Files
Pavel Punsky 5959ecfb13 Add UDP-GSO send path (--udp-gso) (#1907)
## Summary

- New `--udp-gso` flag (Linux, requires `--udp-sendmmsg`) collapses
same-destination, same-size sendmmsg batches into a single `sendmsg`
with a `UDP_SEGMENT` cmsg, so the kernel allocates one super-skb that
traverses the network stack once and is segmented at egress instead of
running `udp_sendmsg → ip_finish_output → __dev_queue_xmit` per
datagram.
- Also wraps the relay-side `recvmmsg` callback loop in
`udp_sendmmsg_batch_begin/end` so peer→client sends triggered inside a
recv batch can also coalesce — without that wrapping the relay path
issues one `sendto` per delivered datagram.
- Sticky-disable on `EINVAL/ENOPROTOOPT` for older kernels/NICs that
lack UDP-GSO; one warning logged, then transparent fallback to the
existing `sendmmsg` and `udp_send` paths.

## Why

The `--udp-recvmmsg` and `--udp-sendmmsg` follow-ups confirmed (see
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md)) that
on the relay flood workload the dominant cost is the per-datagram kernel
TX path. mmsg-style batching reduces only the syscall entry/exit, not
the per-skb stack traversal — UDP-GSO collapses both.

## Result

DigitalOcean nyc1 c-4, 30 s alternating A/B, `-Y packet -m 1`, eth1 TX
as the authoritative server forwarding metric:

| Variant | eth1 RX | eth1 TX | sys CPU | idle CPU |
|---|---:|---:|---:|---:|
| baseline (no flags) | 322,091 | 127,445 | 22.9 % | 67.5 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 266,068 | **257,996** |
15.0 % | 78.7 % |
| baseline (no flags) | 309,475 | 125,573 | 20.9 % | 70.7 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 275,992 | **225,366** |
14.9 % | 74.3 % |

Mean server forwarding rate: **126.5 k → 241.7 k pps (+91 %, 1.91×)**,
mean system CPU **21.9 % → 14.9 %** — about **2.8× CPU efficiency** (TX
pps per system-CPU-%). Full perf-children comparison and methodology in
the new section of
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md).

## Notes for reviewers

- `--udp-gso` is opt-in and requires `--udp-sendmmsg` (the help text
states the dependency). Without `--udp-sendmmsg` the batch state never
accumulates and GSO has nothing to flush.
- GSO eligibility resets on every `_begin/_end`. Mixed-destination,
mixed-size, or oversize batches transparently fall back through
`sendmmsg` / `udp_send`.
- Rebased onto current `master`; the recvmmsg dependency is already
merged via #1906.

## Test plan

- [x] `cmake --build build --target turnserver` (RelWithDebInfo + ASan
local builds clean)
- [x] `ctest --test-dir build --output-on-failure` — 3/3 unit tests pass
- [x] `examples/run_tests.sh` — TCP/TLS/UDP pass; DTLS pre-existing
failure on macOS environment, unrelated to this change
- [x] DigitalOcean A/B perf validation captured above
- [ ] Reviewer to confirm CI green on Linux build/test/CodeQL

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 08:05:38 -07:00
..
2024-01-16 19:54:19 -08:00
2024-01-15 18:31:16 -08:00
2024-02-09 20:14:49 -08:00