1935 Commits

Author SHA1 Message Date
Pavel Punsky fb94ab117d turnutils_uclient: sender thread pool + UDP-GSO send batching + recv_pps reporting (#1913)
## Summary

Three related changes to `turnutils_uclient` that together unblock the
loadgen from being the bottleneck when benchmarking the relay:

1. **Sender thread pool** (`--sender-threads <N>`, max 4, auto-bumped to
2 at `-m >= 4`). Mirrors the listener pool that landed in #1911. Each
sender thread owns its own libevent base, a session shard (round-robin
assigned at allocation time via `elem->sender_id`), and a 100 µs timer
that runs the burst loop just like the legacy main-thread
`timer_handler` did. Send-side counters (`tot_send_messages`,
`tot_send_bytes`, `tot_send_dropped`, `load_sent_packets`) and the
completion accumulators in `client_timer_handler` (`total_loss` /
`total_latency` / `total_jitter`) are written into per-thread
cache-line-aligned slabs and reduced into the globals after
`pthread_join`. This avoids the cross-core atomic-counter contention
that the listener-pool work already documented.

2. **UDP-GSO send batching** in `send_buffer` for the plain-UDP path.
The sender pool opens a thread-local batch window around its per-tick
iteration; within the window, `send_buffer` copies the payload into a
per-thread slot and appends to a scatter-gather `iov[]`. On flush:
- **If `count > 1` and all segments share the same size** → one
`sendmsg(2)` with a `UDP_SEGMENT` cmsg.
- **If GSO is unavailable** (kernel returns
`EINVAL`/`ENOPROTOOPT`/`EOPNOTSUPP`) → sticky-disable per thread, fall
back to `sendmmsg(2)` over the same iov array.
- **Per-entry `send(2)`** as the final fallback for whatever sendmmsg
refused (EAGAIN tail, etc.).

Auto-flush triggers: different fd (next session in iteration), different
segment size, batch capacity (64), or end of iteration.

3. **`recv_pps` in `print_load_generator_rate`**, alongside the existing
`send_pps`. Once the sender pool + GSO let uclient push >>1 Mpps of UDP,
the meaningful end-to-end metric is the round-trip count, not the
send-side count — the relay/peer pipeline drops 95+% of packets when
uclient outpaces it. The progress line now reads:

send_pps=6012928.00, recv_pps=101486.00, total_sent=112975924,
total_recv=1853369

## Why

Benchmarking `--multiplex-client` / `--multiplex-peer` on a c-4
DigitalOcean droplet, the loadgen's single-threaded `timer_handler`
saturated one CPU around 300 kpps regardless of `-m`. The relay was
never put under real pressure, so the multiplex paths' value couldn't be
measured. With this patch the loadgen can produce >6 Mpps from a single
c-4 droplet, far above the relay's per-thread saturation point, so the
bottleneck moves to the server where it belongs.

## Benchmark — multiplex-client turnserver, c-4 loadgen, m=4, 20 s

| Round | OLD (master) | NEW (this PR) | Lift |
|-------|--------------|---------------|------|
| 1 | 246k send_pps | 7.48M | 30.4× |
| 2 | 459k | 6.06M | 13.2× |
| 3 | 360k | 5.07M | 14.1× |
| **avg** | **355k** | **6.20M** | **17.5×** |

Throughput cap shifts from loadgen to relay. End-to-end recv_pps (which
is now first-class in the progress line) is ~100 kpps in this
configuration — limited by the relay, not uclient.

## Design notes

- **Cache-line alignment** on `uclient_sender` mirrors the
listener-pool's slab pattern. Same false-sharing trap, same fix.
- **Main-thread timer slows to 10 ms** when the sender pool is engaged.
The main timer still fires for lifecycle / `__turn_getMSTime` refresh,
but `timer_handler` early-returns when `num_sender_threads > 0` so we
don't burn a core on no-op 100 µs ticks.
- **Stop ordering**: `stop_sender_threads()` runs before
`stop_listener_threads()` — the senders own session mutation (wmsgnum,
to_send_timems, shutdown), so joining them first prevents a race where a
listener accumulates a stat into a session whose owning sender is still
iterating it.
- **UDP-GSO copy**: the per-slot memcpy is intentional. The caller
(`client_write`) reuses `elem->out_buffer` across burst iterations, so
pointing `iov[i]` at the session buffer would alias all entries to the
most recent payload. A rotating per-session output ring would eliminate
the copy — left out of this PR because the kernel-side savings from
collapsing N sendmsg into one GSO sendmsg dominate the per-packet copy
cost at the rates we measured.
- **Linux-only**: send-side batching machinery is gated by `#if
defined(__linux__)`. Non-Linux builds get no-op
`uclient_send_batch_begin`/`_end` and `uclient_tx_enqueue` returns
false, falling through to the legacy `send(2)` loop.

## Test plan

- [x] macOS local build (Apple Silicon, AppleClang). Sender-pool code
paths compile under both Linux and non-Linux gates.
- [x] `clang-format-15 --dry-run --Werror` clean.
- [x] Linux build on a c-4 Ubuntu 24.04 droplet (`cmake
-DCMAKE_BUILD_TYPE=Release`).
- [x] `--help` includes the new `--sender-threads` option with
valid-range hint; out-of-range values rejected.
- [x] Benchmark on two c-4 droplets in nyc1 against `turnserver
--multiplex-client`: 3 alternating rounds OLD vs NEW, +17.5× average
send-side lift (data table above).
- [x] `print_load_generator_rate` output verified — `send_pps`,
`recv_pps`, `total_sent`, `total_recv` all populated and consistent
across listener slab reductions.

## Limitations

- `--multiplex-peer` is not driven by this PR. uclient's pattern (each
`-m N` opens two internal sessions per client that share the same peer
port) hits the multiplex-peer "one allocation per peer endpoint" rule;
benchmarking that flag at high concurrency requires a separate small
change (per-session secondary peer port) — not in scope here.
- The wider per-round variance under the sender pool (rounds in our
bench ranged 13×–30× lift) is timing/scheduler noise at small per-thread
shards. Smoothens out as `-m` and per-thread session counts grow.
2026-05-11 20:59:12 -07:00
Pavel Punsky f7bb459357 Fix memory leak introduced by recvmmsg path (#1912) 2026-05-11 16:56:22 -07:00
Pavel Punsky df8912db5a turnutils_uclient: multi-threaded listener (recv) pool (#1911)
> Updated 2026-05-11 — three follow-up improvements applied on top of
the original draft, with a single-trial real-Linux confirmation run.
Auto threshold lowered, per-listener counter slabs added (eliminating
the K=2/K=4 cache-line regression), per-elem stats atomicised.

## What

Adds an N-thread receive pool to `turnutils_uclient`. Main thread keeps
owning the sender timer, the lifecycle, and the control plane; `EV_READ`
events for client UDP sockets are routed to one of N listener threads,
each with its own libevent base. Sessions sharded round-robin across
listeners at allocation time and pinned to their owning listener for the
lifetime of the test.

### State ownership / synchronisation

- **Per-session bookkeeping** (`recvmsgnum`, `recvtimems`, `rmsgnum`):
mutated only by the owning listener thread — no locking, no atomics.
- **Per-session running stats** (`elem->loss/latency/jitter`): listener
does `__atomic_fetch_add`, the timer on main harvests with
`__atomic_exchange_n` (read-and-zero atomically). Closes the race that
existed in the original draft where a listener increment landing between
the timer's read and its zero-store would be lost.
- **Per-test totals** (`tot_recv_messages`, `tot_recv_bytes`, min/max
latency/jitter): each listener has its own cache-line-aligned slab
inside `uclient_listener`. The listener writes into its own slab (no
atomic); main reads on-demand via snapshot helpers (atomic loads, no
contention with writers); `stop_listener_threads()` folds slabs back
into the global totals before reporting. An early draft used a single
shared atomic counter and the K=2/K=4 path regressed ~19× on `-m {4,
8}`; the slabs close that gap.

Sits on top of the recvmmsg path landed in #1910.

## CLI

```
-K N, --listener-threads N
    Number of receive threads. Default auto:
      -m < 2   ->  K=0  (no worker thread, recv on main event_base)
      -m >= 2  ->  K=1  (one worker thread, clean send/recv split)
    -K overrides the auto rule. Capped at 4.
```

`start_listener_threads()` logs the resolved count and whether it came
from `-K` or the auto rule, so an operator can confirm what actually
ran.

## Real-Linux bench (DigitalOcean, 2× c-4 / 4 vCPU, private VPC)

**Single-trial confirmation** of this PR's follow-ups vs the previous
3-trial baseline.

### After follow-ups (this PR head, 1 trial, recv pps)

| `-m` | **K=0** | **K=1** | K=2 | K=4 |
|---:|---:|---:|---:|---:|
| 1 | **4545** | 3226 | 3226 | 3226 |
| 2 | 6250 | **4878** | 4878 | 153 |
| 4 | **7692** | 6557 | 371 | 350 |
| 8 | **8602** | 689 † | 708 | 199 |
| 16 | 1174 | **1339** | 1281 | 1283 |
| 32 | 1445 | 1517 | **2149** | 1421 |

† K=1 `-m=8` = 689 looks like a single-trial outlier (the matching K=0
row finishes in 0.93 s sending all 8000 packets; K=1 hit the 14 s
timeout with similar throughput). Re-running would clarify, but per the
user's "1 iteration" instruction this is the data we have.

### Comparison vs the previous 3-trial averages

| `-m` | metric | before follow-ups | after follow-ups | delta |
|---:|:---|---:|---:|:---|
| 4 | K=0 recv pps | 5123 | **7692** | **+50 %** |
| 8 | K=0 recv pps | 3967 | **8602** | **+117 %** |
| 2 | K=2 recv pps | 1749 | **4878** | **+179 %** |
| 4 | K=2 status | 344 pps, ~91 % loss | 371 pps, **2.1 %** loss |
regression unblocked |
| 8 | K=2 recv pps | 665 | **708** | +6 % (loss 11 % → 3.2 %) |
| 32 | K=2 recv pps | 1763 | **2149** | +22 % |

The K=0 hot path improved noticeably at mid-`m`. The K=2
cache-line-bouncing regression that originally made the auto-threshold
conservative is gone — K=2 at `-m=4` went from "9× worse than K=1" (344
pps) to "comparable to K=0 with low loss" (371 pps, 2.1 % loss). K=2 at
`-m=32` reached **2149 pps**, the new high-concurrency winner.

### Auto-rule change

With the K=2 regression unblocked, `UCLIENT_AUTO_LISTENERS_THRESHOLD`
lowered from 4 to 2. The auto-default now goes:

- `-m=1` → K=0 (still the lowest-overhead config; ~30 % faster than K=1
at this concurrency)
- `-m>=2` → K=1 (clean send/recv split; consistently strong from m=4
upward)

Explicit `-K` still overrides, so anyone curious about K=2/K=4 on
different hardware can dial them up — they're no longer landmines.

## Combined effect on top of #1910

vs `master` *before* either improvement, at `-m=8`: 148 → **8602 pps** =
**58× speedup**.

## Test plan

- [x] Builds clean on macOS (clang) and Linux (gcc, Debian Trixie via
container)
- [x] `make lint` clean for the modified files
- [x] CLI parsing smoke (`-K 99` rejected, `-K 0` works,
`--listener-threads` long form works)
- [x] Real-Linux single-trial bench across `K ∈ {0,1,2,4}` × `m ∈
{1,2,4,8,16,32}`
- [x] Auto-bump log line confirmed (`uclient: started 1 listener
thread(s) (auto)`)
- [x] Per-listener slab fold-back verified end-to-end (totals match
after `pthread_join`)
- [ ] Functional smoke: `examples/scripts/basic/relay.sh` +
`examples/scripts/basic/udp_c2c_client.sh`
- [ ] CI

## CMake / portability

- pthreads pulled in transitively via `turnclient` → `common`
(`find_package(Threads REQUIRED)`). No link-line change needed.
- `getopt_long()` requires `<getopt.h>` — included unconditionally on
POSIX so the long-option table compiles on macOS as well as glibc.
- `_Thread_local`, `__atomic_*` builtins, struct `aligned(64)`
attribute: C11 / GCC builtins, available on glibc + clang.
2026-05-11 09:52:41 -07:00
nfuhler faff5bf106 examples/turnserver.conf: update description of cli option (#1909)
the previous description does not describe the correct default since
commit 9467af5
2026-05-10 21:12:56 -07:00
Pavel Punsky 284e441a00 turnutils_uclient: Linux recvmmsg receive path + larger SO_RCVBUF (#1910)
## Summary

Two improvements to `turnutils_uclient`'s receive path. With `-Y packet
-m N` workloads the loadgen was previously hitting client-side queue
overflow before the server was anywhere near saturated, so reported
"lost packets" reflected uclient's own kernel-buffer drops rather than
real server loss. Mirrors the recvmmsg work already landed in
`turnutils_peer` (#1908) and the relay (#1906) — this closes the loop on
the loadgen side.

- **`SO_RCVBUF`/`SO_SNDBUF` 64 KB → 4 MB.** New `UCLIENT_SOCK_BUF_SIZE`
constant in `uclient.h`, applied at the 3 socket-creation sites in
`startuclient.c`. `set_sock_buf_size()` already halves on `EPERM/EINVAL`
so this is safe under any `net.core.rmem_max`.
- **Linux-only `recvmmsg(2)` batched receive in
`client_input_handler`.** Refactored `client_read()` → extracted
`process_received_buffer()` so the per-packet processing
(channel/Indication/Data parsing, latency/jitter accounting) can run
after either a single `recv()` or a batched `recvmmsg()`. New
`client_read_batch_udp()` drains up to 32 datagrams per syscall;
`client_input_handler` dispatches to it for plain UDP only (no SSL/DTLS,
no TCP, no TCP-relay sub-connections). All other paths fall through to
the legacy single-`recv()` loop unchanged. Behind `#if
defined(__linux__)` with `_GNU_SOURCE`; static scratch buffers (32 ×
2048 B = 64 KB).

## Bench (local Docker, 3 trials per concurrency, identical server, only
uclient binary differs)

| `-m` | OLD recv pps | NEW recv pps | speedup | OLD loss | NEW loss |
|---:|---:|---:|---:|---:|---:|
| 1 | 58 | **71** | **+22%** | 42% | 27% |
| 2 | 78 | **124** | **+59%** | 60% | 33% |
| 4 | 135 | **211** | **+56%** | 64% | 45% |
| 8 | 145 | **359** | **+148%** | 80% | 50% |
| 16 | 224 | **591** | **+164%** | 84% | 54% |

Send PPS is essentially unchanged (uclient was never send-bound). Server
CPU stayed below 1% across all runs — confirming the bottleneck was the
loadgen, not the server.

## Notes

- macOS / *BSD keep the legacy single-`recv()` loop. The SO_RCVBUF bump
still applies and helps somewhat.
- The remaining loss at higher concurrency is loadgen single-event-loop
saturation, which is a larger restructuring (worker-pool uclient) and
out of scope for this PR.
- DTLS / TCP / SSL paths are deliberately left on the legacy code path —
`recvmmsg` doesn't apply.

## Test plan

- [x] Builds clean on macOS (clang) and Linux (gcc, Debian Trixie via
container)
- [x] `make lint` clean for the modified files
- [x] Functional smoke: `examples/scripts/basic/relay.sh` +
`examples/scripts/basic/udp_c2c_client.sh` still pass
- [x] Bench above (3 trials per `-m {1,2,4,8,16}`)
2026-05-10 21:12:22 -07:00
Pavel Punsky 5959ecfb13 Add UDP-GSO send path (--udp-gso) (#1907)
## Summary

- New `--udp-gso` flag (Linux, requires `--udp-sendmmsg`) collapses
same-destination, same-size sendmmsg batches into a single `sendmsg`
with a `UDP_SEGMENT` cmsg, so the kernel allocates one super-skb that
traverses the network stack once and is segmented at egress instead of
running `udp_sendmsg → ip_finish_output → __dev_queue_xmit` per
datagram.
- Also wraps the relay-side `recvmmsg` callback loop in
`udp_sendmmsg_batch_begin/end` so peer→client sends triggered inside a
recv batch can also coalesce — without that wrapping the relay path
issues one `sendto` per delivered datagram.
- Sticky-disable on `EINVAL/ENOPROTOOPT` for older kernels/NICs that
lack UDP-GSO; one warning logged, then transparent fallback to the
existing `sendmmsg` and `udp_send` paths.

## Why

The `--udp-recvmmsg` and `--udp-sendmmsg` follow-ups confirmed (see
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md)) that
on the relay flood workload the dominant cost is the per-datagram kernel
TX path. mmsg-style batching reduces only the syscall entry/exit, not
the per-skb stack traversal — UDP-GSO collapses both.

## Result

DigitalOcean nyc1 c-4, 30 s alternating A/B, `-Y packet -m 1`, eth1 TX
as the authoritative server forwarding metric:

| Variant | eth1 RX | eth1 TX | sys CPU | idle CPU |
|---|---:|---:|---:|---:|
| baseline (no flags) | 322,091 | 127,445 | 22.9 % | 67.5 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 266,068 | **257,996** |
15.0 % | 78.7 % |
| baseline (no flags) | 309,475 | 125,573 | 20.9 % | 70.7 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 275,992 | **225,366** |
14.9 % | 74.3 % |

Mean server forwarding rate: **126.5 k → 241.7 k pps (+91 %, 1.91×)**,
mean system CPU **21.9 % → 14.9 %** — about **2.8× CPU efficiency** (TX
pps per system-CPU-%). Full perf-children comparison and methodology in
the new section of
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md).

## Notes for reviewers

- `--udp-gso` is opt-in and requires `--udp-sendmmsg` (the help text
states the dependency). Without `--udp-sendmmsg` the batch state never
accumulates and GSO has nothing to flush.
- GSO eligibility resets on every `_begin/_end`. Mixed-destination,
mixed-size, or oversize batches transparently fall back through
`sendmmsg` / `udp_send`.
- Rebased onto current `master`; the recvmmsg dependency is already
merged via #1906.

## Test plan

- [x] `cmake --build build --target turnserver` (RelWithDebInfo + ASan
local builds clean)
- [x] `ctest --test-dir build --output-on-failure` — 3/3 unit tests pass
- [x] `examples/run_tests.sh` — TCP/TLS/UDP pass; DTLS pre-existing
failure on macOS environment, unrelated to this change
- [x] DigitalOcean A/B perf validation captured above
- [ ] Reviewer to confirm CI green on Linux build/test/CodeQL

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 08:05:38 -07:00
Kai Ren 78c1f7c7ce Upgrade Docker image to 4.11.0 Coturn version docker/4.11.0-r0 2026-05-09 13:12:33 +03:00
Kai Ren 259f0d3c67 Update Debian "trixie" to 20260406 snapshot in Docker image 2026-05-09 13:10:35 +03:00
Pavel Punsky e59e227dfd turnutils_peer: Linux fast path with drain loop, recvmmsg/sendmmsg, U… (#1908)
…DP-GSO

The libevent EV_READ handler used to do one recvfrom + one sendto per
ready event, so a packet flood through the relay generated O(N) libevent
re-entries and 2N syscalls per N relayed datagrams — saturating one core
on the loadgen-side peer well below modern relay throughput.

On Linux, replace the handler with:
* a drain loop: keep recvmmsg'ing in MSG_DONTWAIT until the queue
returns less than a full batch, bounded by MAX_DRAIN_ROUNDS so a flood
can't starve the rest of the event loop;
* recvmmsg into a static mmsghdr[32] (peer is single-threaded) and reuse
the same mmsghdr array for sendmmsg back — each entry already has
msg_name pointing at the source (the echo destination) and the iovec
pointing at the received bytes, so no userspace copy;
* UDP-GSO: when the recvmmsg batch is homogeneous (≥2 entries, same
source, same size, ≤1472 B), echo it as one sendmsg with UDP_SEGMENT
cmsg so the kernel allocates one super-skb that traverses the network
stack once.

The non-Linux build keeps the original recvfrom/sendto handler.

DigitalOcean nyc1 c-4 30 s alternating A/B paired with the GSO
turnserver (-Y packet -m 1):
  old peer: turn TX mean 228 k pps, peer CPU mean 91.0 % (saturated)
  new peer: turn TX mean 255 k pps, peer CPU mean 28.8 %

Peer CPU drops 3.2× while turn-side throughput climbs ~12 % because the
old peer was no longer fully reflecting at the GSO turnserver's rate.
The peer is no longer the loadgen-side bottleneck, freeing CPU for
multi-flow tests.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:56:19 -07:00
Pavel Punsky a5005c4193 Relay recvmmsg (#1906)
## Summary

Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP
listener socket to also cover **connected per-session UDP relay
sockets**, so steady-state client→relay and peer→relay traffic on plain
UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead
of one `recvmsg` per packet. DTLS sessions still go through the SSL read
path and are unchanged.

The flag stays **opt-in**: receive-side batching works correctly, but on
the current `m=1` / `m=100` benchmarks throughput is flat to slightly
negative — the bottleneck has moved past receive (see results below).

## What's in the change

- **Shared receive helpers** (`src/apps/relay/ns_ioalib_engine_impl.c`,
`src/apps/relay/ns_ioalib_impl.h`):
- `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg
parser used by both `udp_recvfrom()` and the new batch path. Replaces
the duplicated parser previously inlined in `dtls_listener.c` and
`udp_recvfrom()`.
- `ioa_init_recvmmsg_hdr()` — single initializer for
`mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener.
- New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and
relay paths now share it.
- **Connected relay batch read** (`socket_udp_read_batch_recvmmsg` in
`ns_ioalib_engine_impl.c`): called from `socket_input_worker` for
non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message
`stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches
each datagram through the existing `read_cb` path, and falls back
cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and
on `EAGAIN`/short-batch (releases unused buffers).
- **Per-engine scratch state**: the `mmsghdr[16]` / `iovec[16]` / cmsg /
src-addr arrays live on `ioa_engine`, not on every socket — keeps memory
flat at thousands of allocations.
- **TTL/TOS-sized cmsg buffers** in the listener: the listener
previously over-allocated `64 KiB` per slot; it now uses the same
TTL+TOS sizing as the relay path.
- **Opt-in occupancy stats** behind a new `--udp-recvmmsg-log` flag:
every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=…
avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=…
hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap);
the periodic log is gated by the new flag so default operation is
silent.
- **CLI plumbing**: `--udp-recvmmsg-log` long option in
`mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in
`turn_admin_server.c`, doc updates in `README.turnserver`.
- **Docs**: `docs/PerformanceIterationLog.md` records the iteration
steps, validation, and two rounds of DigitalOcean A/B numbers.
`CLAUDE.md` load-test instructions updated to mention the new flag and
the `tot_recv_msgs` / `tot_recv_bytes` workaround.
2026-05-08 22:47:46 -07:00
Pavel Punsky b1d5c467f3 fuzzing: use hex escapes for HTTP EOH dictionary entry (#1905)
## Summary

`fuzzing/stun.dict` line 147 used C-style `\r\n\r\n` for the HTTP
end-of-headers keyword:

```
kw_http_eoh="\r\n\r\n"
```

libFuzzer's `ParseDictionaryFile` only accepts three escape sequences
inside quoted entries: `\\`, `\"`, and `\xAB` hex. `\r` / `\n` are
unrecognized, so a local fuzz run aborts dictionary load with:

```
ParseDictionaryFile: error in line 147
                kw_http_eoh="\r\n\r\n"
```

Replace with the hex form used by the other 111 entries in the file:

```
kw_http_eoh="\x0d\x0a\x0d\x0a"
```
2026-05-08 20:26:46 -07:00
Pavel Punsky 61332bebca Sync turnserver man page with current CLI options (#1903)
The turnserver man page had drifted from the actual CLI options the
binary accepts. The shipped `man/man1/turnserver.1` was last regenerated
on 05 June 2021, so several options added since then were missing and
one removed option was still documented.

The man page is auto-generated from `README.turnserver` via
`make-man.sh` (txt2man), so the source-of-truth edit is in the README;
the `.1` files are then regenerated.

In `README.turnserver`:
- Add 13 options that exist in `mainrelay.c` long_options[] but were
undocumented: --include-reason-string, --syslog-facility,
--drop-invalid-packets, --drop-invalid-packets-log, --udp-recvmmsg,
--respond-http-unsupported, --prometheus-address, --prometheus-path,
--version, --cpus, --no-cli, --no-rfc5780,
--response-origin-only-with-rfc5780.
- Document --sql-userdb as an alias on the existing --psql-userdb line.
- Remove the stale --ne=[1|2|3] entry (no longer parsed by the binary).

The regenerated `man/man1/turnserver.1` also picks up a backlog of
options that were already in the README but never reached the shipped
page (--software-attribute, --cli, --sock-buf-size, --raw-public-keys,
--stun-backward-compatibility, and the corrected --no-tlsv1_2 wording).

`man/man1/turnadmin.1` and `man/man1/turnutils.1` are regenerated as a
side-effect of `make-man.sh` running over all three READMEs; their
content was similarly stale relative to README.turnadmin /
README.turnutils.
2026-05-08 18:27:30 -07:00
Pavel Punsky 9d0cfca6f1 Remove stale --ne option from turnserver --help (#1904)
## Summary

- The `--ne=[1|2|3]` option was already removed from `long_options[]`
and the option parser, so `turnserver` rejects it at runtime, but the
help text printed by `turnserver --help` still advertised it.
2026-05-08 18:26:08 -07:00
Pavel Punsky 36e1eee855 Restore CodeQL permissions, category, and manual build mode (#1901)
PR #1517 (Jun 2024) simplified codeql.yml in ways that left scans
incomplete: it dropped the actions:read / contents:read permissions and
the analyze category, both of which CodeQL Action requires for results
to land under the existing language category. Combined with the later
cpp -> c-cpp rename and v3 -> v4 upgrade, scheduled scans have not
refreshed the Security tab since Jun 1, 2024.

- Add actions:read and contents:read back to job permissions
- Set build-mode: manual on init (required for v3+/v4 manual builds)
- Pass category "/language:c-cpp" on analyze so SARIF de-duplicates
against the configured language
- Build with --parallel so the tracer keeps up on default runners
2026-05-08 09:02:51 -07:00
dependabot[bot] 97fd597fcb Bump repolevedavaj/install-nsis from 1.1.0 to 1.2.0 (#1899)
Bumps
[repolevedavaj/install-nsis](https://github.com/repolevedavaj/install-nsis)
from 1.1.0 to 1.2.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/repolevedavaj/install-nsis/releases">repolevedavaj/install-nsis's
releases</a>.</em></p>
<blockquote>
<h2>v1.2.0</h2>
<!-- raw HTML omitted -->
<h2>🚀 New features and improvements</h2>
<ul>
<li>Fix NSIS installer download (<a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a>
<a href="https://github.com/lordmulder"><code>@​lordmulder</code></a> +
<a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/41">#41</a>
<a
href="https://github.com/repolevedavaj"><code>@​repolevedavaj</code></a>)</li>
</ul>
<h2>📦 Dependency updates</h2>
<ul>
<li>Bump release-drafter/release-drafter from 6.1.0 to 7.2.0 (<a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/37">#37</a>)
@<a href="https://github.com/apps/dependabot">dependabot[bot]</a></li>
<li>Bump actions/checkout from 5 to 6 (<a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/29">#29</a>)
@<a href="https://github.com/apps/dependabot">dependabot[bot]</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/c14d0ea1b829818b4e9313d8e009b43f0a65fddd"><code>c14d0ea</code></a>
Merge pull request <a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/41">#41</a>
from repolevedavaj/fix/strlen-patch-download-url</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/7b8697d1199a09d10256551b45814079f103086b"><code>7b8697d</code></a>
Apply PR <a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a>
download fix to strlen_8192 patch step</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/9abd7fac248dc054cdc4e14fcf9b42cbd578100e"><code>9abd7fa</code></a>
Merge pull request <a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/37">#37</a>
from repolevedavaj/dependabot/github_actions/release-d...</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/c569f7c7d0186ead4335b0606bd0ca9d488aaada"><code>c569f7c</code></a>
Merge pull request <a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a>
from lordmulder/main</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/fb4d83d77758496ff007a717b63647e8634d138d"><code>fb4d83d</code></a>
Fixed.</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/9f060d05951d525ff9528d59c9bf00561a02e60d"><code>9f060d0</code></a>
Fixed</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/187e8883da4fe935fdaaf3a3bce7d7d11f5654b5"><code>187e888</code></a>
Change NSIS installer download link and User-Agent</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/59638711aebe5255a768fb9cde11cc47d06c7140"><code>5963871</code></a>
Bump release-drafter/release-drafter from 6.1.0 to 7.2.0</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/618f596b61aeb0254a327cceba89036242b79758"><code>618f596</code></a>
Merge pull request <a
href="https://redirect.github.com/repolevedavaj/install-nsis/issues/29">#29</a>
from repolevedavaj/dependabot/github_actions/actions/c...</li>
<li><a
href="https://github.com/repolevedavaj/install-nsis/commit/81fc6efc3373bd2a346607b92f73997e28c970df"><code>81fc6ef</code></a>
Bump actions/checkout from 5 to 6</li>
<li>See full diff in <a
href="https://github.com/repolevedavaj/install-nsis/compare/v1.1.0...v1.2.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=repolevedavaj/install-nsis&package-manager=github_actions&previous-version=1.1.0&new-version=1.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
4.11.0
2026-05-07 22:23:07 -07:00
Pavel Punsky 238c311f05 Fix Prometheus metrics response leak (#1900)
Fix memory leak introduced in #1853

Resolved #1898
2026-05-07 22:22:33 -07:00
Pavel Punsky 326816a92a Update version to 4.11.0 (#1897) 2026-05-04 18:55:49 -07:00
Pavel Punsky 24f474878e Filc harness and pointer typedefs (#1896)
## Summary

- Add a self-contained Fil-C build/test harness under `filc/` that
mirrors the existing `fuzzing/` pattern: one host script
(`filc/run-local.sh`) builds an Ubuntu 24.04 image with the
[Fil-C](https://github.com/pizlonator/fil-c) optfil 0.678 toolchain,
builds turnserver with `CC=filcc`/`CXX=fil++`, runs unit tests + system
tests, and drops a per-run timestamped log directory with `SUMMARY.txt`
+ `ISSUES.txt`.
- Fix the two real Fil-C compatibility bugs the harness surfaces by
changing `ur_map_value_type` and `ur_addr_map_value_type` from
`uintptr_t` to `void *` in `src/server/ns_turn_maps.h`.

## Why

[Fil-C](https://fil-c.org) is a memory-safe C/C++ compiler (Clang 20
fork) that pairs every pointer with an "InvisiCap" capability and turns
UB into deterministic panics with no `unsafe` escape hatch. Putting
coturn through it answers two questions: (a) does it compile unmodified,
and (b) does it run correctly under capability-enforced memory safety.
After this PR, the answer is **yes** for both — turnserver,
turnutils_peer, and turnutils_uclient relay TCP/TLS/UDP/DTLS traffic
with full Fil-C enforcement, all unit tests pass, and
`examples/run_tests_conf.sh` runs end-to-end.

## What's in the PR

### `filc/` harness (commit 1)

| File | Purpose |
|---|---|
| `filc/Dockerfile` | Ubuntu 24.04 + Fil-C optfil 0.678 (extracts the
nested `fil.tar.xz` to `/opt/fil`); `--platform linux/amd64` so it works
on Apple Silicon under emulation. |
| `filc/run-local.sh` | Host-side: build image, create
`filc/logs/<UTC-ts>/`, run container with source mounted read-only and
log dir mounted r/w. |
| `filc/docker-entrypoint.sh` | In-container orchestrator. Phases: env /
source-copy / build / unit-tests / system-cli / system-conf. Runs every
phase even when a prior one fails (no aborting mid-pass). Captures
per-phase logs + a combined `all.log` + JUnit XML for ctest. Greps
panics/errors into `ISSUES.txt`. Downgrades `system-*` phases to FAIL
when `examples/run_tests*.sh` prints `FAIL` despite exiting 0 (existing
fragility in those scripts). |
| `filc/build.sh` | `cmake … -DBUILD_TESTING=ON -DCMAKE_C_COMPILER=filcc
-DCMAKE_CXX_COMPILER=fil++ -DCMAKE_BUILD_TYPE=RelWithDebInfo`, then
build. |
| `filc/.gitignore` | Ignore the on-host `logs/` dir. |

The harness also bumps the post-launch sleep in `examples/run_tests.sh`
from 2s to 6s **only inside the container** (sed-in-place on the copied
source; upstream is untouched). Under linux/amd64 emulation the
Fil-C-built turnserver isn't accepting TCP at 2s, so the first sub-test
races and prints `FAIL`. Matches the 5s sleep already used by
`run_tests_conf.sh`.

### Pointer-typedef fixes (commit 2)

`src/server/ns_turn_maps.h`:

```diff
-typedef uintptr_t ur_map_value_type;
+typedef void *ur_map_value_type;
...
-typedef uintptr_t ur_addr_map_value_type;
+typedef void *ur_addr_map_value_type;
```

**Why this is necessary.** Both maps store pointers, but their value
slot is integer-typed. Every existing `_put` site casts a pointer
through `(ur_*_value_type)` to store, and every `_get` site casts back.
Under standard C this is a well-defined no-op. Under Fil-C, casting a
pointer to `uintptr_t` discards its InvisiCap; casting back yields a
pointer with a non-null address but a NULL Fil-C object — the next
dereference panics with `cannot read pointer with null object`.

The harness caught two such panics, both in the auth-resume /
relay-allocate flow:

1. `src/server/ns_turn_server.c:3248` — `ss->client_socket`, where `ss`
came from `sessions_map` (a `ur_map`).
2. `src/apps/relay/turn_ports.c:225` — `tp->mutex`, where `tp` came from
`ip_to_turnports_*` (a `ur_addr_map`) via `turnipports_add`.

**Why this is also a correctness improvement on a normal build.** The
new typedef makes the API strictly more type-safe — the compiler now
enforces "you put a pointer in." It eliminates a class of accidental
misuse (storing a non-pointer integer where a pointer was expected) that
the integer typedef silently allowed. Same generated code on a normal
build; different (correct) Fil-C semantics.

**Audit.** Verified before changing:
- All `ur_map_put` / `lm_map_put` / `ur_addr_map_put` callers store
pointer-typed values exclusively (no callers store raw integers).
- No internal arithmetic on the value type anywhere in `ns_turn_maps.c`.
- `ur_map_del_func` / `ur_addr_map_func` implementations either don't
exist (all `_del` callers pass `NULL`) or immediately cast their
parameter to a real pointer type — no source change needed.
- `KHASH_MAP_INIT_INT64(3, ur_map_value_type)` works identically with
`void *`.
- `ur_addr_map`'s `addr_elem.value` is assigned, read, compared for
truthiness, and cleared with `= 0` — all valid for `void *`.

## Test plan

- [ ] `filc/run-local.sh` reports all six phases PASS (env / source-copy
/ build / unit-tests / system-cli / system-conf), `ISSUES.txt` carries
no Fil-C panic / safety / sanitizer entries.
- [ ] Local `cmake -S . -B build -DBUILD_TESTING=ON && cmake --build
build && ctest --test-dir build --output-on-failure` is green (no
regression on the regular build).
- [ ] `examples/run_tests.sh` and `examples/run_tests_conf.sh` are green
on Linux per `CLAUDE.md`.
- [ ] Existing `fuzzing/run-local.sh ASan 0 -runs=1` still passes (the
new `filc/` directory is independent and shouldn't perturb anything).
2026-05-04 18:49:18 -07:00
Pavel Punsky 69bc0e7351 Load generator mode in turnutils_uclient (#1894)
## Summary

Adds load-generator modes to `turnutils_uclient` for repeatable TURN
server performance testing:

- Adds `-Y packet|alloc|invalid` load modes.
- Supports packet flood, allocation flood, and invalid-packet flood
workflows.
- Adds unique local client ports for allocation flood mode.
- Removes default packet pacing in load-generator modes unless
explicitly set.
- Adds helper scripts under `examples/loadtest/`.
- Documents load-test usage in `README.turnutils`,
`man/man1/turnutils.1`, `CLAUDE.md`, and
`docs/PerformanceIterationLog.md`.

The performance log captures DigitalOcean benchmark methodology, A/B
lessons, hot-path findings, and future optimization candidates.
2026-05-03 22:03:08 -07:00
Pavel Punsky 4b97d032ad Cache hot lookups in TURN data-path handlers (#1893)
write_to_peerchannel(): get_relay_socket_ss() and
ioa_network_buffer_get_size() were each called twice per channel-data
packet. The compiler can't CSE the calls (cross-TU through a
get_relay_socket() accessor in ns_turn_allocation.c that it can't prove
pure), so cache the relay socket and the inbound size once.

handle_turn_send(): same get_relay_socket_ss() duplication on the
STUN_SEND path.

read_client_connection(): the inbound size was fetched four times
(received_bytes accumulator, verbose log, blen seed, ret check). Reuse
ret as orig_blen.

No behavior change. Targets the ~0.4% per-packet overhead these helpers
were contributing in the m=1 packet-flood profile.
2026-05-03 21:45:54 -07:00
Pavel Punsky 62ee3759f4 Inline get_ioa_addr_len() in the header (#1891)
This is a four-instruction accessor (read sa_family, return struct
sockaddr_{in,in6} size) that gets called from every per-packet sendto(),
recvmsg(), and addr-map lookup. Cross-TU it stays a real function call;
moving the body into ns_turn_ioaddr.h as static inline lets each call
site fold the family branch directly into the syscall setup.

perf record on the m=1 packet flood (c-4 nyc1) confirms the win:
 - udp_recvfrom self-time: 0.76% -> 0.35%  (-54%)
 - udp_send     self-time: 0.60% -> 0.26%  (-57%)
End-to-end throughput stays in the run-to-run noise band, as
expected for a kernel-bound workload, but the released CPU is real.
2026-05-03 20:31:44 -07:00
Pavel Punsky 1a53e51141 Trim two redundant checks from per-packet relay hot path (#1890)
ioa_socket_check_bandwidth(): hoist the "no bps limit configured"
fast-exit before the multi-condition socket-state check. The vast
majority of sessions have max_bps == 0, so the existing path was running
5+ pointer dereferences and equality tests just to land on the same
return-1.

send_data_from_ioa_socket_nbh(): drop the redundant inner "if (!(s->done
|| s->fd == -1))" gate. The outer if/else-if branch already filtered
those, and ioa_socket_tobeclosed() rechecks both, so the inner test was
dead code on every successful send.

perf record on the c-4 nyc1 droplet (m=1 packet flood, 12s) shows
send_data_from_ioa_socket_nbh self-time drop from 0.91% to 0.54% and
ioa_socket_check_bandwidth fall out of the top-25 user-space symbols
(was 0.33%). Throughput is within run-to-run noise — the relay is
syscall-bound, so user-space wins don't translate 1:1 — but the released
CPU is real.
2026-05-03 20:18:56 -07:00
Pavel Punsky a619d9d6d9 Inline addr_cpy() in the header (#1892)
Same pattern as the get_ioa_addr_len() inline: addr_cpy() is a
single-memcpy helper that fires on every receive (each packet dispatch
copies the source address into ioa_net_data, plus allocation/permission
map-key copies). Cross-TU it stays a real function call.

Combined with the previous four iterations (turn_server_get_engine
hoist, bandwidth fast-exit + dead-check removal, cached relay-socket and
buffer-size lookups, get_ioa_addr_len inline), the alternating A/B run
on the same c-4 nyc1 droplet now shows a consistent +5% throughput on
the m=1 packet flood test (recv_msgs/30s mean over 6 rounds: B=146984 /
I=155468).
2026-05-03 20:14:56 -07:00
Pavel Punsky 23e8538657 Hoist turn_server_get_engine() out of per-packet hot path (#1889)
turn_report_session_usage() runs on every packet but only does real
reporting work once per 4096 packets. Re-order the early returns so the
bitmask fast-exit fires before the cross-TU
turn_server_get_engine() call, and flatten the nested if-blocks into
guard clauses for readability.

No behavior change. A/B testing on a c-4 nyc1 droplet shows the
single-client packet-flood throughput within noise (alternating B/I
rounds: B=149317 / I=153844 mean recv_msgs over 30s, ~3% in iter1's
favor with ~10% run-to-run variance).
2026-05-03 20:09:44 -07:00
Pavel Punsky cb701a47b4 Add fuzz coverage for integrity helpers (#1888) 2026-04-30 22:32:50 -07:00
Pavel Punsky 247118d1b4 Add deterministic challenge-response builder to FuzzStun (#1886)
stun_is_challenge_response_str in src/client/ns_turn_msg.c only descends
into its three inner stun_attr_get_first_by_type_str calls when the
input is an error response with err_code 401 or 438 *and* a REALM
attribute *and* a NONCE attribute. The OAuth branch additionally
requires STUN_ATTRIBUTE_THIRD_PARTY_AUTHORIZATION.

The fuzzer-driven path in harness_attr_iter calls the predicate every
iteration but the conjunction of conditions is too specific for
libFuzzer to discover from binary mutation alone — OSS-Fuzz introspector
flags 9 unreached callsites on
stun_attr_get_first_by_type_str gated on this function.

Add harness_challenge_response_builder that constructs six deterministic
message variants on every iteration and runs each through the predicate:

  - 401 with REALM + NONCE                      (canonical success)
  - 401 with REALM + NONCE + THIRD-PARTY-AUTH   (OAuth branch)
  - 438 with REALM + NONCE                      (438 disjunct)
  - 401 with REALM only                         (NONCE-missing path)
  - 401 with no REALM                           (REALM-missing path)
  - 400 with REALM + NONCE                      (wrong err_code path)

Each variant runs once with a non-NULL oauth pointer and once with NULL
to cover both branches of the optional output. Realm / nonce /
server-name lengths and the transaction id are derived from fuzz bytes
so iterations stay meaningfully distinct.

Verified by stand-alone harness:
  - 401+REALM+NONCE returns true with attrs copied out, oauth=false
- 401+REALM+NONCE+TPA returns true with oauth=true and server_name
populated — confirming all three inner get_first_by_type_str callsites
and the OAuth disjunct are now exercised.
2026-04-28 14:30:25 -07:00
Pavel Punsky 8952415609 Seed address-mapping table in fuzz initializer (#1885)
map_addr_from_public_to_private and map_addr_from_private_to_public in
src/client/ns_turn_ioaddr.c walk a static public_addrs[] table of size
mcount. Without an explicit ioa_addr_add_mapping call mcount stays 0 for
the entire fuzz process, so the loop body — including the
addr_eq_no_port call it gates — is dead code in every fuzz iteration.
OSS-Fuzz introspector flags this as 19 unreached callsites under
map_addr_from_public_to_private and 4+4 under addr_eq_no_port.

Extend the existing shared LLVMFuzzerInitialize in FuzzOpenSSLInit.c
(linked into both FuzzStun and FuzzStunClient via FUZZ_COMMON_SOURCES)
to register two synthetic public<->private mapping pairs — one v4
(192.0.2.1 <-> 10.0.0.1) and one v6 (2001:db8::1 <-> fd00::1) — once at
startup. The header comment for ioa_addr_add_mapping requires
single-threaded init before fuzzing begins, which matches exactly when
LLVMFuzzerInitialize runs.

Verified by stand-alone harness: after init,
stun_attr_get_first_addr_str on a XOR-MAPPED-ADDRESS attribute holding
192.0.2.1:443 returns 10.0.0.1:443, and the v6 equivalent returns
[fd00::1]:8080 — confirming addr_eq_no_port is now called inside the
loop body in both helpers.
2026-04-27 21:29:23 -07:00
Pavel Punsky 301415d848 Unblock fuzz coverage for is_http and rare STUN attributes (#1884)
OSS-Fuzz introspector flags three blockers the fuzzer cannot reach on
its own:

1. findstr() in src/client/ns_turn_msg.c is gated by is_http(), which
requires GET/POST/PUT/DELETE prefix + " HTTP/" + "\r\n\r\n". The
fuzzer's binary STUN seeds never synthesize a valid HTTP frame.

2. stun_attr_get_reservation_token_value() and
stun_attr_get_response_port_str() are called from harness_attr_iter only
when the input contains the matching attribute type. Neither appears in
the existing seed corpus.

Add HTTP framing keywords to fuzzing/stun.dict and four new seed files
covering both gaps:

  - seed_http_get.raw: minimal "GET / HTTP/1.1\r\nHost: x\r\n\r\n"
- seed_http_post_clen.raw: POST with Content-Length to drive the strtoul
branch in is_http
- seed_reservation_token.raw: STUN allocate response with an 8-byte
RESERVATION-TOKEN attribute
- seed_response_port.raw: STUN binding request with a 4-byte
RESPONSE-PORT attribute

Each new STUN seed validated against the real parsers
(stun_get_message_len_str, stun_attr_get_first_by_type_str, is_http) to
confirm it reaches the targeted branch.

The corpus zips also drop pre-existing __MACOSX/ and .DS_Store entries
that had snuck in during a prior macOS zip step; net file count rises
(24 -> 28 in FuzzStun, 4 -> 8 in FuzzStunClient) while archive size
shrinks because of the junk removal.
2026-04-27 18:08:40 -07:00
Pavel Punsky 301d12fdda HTTP parsing fixes (#1882) 2026-04-27 08:34:38 -07:00
Pavel Punsky b4c138c409 Cover all public stun_buffer.c wrappers in FuzzStunClient (#1883)
Add harness_stun_buffer_api to FuzzStunClient.c that exercises every
public wrapper in src/apps/common/stun_buffer.c not already reached by
the existing harnesses: stun_get_size (NULL/non-NULL), the init_request
/ init_indication / init_success_response builders, the tid accessors,
the stun_is_indication wrapper (which gates the static is_channel_msg),
the attr_add / attr_add_channel_number / attr_add_addr /
attr_add_even_port (both branches) / attr_get_first_by_type accessors,
stun_set_allocate_request (rt NULL and non-NULL paths),
stun_set_binding_request /
stun_prepare_binding_request, and the channel-message wrappers.

Each builder call is followed by inspect_buffer_message so the resulting
serialized message is also walked by the parser predicates. A tail block
also pumps raw fuzzer bytes through the wrapper-form predicates
(stun_is_indication, stun_is_channel_message, stun_tid_from_message,
stun_attr_get_first_by_type) so they see malformed inputs the serializer
paths cannot produce.
2026-04-26 11:28:28 -07:00
Pavel Punsky 46e5117fb1 Extend fuzzing coverage and enable local fuzzing in a container (#1881) 2026-04-24 22:11:27 -07:00
Pavel Punsky 741b2983cc Extend STUN client fuzz builder coverage (#1878) 2026-04-22 19:06:41 -07:00
Pavel Punsky 4ffa60d32e Out of bound HTTP detection in parser (#1877) 2026-04-21 21:28:41 -07:00
Pavel Punsky 51520c77a2 Delete log line per relay thread on start (#1876)
This is a unique log per relay thread. Auth threads do not log.
2026-04-20 22:14:34 -07:00
Pavel Punsky 453afd1fdc Add Unity-based unit test scaffolding (#1875)
## Summary
Introduces an opt-in unit test layer for coturn using
[Unity](https://github.com/ThrowTheSwitch/Unity) — a single-header
pure-C test framework that matches coturn's C11 toolchain, portability
bar, and zero-C++ production tree.

- Unity v2.6.0 is fetched on demand via CMake `FetchContent` (nothing
vendored).
- Tests are gated behind `-DBUILD_TESTING=ON` (off by default), so the
standard build and OSS-Fuzz pipeline are unaffected.
- Two test binaries cover pure C-callable code in `libturnclient`:
- `test_ioaddr` (6 cases) — `make_ioa_addr`,
`addr_get_port`/`addr_set_port`, `addr_eq` variants, `addr_to_string`,
IPv4/IPv6/garbage input
- `test_stun_msg` (7 cases) — STUN header construction,
request/indication/success/error response classification, transaction-ID
round-trip, channel message parsing, truncated/zeroed buffer rejection
- New `check` cmake target builds tests before running ctest (avoids the
`make test` footgun where the auto-generated `test` target only runs
already-built binaries).
- Legacy `Makefile.in` gets a `unit-tests` target that bootstraps
`build/unit-tests/` and delegates to the cmake `check` target. `make
check` and `make test` now run the RFC 5769 conformance suite **plus**
the Unity unit tests.
- CLAUDE.md documents the new workflow plus the one-liner for adding a
new `test_<name>.c`.

## Why
The existing test story is shell-script integration suites under
`examples/scripts/` — they exercise the binary end-to-end but can't pin
down behavior of individual functions, can't run without a full build
environment, and don't fail loudly when a unit-level invariant breaks. A
lightweight unit layer gives us:

- Targeted regression coverage for protocol parsing/encoding (the
highest bug-yield area).
- A natural home for tests of the kinds of subtle invariants already
documented in CLAUDE.md (port-counter overflow safety, port-bounds
inclusivity, HMAC buffer initialization).
- Sub-second feedback for contributors.

## Usage
```bash
# CMake direct
cmake -S . -B build -DBUILD_TESTING=ON
cmake --build build -j --target check     # build + run all unit tests
ctest --test-dir build --output-on-failure   # run already-built tests

# Legacy Makefile bridge (after ./configure)
make unit-tests   # bootstraps build/unit-tests/, builds + runs Unity tests
make check        # RFC 5769 conformance + unit tests
```

Adding a new test:
1. Drop `tests/test_<name>.c`
2. Append `coturn_add_test(test_<name>)` in `tests/CMakeLists.txt`
3. The `check` target picks it up automatically.

## Test plan
- [x] Clean cmake build with `-DBUILD_TESTING=ON` succeeds; full source
tree (turnserver, turnadmin, turnclient, turn_server, all turnutils)
still builds
- [x] `cmake --build build --target check` builds and runs both test
binaries — 13/13 cases pass
- [x] `ctest --verbose` shows per-case PASS lines for all 13 cases
- [x] Default build (`-DBUILD_TESTING` unset) does not fetch Unity or
build any test binary

## Notes for reviewers
- Why Unity over GoogleTest/Catch2: pure C, single source file, no C++
toolchain dependency, runs anywhere coturn does (incl. exotic CMake
targets like Solaris/AIX). GoogleTest would force `extern "C"` wrappers
and a C++ compiler everywhere.
2026-04-20 21:15:12 -07:00
Pavel Punsky c1518d5f2a Drop udp_relay_servers_number config and clean up dead UDP id-space (#1874)
## Summary
- Remove the `udp-relay-servers` config knob and the
`udp_relay_servers_number` field — after #1849 left only the
`PER_THREAD` UDP engine, this knob is no longer wired to anything that
creates extra UDP relay threads.
- Delete the now-orphaned `udp_relay_servers[]` array, the
`TURNSERVER_ID_BOUNDARY_BETWEEN_TCP_AND_UDP` / `..._UDP_AND_TCP` macros,
and the `id >= boundary` branch in `get_relay_server()`. The array was
read but never written anywhere in the tree, so the branch was
unreachable dead code.
- Drop a stray unused `char s[257]` in
`dbd_redis.c::redis_list_secrets`.
- Adjust the startup banner to log "Total relay threads" (was "Total
General servers" + a never-fired "Total UDP servers" block).

## Test plan
- [x] `cmake .. && make -j8` clean build
- [x] `examples/scripts/rfc5769.sh` — all RFC 5769 conformance vectors
pass
- [x] `examples/scripts/basic/relay.sh` + `udp_c2c_client.sh` —
12000/12000 msgs, 0 lost, 0 dropped
2026-04-19 19:37:52 -07:00
Pavel Punsky c8b3dd6513 Merge 10 fuzz targets into FuzzStun and FuzzStunClient via dispatcher (#1873)
Upstream OSS-Fuzz build recipe
(google/oss-fuzz/projects/coturn/build.sh) only copies two fuzzer
binaries -- FuzzStun and FuzzStunClient -- and their seed corpora into
$OUT. The eight additional fuzz targets added later never ran on
oss-fuzz.com, which is why the introspector profile reports "fuzzer no
longer available" for them.

Rather than patching the Google-owned build recipe, fold all fuzzers
into the two binaries OSS-Fuzz actually ships. Each target now begins
with a single-byte selector (Data[0] mod 5) that dispatches to one of
five sub-harnesses:

  FuzzStun        - integrity (SHA1/multi-SHA), attr_iter, attr_add,
                    old_stun
  FuzzStunClient  - stun_client, channel_data, addr_codec, oauth_token,
                    oauth_roundtrip

No upstream OSS-Fuzz changes are required.
2026-04-19 13:00:19 -07:00
Pavel Punsky 4f8385e142 Fix build failure: define _GNU_SOURCE for recvmmsg() on Linux (#1868)
## Summary

- Fixes compilation error on Linux when `_GNU_SOURCE` is not defined by
the toolchain: `struct mmsghdr` has incomplete type and `recvmmsg()` is
implicitly declared
- Defines `_GNU_SOURCE` in three places for full coverage across build
systems:
- `dtls_listener.c` — before includes, guarded by `#if
defined(__linux__)`
- `configure` — adds `-D_GNU_SOURCE` to `OSCFLAGS` on Linux for the
legacy build path
  - `CMakeLists.txt` — adds `-D_GNU_SOURCE` on Linux for the CMake build

## Context

The `recvmmsg()` batched receive path added in #1852 uses `struct
mmsghdr` and `recvmmsg()`, which are glibc extensions requiring
`_GNU_SOURCE`. Some Linux distros/toolchains don't define this
implicitly, causing:

```
src/apps/relay/dtls_listener.c:129:18: error: array type has incomplete element type 'struct mmsghdr'
src/apps/relay/dtls_listener.c:748:18: warning: implicit declaration of function 'recvmmsg'
```
Fixes #1867

## Test plan

- [x] Verified CMake build succeeds on macOS (recvmmsg code is `#if
defined(__linux__)` guarded — no effect on non-Linux)
- [x] Verify build succeeds on Linux with and without `_GNU_SOURCE` in
the environment
- [x] Verify both `cmake` and `./configure && make` build paths work

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 22:08:11 -07:00
Pavel Punsky c37ccf4df9 Pin session origin only after MESSAGE-INTEGRITY validates (#1871)
The first ALLOCATE set ss->origin_set=1 before check_stun_auth ran, so
an unauthenticated attacker could lock the session into a realm of their
choice by forging the ORIGIN attribute on the first packet. If per-realm
ACLs differ, this lets the attacker pick the most permissive realm for
that session.

Defer the commit of ss->origin_set until check_stun_auth succeeds with a
valid MESSAGE-INTEGRITY. Until auth passes, every request re-parses
ORIGIN, so the 401 challenge still carries the correct realm derived
from the current ORIGIN attribute.
2026-04-18 17:16:47 -07:00
Pavel Punsky 4d0b3c7660 Abort on malformed allowed/denied-peer-ip at startup (#1872)
A bad value like CIDR notation in allowed-peer-ip or denied-peer-ip was
silently dropped: add_ip_list_range returned -1 but the config parser
kept going, leaving the intended whitelist or blocklist partial.
Operators expecting denied-peer-ip=10.0.0.0/8 would end up with no block
at all, enabling SSRF-via-TURN to internal networks.

Fail closed: log the offending value and exit, so the problem is visible
at startup. CIDR parsing is not added (separate feature).
2026-04-18 17:10:50 -07:00
Pavel Punsky f707471ffd Fix format-string injection in Redis DB driver (#1870)
snprintf-then-redisCommand(rc, s) passed attacker-influenced bytes (STUN
USERNAME/REALM, admin CLI inputs) as the printf format string to
hiredis. A `%s`/`%n`/`%x` byte in a REALM attribute would cause stack
misread or a write primitive.

Replace every call site with redisCommand(rc, FORMAT, args) so user
bytes are arguments, never the format string.
2026-04-18 17:09:14 -07:00
Pavel Punsky dbc2884096 Use constant-time compare for STUN MESSAGE-INTEGRITY HMAC (#1869)
memcmp short-circuits on first differing byte, letting an attacker
recover a valid HMAC byte-by-byte via response-time differences. Switch
to CRYPTO_memcmp, which is constant-time regardless of the first
mismatching byte.
2026-04-18 17:08:46 -07:00
tyranron c3a17d06fd Update Alpine to 3.23.4 version to fix CVEs in Docker image
musl
- CVE-2026-6042
- CVE-2026-40200

openssl
- CVE-2026-31790
- CVE-2026-28387
- CVE-2026-28388
- CVE-2026-28389
- CVE-2026-28390
- CVE-2026-31789

zlib
- CVE-2026-22184
- CVE-2026-27171
docker/4.10.0-r1
2026-04-16 12:01:21 +03:00
tyranron 315e185591 Upgrade Docker image to 4.10.0 Coturn version docker/4.10.0-r0 2026-04-14 11:58:48 +03:00
dependabot[bot] eec3b277ed Upgrade softprops/action-gh-release from 2 to 3 version (#1866)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 10:43:37 +02:00
Pavel Punsky 8e2c575229 Update version to 4.10.0 (#1864) 4.10.0 2026-04-13 15:16:42 -07:00
Pavel Punsky 14572fa091 Skip response buffer allocation for STUN indications (#1863)
## Summary

- Skip allocating a 65 KB response buffer for STUN indications (SEND,
DATA, BINDING indication) in `read_client_connection()` — indications
never produce a response, so the buffer was immediately freed
- Guard the unknown-attributes error-response block in
`handle_turn_command()` with a NULL check on `nbh` to match

## Motivation

On the UDP data-relay hot path, every SEND indication triggered a
pool-get + pool-put cycle for a response buffer that was never used.
This is the highest-frequency STUN command type during active media
relay. The change eliminates one unnecessary 65 KB buffer round-trip per
SEND indication.

## Test plan

- [ ] Build passes clean (`cmake .. && make -j$(nproc)`)
- [ ] Run RFC 5769 conformance tests (`examples/scripts/rfc5769.sh`)
- [ ] Run basic UDP relay test to verify SEND indications still relay
data correctly
- [ ] Verify STUN requests (ALLOCATE, REFRESH, BINDING request) still
receive proper error responses
2026-04-12 22:49:44 -07:00
Pavel Punsky 5379b3ac63 Fix windows build (#1865) 2026-04-12 22:15:01 -07:00
Pavel Punsky f233910ef6 Remove unused mutex from ur_map structure (#1861)
Remove the unused mutex field and associated lock/unlock functions from
the `ur_map` structure.

- Removed `TURN_MUTEX_DECLARE(mutex)` field from `struct _ur_map`
- Removed mutex initialization in `ur_map_init()`
- Removed mutex destruction in `ur_map_free()`
- Removed `ur_map_lock()` and `ur_map_unlock()` functions that were not
being used

This cleanup reduces unnecessary synchronization overhead and simplifies
the codebase.
2026-04-12 20:00:44 -07:00
Pavel Punsky eaa9e7920e Merge commit from fork
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:00:20 -07:00