mirror of
https://github.com/swift-server/swift-aws-lambda-runtime.git
synced 2026-05-03 07:22:27 +00:00
102f92aafb
This PR fixes a race condition in `LambdaRuntimeClient` that causes a
fatal crash when an old channel's `closeFuture` callback fires after a
new connection has been established. The fix adds proper channel
lifecycle tracking and replaces the fatal error with graceful handling.
## Problem
**Crash Location**: `LambdaRuntimeClient.swift:270` in `channelClosed()`
**Error Message**:
```
Fatal error: Invalid state: connected(SocketChannel { ... }), closed
```
**Root Cause**: Race condition where:
1. An old channel's `closeFuture` callback fires
2. AFTER a new connection has been established (`connectionState =
.connected`)
3. BUT `closingState` is `.closed` from a previous close operation
4. The code asserted this was impossible and crashed with `fatalError`
This can occur when:
- Network conditions cause delayed channel cleanup
- Connection is recycled quickly (old channel still closing while new
one connects)
- Timing issues between channel close callbacks and new connection
establishment
## Solution
### Key Changes
1. **Added channel identity tracking**:
```swift
private var channelsBeingClosed: Set<ObjectIdentifier> = []
```
Tracks which channels are in the process of closing to distinguish old
channels from the current one.
2. **Enhanced `connectionWillClose()`**:
- Marks channels as "being closed" using `ObjectIdentifier`
- Adds logging when old channels close while new connection is active
3. **Rewrote `channelClosed()` with defensive logic**:
- **Early return for tracked old channels**: Handles them gracefully
without affecting current connection
- **Replaced `fatalError` with warning log**: The `(_, .closed)` case
now logs a warning instead of crashing
- **Channel identity checks**: Only transitions state if the closing
channel is the CURRENT channel
- **Removed unconditional state change**: Previously set
`connectionState = .disconnected` for ANY channel close, now only for
the current channel
### Why This Fixes the Bug
The fix addresses the race condition by:
- Distinguishing between "current channel closing" vs "old channel
closing"
- Handling old channel closes gracefully without crashing or corrupting
state
- Not overwriting connection state when old channels close
- Providing visibility through logging when the race condition occurs
## Changes
### Modified Files
- **Sources/AWSLambdaRuntime/HTTPClient/LambdaRuntimeClient.swift**
- Added `channelsBeingClosed: Set<ObjectIdentifier>` property
- Enhanced `connectionWillClose()` with channel tracking
- Rewrote `channelClosed()` with defensive logic and identity checks
- Replaced `fatalError` with warning log for unexpected states
- Removed unconditional state change in `closeFuture` callback
**Lines Changed**: ~150 lines modified/added
**Backward Compatibility**: ✅ Fully compatible, no API changes
## Testing
### ✅ All Existing Tests Pass
```bash
swift test
# Result: 91 tests passed in 14 suites
```
All original functionality is preserved with no regressions.
### ⚠️ Note on Test Coverage
While we cannot reproduce the exact race condition from bug #624 in a
deterministic test (it requires specific network timing), the fix:
- Is logically sound for the described race condition
- Improves defensive programming around channel lifecycle
- Replaces a fatal crash with graceful handling + logging
- Should prevent the crash by properly tracking channel identity
## Related Issues
Fixes #624
---------
Co-authored-by: Sebastien Stormacq <stormacq@amazon.lu>