[docs](trx-rs): deep codebase review with updated architecture and improvement plan

Architecture.md: added detailed component notes covering rig_task internals,
audio pipeline, remote client dual-connection model, FrontendRuntimeContext
field groups, decoder implementation patterns, and FT-817 backend workarounds.

Improvement-Areas.md: added 10 new findings from deep review including LIFO
command batching, unbounded decoder history, missing jitter in backoff, rig
task crash recovery, SoapySdrRig constructor complexity, and protocol versioning.

CLAUDE.md: refreshed review observations with accurate LOC counts, prioritized
improvement items (P1/P2/P3), and new strengths identified.

https://claude.ai/code/session_011aiY4GfrmDUrpYVvEUGNGm
Signed-off-by: Claude <noreply@anthropic.com>
This commit is contained in:
Claude
2026-03-28 09:52:08 +00:00
committed by Stan Grams
parent b3e7c22260
commit 9df71cf36c
4 changed files with 314 additions and 21 deletions
+35 -18
View File
@@ -104,27 +104,44 @@ Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`. Use `(trx-rs
## Codebase Review Observations
Full architecture documentation: `docs/architecture.md`
Improvement plan: `docs/improvement-plan.md`
Full architecture documentation: `docs/Architecture.md`
Improvement plan: `docs/Improvement-Areas.md`
*Last reviewed: 2026-03-28*
### Strengths
- **Explicit state machine**: `RigMachineState` FSM prevents invalid states with a deterministic transition table and exhaustive matching. Well-tested with lifecycle, error recovery, and invalid transition tests.
- **Trait-based polymorphism**: Clean abstraction boundaries (`RigCat`, `RigSdr`, `AudioSource`, `RigListener`, `RigCommandHandler`, `CommandExecutor`, `TokenValidator`, `FrontendSpawner`) enable loose coupling and testability.
- **Multi-rig architecture**: Per-rig task isolation with `HashMap<rig_id, RigHandle>` routing, per-rig state/spectrum/audio watch channels, and backward-compatible single-rig mode.
- **Async concurrency model**: Proper use of tokio channels -- `watch` for state snapshots, `broadcast` for PCM/decode fan-out, `mpsc` for commands. No mutex contention on hot paths.
- **Comprehensive SDR support**: Full DSP pipeline with multi-mode demodulation (SSB, AM, SAM, FM, WFM, AIS, VDES), virtual channel management, squelch, noise blanker, spectrum FFT, RDS decoding.
- **Pure Rust decoders**: FT8/FT4/FT2, APRS, CW, WSPR, AIS, VDES, RDS -- all implemented without C FFI dependencies.
- **Good test coverage** in protocol layer: codec, mapping, auth all have thorough unit tests with round-trip verification.
- **Feature-gated backends**: ft817, ft450d, soapysdr compiled conditionally to minimize binary size.
- **Explicit state machine**: `RigMachineState` FSM (7 states) prevents invalid states with a deterministic transition table and exhaustive matching. Well-tested with lifecycle, error recovery, and invalid transition tests. `ReadyStateData`/`TransmittingStateData` use `pub(crate)` fields with controlled accessors.
- **Trait-based polymorphism**: Clean abstraction boundaries (`RigCat`, `RigSdr`, `AudioSource`, `RigListener`, `RigCommandHandler`, `CommandExecutor`, `TokenValidator`, `FrontendSpawner`) enable loose coupling and testability. `RigCat`/`RigSdr` split cleanly separates CAT ops from SDR-specific methods.
- **Multi-rig architecture**: Per-rig task isolation with `HashMap<rig_id, RigHandle>` routing, per-rig state/spectrum/audio/decoder-history channels, dual-connection model (main + spectrum) in the client, and backward-compatible single-rig mode.
- **Async concurrency model**: Proper use of tokio channels -- `watch` for state snapshots, `broadcast` for PCM/decode fan-out, `mpsc` for commands. No mutex contention on hot paths. Spectrum deduplication collapses concurrent GetSpectrum requests.
- **Comprehensive SDR support**: Full DSP pipeline with multi-mode demodulation (SSB, AM, SAM, FM, WFM, AIS, VDES), virtual channel management, squelch, noise blanker, spectrum FFT, RDS decoding. AVX2-optimized FM discriminator with scalar fallbacks.
- **Pure Rust decoders**: FT8/FT4/FT2, APRS, CW, WSPR, AIS, VDES, RDS -- all implemented without C FFI dependencies. Consistent decoder pattern: stateful struct → `process_block()``decode_if_ready()`.
- **Good test coverage** in protocol layer: codec, mapping, auth all have thorough unit tests with round-trip verification. 45+ mapping tests cover all command variants.
- **Feature-gated backends**: ft817, ft450d, soapysdr compiled conditionally to minimize binary size. Factory pattern with name normalization for registration.
- **Defensive error handling**: Lock poisoning recovery, stream error deduplication with 60s summaries, input truncation in logs (128 chars), per-IP rate limiting on auth endpoints.
- **Well-documented DSP guidelines**: `docs/Optimization-Guidelines.md` captures lessons on NCO design, polyphase resampling, AVX2 batching, and stereo FM decoding.
### Areas for Improvement
- **FrontendRuntimeContext** (`trx-frontend/src/lib.rs`) is a 60+ field god-struct mixing audio, decode, auth, UI config, and per-rig routing. Should be decomposed into sub-structs.
- **Dual command enums**: `ClientCommand` and `RigCommand` are near-identical 40+ variant enums with mechanical 1:1 mapping in `mapping.rs`. Adding a command requires 4-file changes.
- **Command handler boilerplate**: 11 implementations of `RigCommandHandler` follow identical patterns across 500+ lines. A macro could reduce this to ~100 lines.
- **No integration tests** for `rig_task.rs` (1050 lines) or `audio.rs` (850+ lines) -- the most critical server components.
- **No command execution timeouts** at the `CommandExecutor` level. Backend stalls propagate up. The 10s timeout only exists at the rig_task layer.
- **Event emitter not panic-safe**: If one `RigListener` panics, remaining listeners in the notification loop are skipped.
- **Sparse structured logging** in trx-core: only one `debug!()` call. State transitions, command execution, and retries are not instrumented.
- **No command rate limiting**: rapid-fire commands from clients can overwhelm slow serial CAT interfaces.
**P1 — High:**
- **FrontendRuntimeContext** (`trx-frontend/src/lib.rs`) is a ~50-field god-struct mixing audio channels, decode histories (9 types), auth config (7 fields), UI settings, rig routing, virtual channels, and branding. Should be decomposed into sub-structs (see `docs/Improvement-Areas.md`).
- **Rig task command batching uses LIFO** (`rig_task.rs`): `batch.pop()` reverses arrival order. Commands execute newest-first, causing unexpected transient states.
- **Decoder history unbounded** (`audio.rs`): No capacity limit on `VecDeque` queues; only 24h time-based pruning. Busy AIS channels can exhaust memory.
- **ExponentialBackoff has no jitter** (`policies.rs`): All rigs/clients retry at identical times after a server restart (thundering herd).
- **No rig task crash recovery** (`main.rs`): If a rig task panics, it silently disappears. No supervisor, no restart, no health monitoring.
**P2 — Medium:**
- **Dual command enums**: `ClientCommand` and `RigCommand` are near-identical 40+ variant enums with mechanical 1:1 mapping in `mapping.rs` (675 lines). Adding a command requires 4-file changes. `GetRigs` triggers `unreachable!()`.
- **SoapySdrRig 20-parameter constructor**: No builder pattern, fragile call sites.
- **Lock poisoning recovery hides panics**: `unwrap_or_else(|e| e.into_inner())` throughout `audio.rs` silently continues with potentially inconsistent data.
- **Hardcoded timeouts**: 10+ timeout/retention constants scattered across files, none configurable via TOML.
- **Config duplication**: `config.rs` in server (1,512 LOC) and client (1,181 LOC) mirror many structs.
**P3 — Low:**
- **Command handler boilerplate**: 11 `RigCommandHandler` impls follow identical patterns across 500+ lines. Macro opportunity.
- **No integration tests** for `rig_task.rs` (1,315 LOC) or `audio.rs` (3,977 LOC) — the two largest server modules.
- **No command execution timeouts** at the `CommandExecutor` level. Backend stalls propagate up.
- **FT-817 VFO inference fragile**: Fails when VFO A and B share the same frequency.
- **VDES decoder incomplete**: Turbo FEC and link-layer parsing not implemented.
- **No protocol versioning**: Unknown commands cause parse errors with no graceful degradation.
+87
View File
@@ -1003,3 +1003,90 @@ DecoderHistories buffer ↓
listener connections ↓
stream decoder messages HTTP WebSocket / local speakers
```
---
## Detailed Component Notes
### Rig Task Internals (`rig_task.rs` — 1,315 lines)
The rig task is the heart of the server. Key implementation details:
- **Command batching**: Accumulates pending requests before processing sequentially. Uses `batch.pop()` (LIFO order, not FIFO).
- **Spectrum deduplication**: Concurrent `GetSpectrum` requests are collapsed — one DSP computation broadcasts to all waiting responders.
- **Adaptive polling**: Poll interval adjusts based on TX state (100ms during TX, 500ms idle).
- **Grace period**: 800ms pause on polling after power-on/off operations to let hardware settle.
- **VFO priming**: Optional initialization sequence that toggles VFO A/B to populate the state cache.
- **Per-rig decoder histories**: Each rig maintains independent `Arc<DecoderHistories>` for all 11 decoder types.
- **Timeout enforcement**: Commands have `COMMAND_EXEC_TIMEOUT` (10s), polling has `POLL_REFRESH_TIMEOUT` (8s).
### Audio Pipeline (`audio.rs` — 3,977 lines)
The audio module handles decoder history storage and stream management:
- **`DecoderHistories`**: Per-rig mutable store for 11 decoder history queues (AIS, VDES, APRS, HF_APRS, CW, FT8, FT4, FT2, WSPR, WXSAT, LRPT).
- **Time-based retention**: 24h TTL on all history with periodic pruning.
- **Atomic total count**: `AtomicUsize` with CAS loop avoids acquiring 11 mutex locks in `snapshot_all()`.
- **Lock poisoning tolerance**: Uses `.unwrap_or_else(|e| e.into_inner())` to survive poisoned mutexes.
- **`StreamErrorLogger`**: Suppresses duplicate stream errors with 60s periodic summaries and error classification (alsa_poll_failure, input/output_stream_error).
- **CRC filtering**: APRS records filtered by `crc_ok` before storage.
### Remote Client Dual-Connection Model
`remote_client.rs` maintains two independent TCP connections to the server:
1. **Main connection** (port 4530): State polling, command forwarding, rig discovery.
2. **Spectrum connection** (dedicated): Polls `GetSpectrum` at 50ms intervals (20 fps) independently to avoid blocking the main connection during command processing.
Constants: `CONNECT_TIMEOUT: 5s`, `IO_TIMEOUT: 15s`, `SPECTRUM_IO_TIMEOUT: 3s`. Exponential backoff with jitter on reconnect.
### FrontendRuntimeContext Field Groups (~50 fields)
The `FrontendRuntimeContext` struct in `trx-frontend/src/lib.rs` organizes into logical groups:
| Group | Fields | Examples |
|-------|--------|---------|
| Audio/decode channels | 13 | `audio_rx`, `decode_rx`, `ais_history`, `ft8_history` |
| Client tracking | 4 | `sse_clients`, `rigctl_clients`, `audio_clients` |
| Connection state | 4 | `server_connected`, `rig_server_connected` |
| HTTP auth config | 7 | `http_auth_enabled`, `http_auth_session_ttl_secs` |
| HTTP UI config | 6 | `http_initial_map_zoom`, `http_spectrum_usable_span_ratio` |
| Remote rig management | 4 | `remote_active_rig_id`, `remote_rigs`, `rig_states` |
| Owner/branding | 4 | `owner_callsign`, `owner_website_url`, `ais_vessel_url_base` |
| Spectrum & per-rig audio | 4 | `spectrum`, `rig_audio_rx`, `rig_audio_info` |
| Virtual channel audio | 4 | `rig_vchan_audio_cmd`, `vchan_audio`, `vchan_destroyed` |
### Decoder Implementation Patterns
All real-time decoders follow a consistent pattern:
```rust
// 1. Stateful decoder struct with sample buffer
pub struct XxxDecoder { sample_buf: Vec<f32>, ... }
// 2. Block/sample processing
pub fn process_block(&mut self, samples: &[f32]) { ... }
// 3. Result extraction
pub fn decode_if_ready(&mut self) -> Vec<XxxResult> { ... }
```
| Decoder | Algorithm | Sample Rate | Key Constants |
|---------|-----------|-------------|---------------|
| FT8/FT4/FT2 | Waterfall + LDPC/OSD | Varies | MAX_LDPC_ITERATIONS=20, MAX_CANDIDATES=120 |
| CW | Goertzel tone detection | Varies | 10ms windows, tone range 3001200 Hz |
| APRS | Bell 202 AFSK (1200/2200 Hz) | 9600 | HDLC framing, NRZI, CRC-16-CCITT |
| AIS | GMSK 9600 baud | 9600 | Narrowband FM input |
| WSPR | Fano decoder | 12000 | 162 symbols, 120s slot, 1.46 Hz spacing |
| RDS | RRC matched filter + Costas PLL | Native | 57 kHz subcarrier, 1187.5 bps, OSD FEC |
| VDES | pi/4-QPSK 76.8 ksps | 100k | Burst detection, partial Turbo FEC |
### Backend Reliability Workarounds (FT-817)
The FT-817 CAT backend (`trx-backend-ft817/`) includes empirical workarounds for hardware quirks:
- **Duplicate frame sends**: `set_mode()` and `set_ptt()` send CAT frames twice with 80ms delay (radio sometimes drops first frame).
- **Panel unlock before commands**: Clears stale bytes from the serial buffer.
- **Power-on dummy frame**: CPU wakes before CAT framing locks; dummy frame ensures readiness.
- **VFO state inference**: Infers VFO A/B by matching frequencies against cached values (fragile when frequencies collide).
- **Read timeout**: 800ms per CAT read operation (not configurable).
+1 -1
View File
@@ -11,4 +11,4 @@ and remote control are exposed elsewhere on the network.
- [Architecture](Architecture) — system design, crate layout, data flow, and internals
- [Optimization Guidelines](Optimization-Guidelines) — performance guidelines for the real-time DSP pipeline
- [Planned Features](Planned-Features) — planned features and design notes
- [Improvement Areas](Improvement-Areas) — codebase audit: quality, architecture, security, and performance
- [Improvement Areas](Improvement-Areas) — codebase audit: quality, architecture, security, performance, and improvement plan
+191 -2
View File
@@ -4,7 +4,7 @@ A comprehensive audit of the trx-rs codebase covering code quality, architecture
security, testing, and performance. Each item includes the affected location and
a suggested fix.
*Last updated: 2026-03-26*
*Last updated: 2026-03-28*
---
@@ -122,4 +122,193 @@ No plugin API version, capability manifest, or unload/reload semantics. Old
plugins break silently on API changes.
**Fix:** Add a version field to the registration struct and reject incompatible
plugins at load time.
plugins at load time.
---
## New Findings (2026-03-28 Deep Review)
### High Priority (P1)
#### Rig task command batching uses LIFO order
**Location:** `src/trx-server/src/<rig_task.rs>``batch.pop()`
Pending commands are accumulated into a `Vec` and processed with `pop()`, which
reverses arrival order. If a client sends `SetFreq(14.074)` then `SetMode(USB)`,
the mode change executes before the frequency change. This can cause unexpected
transient state on hardware that validates mode against frequency (e.g. FT-817
rejects CW below 1.8 MHz).
**Fix:** Replace `pop()` with `drain(..)` or iterate in forward order.
#### FrontendRuntimeContext god-struct (~50 fields)
**Location:** `src/trx-client/trx-frontend/src/<lib.rs>`
Mixes audio channels, decode histories, auth config, UI settings, rig routing,
virtual channel management, and branding info into a single struct passed
through `Arc`. Every frontend receives all 50 fields even if it only needs a
subset. Changes to any field group force recompilation of all frontends.
**Suggested decomposition:**
```
FrontendRuntimeContext
├── AudioContext (audio_rx, audio_tx, audio_info, decode_rx)
├── DecodeHistoryContext (ais, vdes, aprs, hf_aprs, cw, ft8, ft4, ft2, wspr)
├── HttpAuthConfig (enabled, passphrases, session_ttl, cookie settings)
├── HttpUiConfig (map_zoom, spectrum settings, history retention)
├── RigRoutingContext (active_rig_id, remote_rigs, rig_states, rig_spectrums)
├── OwnerInfo (callsign, website_url, website_name, ais_vessel_url)
└── VChanContext (vchan_audio, vchan_audio_cmd, vchan_destroyed)
```
#### Decoder history queues have no capacity bounds
**Location:** `src/trx-server/src/<audio.rs>``DecoderHistories`
History queues (`VecDeque`) grow unbounded until the 24h retention period expires.
Under high traffic (e.g. busy AIS channel near a port), a single queue could
accumulate millions of entries and consume gigabytes of memory.
**Fix:** Add per-decoder max capacity (e.g. 10,000 entries). Evict oldest entries
when capacity is reached, independent of time-based pruning.
#### ExponentialBackoff has no jitter
**Location:** `src/trx-core/src/rig/controller/<policies.rs>`
Multiple rigs or reconnecting clients using the same backoff parameters will retry
at identical times (thundering herd). This is especially problematic when a server
restarts and all clients reconnect simultaneously.
**Fix:** Add randomized jitter (e.g. ±25% of the computed delay) to the
`ExponentialBackoff::delay()` method.
#### No crash recovery for rig tasks
**Location:** `src/trx-server/src/<main.rs>`
If a rig task panics (e.g. due to an unexpected backend error), the task simply
disappears. The listener continues routing commands to the dead rig's channel,
where they silently timeout. No automatic restart or health monitoring exists.
**Fix:** Wrap rig tasks in a supervisor loop that detects task completion/panic
and restarts with backoff. Emit a `RigMachineState::Error` on the watch channel
so clients see the failure.
---
### Medium Priority (P2)
#### SoapySdrRig constructor takes 20+ parameters
**Location:** `src/trx-server/trx-backend/trx-backend-soapysdr/src/<lib.rs>`
`new_with_config()`
The constructor accepts 20+ positional parameters with no builder pattern,
making call sites fragile and hard to read. Adding a new parameter requires
updating all callers.
**Fix:** Introduce a `SoapySdrConfig` builder struct with sensible defaults.
#### Dual command enums with mechanical 1:1 mapping
**Location:** `src/trx-protocol/src/<mapping.rs>` (675 lines),
`src/trx-protocol/src/<types.rs>`, `src/trx-core/src/rig/<command.rs>`
`ClientCommand` and `RigCommand` are near-identical 40+ variant enums with
purely mechanical mapping in `mapping.rs`. Adding a new command requires editing
4 files (command.rs, types.rs, mapping.rs in both directions, codec.rs).
`mapping.rs` contains an `unreachable!()` for `GetRigs` that would panic if
the listener logic changes.
**Fix:** Consider a macro that generates both enums and the mapping from a single
definition. Alternatively, collapse to a single enum with serde annotations.
#### Lock poisoning recovery hides panics
**Location:** `src/trx-server/src/<audio.rs>``DecoderHistories`
All mutex acquisitions use `.unwrap_or_else(|e| e.into_inner())` which silently
recovers from poisoned mutexes. While this prevents cascading panics, it hides
the original panic and may operate on inconsistent data.
**Fix:** Log a warning when recovering from a poisoned lock, and consider whether
the recovered data is actually safe to use. For history queues, clearing the
queue on poison recovery may be safer than continuing with partial data.
#### Configuration duplication between server and client
**Location:** `src/trx-server/src/<config.rs>` (1,512 lines),
`src/trx-client/src/<config.rs>` (1,181 lines)
14 config structs each, many mirrored between server and client (GeneralConfig,
rig model definitions, defaults). Shared config definitions should live in
`trx-app`.
#### Hardcoded timeouts and retention periods
**Locations:** Multiple files
| Constant | Value | Location |
|----------|-------|----------|
| COMMAND_EXEC_TIMEOUT | 10s | rig_task.rs |
| POLL_REFRESH_TIMEOUT | 8s | rig_task.rs |
| IO_TIMEOUT | 10s | listener.rs |
| REQUEST_TIMEOUT | 12s | listener.rs |
| History retention | 24h | audio.rs |
| FT-817 read timeout | 800ms | trx-backend-ft817 |
| RIG_TASK_CHANNEL_BUFFER | 32 | main.rs |
None are configurable. Making these part of the TOML config would help
deployments with slow serial links or high-latency networks.
---
### Low Priority (P3)
#### FT-817 VFO state inference is fragile
**Location:** `src/trx-server/trx-backend/trx-backend-ft817/src/<lib.rs>`
VFO state starts as `Unknown` and is inferred by matching frequencies against
cached values. When VFO A and VFO B are set to the same frequency, inference
fails and the rig may report the wrong VFO.
**Fix:** Some FT-817 variants support extended status bytes that indicate active
VFO directly. Detect firmware version and use the direct query when available.
#### VDES decoder has incomplete FEC
**Location:** `src/decoders/trx-vdes/src/<lib.rs>`
The VDES decoder implements burst detection and pi/4-QPSK demodulation but the
Turbo FEC (1/2 rate) decoder and link-layer (M.2092-1) parser are not complete.
Decoded output is limited to raw symbols.
#### No forward compatibility in protocol
**Location:** `src/trx-protocol/src/<codec.rs>`
Unknown commands cause parse errors. There is no protocol version field in the
envelope, making it impossible for older clients to gracefully degrade when
connecting to newer servers.
**Fix:** Add an optional `protocol_version` field to `ClientEnvelope`. Unknown
commands should return an error response rather than a parse failure.
#### Command handler boilerplate
**Location:** `src/trx-core/src/rig/controller/<handlers.rs>`
11 `RigCommandHandler` implementations follow identical patterns across 500+
lines (validate state → call executor method → return result). A declarative
macro could reduce this to ~100 lines.
#### Missing tests for critical paths
Serial backends (FT-817, FT-450D), plugin loading/discovery, and the audio
pipeline (Opus encode/decode) have no or minimal test coverage. `rig_task.rs`
(1,315 lines) and `audio.rs` (3,977 lines) — the two largest server modules —
have no integration tests.