[docs](trx-rs): deep codebase review with updated architecture and improvement plan
Architecture.md: added detailed component notes covering rig_task internals, audio pipeline, remote client dual-connection model, FrontendRuntimeContext field groups, decoder implementation patterns, and FT-817 backend workarounds. Improvement-Areas.md: added 10 new findings from deep review including LIFO command batching, unbounded decoder history, missing jitter in backoff, rig task crash recovery, SoapySdrRig constructor complexity, and protocol versioning. CLAUDE.md: refreshed review observations with accurate LOC counts, prioritized improvement items (P1/P2/P3), and new strengths identified. https://claude.ai/code/session_011aiY4GfrmDUrpYVvEUGNGm Signed-off-by: Claude <noreply@anthropic.com>
This commit is contained in:
+191
-2
@@ -4,7 +4,7 @@ A comprehensive audit of the trx-rs codebase covering code quality, architecture
|
||||
security, testing, and performance. Each item includes the affected location and
|
||||
a suggested fix.
|
||||
|
||||
*Last updated: 2026-03-26*
|
||||
*Last updated: 2026-03-28*
|
||||
|
||||
---
|
||||
|
||||
@@ -122,4 +122,193 @@ No plugin API version, capability manifest, or unload/reload semantics. Old
|
||||
plugins break silently on API changes.
|
||||
|
||||
**Fix:** Add a version field to the registration struct and reject incompatible
|
||||
plugins at load time.
|
||||
plugins at load time.
|
||||
|
||||
---
|
||||
|
||||
## New Findings (2026-03-28 Deep Review)
|
||||
|
||||
### High Priority (P1)
|
||||
|
||||
#### Rig task command batching uses LIFO order
|
||||
|
||||
**Location:** `src/trx-server/src/<rig_task.rs>` — `batch.pop()`
|
||||
|
||||
Pending commands are accumulated into a `Vec` and processed with `pop()`, which
|
||||
reverses arrival order. If a client sends `SetFreq(14.074)` then `SetMode(USB)`,
|
||||
the mode change executes before the frequency change. This can cause unexpected
|
||||
transient state on hardware that validates mode against frequency (e.g. FT-817
|
||||
rejects CW below 1.8 MHz).
|
||||
|
||||
**Fix:** Replace `pop()` with `drain(..)` or iterate in forward order.
|
||||
|
||||
#### FrontendRuntimeContext god-struct (~50 fields)
|
||||
|
||||
**Location:** `src/trx-client/trx-frontend/src/<lib.rs>`
|
||||
|
||||
Mixes audio channels, decode histories, auth config, UI settings, rig routing,
|
||||
virtual channel management, and branding info into a single struct passed
|
||||
through `Arc`. Every frontend receives all 50 fields even if it only needs a
|
||||
subset. Changes to any field group force recompilation of all frontends.
|
||||
|
||||
**Suggested decomposition:**
|
||||
```
|
||||
FrontendRuntimeContext
|
||||
├── AudioContext (audio_rx, audio_tx, audio_info, decode_rx)
|
||||
├── DecodeHistoryContext (ais, vdes, aprs, hf_aprs, cw, ft8, ft4, ft2, wspr)
|
||||
├── HttpAuthConfig (enabled, passphrases, session_ttl, cookie settings)
|
||||
├── HttpUiConfig (map_zoom, spectrum settings, history retention)
|
||||
├── RigRoutingContext (active_rig_id, remote_rigs, rig_states, rig_spectrums)
|
||||
├── OwnerInfo (callsign, website_url, website_name, ais_vessel_url)
|
||||
└── VChanContext (vchan_audio, vchan_audio_cmd, vchan_destroyed)
|
||||
```
|
||||
|
||||
#### Decoder history queues have no capacity bounds
|
||||
|
||||
**Location:** `src/trx-server/src/<audio.rs>` — `DecoderHistories`
|
||||
|
||||
History queues (`VecDeque`) grow unbounded until the 24h retention period expires.
|
||||
Under high traffic (e.g. busy AIS channel near a port), a single queue could
|
||||
accumulate millions of entries and consume gigabytes of memory.
|
||||
|
||||
**Fix:** Add per-decoder max capacity (e.g. 10,000 entries). Evict oldest entries
|
||||
when capacity is reached, independent of time-based pruning.
|
||||
|
||||
#### ExponentialBackoff has no jitter
|
||||
|
||||
**Location:** `src/trx-core/src/rig/controller/<policies.rs>`
|
||||
|
||||
Multiple rigs or reconnecting clients using the same backoff parameters will retry
|
||||
at identical times (thundering herd). This is especially problematic when a server
|
||||
restarts and all clients reconnect simultaneously.
|
||||
|
||||
**Fix:** Add randomized jitter (e.g. ±25% of the computed delay) to the
|
||||
`ExponentialBackoff::delay()` method.
|
||||
|
||||
#### No crash recovery for rig tasks
|
||||
|
||||
**Location:** `src/trx-server/src/<main.rs>`
|
||||
|
||||
If a rig task panics (e.g. due to an unexpected backend error), the task simply
|
||||
disappears. The listener continues routing commands to the dead rig's channel,
|
||||
where they silently timeout. No automatic restart or health monitoring exists.
|
||||
|
||||
**Fix:** Wrap rig tasks in a supervisor loop that detects task completion/panic
|
||||
and restarts with backoff. Emit a `RigMachineState::Error` on the watch channel
|
||||
so clients see the failure.
|
||||
|
||||
---
|
||||
|
||||
### Medium Priority (P2)
|
||||
|
||||
#### SoapySdrRig constructor takes 20+ parameters
|
||||
|
||||
**Location:** `src/trx-server/trx-backend/trx-backend-soapysdr/src/<lib.rs>`
|
||||
— `new_with_config()`
|
||||
|
||||
The constructor accepts 20+ positional parameters with no builder pattern,
|
||||
making call sites fragile and hard to read. Adding a new parameter requires
|
||||
updating all callers.
|
||||
|
||||
**Fix:** Introduce a `SoapySdrConfig` builder struct with sensible defaults.
|
||||
|
||||
#### Dual command enums with mechanical 1:1 mapping
|
||||
|
||||
**Location:** `src/trx-protocol/src/<mapping.rs>` (675 lines),
|
||||
`src/trx-protocol/src/<types.rs>`, `src/trx-core/src/rig/<command.rs>`
|
||||
|
||||
`ClientCommand` and `RigCommand` are near-identical 40+ variant enums with
|
||||
purely mechanical mapping in `mapping.rs`. Adding a new command requires editing
|
||||
4 files (command.rs, types.rs, mapping.rs in both directions, codec.rs).
|
||||
`mapping.rs` contains an `unreachable!()` for `GetRigs` that would panic if
|
||||
the listener logic changes.
|
||||
|
||||
**Fix:** Consider a macro that generates both enums and the mapping from a single
|
||||
definition. Alternatively, collapse to a single enum with serde annotations.
|
||||
|
||||
#### Lock poisoning recovery hides panics
|
||||
|
||||
**Location:** `src/trx-server/src/<audio.rs>` — `DecoderHistories`
|
||||
|
||||
All mutex acquisitions use `.unwrap_or_else(|e| e.into_inner())` which silently
|
||||
recovers from poisoned mutexes. While this prevents cascading panics, it hides
|
||||
the original panic and may operate on inconsistent data.
|
||||
|
||||
**Fix:** Log a warning when recovering from a poisoned lock, and consider whether
|
||||
the recovered data is actually safe to use. For history queues, clearing the
|
||||
queue on poison recovery may be safer than continuing with partial data.
|
||||
|
||||
#### Configuration duplication between server and client
|
||||
|
||||
**Location:** `src/trx-server/src/<config.rs>` (1,512 lines),
|
||||
`src/trx-client/src/<config.rs>` (1,181 lines)
|
||||
|
||||
14 config structs each, many mirrored between server and client (GeneralConfig,
|
||||
rig model definitions, defaults). Shared config definitions should live in
|
||||
`trx-app`.
|
||||
|
||||
#### Hardcoded timeouts and retention periods
|
||||
|
||||
**Locations:** Multiple files
|
||||
|
||||
| Constant | Value | Location |
|
||||
|----------|-------|----------|
|
||||
| COMMAND_EXEC_TIMEOUT | 10s | rig_task.rs |
|
||||
| POLL_REFRESH_TIMEOUT | 8s | rig_task.rs |
|
||||
| IO_TIMEOUT | 10s | listener.rs |
|
||||
| REQUEST_TIMEOUT | 12s | listener.rs |
|
||||
| History retention | 24h | audio.rs |
|
||||
| FT-817 read timeout | 800ms | trx-backend-ft817 |
|
||||
| RIG_TASK_CHANNEL_BUFFER | 32 | main.rs |
|
||||
|
||||
None are configurable. Making these part of the TOML config would help
|
||||
deployments with slow serial links or high-latency networks.
|
||||
|
||||
---
|
||||
|
||||
### Low Priority (P3)
|
||||
|
||||
#### FT-817 VFO state inference is fragile
|
||||
|
||||
**Location:** `src/trx-server/trx-backend/trx-backend-ft817/src/<lib.rs>`
|
||||
|
||||
VFO state starts as `Unknown` and is inferred by matching frequencies against
|
||||
cached values. When VFO A and VFO B are set to the same frequency, inference
|
||||
fails and the rig may report the wrong VFO.
|
||||
|
||||
**Fix:** Some FT-817 variants support extended status bytes that indicate active
|
||||
VFO directly. Detect firmware version and use the direct query when available.
|
||||
|
||||
#### VDES decoder has incomplete FEC
|
||||
|
||||
**Location:** `src/decoders/trx-vdes/src/<lib.rs>`
|
||||
|
||||
The VDES decoder implements burst detection and pi/4-QPSK demodulation but the
|
||||
Turbo FEC (1/2 rate) decoder and link-layer (M.2092-1) parser are not complete.
|
||||
Decoded output is limited to raw symbols.
|
||||
|
||||
#### No forward compatibility in protocol
|
||||
|
||||
**Location:** `src/trx-protocol/src/<codec.rs>`
|
||||
|
||||
Unknown commands cause parse errors. There is no protocol version field in the
|
||||
envelope, making it impossible for older clients to gracefully degrade when
|
||||
connecting to newer servers.
|
||||
|
||||
**Fix:** Add an optional `protocol_version` field to `ClientEnvelope`. Unknown
|
||||
commands should return an error response rather than a parse failure.
|
||||
|
||||
#### Command handler boilerplate
|
||||
|
||||
**Location:** `src/trx-core/src/rig/controller/<handlers.rs>`
|
||||
|
||||
11 `RigCommandHandler` implementations follow identical patterns across 500+
|
||||
lines (validate state → call executor method → return result). A declarative
|
||||
macro could reduce this to ~100 lines.
|
||||
|
||||
#### Missing tests for critical paths
|
||||
|
||||
Serial backends (FT-817, FT-450D), plugin loading/discovery, and the audio
|
||||
pipeline (Opus encode/decode) have no or minimal test coverage. `rig_task.rs`
|
||||
(1,315 lines) and `audio.rs` (3,977 lines) — the two largest server modules —
|
||||
have no integration tests.
|
||||
Reference in New Issue
Block a user