[docs](trx-rs): deep codebase review with updated architecture and improvement plan
Architecture.md: added detailed component notes covering rig_task internals, audio pipeline, remote client dual-connection model, FrontendRuntimeContext field groups, decoder implementation patterns, and FT-817 backend workarounds. Improvement-Areas.md: added 10 new findings from deep review including LIFO command batching, unbounded decoder history, missing jitter in backoff, rig task crash recovery, SoapySdrRig constructor complexity, and protocol versioning. CLAUDE.md: refreshed review observations with accurate LOC counts, prioritized improvement items (P1/P2/P3), and new strengths identified. https://claude.ai/code/session_011aiY4GfrmDUrpYVvEUGNGm Signed-off-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1003,3 +1003,90 @@ DecoderHistories buffer ↓
|
||||
listener connections ↓
|
||||
stream decoder messages HTTP WebSocket / local speakers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Component Notes
|
||||
|
||||
### Rig Task Internals (`rig_task.rs` — 1,315 lines)
|
||||
|
||||
The rig task is the heart of the server. Key implementation details:
|
||||
|
||||
- **Command batching**: Accumulates pending requests before processing sequentially. Uses `batch.pop()` (LIFO order, not FIFO).
|
||||
- **Spectrum deduplication**: Concurrent `GetSpectrum` requests are collapsed — one DSP computation broadcasts to all waiting responders.
|
||||
- **Adaptive polling**: Poll interval adjusts based on TX state (100ms during TX, 500ms idle).
|
||||
- **Grace period**: 800ms pause on polling after power-on/off operations to let hardware settle.
|
||||
- **VFO priming**: Optional initialization sequence that toggles VFO A/B to populate the state cache.
|
||||
- **Per-rig decoder histories**: Each rig maintains independent `Arc<DecoderHistories>` for all 11 decoder types.
|
||||
- **Timeout enforcement**: Commands have `COMMAND_EXEC_TIMEOUT` (10s), polling has `POLL_REFRESH_TIMEOUT` (8s).
|
||||
|
||||
### Audio Pipeline (`audio.rs` — 3,977 lines)
|
||||
|
||||
The audio module handles decoder history storage and stream management:
|
||||
|
||||
- **`DecoderHistories`**: Per-rig mutable store for 11 decoder history queues (AIS, VDES, APRS, HF_APRS, CW, FT8, FT4, FT2, WSPR, WXSAT, LRPT).
|
||||
- **Time-based retention**: 24h TTL on all history with periodic pruning.
|
||||
- **Atomic total count**: `AtomicUsize` with CAS loop avoids acquiring 11 mutex locks in `snapshot_all()`.
|
||||
- **Lock poisoning tolerance**: Uses `.unwrap_or_else(|e| e.into_inner())` to survive poisoned mutexes.
|
||||
- **`StreamErrorLogger`**: Suppresses duplicate stream errors with 60s periodic summaries and error classification (alsa_poll_failure, input/output_stream_error).
|
||||
- **CRC filtering**: APRS records filtered by `crc_ok` before storage.
|
||||
|
||||
### Remote Client Dual-Connection Model
|
||||
|
||||
`remote_client.rs` maintains two independent TCP connections to the server:
|
||||
|
||||
1. **Main connection** (port 4530): State polling, command forwarding, rig discovery.
|
||||
2. **Spectrum connection** (dedicated): Polls `GetSpectrum` at 50ms intervals (20 fps) independently to avoid blocking the main connection during command processing.
|
||||
|
||||
Constants: `CONNECT_TIMEOUT: 5s`, `IO_TIMEOUT: 15s`, `SPECTRUM_IO_TIMEOUT: 3s`. Exponential backoff with jitter on reconnect.
|
||||
|
||||
### FrontendRuntimeContext Field Groups (~50 fields)
|
||||
|
||||
The `FrontendRuntimeContext` struct in `trx-frontend/src/lib.rs` organizes into logical groups:
|
||||
|
||||
| Group | Fields | Examples |
|
||||
|-------|--------|---------|
|
||||
| Audio/decode channels | 13 | `audio_rx`, `decode_rx`, `ais_history`, `ft8_history` |
|
||||
| Client tracking | 4 | `sse_clients`, `rigctl_clients`, `audio_clients` |
|
||||
| Connection state | 4 | `server_connected`, `rig_server_connected` |
|
||||
| HTTP auth config | 7 | `http_auth_enabled`, `http_auth_session_ttl_secs` |
|
||||
| HTTP UI config | 6 | `http_initial_map_zoom`, `http_spectrum_usable_span_ratio` |
|
||||
| Remote rig management | 4 | `remote_active_rig_id`, `remote_rigs`, `rig_states` |
|
||||
| Owner/branding | 4 | `owner_callsign`, `owner_website_url`, `ais_vessel_url_base` |
|
||||
| Spectrum & per-rig audio | 4 | `spectrum`, `rig_audio_rx`, `rig_audio_info` |
|
||||
| Virtual channel audio | 4 | `rig_vchan_audio_cmd`, `vchan_audio`, `vchan_destroyed` |
|
||||
|
||||
### Decoder Implementation Patterns
|
||||
|
||||
All real-time decoders follow a consistent pattern:
|
||||
|
||||
```rust
|
||||
// 1. Stateful decoder struct with sample buffer
|
||||
pub struct XxxDecoder { sample_buf: Vec<f32>, ... }
|
||||
|
||||
// 2. Block/sample processing
|
||||
pub fn process_block(&mut self, samples: &[f32]) { ... }
|
||||
|
||||
// 3. Result extraction
|
||||
pub fn decode_if_ready(&mut self) -> Vec<XxxResult> { ... }
|
||||
```
|
||||
|
||||
| Decoder | Algorithm | Sample Rate | Key Constants |
|
||||
|---------|-----------|-------------|---------------|
|
||||
| FT8/FT4/FT2 | Waterfall + LDPC/OSD | Varies | MAX_LDPC_ITERATIONS=20, MAX_CANDIDATES=120 |
|
||||
| CW | Goertzel tone detection | Varies | 10ms windows, tone range 300–1200 Hz |
|
||||
| APRS | Bell 202 AFSK (1200/2200 Hz) | 9600 | HDLC framing, NRZI, CRC-16-CCITT |
|
||||
| AIS | GMSK 9600 baud | 9600 | Narrowband FM input |
|
||||
| WSPR | Fano decoder | 12000 | 162 symbols, 120s slot, 1.46 Hz spacing |
|
||||
| RDS | RRC matched filter + Costas PLL | Native | 57 kHz subcarrier, 1187.5 bps, OSD FEC |
|
||||
| VDES | pi/4-QPSK 76.8 ksps | 100k | Burst detection, partial Turbo FEC |
|
||||
|
||||
### Backend Reliability Workarounds (FT-817)
|
||||
|
||||
The FT-817 CAT backend (`trx-backend-ft817/`) includes empirical workarounds for hardware quirks:
|
||||
|
||||
- **Duplicate frame sends**: `set_mode()` and `set_ptt()` send CAT frames twice with 80ms delay (radio sometimes drops first frame).
|
||||
- **Panel unlock before commands**: Clears stale bytes from the serial buffer.
|
||||
- **Power-on dummy frame**: CPU wakes before CAT framing locks; dummy frame ensures readiness.
|
||||
- **VFO state inference**: Infers VFO A/B by matching frequencies against cached values (fragile when frequencies collide).
|
||||
- **Read timeout**: 800ms per CAT read operation (not configurable).
|
||||
|
||||
+1
-1
@@ -11,4 +11,4 @@ and remote control are exposed elsewhere on the network.
|
||||
- [Architecture](Architecture) — system design, crate layout, data flow, and internals
|
||||
- [Optimization Guidelines](Optimization-Guidelines) — performance guidelines for the real-time DSP pipeline
|
||||
- [Planned Features](Planned-Features) — planned features and design notes
|
||||
- [Improvement Areas](Improvement-Areas) — codebase audit: quality, architecture, security, and performance
|
||||
- [Improvement Areas](Improvement-Areas) — codebase audit: quality, architecture, security, performance, and improvement plan
|
||||
|
||||
+191
-2
@@ -4,7 +4,7 @@ A comprehensive audit of the trx-rs codebase covering code quality, architecture
|
||||
security, testing, and performance. Each item includes the affected location and
|
||||
a suggested fix.
|
||||
|
||||
*Last updated: 2026-03-26*
|
||||
*Last updated: 2026-03-28*
|
||||
|
||||
---
|
||||
|
||||
@@ -122,4 +122,193 @@ No plugin API version, capability manifest, or unload/reload semantics. Old
|
||||
plugins break silently on API changes.
|
||||
|
||||
**Fix:** Add a version field to the registration struct and reject incompatible
|
||||
plugins at load time.
|
||||
plugins at load time.
|
||||
|
||||
---
|
||||
|
||||
## New Findings (2026-03-28 Deep Review)
|
||||
|
||||
### High Priority (P1)
|
||||
|
||||
#### Rig task command batching uses LIFO order
|
||||
|
||||
**Location:** `src/trx-server/src/<rig_task.rs>` — `batch.pop()`
|
||||
|
||||
Pending commands are accumulated into a `Vec` and processed with `pop()`, which
|
||||
reverses arrival order. If a client sends `SetFreq(14.074)` then `SetMode(USB)`,
|
||||
the mode change executes before the frequency change. This can cause unexpected
|
||||
transient state on hardware that validates mode against frequency (e.g. FT-817
|
||||
rejects CW below 1.8 MHz).
|
||||
|
||||
**Fix:** Replace `pop()` with `drain(..)` or iterate in forward order.
|
||||
|
||||
#### FrontendRuntimeContext god-struct (~50 fields)
|
||||
|
||||
**Location:** `src/trx-client/trx-frontend/src/<lib.rs>`
|
||||
|
||||
Mixes audio channels, decode histories, auth config, UI settings, rig routing,
|
||||
virtual channel management, and branding info into a single struct passed
|
||||
through `Arc`. Every frontend receives all 50 fields even if it only needs a
|
||||
subset. Changes to any field group force recompilation of all frontends.
|
||||
|
||||
**Suggested decomposition:**
|
||||
```
|
||||
FrontendRuntimeContext
|
||||
├── AudioContext (audio_rx, audio_tx, audio_info, decode_rx)
|
||||
├── DecodeHistoryContext (ais, vdes, aprs, hf_aprs, cw, ft8, ft4, ft2, wspr)
|
||||
├── HttpAuthConfig (enabled, passphrases, session_ttl, cookie settings)
|
||||
├── HttpUiConfig (map_zoom, spectrum settings, history retention)
|
||||
├── RigRoutingContext (active_rig_id, remote_rigs, rig_states, rig_spectrums)
|
||||
├── OwnerInfo (callsign, website_url, website_name, ais_vessel_url)
|
||||
└── VChanContext (vchan_audio, vchan_audio_cmd, vchan_destroyed)
|
||||
```
|
||||
|
||||
#### Decoder history queues have no capacity bounds
|
||||
|
||||
**Location:** `src/trx-server/src/<audio.rs>` — `DecoderHistories`
|
||||
|
||||
History queues (`VecDeque`) grow unbounded until the 24h retention period expires.
|
||||
Under high traffic (e.g. busy AIS channel near a port), a single queue could
|
||||
accumulate millions of entries and consume gigabytes of memory.
|
||||
|
||||
**Fix:** Add per-decoder max capacity (e.g. 10,000 entries). Evict oldest entries
|
||||
when capacity is reached, independent of time-based pruning.
|
||||
|
||||
#### ExponentialBackoff has no jitter
|
||||
|
||||
**Location:** `src/trx-core/src/rig/controller/<policies.rs>`
|
||||
|
||||
Multiple rigs or reconnecting clients using the same backoff parameters will retry
|
||||
at identical times (thundering herd). This is especially problematic when a server
|
||||
restarts and all clients reconnect simultaneously.
|
||||
|
||||
**Fix:** Add randomized jitter (e.g. ±25% of the computed delay) to the
|
||||
`ExponentialBackoff::delay()` method.
|
||||
|
||||
#### No crash recovery for rig tasks
|
||||
|
||||
**Location:** `src/trx-server/src/<main.rs>`
|
||||
|
||||
If a rig task panics (e.g. due to an unexpected backend error), the task simply
|
||||
disappears. The listener continues routing commands to the dead rig's channel,
|
||||
where they silently timeout. No automatic restart or health monitoring exists.
|
||||
|
||||
**Fix:** Wrap rig tasks in a supervisor loop that detects task completion/panic
|
||||
and restarts with backoff. Emit a `RigMachineState::Error` on the watch channel
|
||||
so clients see the failure.
|
||||
|
||||
---
|
||||
|
||||
### Medium Priority (P2)
|
||||
|
||||
#### SoapySdrRig constructor takes 20+ parameters
|
||||
|
||||
**Location:** `src/trx-server/trx-backend/trx-backend-soapysdr/src/<lib.rs>`
|
||||
— `new_with_config()`
|
||||
|
||||
The constructor accepts 20+ positional parameters with no builder pattern,
|
||||
making call sites fragile and hard to read. Adding a new parameter requires
|
||||
updating all callers.
|
||||
|
||||
**Fix:** Introduce a `SoapySdrConfig` builder struct with sensible defaults.
|
||||
|
||||
#### Dual command enums with mechanical 1:1 mapping
|
||||
|
||||
**Location:** `src/trx-protocol/src/<mapping.rs>` (675 lines),
|
||||
`src/trx-protocol/src/<types.rs>`, `src/trx-core/src/rig/<command.rs>`
|
||||
|
||||
`ClientCommand` and `RigCommand` are near-identical 40+ variant enums with
|
||||
purely mechanical mapping in `mapping.rs`. Adding a new command requires editing
|
||||
4 files (command.rs, types.rs, mapping.rs in both directions, codec.rs).
|
||||
`mapping.rs` contains an `unreachable!()` for `GetRigs` that would panic if
|
||||
the listener logic changes.
|
||||
|
||||
**Fix:** Consider a macro that generates both enums and the mapping from a single
|
||||
definition. Alternatively, collapse to a single enum with serde annotations.
|
||||
|
||||
#### Lock poisoning recovery hides panics
|
||||
|
||||
**Location:** `src/trx-server/src/<audio.rs>` — `DecoderHistories`
|
||||
|
||||
All mutex acquisitions use `.unwrap_or_else(|e| e.into_inner())` which silently
|
||||
recovers from poisoned mutexes. While this prevents cascading panics, it hides
|
||||
the original panic and may operate on inconsistent data.
|
||||
|
||||
**Fix:** Log a warning when recovering from a poisoned lock, and consider whether
|
||||
the recovered data is actually safe to use. For history queues, clearing the
|
||||
queue on poison recovery may be safer than continuing with partial data.
|
||||
|
||||
#### Configuration duplication between server and client
|
||||
|
||||
**Location:** `src/trx-server/src/<config.rs>` (1,512 lines),
|
||||
`src/trx-client/src/<config.rs>` (1,181 lines)
|
||||
|
||||
14 config structs each, many mirrored between server and client (GeneralConfig,
|
||||
rig model definitions, defaults). Shared config definitions should live in
|
||||
`trx-app`.
|
||||
|
||||
#### Hardcoded timeouts and retention periods
|
||||
|
||||
**Locations:** Multiple files
|
||||
|
||||
| Constant | Value | Location |
|
||||
|----------|-------|----------|
|
||||
| COMMAND_EXEC_TIMEOUT | 10s | rig_task.rs |
|
||||
| POLL_REFRESH_TIMEOUT | 8s | rig_task.rs |
|
||||
| IO_TIMEOUT | 10s | listener.rs |
|
||||
| REQUEST_TIMEOUT | 12s | listener.rs |
|
||||
| History retention | 24h | audio.rs |
|
||||
| FT-817 read timeout | 800ms | trx-backend-ft817 |
|
||||
| RIG_TASK_CHANNEL_BUFFER | 32 | main.rs |
|
||||
|
||||
None are configurable. Making these part of the TOML config would help
|
||||
deployments with slow serial links or high-latency networks.
|
||||
|
||||
---
|
||||
|
||||
### Low Priority (P3)
|
||||
|
||||
#### FT-817 VFO state inference is fragile
|
||||
|
||||
**Location:** `src/trx-server/trx-backend/trx-backend-ft817/src/<lib.rs>`
|
||||
|
||||
VFO state starts as `Unknown` and is inferred by matching frequencies against
|
||||
cached values. When VFO A and VFO B are set to the same frequency, inference
|
||||
fails and the rig may report the wrong VFO.
|
||||
|
||||
**Fix:** Some FT-817 variants support extended status bytes that indicate active
|
||||
VFO directly. Detect firmware version and use the direct query when available.
|
||||
|
||||
#### VDES decoder has incomplete FEC
|
||||
|
||||
**Location:** `src/decoders/trx-vdes/src/<lib.rs>`
|
||||
|
||||
The VDES decoder implements burst detection and pi/4-QPSK demodulation but the
|
||||
Turbo FEC (1/2 rate) decoder and link-layer (M.2092-1) parser are not complete.
|
||||
Decoded output is limited to raw symbols.
|
||||
|
||||
#### No forward compatibility in protocol
|
||||
|
||||
**Location:** `src/trx-protocol/src/<codec.rs>`
|
||||
|
||||
Unknown commands cause parse errors. There is no protocol version field in the
|
||||
envelope, making it impossible for older clients to gracefully degrade when
|
||||
connecting to newer servers.
|
||||
|
||||
**Fix:** Add an optional `protocol_version` field to `ClientEnvelope`. Unknown
|
||||
commands should return an error response rather than a parse failure.
|
||||
|
||||
#### Command handler boilerplate
|
||||
|
||||
**Location:** `src/trx-core/src/rig/controller/<handlers.rs>`
|
||||
|
||||
11 `RigCommandHandler` implementations follow identical patterns across 500+
|
||||
lines (validate state → call executor method → return result). A declarative
|
||||
macro could reduce this to ~100 lines.
|
||||
|
||||
#### Missing tests for critical paths
|
||||
|
||||
Serial backends (FT-817, FT-450D), plugin loading/discovery, and the audio
|
||||
pipeline (Opus encode/decode) have no or minimal test coverage. `rig_task.rs`
|
||||
(1,315 lines) and `audio.rs` (3,977 lines) — the two largest server modules —
|
||||
have no integration tests.
|
||||
Reference in New Issue
Block a user