[docs](trx-rs): refresh improvement areas — remove resolved, add new findings

Remove all completed P0/P1/P2 items and quick wins. Add new findings from
codebase scan: auth.rs mutex poisoning, TCP listener rate limiting, RigState
struct decomposition, spawn_blocking timeout, unsafe string construction,
dead_code annotations, expanded test coverage gaps, and naming inconsistencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Stan Grams <sjg@haxx.space>
This commit is contained in:
2026-03-29 12:15:00 +02:00
parent d3f958fc37
commit 59ebfc2626
2 changed files with 154 additions and 157 deletions
+21 -21
View File
@@ -107,7 +107,7 @@ Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`. Use `(trx-rs
Full architecture documentation: `docs/Architecture.md`
Improvement plan: `docs/Improvement-Areas.md`
*Last reviewed: 2026-03-28*
*Last reviewed: 2026-03-29*
### Strengths
@@ -124,26 +124,26 @@ Improvement plan: `docs/Improvement-Areas.md`
### Areas for Improvement
**P1High (all resolved):**
- **FrontendRuntimeContext** decomposed into 9 sub-structs (AudioContext, DecodeHistoryContext, HttpAuthConfig, etc.)
-**Rig task command batching** fixed to FIFO order
-**Decoder history bounded** at 10,000 entries per queue
-**ExponentialBackoff jitter** ±25% randomized
-**Rig task crash recovery** emits Error state to clients
-**Sync RwLock in async** migrated to tokio::sync::RwLock where appropriate
-**Audio pipeline helpers** extracted from run_capture/run_playback
**P0Critical:**
- **Plugin signing**: No SHA-256 checksum verification, no per-plugin scoping, no Windows validation.
**P2Medium (all resolved):**
- **SoapySdrRig** uses `SoapySdrConfig` builder struct
- **Command enum mapping** uses `define_command_mappings!` macro
- **Lock poison recovery** now logs warnings via `lock_or_recover()` helper
- **Timeouts configurable** via `[timeouts]` TOML section
-**Config shared** types extracted to `trx-app/src/shared_config.rs`
**P1High:**
- **Session store unwrap()**: 7 `.write().unwrap()` in `auth.rs` — mutex poisoning crashes server.
- **No TCP listener rate limiting**: Raw protocol listener accepts unlimited connections per IP.
- **RigState 33-field flat struct**: Decoder bools/reset seqs should be sub-structs. Cloned on every broadcast.
- **No `spawn_blocking` timeout**: Satellite pass computation in `listener.rs` can hang indefinitely.
**P3Low (remaining):**
- **Command handler boilerplate**: 11 `RigCommandHandler` impls follow identical patterns across 500+ lines. Macro opportunity.
- **No integration tests** for `rig_task.rs` (1,315 LOC) or `audio.rs` (3,977 LOC) — the two largest server modules.
- **No command execution timeouts** at the `CommandExecutor` level. Backend stalls propagate up.
- **FT-817 VFO inference fragile**: Fails when VFO A and B share the same frequency.
- **VDES decoder incomplete**: Turbo FEC and link-layer parsing not implemented.
**P2Medium:**
- **Command handler boilerplate**: 11 `RigCommandHandler` impls across 500+ lines. Macro opportunity.
- **No command execution timeouts** at `CommandExecutor` level. Backend stalls propagate up.
- **No protocol versioning**: Unknown commands cause parse errors with no graceful degradation.
- **Unsafe string in spectrum encoding**: `from_utf8_unchecked` in `api.rs:63`.
- **6 `#[allow(dead_code)]`** annotations to review/clean up.
**P3 — Low:**
- **Missing tests**: `audio.rs` (3,812 LOC), `api.rs` (2,711 LOC), `main.rs` (1,203 LOC) have 0 tests.
- **FT-817 VFO inference fragile**: Fails when VFO A and B share the same frequency.
- **VDES decoder incomplete**: Turbo FEC, CRC, and link-layer parsing not implemented.
- **Plugin system lacks versioning**: No API version or reload semantics.
- **Configurator detection stubbed**: `detect.rs` TODO for `serialport::available_ports()`.
- **Inconsistent naming**: `freq_hz`/`frequency`/`center_hz`; `rig_id`/`id`; `model`/`rig_model`.
+133 -136
View File
@@ -4,54 +4,15 @@ A comprehensive audit of the trx-rs codebase covering code quality, architecture
security, testing, and performance. Each item includes the affected location and
a suggested fix.
*Last updated: 2026-03-28*
*Last updated: 2026-03-29*
---
## Resolved
## Critical (P0)
The following items have been fixed across PRs #58, #59, and #60:
### Plugin signing and cross-platform validation
### Quick Wins (all complete)
-**Session cleanup timer** — 5-minute periodic `cleanup_expired()` task
-**`DecodeHistory<T>` type alias** — replaces 9 repeated `Arc<Mutex<VecDeque<...>>>` patterns
-**`mode_to_string()` allocation** — returns `Cow<'static, str>` (zero-alloc for known modes)
-**FTx dedup**`HashSet<u16>` for O(1) lookups
-**Unbounded channels**`VChanAudioCmd` channels bounded at 256
-**JSON serialization**`#[serde(flatten)]` wrapper replaces string-level splice
-**`AtomicUsize` counter** — `estimated_total_count()` avoids 9 mutex acquisitions
-**Cookie security warning** — startup warning when `cookie_secure` is false
-**Spectrum encoding** — pre-allocated output string replaces `format!` overhead
-**`pub(crate)` state data** — `ReadyStateData`/`TransmittingStateData` fields restricted with constructors + getters
-**Lock ordering docs** — module-level documentation in `<vchan.rs>` establishing `rigs → sessions → audio_cmd`
### Critical (P0)
-**Plugin loading validation** — rejects world-writable files on Unix; `TRX_PLUGINS_DISABLED` env var
-**Audio pipeline mutex panics** — all `.expect()` on history mutexes and `.unwrap()` on audio ring buffers replaced with `.unwrap_or_else(|e| e.into_inner())` poison recovery
-**vchan lock panics** — ~25 `.unwrap()` on RwLock/Mutex replaced with poison recovery
### High (P1)
-**RigCat trait split** — 13 SDR-specific methods extracted into `RigSdr` extension trait; `RigCat` retains core CAT ops + `as_sdr()`/`as_sdr_ref()`; SoapySdrRig implements both; FT-817/FT-450D/DummyRig unchanged
-**Decoder history contention**`AtomicUsize` total counter maintained by record/prune/clear
### Medium (P2)
-**Silent state machine failures** — debug-level tracing for rejected transitions
-**User input in logs** — raw JSON truncated to 128 chars
-**Rate limiting** — per-IP `LoginRateLimiter` (10 attempts/60s) on `/auth/login`
-**Lock-holding serialization** — clone data out under lock, serialize after release
-**Overly-public API** — state data fields `pub(crate)` with controlled accessors
-**Cookie security flag** — startup warning for non-TLS deployments
-**Lock ordering** — documented in `<vchan.rs>` module header
---
## Remaining Issues
### Critical (P0)
#### Plugin signing and cross-platform validation
**Location:** `src/trx-app/src/<plugins.rs>`
**Location:** `src/trx-app/src/plugins.rs`
Current protections: file permission checks (Unix), `TRX_PLUGINS_DISABLED` env var,
loaded plugins logged at startup.
@@ -69,54 +30,145 @@ loaded plugins logged at startup.
---
### High Priority (P1)
## High Priority (P1)
#### Synchronous locks in async contexts
### Session store mutex poisoning (auth.rs)
**Location:** `src/trx-client/trx-frontend/trx-frontend-http/src/<background_decode.rs>`,
`src/trx-client/trx-frontend/trx-frontend-http/src/<vchan.rs>`
**Location:** `src/trx-client/trx-frontend/trx-frontend-http/src/auth.rs` (lines 89,
96, 116, 124, 151, 158, 165)
`std::sync::RwLock` is used inside async tasks. Current code is safe (no locks held
across await points), but not idiomatic. Migrating to `tokio::sync::RwLock` would
prevent future regressions.
7 `.write().unwrap()` / `.lock().unwrap()` calls on the session `RwLock<HashMap>`.
If a panic occurs while holding the lock, all subsequent auth operations will panic
and crash the server.
#### Large functions in audio pipeline
**Fix:** Use `lock_or_recover()` helper (already used elsewhere in the codebase) or
`write().unwrap_or_else(|e| e.into_inner())` with warning logs.
### No rate limiting on TCP listener
**Location:** `src/trx-server/src/listener.rs`
The TCP listener accepts connections without per-IP rate limiting. The HTTP frontend
has rate limiting on `/auth/login`, but the raw protocol listener does not. Potential
for connection exhaustion.
**Fix:** Add per-IP connection rate limiting (similar to `LoginRateLimiter` in auth).
### RigState is a 33-field flat struct
**Location:** `src/trx-core/src/rig/state.rs` (lines 1384)
33 fields including 8 `*_decode_enabled` bools and 8 `*_decode_reset_seq` counters
that follow identical patterns. Cloned frequently via `watch` channel broadcasts.
**Fix:** Group decoder fields into a `DecoderConfig` sub-struct and reset sequences
into a `DecoderResetSeqs` sub-struct. Reduces clone cost and makes decoder-related
changes self-contained.
### No timeout on `spawn_blocking` in listener
**Location:** `src/trx-server/src/listener.rs:351`
`tokio::task::spawn_blocking()` for satellite pass computation has no timeout. If
SGP4 propagation hangs, it consumes a thread pool slot indefinitely.
**Fix:** Wrap in `tokio::time::timeout()`.
---
## Medium Priority (P2)
### Command handler boilerplate
**Location:** `src/trx-core/src/rig/controller/handlers.rs` (lines 145659)
11 `RigCommandHandler` implementations follow identical patterns across 500+ lines:
validate state → call executor method → return result. Differences are limited to
which executor method is called and which state preconditions are checked.
**Fix:** Declarative macro that generates implementations from a table of
(command, executor_method, preconditions) tuples. Would reduce ~500 lines to ~100.
### No command execution timeouts at CommandExecutor level
**Location:** `src/trx-server/src/rig_task.rs`
`command_exec_timeout` is defined in `RigTaskConfig` but there is no evidence of
`tokio::time::timeout()` wrapping individual executor calls. A stuck backend command
blocks the rig task indefinitely.
**Fix:** Wrap each `executor.method().await` call in `timeout(config.command_exec_timeout, ...)`.
### No forward compatibility in protocol
**Location:** `src/trx-protocol/src/codec.rs`
Unknown commands cause parse errors. No `protocol_version` field in the envelope.
Older clients cannot gracefully degrade when connecting to newer servers.
**Fix:** Add optional `protocol_version` to `ClientEnvelope`. Unknown commands
should return an error response rather than a parse failure.
### `unsafe` string construction in spectrum encoding
**Location:** `src/trx-client/trx-frontend/trx-frontend-http/src/api.rs:63`
`unsafe { String::from_utf8_unchecked(out) }` builds a base64 string from bytes.
The safety comment claims ASCII-only output, which is correct for the current
implementation, but a future edit could break the invariant silently.
**Fix:** Use `String::from_utf8(out).expect("base64 is ASCII")` (negligible
performance difference on short spectrum strings) or use the `base64` crate.
### 6 `#[allow(dead_code)]` annotations
**Locations:**
- `src/trx-server/src/<audio.rs>``run_capture()` (~200 lines),
`run_playback()` (~217 lines)
- `src/trx-client/trx-frontend/trx-frontend-http/src/auth.rs:652`
- `src/trx-client/src/config.rs:266`
- `src/trx-server/trx-backend/trx-backend-soapysdr/src/vchan_impl.rs:66, 87`
- `src/trx-server/trx-backend/trx-backend-soapysdr/src/demod.rs:113`
- `src/trx-server/trx-backend/trx-backend-soapysdr/src/real_iq_source.rs:20`
These contain nested loops, device re-enumeration logic, and stream error handling
that should be extracted into focused helper functions.
**Fix:** Review each — remove dead code or remove the annotation if the code is
reachable via feature gates.
---
### Medium Priority (P2)
## Low Priority (P3)
#### Configuration duplication
### Missing tests for critical modules
**Location:** `src/trx-server/src/<config.rs>` (1512 lines),
`src/trx-client/src/<config.rs>` (1181 lines)
Zero `#[test]` functions in:
- `src/trx-server/src/audio.rs` (3,812 lines) — decoder instantiation, audio streaming, history
- `src/trx-client/trx-frontend/trx-frontend-http/src/api.rs` (2,711 lines) — HTTP endpoints, SSE, spectrum encoding
- `src/trx-server/src/main.rs` (1,203 lines) — multi-rig setup, initialization
- `src/trx-server/src/history_store.rs` (193 lines) — persistence, timestamp conversion
14 config structs each, many mirrored between server and client. Extract shared
definitions (GeneralConfig, RigConfig, defaults) into `trx-app`.
`rig_task.rs` (1,316 lines) has 4 tests but no integration tests for command
timeout handling, polling recovery, or error state transitions.
---
Serial backends (FT-817, FT-450D) and plugin loading have no test coverage.
### Low Priority (P3)
### FT-817 VFO state inference is fragile
#### Missing tests for critical paths
**Location:** `src/trx-server/trx-backend/trx-backend-ft817/src/lib.rs`
Serial backends (FT-817, FT-450D), plugin loading/discovery, and the audio
pipeline (Opus encode/decode) have no or minimal test coverage.
VFO state starts as `Unknown` and is inferred by matching frequencies against
cached values. When VFO A and B share the same frequency, inference fails.
Core crates (`trx-core`, `trx-server`, `trx-client`, `trx-app`) have limited
`[dev-dependencies]` and use only inline `#[test]` functions. Adding test
utilities (mock serial ports, test fixtures) would improve coverage.
**Fix:** Detect firmware version and use direct VFO query when available.
#### Plugin system lacks versioning and lifecycle
### VDES decoder has incomplete FEC
**Location:** `src/trx-app/src/<plugins.rs>`
**Location:** `src/decoders/trx-vdes/src/lib.rs`
Burst detection and pi/4-QPSK demodulation work, but Turbo FEC (1/2 rate) and
link-layer (M.2092-1) parsing are not implemented. CRC validation is stubbed
(`crc_ok: false`). Output limited to raw symbols.
### Plugin system lacks versioning and lifecycle
**Location:** `src/trx-app/src/plugins.rs`
No plugin API version, capability manifest, or unload/reload semantics. Old
plugins break silently on API changes.
@@ -124,73 +176,18 @@ plugins break silently on API changes.
**Fix:** Add a version field to the registration struct and reject incompatible
plugins at load time.
---
### Configurator serial detection is stubbed
## New Findings (2026-03-28 Deep Review) — All Resolved
**Location:** `src/trx-configurator/src/detect.rs:8`
### High Priority (P1) — All Complete
Contains `TODO: use serialport::available_ports() for real detection`. The
interactive setup wizard cannot auto-detect connected rigs.
-**Rig task command batching LIFO** — replaced `batch.pop()` with `batch.remove(0)` for FIFO order
-**FrontendRuntimeContext god-struct** — decomposed ~50 flat fields into 9 coherent sub-structs (`AudioContext`, `DecodeHistoryContext`, `HttpAuthConfig`, `HttpUiConfig`, `RigRoutingContext`, `OwnerInfo`, `VChanContext`, `SpectrumContext`, `PerRigAudioContext`); all 7 consumer files updated
-**Decoder history unbounded** — added `MAX_HISTORY_ENTRIES` (10,000) cap with `enforce_capacity()` eviction independent of time-based pruning
-**ExponentialBackoff no jitter** — added ±25% randomized jitter via `apply_jitter()` helper to prevent thundering herd on reconnect
-**No rig task crash recovery** — rig tasks now detect errors and emit `RigMachineState::Error` on the watch channel so clients see the failure
-**Synchronous locks in async contexts** — migrated `std::sync::RwLock` to `tokio::sync::RwLock` in `background_decode.rs`; `vchan.rs` left as-is (all methods are synchronous, no locks held across await points)
-**Large audio pipeline functions** — extracted `find_input_device()` and `find_output_device()` helpers from `run_capture()` and `run_playback()`
### Inconsistent frequency/rig naming across crates
### Medium Priority (P2) — All Complete
Field naming is inconsistent across the codebase:
- `freq_hz` vs `frequency` vs `center_hz` (audio.rs, api.rs, config.rs)
- `rig_id` vs `id` (RigInstanceConfig vs RigState)
- `model` vs `rig_model` (RigConfig vs RigTaskConfig)
-**SoapySdrRig 20-parameter constructor** — introduced `SoapySdrConfig` struct with named fields and defaults; `new_from_config()` replaces positional parameters; old `new_with_config()` preserved as backward-compatible wrapper
-**Dual command enums** — added `define_command_mappings!` macro in `mapping.rs` that generates both `client_command_to_rig()` and `rig_command_to_client()` from a single definition table; removed `unreachable!()` for `GetRigs`/`GetSatPasses`
-**Lock poisoning recovery hides panics** — replaced all `.unwrap_or_else(|e| e.into_inner())` with `lock_or_recover()` helper that logs a warning with the lock label when recovering from poisoned mutex
-**Configuration duplication** — extracted shared config types (`LogLevel` defaults, common patterns) into `trx-app/src/shared_config.rs`; both server and client import from `trx_app`
-**Hardcoded timeouts** — made `command_exec_timeout`, `poll_refresh_timeout`, `io_timeout`, `request_timeout`, and `rig_task_channel_buffer` configurable via `RigTaskConfig`/`ListenerConfig` and the TOML `[timeouts]` section; constants remain as defaults
---
### Low Priority (P3)
#### FT-817 VFO state inference is fragile
**Location:** `src/trx-server/trx-backend/trx-backend-ft817/src/<lib.rs>`
VFO state starts as `Unknown` and is inferred by matching frequencies against
cached values. When VFO A and VFO B are set to the same frequency, inference
fails and the rig may report the wrong VFO.
**Fix:** Some FT-817 variants support extended status bytes that indicate active
VFO directly. Detect firmware version and use the direct query when available.
#### VDES decoder has incomplete FEC
**Location:** `src/decoders/trx-vdes/src/<lib.rs>`
The VDES decoder implements burst detection and pi/4-QPSK demodulation but the
Turbo FEC (1/2 rate) decoder and link-layer (M.2092-1) parser are not complete.
Decoded output is limited to raw symbols.
#### No forward compatibility in protocol
**Location:** `src/trx-protocol/src/<codec.rs>`
Unknown commands cause parse errors. There is no protocol version field in the
envelope, making it impossible for older clients to gracefully degrade when
connecting to newer servers.
**Fix:** Add an optional `protocol_version` field to `ClientEnvelope`. Unknown
commands should return an error response rather than a parse failure.
#### Command handler boilerplate
**Location:** `src/trx-core/src/rig/controller/<handlers.rs>`
11 `RigCommandHandler` implementations follow identical patterns across 500+
lines (validate state → call executor method → return result). A declarative
macro could reduce this to ~100 lines.
#### Missing tests for critical paths
Serial backends (FT-817, FT-450D), plugin loading/discovery, and the audio
pipeline (Opus encode/decode) have no or minimal test coverage. `rig_task.rs`
(1,315 lines) and `audio.rs` (3,977 lines) — the two largest server modules —
have no integration tests.
Not a correctness issue, but increases cognitive overhead and copy-paste errors.