[docs](workspace): refresh post-refactor enhancement plan
Reanalyze current architecture status and rewrite ENHANCEMENT.md to reflect remaining high-impact issues after completed phases. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Stanislaw Grams <stanislawgrams@gmail.com>
This commit is contained in:
+38
-39
@@ -1,70 +1,69 @@
|
|||||||
# Top 5 Real Architecture Issues
|
# Top 5 Real Architecture Issues (Post-Refactor)
|
||||||
|
|
||||||
## 1) Global plugin compatibility registries still exist
|
## 1) Plugin ABI is still brittle and unversioned
|
||||||
### Files
|
### Files
|
||||||
- `src/trx-server/trx-backend/src/lib.rs`
|
- `src/trx-app/src/plugins.rs`
|
||||||
- `src/trx-client/trx-frontend/src/lib.rs`
|
- `examples/trx-plugin-example/src/lib.rs`
|
||||||
|
|
||||||
### Why this matters
|
### Why this matters
|
||||||
`OnceLock<Mutex<...>>` registry shims still hold mutable global state. This keeps plugin registration behavior implicit and harder to test.
|
Plugin loading is now explicit (good), but still assumes exact symbol names and raw FFI contracts with no ABI/version handshake. A plugin built against an older/newer ABI can fail at runtime in hard-to-diagnose ways.
|
||||||
|
|
||||||
### Fix steps
|
### Fix steps
|
||||||
1. Introduce explicit plugin registration API that takes a mutable context.
|
1. Add an ABI version symbol/handshake (`trx_plugin_abi_version`) and reject incompatible plugins with clear errors.
|
||||||
2. Make plugin loader return registration data instead of relying on global side effects.
|
2. Split plugin capability metadata (backend/frontend/both) from registration symbols to avoid noisy failed-load logs.
|
||||||
3. Remove global `register_*`/`snapshot_bootstrap_context` wrappers after migration.
|
3. Provide a tiny shared plugin-API crate for stable entrypoint signatures.
|
||||||
|
|
||||||
## 2) No supervised shutdown/lifecycle model
|
## 2) Runtime supervision is still ad-hoc (sleep + abort)
|
||||||
### Files
|
### Files
|
||||||
- `src/trx-server/src/main.rs`
|
- `src/trx-server/src/main.rs`
|
||||||
- `src/trx-client/src/main.rs`
|
- `src/trx-client/src/main.rs`
|
||||||
|
|
||||||
### Why this matters
|
### Why this matters
|
||||||
Many tasks are detached via `tokio::spawn` and process shutdown mostly waits on Ctrl+C. Task failures and cancellation order are not centrally managed.
|
Shutdown is coordinated, but supervision still uses a fixed delay plus manual `abort()` over `Vec<JoinHandle<_>>`. This can mask task failures, race shutdown ordering, and make lifecycle behavior harder to reason about.
|
||||||
|
|
||||||
### Fix steps
|
### Fix steps
|
||||||
1. Add shared cancellation token.
|
1. Move to `JoinSet` (or a small supervisor type) for task ownership and result handling.
|
||||||
2. Track tasks in `JoinSet`.
|
2. Replace fixed sleep with bounded graceful-join timeout logic.
|
||||||
3. On shutdown: stop listeners, cancel workers, await joins with timeout, then exit.
|
3. Surface task failure reasons consistently in one place.
|
||||||
|
|
||||||
## 3) Protocol/network hardening gaps
|
## 3) JSON/TCP transport logic is duplicated across modules
|
||||||
### Files
|
### Files
|
||||||
- `src/trx-client/src/remote_client.rs`
|
|
||||||
- `src/trx-server/src/listener.rs`
|
- `src/trx-server/src/listener.rs`
|
||||||
- `src/trx-client/trx-frontend/trx-frontend-http-json/src/server.rs`
|
- `src/trx-client/trx-frontend/trx-frontend-http-json/src/server.rs`
|
||||||
|
- `src/trx-client/src/remote_client.rs`
|
||||||
|
|
||||||
### Why this matters
|
### Why this matters
|
||||||
`parse_remote_url` is ad-hoc and line-based listeners accept unbounded lines. This risks parsing edge cases and memory pressure.
|
`read_limited_line`, timeout handling, and response write patterns are repeated in multiple places. This increases drift risk and makes protocol hardening changes expensive.
|
||||||
|
|
||||||
### Fix steps
|
### Fix steps
|
||||||
1. Replace string URL parsing with typed address parsing (support IPv4/IPv6/hostnames explicitly).
|
1. Extract shared JSON-over-TCP helpers into `trx-protocol` (or a small transport crate/module).
|
||||||
2. Enforce maximum line/frame size for JSON-over-TCP.
|
2. Keep one source of truth for max line size, timeout behavior, and framing errors.
|
||||||
3. Add read/write/request timeouts and explicit error messages.
|
3. Cover shared transport with focused tests once instead of per-module copies.
|
||||||
|
|
||||||
## 4) Config has parse defaults but weak semantic validation
|
## 4) Boundary tests are present but mostly ignored in constrained envs
|
||||||
### Files
|
|
||||||
- `src/trx-server/src/config.rs`
|
|
||||||
- `src/trx-client/src/config.rs`
|
|
||||||
|
|
||||||
### Why this matters
|
|
||||||
Config loads successfully even when values are semantically bad (timings, ports, audio params), leading to runtime failures.
|
|
||||||
|
|
||||||
### Fix steps
|
|
||||||
1. Add `validate()` to server/client config models.
|
|
||||||
2. Validate ranges and required field combinations.
|
|
||||||
3. Call `validate()` in startup before spawning tasks; fail fast with clear path-based errors.
|
|
||||||
|
|
||||||
## 5) Integration coverage is still thin at boundaries
|
|
||||||
### Files
|
### Files
|
||||||
- `src/trx-server/src/listener.rs`
|
- `src/trx-server/src/listener.rs`
|
||||||
- `src/trx-client/src/remote_client.rs`
|
- `src/trx-client/src/remote_client.rs`
|
||||||
- `src/trx-client/trx-frontend/trx-frontend-http-json/src/server.rs`
|
- `src/trx-client/trx-frontend/trx-frontend-http-json/src/server.rs`
|
||||||
- `src/trx-app/src/plugins.rs`
|
|
||||||
|
|
||||||
### Why this matters
|
### Why this matters
|
||||||
Most coverage is unit-level. Critical network/plugin/runtime flows can regress without tests.
|
Important network-path tests exist, but are marked `#[ignore]` in this environment due bind restrictions. Without a clear CI strategy, regressions can still slip through.
|
||||||
|
|
||||||
### Fix steps
|
### Fix steps
|
||||||
1. Add integration tests for JSON TCP auth/command flow.
|
1. Add CI jobs/environment where bind-based tests run by default.
|
||||||
2. Add reconnect tests for remote client.
|
2. Split pure transport logic from socket bind/accept so more behavior can be tested without real sockets.
|
||||||
3. Add plugin load/failure isolation tests.
|
3. Keep ignored tests minimal and document how/when they run.
|
||||||
4. Add shutdown behavior tests once lifecycle supervision is added.
|
|
||||||
|
## 5) Decode/history shared state still relies on global mutexes
|
||||||
|
### Files
|
||||||
|
- `src/trx-server/src/audio.rs`
|
||||||
|
- `src/trx-client/trx-frontend/src/lib.rs`
|
||||||
|
- `src/trx-client/trx-frontend/trx-frontend-http/src/audio.rs`
|
||||||
|
|
||||||
|
### Why this matters
|
||||||
|
History/state paths still use shared mutex-backed globals/contexts with `expect` on lock poisoning in hot paths. This is workable but fragile for long-running async services.
|
||||||
|
|
||||||
|
### Fix steps
|
||||||
|
1. Replace panic-on-poison lock usage with resilient handling.
|
||||||
|
2. Consider bounded channel or lock-free append/read model for decode history.
|
||||||
|
3. Define explicit ownership/lifetime for history data instead of implicit shared mutation.
|
||||||
|
|||||||
Reference in New Issue
Block a user