# Route Comparison Protocol Goal: compare three FM-mechanism × traffic-property route variants on a unified training base. All routes start from the current `Unified_CFM` SOTA recipe and change one mechanism axis. ## Unified base (LOCKED) | Item | Value | |---|---| | Dataset | CICIoT2023 | | Source store | `datasets/ciciot2023/processed/full_store/` | | Flows | `datasets/ciciot2023/processed/full_store/flows.parquet` | | Flow features | `datasets/ciciot2023/processed/flow_features.parquet` (canonical 20-d) | | Train: benign | 10,000 (Shafir within-dataset protocol) | | Sequence length | T = 64 | | Packet preprocess | `mixed_dequant` (Routes A/B); raw binaries (Route C) | | Benign split | 80/20, `split_seed=42` | | Val cap | 10,000 | | Attack cap | 20,000 (stratified) | | Multi-seed | {42, 43, 44} | ## Architecture base (LOCKED) | Item | Value | |---|---| | `d_model` | 128 | | `n_layers` | 4 | | `n_heads` | 4 | | `mlp_ratio` | 4.0 | | `time_dim` | 64 | | `sigma` | 0.1 | | `use_ot` | True | | `lambda_flow / lambda_packet` | 0.3 / 0.3 | | `packet_mask_ratio` | 0.5 | | Optimizer | AdamW, lr=3e-4, wd=0.01, grad_clip=1.0 | | Schedule | CosineAnnealingLR over total steps | | Epochs | 50 | | Batch size | 256 | ## Routes | Route | Mechanism axis | Traffic property targeted | |---|---|---| | **Baseline** | Standard UnifiedCFM (current SOTA) | — | | **A: Causal** | Packet-causal attention mask | Protocol causality (TCP/HTTP handshake) | | **B: Spectral** | Append K=8-band DFT of (size, IAT) — 32 dims — to flow features (`flow_dim` 20→52); model architecture unchanged | Burstiness / LRD / self-similarity | | **C: Mixed FM** | Continuous-CFM on (size,IAT,win) + DFM on flags | Discrete-continuous mixed channels | Route D (Edit Flows) is deferred until A/B/C show signal. ## Reporting Each route × seed produces: ``` artifacts/route_comparison/_seed/ ├── model.pt ├── config.yaml # actual config used ├── history.json ├── phase1_summary.json # 34-score per-attack-class AUROC table └── train.log ``` Final aggregate at `artifacts/route_comparison/RESULTS.md`: ``` | Route | terminal_norm | route-specific score | param count | train wall | | baseline | 0.962 (existing) | — | 1.23M | ~2 min | | A | ? | causal_surprisal_packet_median | ? | ? | | B | ? | velocity_freq | ? | ? | | C | ? | nll_disc + terminal_cont | ? | ? | ``` Plus per-attack-class breakdown for the top 10 attack labels by support. ## Baseline reference (single-seed, from existing run) `artifacts/runs/unified_cfm_ciciot2023_2026_04_29/`: - 50 epochs, σ=0.1, λ=0.3 - final `auroc_terminal_norm` = **0.962** - This is the number to compare against; we'll re-run it under multi-seed for fair comparison.