Files
JANUS/scripts/aggregate/PROTOCOL.md

2.7 KiB
Raw Blame History

Route Comparison Protocol

Goal: compare three FM-mechanism × traffic-property route variants on a unified training base. All routes start from the current Unified_CFM SOTA recipe and change one mechanism axis.

Unified base (LOCKED)

Item Value
Dataset CICIoT2023
Source store datasets/ciciot2023/processed/full_store/
Flows datasets/ciciot2023/processed/full_store/flows.parquet
Flow features datasets/ciciot2023/processed/flow_features.parquet (canonical 20-d)
Train: benign 10,000 (Shafir within-dataset protocol)
Sequence length T = 64
Packet preprocess mixed_dequant (Routes A/B); raw binaries (Route C)
Benign split 80/20, split_seed=42
Val cap 10,000
Attack cap 20,000 (stratified)
Multi-seed {42, 43, 44}

Architecture base (LOCKED)

Item Value
d_model 128
n_layers 4
n_heads 4
mlp_ratio 4.0
time_dim 64
sigma 0.1
use_ot True
lambda_flow / lambda_packet 0.3 / 0.3
packet_mask_ratio 0.5
Optimizer AdamW, lr=3e-4, wd=0.01, grad_clip=1.0
Schedule CosineAnnealingLR over total steps
Epochs 50
Batch size 256

Routes

Route Mechanism axis Traffic property targeted
Baseline Standard UnifiedCFM (current SOTA)
A: Causal Packet-causal attention mask Protocol causality (TCP/HTTP handshake)
B: Spectral Append K=8-band DFT of (size, IAT) — 32 dims — to flow features (flow_dim 20→52); model architecture unchanged Burstiness / LRD / self-similarity
C: Mixed FM Continuous-CFM on (size,IAT,win) + DFM on flags Discrete-continuous mixed channels

Route D (Edit Flows) is deferred until A/B/C show signal.

Reporting

Each route × seed produces:

artifacts/route_comparison/<route>_seed<S>/
├── model.pt
├── config.yaml          # actual config used
├── history.json
├── phase1_summary.json  # 34-score per-attack-class AUROC table
└── train.log

Final aggregate at artifacts/route_comparison/RESULTS.md:

| Route | terminal_norm | route-specific score | param count | train wall |
| baseline | 0.962 (existing) | — | 1.23M | ~2 min |
| A | ? | causal_surprisal_packet_median | ? | ? |
| B | ? | velocity_freq | ? | ? |
| C | ? | nll_disc + terminal_cont | ? | ? |

Plus per-attack-class breakdown for the top 10 attack labels by support.

Baseline reference (single-seed, from existing run)

artifacts/runs/unified_cfm_ciciot2023_2026_04_29/:

  • 50 epochs, σ=0.1, λ=0.3
  • final auroc_terminal_norm = 0.962
  • This is the number to compare against; we'll re-run it under multi-seed for fair comparison.