ablation: add Group A (aggregator) + Group B (architecture) infrastructure

Extends MixedCFMConfig with 5 backwards-compatible flags (use_flow_token,
n_packet_tokens, disc_as_cont, cont_as_disc + cont_n_bins) so existing
JANUS-full checkpoints load with 0 missing/unexpected keys.

Adds:
- 60 ablation training configs (5 variants × 4 datasets × 3 seeds)
- scripts/ablation/{generate_configs.py, run_groupB.sh, run_cross_groupB.sh,
  smoke_test.sh} — config generation + GPU drivers
- scripts/aggregate/aggregate_ablation{,_cross,_cross_B}.py — produces
  within-dataset and cross-dataset (3×3) ablation tables with 3-seed mean
  ± 95% t-CI plus optional paired DeLong p-values

README updated with ablation section pointing at
artifacts/ablation/ABLATION_SUMMARY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 23:59:27 +08:00
parent 1d8862fbeb
commit a6bcbbd299
72 changed files with 3642 additions and 96 deletions

View File

@@ -51,6 +51,28 @@ Source (rows) trained on 10K benign of source dataset; target (columns) tested o
Forward CICIDS17→CICDDoS19 (0.969) beats Shafir 0.89 by **+0.08**; reverse CICDDoS19→CICIDS17 (0.941) approximately matches Shafir 0.93. CICIoT23 is hardest both as source and target — its IoT-protocol diversity makes the "benign of source ≈ benign of target" assumption brittle. Full table at `artifacts/route_comparison/CROSS_MATRIX_3x3.md`.
### Ablations (architecture & aggregator)
Two orthogonal ablation axes, each evaluated **within-dataset** (4 datasets × 3 seeds) **and** **cross-dataset** (3×3 transfer × 3 seeds):
- **Group A** — 7 alternative aggregators on the same JANUS-full sub-score vector (post-processing only; no retraining).
- **Group B** — 5 architecture variants, each retrained 4 datasets × 3 seeds = 60 runs + 90 cross-evals.
Every load-bearing JANUS design choice has the **same shape of ablation curve**: small in-distribution cost, large cross-dataset gain.
| Component (removed in ablation) | Variant | Within Δ | Cross-mean Δ | Cross-worst Δ |
|---|---|---:|---:|---:|
| FLOW token (global context) | B1 | **0.94** | 6.70 | 19.97 |
| Packet sequence | B2 | +0.15 | **23.82** | **36.27** |
| Cont/disc head split (drop disc head) | B3 | +0.44 | **13.14** | **25.03** |
| CFM head (drop continuous side) | B4 | **2.37** | 2.03 | 2.86 |
| Joint training of two heads | B5 | +0.20 | **18.93** | **27.54** |
| OAS Mahalanobis aggregator | A1 vs A5 | +0.37 | **15.88** | **27.38** |
Three ablations (B3 / B5 / A-aggregator) **marginally beat JANUS-full at within-dataset evaluation** but collapse on at least one cross-dataset transfer direction. The disc head, joint training, and OAS aggregator are deliberate trades: their value is exclusively in cross-dataset robustness.
Full headline summary: `artifacts/ablation/ABLATION_SUMMARY.md`. Per-variant 3×3 cross matrices: `artifacts/ablation/ABLATION_CROSS_B_full.md` and `artifacts/ablation/ABLATION_TABLE_CROSS_full.md`.
## Layout
```
@@ -74,6 +96,12 @@ scripts/ Workspace-level pcap → artifact pipeline,
orchestration. aggregate_score_router.py is the
deployable score path; run_cross_3x3.sh +
cross_3x3_table.py produce the cross matrix.
aggregate_ablation.py / aggregate_ablation_cross.py /
aggregate_ablation_cross_B.py produce the ablation
tables in artifacts/ablation/.
ablation/ B-group ablation training/eval drivers
(generate_configs.py, run_groupB.sh,
run_cross_groupB.sh).
tests/ Data-contract unit tests.
```
@@ -177,7 +205,8 @@ Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `e
## Authoritative documents
- `RESULTS.md` — full headline tables, ablations, per-attack analysis, JANUS configuration, thresholded operating-point metrics, what the experiments proved / disproved.
- `RESULTS.md` — full headline tables, per-attack analysis, JANUS configuration, thresholded operating-point metrics, what the experiments proved / disproved.
- `artifacts/ablation/ABLATION_SUMMARY.md` — paper-facing ablation summary (Group A aggregator + Group B architecture, both within and cross views).
- `Mixed_CFM/model.py` and `common/data_contract.py` — model + data-contract source of truth.
## Python environment