README: academic-style within-dataset comparison table with 12 baselines + JANUS

2026-05-08 11:51:47 +08:00
parent c33efc290a
commit a1e81f16b5
1 changed files with 21 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -15,16 +15,29 @@ JANUS is the first NIDS method to use Flow Matching as the training paradigm in

 3-seed mean ± std AUROC. Selection-bias-free Mahalanobis-OAS aggregator on the 10-d JANUS score vector, fit on benign val only.

-### Within-dataset
+### Within-dataset comparison (AUROC %, mean ± std)

-| Task | Shafir 2026 SOTA | **JANUS** | Δ |
-|---|---|---|---|
-| ISCXTor2016 (NonTor → Tor) | 0.8731 | **0.9909 ± 0.0013** | **+0.118** |
-| CICIDS2017 within | 0.9303 | **0.9826 ± 0.0035** | **+0.052** |
-| CICDDoS2019 within | 0.93 | **0.9918 ± 0.0005** | **+0.062** |
-| CICIoT2023 within | F1=0.9951 (different metric) | 0.9590 ± 0.0022 (AUROC) | N/A — metric mismatch |
+| Method | Venue | CIC-IDS2017 | CIC-DDoS2019 | CIC-IoT2023 | ISCXTor2016 |
+|---|---|---:|---:|---:|---:|
+| Isolation Forest | classical | 55.27 ± 0.4 † | — | — | — |
+| OCSVM | classical | 59.59 ± 0.6 † | — | — | — |
+| AnoFormer | ICLR'22 | 63.37 ± 0.7 † | — | — | — |
+| GANomaly | BMVC'18 | 82.75 ± 5.6 † | — | — | — |
+| RD4AD | CVPR'22 | 83.78 ± 0.8 † | — | — | — |
+| TSLANet | ICML'24 | 84.45 ± 1.7 † | — | — | — |
+| ARCADE | — | 84.85 ± 2.0 † | — | — | — |
+| MFAD | — | 86.02 ± 0.8 † | — | — | — |
+| STFPM | BMVC'21 | 86.29 ± 1.7 † | — | — | — |
+| MMR | — | 89.26 ± 1.2 † | — | — | — |
+| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 99.51 §‡ | 87.31 ‡ |
+| ConMD | TIFS'26 | 94.43 ± 0.1 † | — | — | — |
+| **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** |

-3/3 directly comparable within-dataset benchmarks beat external Shafir 2026 SOTA. CICIoT2023 is reported as additional benchmark only (Shafir reports F1, we report AUROC; not a +SOTA claim). See `RESULTS.md` for caveats and the full headline table.
+† Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std.
+‡ Numbers from Shafir et al. (arXiv'26); protocol = train 10 K benign / SHAP-selected feature subsets per dataset.
+§ Metric mismatch on CIC-IoT2023: Shafir reports F1 = 99.51 (Youden's-J threshold tuned with attack labels), we report AUROC = 95.90 (threshold-free); not directly comparable. Thresholded F1 for JANUS is reported in `RESULTS.md` Section D and `artifacts/route_comparison/THRESHOLDED.md`.
+
+JANUS sets new SOTA on **3/3 directly comparable benchmarks** (CIC-IDS2017 +3.83, CIC-DDoS2019 +6.18, ISCXTor2016 +11.78) — all margins outside seed std. JANUS is fully unsupervised (benign-only training, no attack labels at any stage), and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only.

 ### 3×3 cross-dataset transfer matrix