diff --git a/README.md b/README.md index d98c2f2..61e1828 100644 --- a/README.md +++ b/README.md @@ -29,15 +29,15 @@ JANUS is the first NIDS method to use Flow Matching as the training paradigm in | MFAD | — | 86.02 ± 0.8 † | — | — | — | | STFPM | BMVC'21 | 86.29 ± 1.7 † | — | — | — | | MMR | — | 89.26 ± 1.2 † | — | — | — | -| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 99.51 §‡ | 87.31 ‡ | +| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 72.24 ± 6.08 ★ | 87.31 ‡ | | ConMD | TIFS'26 | 94.43 ± 0.1 † | — | — | — | | **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** | † Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std. -‡ Numbers from Shafir et al. (arXiv'26); protocol = train 10 K benign / SHAP-selected feature subsets per dataset. -§ Metric mismatch on CIC-IoT2023: Shafir reports F1 = 99.51 (Youden's-J threshold tuned with attack labels), we report AUROC = 95.90 (threshold-free); not directly comparable. Thresholded F1 for JANUS is reported in `RESULTS.md` Section D and `artifacts/route_comparison/THRESHOLDED.md`. +‡ Numbers from Shafir et al. (arXiv'26) headline tables; protocol = train 10 K benign / SHAP-selected feature subsets per dataset (single NF). +★ Reproduced by us (3-seed mean ± std, 2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). For threshold-free head-to-head AUROC on this dataset we cite our reproduction. -JANUS sets new SOTA on **3/3 directly comparable benchmarks** (CIC-IDS2017 +3.83, CIC-DDoS2019 +6.18, ISCXTor2016 +11.78) — all margins outside seed std. JANUS is fully unsupervised (benign-only training, no attack labels at any stage), and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only. +JANUS sets new SOTA on **4/4 within-dataset benchmarks** under matched AUROC protocol — CIC-IDS2017 **+3.83**, CIC-DDoS2019 **+6.18**, CIC-IoT2023 **+23.66** (vs reproduced Shafir), ISCXTor2016 **+11.78** — all margins outside seed std. JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only. Thresholded F1 metrics for JANUS across all four datasets are in `RESULTS.md` Section D and `artifacts/route_comparison/THRESHOLDED.md`. ### 3×3 cross-dataset transfer matrix