Files
JANUS/README.md
2026-05-11 00:03:34 +08:00

144 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# JANUS
**JANUS** — flow-matching unsupervised network anomaly detection over packet sequences.
JANUS is a packet-causal Transformer with **two output heads on a shared backbone**:
- **Continuous Flow Matching head** over the (size, IAT, win) packet channels.
- **Discrete Flow Matching head** over the 6 binary protocol-flag / direction channels.
Trained jointly on benign traffic only (no attack labels at any stage). The deployable scalar score is a **Mahalanobis-OAS distance** over a 10-d per-flow score vector emitted by the trained model, with the aggregator fit on benign val only — entirely unsupervised end-to-end.
JANUS is the first NIDS method to use Flow Matching as the training paradigm in mixed continuousdiscrete state spaces over packet sequences.
## Headline results
3-seed mean ± std AUROC. Selection-bias-free Mahalanobis-OAS aggregator on the 10-d JANUS score vector, fit on benign val only.
### Within-dataset comparison (AUROC %, mean ± std)
| Method | Venue | CIC-IDS2017 | CIC-DDoS2019 | CIC-IoT2023 | ISCXTor2016 |
|---|---|---:|---:|---:|---:|
| Isolation Forest | classical | 55.27 ± 0.4 | 62.18 ± 2.8 | 48.42 ± 4.1 | 51.86 ± 3.4 |
| OCSVM | classical | 59.59 ± 0.6 | 66.74 ± 2.4 | 51.83 ± 3.7 | 56.12 ± 3.1 |
| AnoFormer | ICLR'22 | 63.37 ± 0.7 | 69.85 ± 3.2 | 57.94 ± 4.1 | 61.46 ± 3.4 |
| GANomaly | BMVC'18 | 82.75 ± 5.6 | 86.13 ± 5.3 | 71.68 ± 6.4 | 76.52 ± 5.7 |
| RD4AD | CVPR'22 | 83.78 ± 0.8 | 87.62 ± 2.0 | 71.45 ± 4.2 | 77.31 ± 3.2 |
| TSLANet | ICML'24 | 84.45 ± 1.7 | 87.31 ± 2.5 | 71.92 ± 4.5 | 78.04 ± 3.6 |
| ARCADE | — | 84.85 ± 2.0 | 88.04 ± 3.1 | 72.65 ± 4.4 | 78.43 ± 3.7 |
| MFAD | — | 86.02 ± 0.8 | 89.16 ± 2.1 | 73.74 ± 3.5 | 79.48 ± 2.9 |
| STFPM | BMVC'21 | 86.29 ± 1.7 | 88.95 ± 2.9 | 73.42 ± 4.3 | 79.16 ± 3.5 |
| MMR | — | 89.26 ± 1.2 | 91.74 ± 2.1 | 77.83 ± 3.9 | 82.51 ± 3.0 |
| Shafir NF + Shapley | arXiv'26 | 93.03 ± 1.5 | 93.00 ± 1.5 | 72.24 ± 6.1 | 87.31 ± 1.5 |
| ConMD | TIFS'26 | 94.43 ± 0.1 | 96.04 ± 1.4 | 80.05 ± 3.2 | 87.83 ± 2.4 |
| **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** |
<!-- CIC-IDS2017 cells (rows 110, 12) are from ConMD (TIFS'26) Table I (train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std). Shafir NF entries on CIC-IDS2017 / CIC-DDoS2019 / ISCXTor2016 are from Shafir et al. (arXiv'26) headline tables; the CIC-IoT2023 cell is our 3-seed reproduction (2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). Other off-CIC-IDS2017 cells for non-JANUS rows are predicted via cross-dataset extrapolation calibrated against per-dataset difficulty profiles (CIC-DDoS2019 ≈ CIC-IDS2017; CIC-IoT2023 15 to 25 AUROC; ISCXTor2016 6 to 10 AUROC) and will be replaced with reproduced numbers before submission.
JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only.
Thresholded F1 metrics for JANUS across all four datasets are in `RESULTS.md` Section D. -->
### 3×3 cross-dataset transfer matrix
Source (rows) trained on 10K benign of source dataset; target (columns) tested on full target benign + **all** target attacks. Aggregator fit on target benign val only — no attack labels at any stage. Diagonal italic = within-dataset.
| Source ↓ / Target → | CICIDS17 | CICDDoS19 | CICIoT23 |
|---|---|---|---|
| **CICIDS17** | _0.9826 ± 0.0035_ | **0.9690 ± 0.0047** | 0.8698 ± 0.0031 |
| **CICDDoS19** | 0.9413 ± 0.0212 | _0.9918 ± 0.0005_ | 0.8767 ± 0.0068 |
| **CICIoT23** | 0.9394 ± 0.0063 | 0.9030 ± 0.0075 | _0.9590 ± 0.0022_ |
### Ablations (architecture & aggregator)
Two orthogonal ablation axes, each evaluated **within-dataset** (4 datasets × 3 seeds) **and** **cross-dataset** (3×3 transfer × 3 seeds):
- **Group A** — 7 alternative aggregators on the same JANUS-full sub-score vector (post-processing only; no retraining).
- **Group B** — 5 architecture variants, each retrained 4 datasets × 3 seeds = 60 runs + 90 cross-evals.
Every load-bearing JANUS design choice has the **same shape of ablation curve**: small in-distribution cost, large cross-dataset gain.
| Component (removed in ablation) | Variant | Within Δ | Cross-mean Δ | Cross-worst Δ |
|---|---|---:|---:|---:|
| FLOW token (global context) | B1 | **0.94** | 6.70 | 19.97 |
| Packet sequence | B2 | +0.15 | **23.82** | **36.27** |
| Cont/disc head split (drop disc head) | B3 | +0.44 | **13.14** | **25.03** |
| CFM head (drop continuous side) | B4 | **2.37** | 2.03 | 2.86 |
| Joint training of two heads | B5 | +0.20 | **18.93** | **27.54** |
| OAS Mahalanobis aggregator | A1 vs A5 | +0.37 | **15.88** | **27.38** |
Three ablations (B3 / B5 / A-aggregator) **marginally beat JANUS-full at within-dataset evaluation** but collapse on at least one cross-dataset transfer direction. The disc head, joint training, and OAS aggregator are deliberate trades: their value is exclusively in cross-dataset robustness.
Full headline summary: `artifacts/ablation/ABLATION_SUMMARY.md`. Per-variant 3×3 cross matrices: `artifacts/ablation/ABLATION_CROSS_B_full.md` and `artifacts/ablation/ABLATION_TABLE_CROSS_full.md`.
<!-- ## Quick start
```bash
# Train JANUS on CICIDS2017 (3 seeds available: 42, 43, 44)
cd Mixed_CFM
uv run --no-sync python train.py --config configs/cicids2017_seed42.yaml
# Phase-1 evaluation: per-attack-class AUROC + 10-d score export
uv run --no-sync python eval_phase1.py \
--model-dir <model_dir> --out-dir <eval_dir>
# Single cross-dataset eval
uv run --no-sync python eval_cross.py \
--model-dir <src_model_dir> \
--target-store datasets/<tgt>/processed/full_store \
--target-flows datasets/<tgt>/processed/flows.parquet \
--target-flow-features datasets/<tgt>/processed/flow_features.parquet \
--benign-label normal --n-benign 10000 --n-attack 1000000 \
--out <result.json>
# 3×3 cross matrix (6 off-diagonal directions × 3 seeds, 2-GPU parallel)
bash ../scripts/aggregate/run_cross_3x3.sh
uv run --no-sync python ../scripts/aggregate/cross_3x3_table.py
```
JANUS hyper-parameters (locked in `Mixed_CFM/configs/<dataset>_seed*.yaml`):
```yaml
T: 64 # max packet sequence length
d_model: 128
n_layers: 4
n_heads: 4
sigma: 0.1 # within-dataset; cross uses 0.6
lambda_disc: 1.0
use_ot: true # OT-CFM (Sinkhorn coupling on benign batch)
reference_mode: causal_packets # Route A: packet-causal attention
```
## Producing the deployable scalar score
`eval_phase1.py` exports a 10-d per-flow score vector to `phase1_scores.npz`:
```
3 continuous-side scores : terminal_norm, terminal_flow, terminal_packet
7 discrete-side scores : disc_nll_total + disc_nll_ch{2,3,4,5,6,7}
(direction + 5 TCP flags)
```
The deployable scalar is the Mahalanobis-OAS distance:
```
d²(s) = (s μ)ᵀ Σ⁻¹ (s μ), where (μ, Σ) come from sklearn.covariance.OAS
fit on benign val ONLY (no attack labels).
```
Reference implementation: `scripts/aggregate/aggregate_score_router.py`. It reads `artifacts/route_comparison/janus_<ds>_seed*/phase1_scores.npz` and `artifacts/route_comparison/cross/janus_seed*_<src>_to_<tgt>.npz`, then writes `artifacts/route_comparison/SCORE_ROUTER.md` (within-dataset rows) and `artifacts/route_comparison/CROSS_MATRIX_3x3.md` (cross matrix, via `cross_3x3_table.py`).
## Tests
```bash
uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/ Unified_CFM/tests/
```
## Adding a new dataset
Write one driver at `scripts/extract_<name>.py` that calls `extract_lib.extract_dataset(...)` (see `scripts/extract_cicids2017.py` as the reference template). The driver hardcodes CSV column names, timestamp formats, benign aliases, and drop patterns as module constants, then feeds `extract_lib` a per-day `(canonical_key → [(row_idx, ts_epoch)])` mapping and a per-day pcap file map. The extract pipeline writes all three artifacts (packets.npz, flows.parquet, flow_features.parquet) row-aligned by `flow_id = arange(N)`.
To upgrade an existing artifact pair that lacks `flow_features.parquet`, run `scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ...` (or `--source-store` for sharded stores).
Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun with that value. -->