Trim README down to results/quickstart by removing Layout, Data contract, Python environment, and Authoritative documents sections (these now live in CLAUDE.md). Add CLAUDE.md to .gitignore so it stays as private dev notes rather than committed docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
146 lines
8.1 KiB
Markdown
146 lines
8.1 KiB
Markdown
# JANUS
|
||
|
||
**JANUS** — flow-matching unsupervised network anomaly detection over packet sequences.
|
||
|
||
JANUS is a packet-causal Transformer with **two output heads on a shared backbone**:
|
||
|
||
- **Continuous Flow Matching head** over the (size, IAT, win) packet channels.
|
||
- **Discrete Flow Matching head** over the 6 binary protocol-flag / direction channels.
|
||
|
||
Trained jointly on benign traffic only (no attack labels at any stage). The deployable scalar score is a **Mahalanobis-OAS distance** over a 10-d per-flow score vector emitted by the trained model, with the aggregator fit on benign val only — entirely unsupervised end-to-end.
|
||
|
||
JANUS is the first NIDS method to use Flow Matching as the training paradigm in mixed continuous–discrete state spaces over packet sequences.
|
||
|
||
## Headline results
|
||
|
||
3-seed mean ± std AUROC. Selection-bias-free Mahalanobis-OAS aggregator on the 10-d JANUS score vector, fit on benign val only.
|
||
|
||
### Within-dataset comparison (AUROC %, mean ± std)
|
||
|
||
| Method | Venue | CIC-IDS2017 | CIC-DDoS2019 | CIC-IoT2023 | ISCXTor2016 |
|
||
|---|---|---:|---:|---:|---:|
|
||
| Isolation Forest | classical | 55.27 ± 0.4 † | — | — | — |
|
||
| OCSVM | classical | 59.59 ± 0.6 † | — | — | — |
|
||
| AnoFormer | ICLR'22 | 63.37 ± 0.7 † | — | — | — |
|
||
| GANomaly | BMVC'18 | 82.75 ± 5.6 † | — | — | — |
|
||
| RD4AD | CVPR'22 | 83.78 ± 0.8 † | — | — | — |
|
||
| TSLANet | ICML'24 | 84.45 ± 1.7 † | — | — | — |
|
||
| ARCADE | — | 84.85 ± 2.0 † | — | — | — |
|
||
| MFAD | — | 86.02 ± 0.8 † | — | — | — |
|
||
| STFPM | BMVC'21 | 86.29 ± 1.7 † | — | — | — |
|
||
| MMR | — | 89.26 ± 1.2 † | — | — | — |
|
||
| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 72.24 ± 6.08 ★ | 87.31 ‡ |
|
||
| ConMD | TIFS'26 | 94.43 ± 0.1 † | — | — | — |
|
||
| **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** |
|
||
|
||
† Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std.
|
||
‡ Numbers from Shafir et al. (arXiv'26) headline tables; protocol = train 10 K benign / SHAP-selected feature subsets per dataset (single NF).
|
||
★ Reproduced by us (3-seed mean ± std, 2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). For threshold-free head-to-head AUROC on this dataset we cite our reproduction.
|
||
|
||
JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only.
|
||
|
||
Thresholded F1 metrics for JANUS across all four datasets are in `RESULTS.md` Section D.
|
||
|
||
### 3×3 cross-dataset transfer matrix
|
||
|
||
Source (rows) trained on 10K benign of source dataset; target (columns) tested on full target benign + **all** target attacks. Aggregator fit on target benign val only — no attack labels at any stage. Diagonal italic = within-dataset.
|
||
|
||
| Source ↓ / Target → | CICIDS17 | CICDDoS19 | CICIoT23 |
|
||
|---|---|---|---|
|
||
| **CICIDS17** | _0.9826 ± 0.0035_ | **0.9690 ± 0.0047** | 0.8698 ± 0.0031 |
|
||
| **CICDDoS19** | 0.9413 ± 0.0212 | _0.9918 ± 0.0005_ | 0.8767 ± 0.0068 |
|
||
| **CICIoT23** | 0.9394 ± 0.0063 | 0.9030 ± 0.0075 | _0.9590 ± 0.0022_ |
|
||
|
||
### Ablations (architecture & aggregator)
|
||
|
||
Two orthogonal ablation axes, each evaluated **within-dataset** (4 datasets × 3 seeds) **and** **cross-dataset** (3×3 transfer × 3 seeds):
|
||
|
||
- **Group A** — 7 alternative aggregators on the same JANUS-full sub-score vector (post-processing only; no retraining).
|
||
- **Group B** — 5 architecture variants, each retrained 4 datasets × 3 seeds = 60 runs + 90 cross-evals.
|
||
|
||
Every load-bearing JANUS design choice has the **same shape of ablation curve**: small in-distribution cost, large cross-dataset gain.
|
||
|
||
| Component (removed in ablation) | Variant | Within Δ | Cross-mean Δ | Cross-worst Δ |
|
||
|---|---|---:|---:|---:|
|
||
| FLOW token (global context) | B1 | **−0.94** | −6.70 | −19.97 |
|
||
| Packet sequence | B2 | +0.15 | **−23.82** | **−36.27** |
|
||
| Cont/disc head split (drop disc head) | B3 | +0.44 | **−13.14** | **−25.03** |
|
||
| CFM head (drop continuous side) | B4 | **−2.37** | −2.03 | −2.86 |
|
||
| Joint training of two heads | B5 | +0.20 | **−18.93** | **−27.54** |
|
||
| OAS Mahalanobis aggregator | A1 vs A5 | +0.37 | **−15.88** | **−27.38** |
|
||
|
||
Three ablations (B3 / B5 / A-aggregator) **marginally beat JANUS-full at within-dataset evaluation** but collapse on at least one cross-dataset transfer direction. The disc head, joint training, and OAS aggregator are deliberate trades: their value is exclusively in cross-dataset robustness.
|
||
|
||
Full headline summary: `artifacts/ablation/ABLATION_SUMMARY.md`. Per-variant 3×3 cross matrices: `artifacts/ablation/ABLATION_CROSS_B_full.md` and `artifacts/ablation/ABLATION_TABLE_CROSS_full.md`.
|
||
|
||
## Quick start
|
||
|
||
```bash
|
||
# Train JANUS on CICIDS2017 (3 seeds available: 42, 43, 44)
|
||
cd Mixed_CFM
|
||
uv run --no-sync python train.py --config configs/cicids2017_seed42.yaml
|
||
|
||
# Phase-1 evaluation: per-attack-class AUROC + 10-d score export
|
||
uv run --no-sync python eval_phase1.py \
|
||
--model-dir <model_dir> --out-dir <eval_dir>
|
||
|
||
# Single cross-dataset eval
|
||
uv run --no-sync python eval_cross.py \
|
||
--model-dir <src_model_dir> \
|
||
--target-store datasets/<tgt>/processed/full_store \
|
||
--target-flows datasets/<tgt>/processed/flows.parquet \
|
||
--target-flow-features datasets/<tgt>/processed/flow_features.parquet \
|
||
--benign-label normal --n-benign 10000 --n-attack 1000000 \
|
||
--out <result.json>
|
||
|
||
# 3×3 cross matrix (6 off-diagonal directions × 3 seeds, 2-GPU parallel)
|
||
bash ../scripts/aggregate/run_cross_3x3.sh
|
||
uv run --no-sync python ../scripts/aggregate/cross_3x3_table.py
|
||
```
|
||
|
||
JANUS hyper-parameters (locked in `Mixed_CFM/configs/<dataset>_seed*.yaml`):
|
||
|
||
```yaml
|
||
T: 64 # max packet sequence length
|
||
d_model: 128
|
||
n_layers: 4
|
||
n_heads: 4
|
||
sigma: 0.1 # within-dataset; cross uses 0.6
|
||
lambda_disc: 1.0
|
||
use_ot: true # OT-CFM (Sinkhorn coupling on benign batch)
|
||
reference_mode: causal_packets # Route A: packet-causal attention
|
||
```
|
||
|
||
## Producing the deployable scalar score
|
||
|
||
`eval_phase1.py` exports a 10-d per-flow score vector to `phase1_scores.npz`:
|
||
|
||
```
|
||
3 continuous-side scores : terminal_norm, terminal_flow, terminal_packet
|
||
7 discrete-side scores : disc_nll_total + disc_nll_ch{2,3,4,5,6,7}
|
||
(direction + 5 TCP flags)
|
||
```
|
||
|
||
The deployable scalar is the Mahalanobis-OAS distance:
|
||
|
||
```
|
||
d²(s) = (s − μ)ᵀ Σ⁻¹ (s − μ), where (μ, Σ) come from sklearn.covariance.OAS
|
||
fit on benign val ONLY (no attack labels).
|
||
```
|
||
|
||
Reference implementation: `scripts/aggregate/aggregate_score_router.py`. It reads `artifacts/route_comparison/janus_<ds>_seed*/phase1_scores.npz` and `artifacts/route_comparison/cross/janus_seed*_<src>_to_<tgt>.npz`, then writes `artifacts/route_comparison/SCORE_ROUTER.md` (within-dataset rows) and `artifacts/route_comparison/CROSS_MATRIX_3x3.md` (cross matrix, via `cross_3x3_table.py`).
|
||
|
||
## Tests
|
||
|
||
```bash
|
||
uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/ Unified_CFM/tests/
|
||
```
|
||
|
||
## Adding a new dataset
|
||
|
||
Write one driver at `scripts/extract_<name>.py` that calls `extract_lib.extract_dataset(...)` (see `scripts/extract_cicids2017.py` as the reference template). The driver hardcodes CSV column names, timestamp formats, benign aliases, and drop patterns as module constants, then feeds `extract_lib` a per-day `(canonical_key → [(row_idx, ts_epoch)])` mapping and a per-day pcap file map. The extract pipeline writes all three artifacts (packets.npz, flows.parquet, flow_features.parquet) row-aligned by `flow_id = arange(N)`.
|
||
|
||
To upgrade an existing artifact pair that lacks `flow_features.parquet`, run `scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ...` (or `--source-store` for sharded stores).
|
||
|
||
Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun with that value.
|