Files
JANUS/README.md
BattleTag 4263fa8807 README: slim public-facing sections; gitignore CLAUDE.md
Trim README down to results/quickstart by removing Layout, Data contract,
Python environment, and Authoritative documents sections (these now live
in CLAUDE.md). Add CLAUDE.md to .gitignore so it stays as private dev
notes rather than committed docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 08:42:51 +08:00

8.1 KiB
Raw Blame History

JANUS

JANUS — flow-matching unsupervised network anomaly detection over packet sequences.

JANUS is a packet-causal Transformer with two output heads on a shared backbone:

  • Continuous Flow Matching head over the (size, IAT, win) packet channels.
  • Discrete Flow Matching head over the 6 binary protocol-flag / direction channels.

Trained jointly on benign traffic only (no attack labels at any stage). The deployable scalar score is a Mahalanobis-OAS distance over a 10-d per-flow score vector emitted by the trained model, with the aggregator fit on benign val only — entirely unsupervised end-to-end.

JANUS is the first NIDS method to use Flow Matching as the training paradigm in mixed continuousdiscrete state spaces over packet sequences.

Headline results

3-seed mean ± std AUROC. Selection-bias-free Mahalanobis-OAS aggregator on the 10-d JANUS score vector, fit on benign val only.

Within-dataset comparison (AUROC %, mean ± std)

Method Venue CIC-IDS2017 CIC-DDoS2019 CIC-IoT2023 ISCXTor2016
Isolation Forest classical 55.27 ± 0.4 †
OCSVM classical 59.59 ± 0.6 †
AnoFormer ICLR'22 63.37 ± 0.7 †
GANomaly BMVC'18 82.75 ± 5.6 †
RD4AD CVPR'22 83.78 ± 0.8 †
TSLANet ICML'24 84.45 ± 1.7 †
ARCADE 84.85 ± 2.0 †
MFAD 86.02 ± 0.8 †
STFPM BMVC'21 86.29 ± 1.7 †
MMR 89.26 ± 1.2 †
Shafir NF + Shapley arXiv'26 93.03 ‡ 93.00 ‡ 72.24 ± 6.08 ★ 87.31 ‡
ConMD TIFS'26 94.43 ± 0.1 †
JANUS (ours) 98.26 ± 0.35 99.18 ± 0.05 95.90 ± 0.22 99.09 ± 0.13

† Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std. ‡ Numbers from Shafir et al. (arXiv'26) headline tables; protocol = train 10 K benign / SHAP-selected feature subsets per dataset (single NF). ★ Reproduced by us (3-seed mean ± std, 2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). For threshold-free head-to-head AUROC on this dataset we cite our reproduction.

JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only.

Thresholded F1 metrics for JANUS across all four datasets are in RESULTS.md Section D.

3×3 cross-dataset transfer matrix

Source (rows) trained on 10K benign of source dataset; target (columns) tested on full target benign + all target attacks. Aggregator fit on target benign val only — no attack labels at any stage. Diagonal italic = within-dataset.

Source ↓ / Target → CICIDS17 CICDDoS19 CICIoT23
CICIDS17 0.9826 ± 0.0035 0.9690 ± 0.0047 0.8698 ± 0.0031
CICDDoS19 0.9413 ± 0.0212 0.9918 ± 0.0005 0.8767 ± 0.0068
CICIoT23 0.9394 ± 0.0063 0.9030 ± 0.0075 0.9590 ± 0.0022

Ablations (architecture & aggregator)

Two orthogonal ablation axes, each evaluated within-dataset (4 datasets × 3 seeds) and cross-dataset (3×3 transfer × 3 seeds):

  • Group A — 7 alternative aggregators on the same JANUS-full sub-score vector (post-processing only; no retraining).
  • Group B — 5 architecture variants, each retrained 4 datasets × 3 seeds = 60 runs + 90 cross-evals.

Every load-bearing JANUS design choice has the same shape of ablation curve: small in-distribution cost, large cross-dataset gain.

Component (removed in ablation) Variant Within Δ Cross-mean Δ Cross-worst Δ
FLOW token (global context) B1 0.94 6.70 19.97
Packet sequence B2 +0.15 23.82 36.27
Cont/disc head split (drop disc head) B3 +0.44 13.14 25.03
CFM head (drop continuous side) B4 2.37 2.03 2.86
Joint training of two heads B5 +0.20 18.93 27.54
OAS Mahalanobis aggregator A1 vs A5 +0.37 15.88 27.38

Three ablations (B3 / B5 / A-aggregator) marginally beat JANUS-full at within-dataset evaluation but collapse on at least one cross-dataset transfer direction. The disc head, joint training, and OAS aggregator are deliberate trades: their value is exclusively in cross-dataset robustness.

Full headline summary: artifacts/ablation/ABLATION_SUMMARY.md. Per-variant 3×3 cross matrices: artifacts/ablation/ABLATION_CROSS_B_full.md and artifacts/ablation/ABLATION_TABLE_CROSS_full.md.

Quick start

# Train JANUS on CICIDS2017 (3 seeds available: 42, 43, 44)
cd Mixed_CFM
uv run --no-sync python train.py --config configs/cicids2017_seed42.yaml

# Phase-1 evaluation: per-attack-class AUROC + 10-d score export
uv run --no-sync python eval_phase1.py \
    --model-dir <model_dir> --out-dir <eval_dir>

# Single cross-dataset eval
uv run --no-sync python eval_cross.py \
    --model-dir <src_model_dir> \
    --target-store datasets/<tgt>/processed/full_store \
    --target-flows datasets/<tgt>/processed/flows.parquet \
    --target-flow-features datasets/<tgt>/processed/flow_features.parquet \
    --benign-label normal --n-benign 10000 --n-attack 1000000 \
    --out <result.json>

# 3×3 cross matrix (6 off-diagonal directions × 3 seeds, 2-GPU parallel)
bash ../scripts/aggregate/run_cross_3x3.sh
uv run --no-sync python ../scripts/aggregate/cross_3x3_table.py

JANUS hyper-parameters (locked in Mixed_CFM/configs/<dataset>_seed*.yaml):

T: 64                  # max packet sequence length
d_model: 128
n_layers: 4
n_heads: 4
sigma: 0.1             # within-dataset; cross uses 0.6
lambda_disc: 1.0
use_ot: true           # OT-CFM (Sinkhorn coupling on benign batch)
reference_mode: causal_packets    # Route A: packet-causal attention

Producing the deployable scalar score

eval_phase1.py exports a 10-d per-flow score vector to phase1_scores.npz:

3 continuous-side scores  : terminal_norm, terminal_flow, terminal_packet
7 discrete-side scores    : disc_nll_total + disc_nll_ch{2,3,4,5,6,7}
                            (direction + 5 TCP flags)

The deployable scalar is the Mahalanobis-OAS distance:

d²(s) = (s  μ)ᵀ Σ⁻¹ (s  μ),    where (μ, Σ) come from sklearn.covariance.OAS
                                 fit on benign val ONLY (no attack labels).

Reference implementation: scripts/aggregate/aggregate_score_router.py. It reads artifacts/route_comparison/janus_<ds>_seed*/phase1_scores.npz and artifacts/route_comparison/cross/janus_seed*_<src>_to_<tgt>.npz, then writes artifacts/route_comparison/SCORE_ROUTER.md (within-dataset rows) and artifacts/route_comparison/CROSS_MATRIX_3x3.md (cross matrix, via cross_3x3_table.py).

Tests

uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/ Unified_CFM/tests/

Adding a new dataset

Write one driver at scripts/extract_<name>.py that calls extract_lib.extract_dataset(...) (see scripts/extract_cicids2017.py as the reference template). The driver hardcodes CSV column names, timestamp formats, benign aliases, and drop patterns as module constants, then feeds extract_lib a per-day (canonical_key → [(row_idx, ts_epoch)]) mapping and a per-day pcap file map. The extract pipeline writes all three artifacts (packets.npz, flows.parquet, flow_features.parquet) row-aligned by flow_id = arange(N).

To upgrade an existing artifact pair that lacks flow_features.parquet, run scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ... (or --source-store for sharded stores).

Common gotcha: if CSV timestamps and pcap epochs are in different time zones, extract_lib prints a diagnostic with the recommended --time-offset; rerun with that value.