Files
JANUS/README.md

189 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# JANUS
**JANUS** (Joint Anomaly via Normalizing-flows of Unified States) — flow-matching unsupervised network anomaly detection over packet sequences.
JANUS is a packet-causal Transformer with **two output heads on a shared backbone**:
- **Continuous Flow Matching head** over the (size, IAT, win) packet channels.
- **Discrete Flow Matching head** over the 6 binary protocol-flag / direction channels.
Trained jointly on benign traffic only (no attack labels at any stage). The deployable scalar score is a **Mahalanobis-OAS distance** over a 10-d per-flow score vector emitted by the trained model, with the aggregator fit on benign val only — entirely unsupervised end-to-end.
JANUS is the first NIDS method to use Flow Matching as the training paradigm in mixed continuousdiscrete state spaces over packet sequences.
## Headline results
3-seed mean ± std AUROC. Selection-bias-free Mahalanobis-OAS aggregator on the 10-d JANUS score vector, fit on benign val only.
### Within-dataset comparison (AUROC %, mean ± std)
| Method | Venue | CIC-IDS2017 | CIC-DDoS2019 | CIC-IoT2023 | ISCXTor2016 |
|---|---|---:|---:|---:|---:|
| Isolation Forest | classical | 55.27 ± 0.4 † | — | — | — |
| OCSVM | classical | 59.59 ± 0.6 † | — | — | — |
| AnoFormer | ICLR'22 | 63.37 ± 0.7 † | — | — | — |
| GANomaly | BMVC'18 | 82.75 ± 5.6 † | — | — | — |
| RD4AD | CVPR'22 | 83.78 ± 0.8 † | — | — | — |
| TSLANet | ICML'24 | 84.45 ± 1.7 † | — | — | — |
| ARCADE | — | 84.85 ± 2.0 † | — | — | — |
| MFAD | — | 86.02 ± 0.8 † | — | — | — |
| STFPM | BMVC'21 | 86.29 ± 1.7 † | — | — | — |
| MMR | — | 89.26 ± 1.2 † | — | — | — |
| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 99.51 §‡ | 87.31 ‡ |
| ConMD | TIFS'26 | 94.43 ± 0.1 † | — | — | — |
| **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** |
† Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std.
‡ Numbers from Shafir et al. (arXiv'26); protocol = train 10 K benign / SHAP-selected feature subsets per dataset.
§ Metric mismatch on CIC-IoT2023: Shafir reports F1 = 99.51 (Youden's-J threshold tuned with attack labels), we report AUROC = 95.90 (threshold-free); not directly comparable. Thresholded F1 for JANUS is reported in `RESULTS.md` Section D and `artifacts/route_comparison/THRESHOLDED.md`.
JANUS sets new SOTA on **3/3 directly comparable benchmarks** (CIC-IDS2017 +3.83, CIC-DDoS2019 +6.18, ISCXTor2016 +11.78) — all margins outside seed std. JANUS is fully unsupervised (benign-only training, no attack labels at any stage), and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only.
### 3×3 cross-dataset transfer matrix
Source (rows) trained on 10K benign of source dataset; target (columns) tested on full target benign + **all** target attacks. Aggregator fit on target benign val only — no attack labels at any stage. Diagonal italic = within-dataset.
| Source ↓ / Target → | CICIDS17 | CICDDoS19 | CICIoT23 |
|---|---|---|---|
| **CICIDS17** | _0.9826 ± 0.0035_ | **0.9690 ± 0.0047** | 0.8698 ± 0.0031 |
| **CICDDoS19** | 0.9413 ± 0.0212 | _0.9918 ± 0.0005_ | 0.8767 ± 0.0068 |
| **CICIoT23** | 0.9394 ± 0.0063 | 0.9030 ± 0.0075 | _0.9590 ± 0.0022_ |
Forward CICIDS17→CICDDoS19 (0.969) beats Shafir 0.89 by **+0.08**; reverse CICDDoS19→CICIDS17 (0.941) approximately matches Shafir 0.93. CICIoT23 is hardest both as source and target — its IoT-protocol diversity makes the "benign of source ≈ benign of target" assumption brittle. Full table at `artifacts/route_comparison/CROSS_MATRIX_3x3.md`.
## Layout
```
common/ Data contract — single source of truth for the
9-d packet schema, 20-d packet-derived flow schema,
label normalization, and packet preprocessing.
Mixed_CFM/ The JANUS model. Mixed continuousdiscrete CFM
with two output heads on a shared causal Transformer.
configs/ Per-(dataset × seed) training configs.
model.py MixedTokenCFM + MixedVelocity.
train.py / eval_phase1.py / eval_cross.py
Unified_CFM/ Legacy unified token CFM. Mixed_CFM imports its
AdaLNBlock + sinusoidal time embedding for backbone
reuse. Kept as internal ablation reference.
scripts/ Workspace-level pcap → artifact pipeline,
CSV adapters, cross-package eval tooling.
download/ UNB/CIC dataset downloaders.
baselines/ Third-party baseline runners (Kitsune, Shafir-NF,
Anomaly-Transformer).
aggregate/ Mahalanobis-OAS score-router + cross-matrix
orchestration. aggregate_score_router.py is the
deployable score path; run_cross_3x3.sh +
cross_3x3_table.py produce the cross matrix.
tests/ Data-contract unit tests.
```
The following directories are **gitignored** (live on the dev box, not in the repo):
```
artifacts/ All run outputs (checkpoints, eval JSONs, score
npzs, figures). Per-(dataset × seed) model dirs at
artifacts/route_comparison/janus_<ds>_seed<N>/.
datasets/ Raw + processed datasets (~1 TB).
baselines/ Third-party baseline forks (Kitsune-py,
Anomaly-Transformer, ConMD, ganomaly, TIPSO-GAN, ...).
paper/ Paper sources & external PDFs (Shafir 2026, Lipman
2210.02747, etc.).
.venv/ uv-managed Python 3.14 virtual env.
```
## Data contract
Every processed dataset under `datasets/<name>/processed/` ships an aligned triple, all with the same row order (`flow_id = arange(N)`):
```
packets.npz packet_tokens [N, T_full, 9], packet_lengths [N], flow_id [N]
(or full_store/ — sharded PacketShardStore — for large datasets)
flows.parquet flow_id + label + 5-tuple metadata (src_ip, dst_ip, ports, protocol)
flow_features.parquet flow_id + label + 20 canonical packet-derived features
```
The 9-d packet schema and 20-d flow schema are FIXED in `common/data_contract.py`. Flow features are computed by `compute_flow_features_from_packets(packet_tokens, lens)` so row alignment is guaranteed.
## Quick start
```bash
# Train JANUS on CICIDS2017 (3 seeds available: 42, 43, 44)
cd Mixed_CFM
uv run --no-sync python train.py --config configs/cicids2017_seed42.yaml
# Phase-1 evaluation: per-attack-class AUROC + 10-d score export
uv run --no-sync python eval_phase1.py \
--model-dir <model_dir> --out-dir <eval_dir>
# Single cross-dataset eval
uv run --no-sync python eval_cross.py \
--model-dir <src_model_dir> \
--target-store datasets/<tgt>/processed/full_store \
--target-flows datasets/<tgt>/processed/flows.parquet \
--target-flow-features datasets/<tgt>/processed/flow_features.parquet \
--benign-label normal --n-benign 10000 --n-attack 1000000 \
--out <result.json>
# 3×3 cross matrix (6 off-diagonal directions × 3 seeds, 2-GPU parallel)
bash ../scripts/aggregate/run_cross_3x3.sh
uv run --no-sync python ../scripts/aggregate/cross_3x3_table.py
```
JANUS hyper-parameters (locked in `Mixed_CFM/configs/<dataset>_seed*.yaml`):
```yaml
T: 64 # max packet sequence length
d_model: 128
n_layers: 4
n_heads: 4
sigma: 0.1 # within-dataset; cross uses 0.6
lambda_disc: 1.0
use_ot: true # OT-CFM (Sinkhorn coupling on benign batch)
reference_mode: causal_packets # Route A: packet-causal attention
```
## Producing the deployable scalar score
`eval_phase1.py` exports a 10-d per-flow score vector to `phase1_scores.npz`:
```
3 continuous-side scores : terminal_norm, terminal_flow, terminal_packet
7 discrete-side scores : disc_nll_total + disc_nll_ch{2,3,4,5,6,7}
(direction + 5 TCP flags)
```
The deployable scalar is the Mahalanobis-OAS distance:
```
d²(s) = (s μ)ᵀ Σ⁻¹ (s μ), where (μ, Σ) come from sklearn.covariance.OAS
fit on benign val ONLY (no attack labels).
```
Reference implementation: `scripts/aggregate/aggregate_score_router.py`. It reads `artifacts/route_comparison/janus_<ds>_seed*/phase1_scores.npz` and `artifacts/route_comparison/cross/janus_seed*_<src>_to_<tgt>.npz`, then writes `artifacts/route_comparison/SCORE_ROUTER.md` (within-dataset rows) and `artifacts/route_comparison/CROSS_MATRIX_3x3.md` (cross matrix, via `cross_3x3_table.py`).
## Tests
```bash
uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/ Unified_CFM/tests/
```
## Adding a new dataset
Write one driver at `scripts/extract_<name>.py` that calls `extract_lib.extract_dataset(...)` (see `scripts/extract_cicids2017.py` as the reference template). The driver hardcodes CSV column names, timestamp formats, benign aliases, and drop patterns as module constants, then feeds `extract_lib` a per-day `(canonical_key → [(row_idx, ts_epoch)])` mapping and a per-day pcap file map. The extract pipeline writes all three artifacts (packets.npz, flows.parquet, flow_features.parquet) row-aligned by `flow_id = arange(N)`.
To upgrade an existing artifact pair that lacks `flow_features.parquet`, run `scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ...` (or `--source-store` for sharded stores).
Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun with that value.
## Authoritative documents
- `RESULTS.md` — full headline tables, ablations, per-attack analysis, JANUS configuration, thresholded operating-point metrics, what the experiments proved / disproved.
- `Mixed_CFM/model.py` and `common/data_contract.py` — model + data-contract source of truth.
## Python environment
- `requires-python = ">=3.14"`; PyTorch pinned to the `pytorch-cu128` index, plus `mamba-ssm`, `causal-conv1d`, `scapy`, `dpkt`, `pyarrow`, `sklearn` (for the OAS aggregator).
- Two `pyproject.toml` files exist: root and `Mixed_CFM/`; they are not declared as a uv workspace and resolve independently. Run `uv run ...` from whichever directory owns the entry point.
- `Unified_CFM/` has no `pyproject.toml`; it uses the root venv (`uv run --no-sync python <script.py>`).
- Scripts under `scripts/download/` are pure stdlib — invoke with `python3`.