mambafortrafficmodeling

Network traffic anomaly detection with continuous flow matching (CFM). Three sibling model packages over a shared canonical data contract.

Layout

common/data_contract.py — single source of truth for the canonical packet schema (9-d) and flow schema (20-d, packet-derived). All three packages import constants and helpers from here.
Packet_CFM/ — packet-sequence OT-CFM with explicit σ-band benign distribution learning.
Flow_CFM/ — flow-level CFM on the workspace-canonical 20-d packet-derived flow_features.parquet. Legacy 61-d CICFlowMeter CSV caches are kept only for paper reproduction (--legacy-csv-features flag).
Unified_CFM/ — unified packet+flow token CFM. Current SOTA model — used for all main results (within-dataset SOTA on ISCXTor2016 / CICIDS2017 / CICDDoS2019, near-SOTA cross-dataset).
datasets/<name>/processed/ — canonical artifact bundle:
- packets.npz (small/medium) or full_store/ (large, sharded)
- flows.parquet (label + 5-tuple metadata)
- flow_features.parquet (20-d packet-derived, row-aligned)
scripts/ — workspace-level pcap → artifact extraction, CSV adapters, cross-package eval tooling. scripts/download/ is also here.
artifacts/ — run outputs (training checkpoints, eval JSONs, reports). Phase 0 / 1 / 2 / 2.5 experiment summaries live under artifacts/phase{0,1,2}* directories.
paper/ — paper PDFs we compare against (Shafir 2026 NF, ConMD 2026, TIPSO-GAN 2026, Lipman 2210.02747 flow matching).

The root keeps only workspace-level files. All model/training/eval code lives under one of the three packages.

Current best results (Unified_CFM, λ=0.3, 3 seeds)

Shafir baselines verified from paper PDF tables — see artifacts/locked_baselines.md.

Task	Shafir 2026 SOTA	Our best	Δ
ISCXTor2016 (NonTor → Tor)	0.8731 (Table VI)	0.9945 ± 0.0011 (σ=0.1)	+0.121
CICIDS2017 within (10k/10k Shafir protocol)	0.9303 (Table VII)	0.9858 ± 0.0021 (σ=0.6)	+0.055
CICDDoS2019 within	0.93 (Table IX)	0.9958 ± 0.0010 (σ=0.1)	+0.066
CICIDS2017 → CICDDoS2019 cross (`terminal_norm`)	0.89 (Table IX, IDS→DDoS row)	0.9109 ± 0.0032 (σ=0.6)	+0.021
CICIDS2017 → CICDDoS2019 cross (`terminal_flow`)	0.89	0.9197 ± 0.0036	+0.030

4 of 4 reported tasks achieve SOTA. Cross-dataset baseline was previously misread as 0.93; the IDS→DDoS direction in Shafir Table IX is 0.89.

Plus an architectural contribution: a flow_consistency diagnostic score that lifts from random (~0.6) to discriminative (~0.9) only when the model is trained with the masked-prediction consistency loss. On SSH-Patator (the hardest CICIDS2017 class for terminal_norm at 0.64) it reaches 0.94.

Authoritative result tables live in RESULTS.md (root) and artifacts/locked_baselines.md (Shafir baseline verification trail). Thresholded F1 / Precision / Recall / TPR@FPR under unsupervised threshold protocol: RESULTS_THRESHOLDED.md. Per-attack-family multi-seed analysis: artifacts/phase25_multiseed_2026_04_25/PER_ATTACK_TABLE.md.

3.1 KiB Raw Blame History Unescape Escape

mambafortrafficmodeling

Layout

Current best results (Unified_CFM, λ=0.3, 3 seeds)

3.1 KiB

Raw Blame History