# Unified_CFM

A single multi-scale OT-CFM over one token sequence per flow:

```text
[FLOW_TOKEN, PACKET_1, ..., PACKET_T]
```

This is **not** a Flow-CFM + Packet-CFM ensemble. Flow-level and packet-level
signals interact inside one Transformer velocity field, and a Phase 2
masked-prediction consistency loss explicitly trains the cross-modal
dependency.

This is the **current SOTA model** in the repo (within-dataset SOTA on
ISCXTor2016 / CICIDS2017 / CICDDoS2019; near-SOTA cross-dataset).

## Model

`UnifiedTokenCFM` uses fixed tokenization to avoid latent-collapse shortcuts:

```text
flow token:   [type=-1, normalized 20-d canonical flow features, zero pad]
packet token: [type=+1, normalized 9-d packet features,           zero pad]
```

Velocity field: 4-layer AdaLN-Zero Transformer (`d_model=128, n_heads=4`),
sinusoidal time embedding (`time_dim=64`). Total ≈ 1.23M parameters.

Loss with Phase 2 consistency:

```
L = L_main + λ_flow · L_mask_flow + λ_packet · L_mask_packet

L_main:        standard OT-CFM velocity regression with σ-band noise +
               Sinkhorn OT coupling.
L_mask_flow:   zero out the flow token's input at x_t; predict v[flow]
               from packet context only.
L_mask_packet: zero out a random 50% of real packet tokens at x_t;
               predict their velocities from flow + remaining packets.
```

Best hyperparameters from the σ × λ sweeps:

```
lambda_flow = lambda_packet = 0.3
packet_mask_ratio = 0.5
sigma = 0.6   # cross-dataset best; σ=0.1 marginally better for some within
use_ot = True
```

## Scores

The model exposes three classes of scores at inference:

```text
# primary
terminal_norm

# decomposed (analysis only)
terminal_flow         terminal_packet
arc_length            kinetic_energy   kinetic_flow   kinetic_packet
velocity_total        velocity_flow    velocity_packet

# Phase 1 diagnostics
curvature_total       curvature_flow   curvature_packet      # ∫ ||dv/dt||² dt
kappa2_speed2norm_packet_{mean,median,trimmed10_mean}        # packet curvature / speed²
jacobian_total        jacobian_flow    jacobian_packet       # Hutchinson VJP estimate of ||∂v/∂x||_F²
velocity_*_t{01..10}                                          # 18 time-profile scores

# Phase 2 cross-modal consistency
flow_consistency      packet_consistency      consistency_total
```

`terminal_norm` is the paper's primary score. The decomposed and diagnostic
scores serve **per-attack-family analysis** — they are NOT competing
SOTA claims. Multi-seed std on `terminal_norm` is ≤ 0.005 across all our
runs.

The Phase 2 consistency scores have a notable property: they are
**discriminative only when the model is trained with the consistency loss**.
On a baseline model `flow_consistency` is roughly random (0.57 on
CICIDS2017); after Phase 2 training it lifts to 0.88. On SSH-Patator,
where standard density scores struggle (`terminal_norm` 0.64), Phase 2
`flow_consistency` reaches 0.94.

## Train

```bash
# baseline (no consistency loss)
uv run python Unified_CFM/train.py --config Unified_CFM/configs/cicids2017_baseline.yaml

# Phase 2 with consistency loss (λ=0.1, σ=0.1)
uv run python Unified_CFM/train.py --config Unified_CFM/configs/cicids2017_consistency.yaml

# σ × λ sweeps and multi-seed orchestrators live in
# artifacts/verify_2026_04_24/run_*.sh
```

The intended setup is to use the workspace-canonical 20-d packet-derived
flow feature file:

```yaml
flow_features_path: datasets/cicids2017/processed/flow_features.parquet
flow_features_align: auto
```

`flow_features.parquet` is row-aligned with the Packet_CFM artifacts via
`flow_id`. With `flow_features_align: auto`, the loader uses direct
row/`flow_id` alignment when possible; scan alignment remains only for
legacy full CSV-derived caches.

For large datasets where a monolithic `packets.npz` would exceed memory,
the loader supports the sharded backend:

```yaml
source_store: datasets/cicddos2019/processed/full_store
val_cap: 20000
attack_cap: 20000
```

If `flow_features_path` is empty, the loader derives compact 16-d flow-level
statistics from the packet sequence. That fallback is for debugging only;
new runs should use the canonical 20-d file generated by
`scripts/generate_flow_features.py`.

## Evaluation

`artifacts/verify_2026_04_24/eval_phase1_unified.py` runs Phase 1 + Phase 2
score battery on a trained checkpoint, with per-attack-class AUROC.

`artifacts/verify_2026_04_24/eval_phase2_cross_cicddos2019.py` runs
cross-dataset CICIDS2017→CICDDoS2019 evaluation under the standard
10k benign + 10k stratified attack protocol.