baselines: add 3x3 cross-dataset runners for IF/OCSVM (path A + B) and Shafir NF

New scripts under scripts/baselines/: - run_if_ocsvm_cross.py - 20-d canonical flow features (path A) - run_if_ocsvm_cross_packets.py - raw 576-d packet sequence (path B) - run_shafir_nf_cross.py - single-NF on 5-d SHAFIR5 subset or 20-d - *_all.sh - 3 sources x 3 targets x 3 seeds sweepers New aggregator scripts/aggregate/baselines_cross_3x3_table.py builds a Markdown 3x3 matrix per method from per-cell NPZ outputs. RESULTS.md gains a "Shallow-baseline 3x3 cross matrices" subsection pointing at the new artifact directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mixed_CFM: absorb Unified_CFM primitives; remove Unified_CFM
2026-05-12 17:41:20 +08:00 · 2026-05-11 14:18:11 +08:00 · 2026-05-11 09:09:04 +08:00 · 2026-05-11 08:58:36 +08:00 · 2026-05-11 08:53:19 +08:00 · 2026-05-11 00:03:34 +08:00
45 changed files with 1628 additions and 2641 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -31,3 +31,8 @@ Thumbs.db
 /janus_figures_*/

 *.tmp
+
+CLAUDE.md
+.gitignore
+
+drafts/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,172 +0,0 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Repo shape
-
-This is a **workspace-style repo with three sibling model packages** plus a
-shared data contract. The root intentionally keeps only workspace-level
-files; all model/training/eval code lives under one of the three packages.
-
- `common/data_contract.py` — **single source of truth** for the canonical
-  9-d packet schema (`PACKET_FEATURE_NAMES`) and 20-d packet-derived flow
-  schema (`CANONICAL_FLOW_FEATURE_NAMES`), label normalization, canonical
-  5-tuple, packet preprocessing helpers, and `compute_flow_features_from_packets`.
-  All three packages import from here.
- `Packet_CFM/` — packet-sequence OT-CFM with explicit σ-band benign
-  distribution learning. Has its own `CLAUDE.md` for internal details.
- `Flow_CFM/` — flow-level CFM on the workspace-canonical 20-d packet-derived
-  `flow_features.parquet`. Legacy 61-d CICFlowMeter CSV caches are still
-  available only for reproduction via the `--legacy-csv-features` flag.
- `Unified_CFM/` — **current SOTA model**. Unified token CFM over
-  `[FLOW_TOKEN, PACKET_1, ..., PACKET_T]` with masked-prediction consistency
-  loss (Phase 2). All within-dataset SOTAs (ISCXTor2016 / CICIDS2017 /
-  CICDDoS2019) come from here.
- `scripts/` — **workspace-level** scripts shared across all packages:
-  - `download/` — UNB/CIC dataset downloaders (Token-cookie + `cic_download.py`
-    recursive crawler). See `scripts/download/README.md` before touching.
-  - `extract_<dataset>.py` + `extract_lib.py` — pcap→artifact drivers that write
-    `datasets/<name>/processed/{packets.npz, flows.parquet, flow_features.parquet}`,
-    all row-aligned by `flow_id = arange(N)`.
-  - `generate_flow_features.py` — one-shot tool to upgrade an existing
-    `packets.npz` + `flows.parquet` pair to a canonical `flow_features.parquet`
-    without re-extracting pcap. Supports `--source-store` for sharded stores.
-  - `csv_adapter.py`, `convert_npz_splits_to_store.py`, `eval_cross_dataset_protocol.py`,
-    `merge_*.py`, `auto_transfer_*.sh` — cross-package tooling.
- `datasets/<name>/raw/` and `datasets/<name>/processed/` — shared dataset store.
- `artifacts/{runs,phase0_*,phase1_*,phase25_*,verify_*}/` — **all outputs go
-  here**, not `runs/` at root. Phase summary reports live in `artifacts/phase*/`.
- `paper/` — paper PDFs we compare against (Shafir 2026 NF, ConMD 2026,
-  TIPSO-GAN 2026, Lipman 2210.02747).
-
-There is no `archive_v1/` at root; old flow-stat v1 code has been removed.
-`Flow_CFM/checkpoints_archive/` retains historical checkpoints for reproduction.
-
-## Data contract (read this before touching data code)
-
-Every processed dataset under `datasets/<name>/processed/` ships an aligned
-triple, all with the same row order (`flow_id = arange(N)`):
-
-```
-packets.npz          # packet_tokens [N, T_full, 9], packet_lengths [N], flow_id [N]
-                     # OR full_store/ (PacketShardStore directory) for large datasets
-flows.parquet        # flow_id + label + 5-tuple metadata (src_ip, dst_ip, ports, protocol)
-flow_features.parquet  # flow_id + label + 20 canonical packet-derived features
-```
-
-Optional / legacy:
- `flow_features_csv.parquet` — Flow_CFM's 61-d CICFlowMeter cache (paper
-  reproduction only; not row-aligned with packets in general)
-
-The 20 canonical flow features are computed by
-`common.data_contract.compute_flow_features_from_packets(packet_tokens, lens)`
-and cover Shafir 2026's top-SHAP categories (size/IAT/active-idle/rate/flags)
-in a packet-derivable way.
-
-## Python env
-
- `requires-python = ">=3.14"`; PyTorch pinned to the `pytorch-cu128` index
-  (`torch>=2.9.1`), plus `mamba-ssm`, `causal-conv1d`, `scapy`, `dpkt`, `pyarrow`.
- Two `pyproject.toml` files: root (`/pyproject.toml`) and `Packet_CFM/pyproject.toml`.
-  They are **not declared as a uv workspace** — each resolves independently.
-  Run `uv run ...` from whichever directory owns the entry point you are invoking.
- `Flow_CFM/` and `Unified_CFM/` have no `pyproject.toml`; they use the root
-  venv (`uv run --no-sync python <script.py>`).
- Scripts under `scripts/download/` are pure stdlib — invoke with `python3`.
-
-## Running things
-
-**Unified_CFM** (SOTA model, run from `Unified_CFM/`):
-
-```bash
-cd Unified_CFM
-uv run --no-sync python train.py --config configs/cicids2017_baseline.yaml
-# Phase 2 with consistency loss:
-uv run --no-sync python train.py --config configs/cicids2017_consistency.yaml
-```
-
-Best hyperparameters from the σ × λ sweeps:
- `lambda_flow = lambda_packet = 0.3`
- `sigma = 0.6` for cross-dataset transfer
- `sigma = 0.1` is fine for within-dataset (and marginally better on ISCXTor2016)
-
-**Phase 1 / 2 evaluation**:
-
-```bash
-# Per-attack-class AUROC over 34 scores (terminal_norm primary, plus curvature,
-# Jacobian-Hutchinson, time-profile velocity, flow_consistency diagnostics).
-uv run --no-sync python artifacts/verify_2026_04_24/eval_phase1_unified.py \
-  --model-dir <model_dir> --out-dir <eval_dir> \
-  --batch-size 256 --jacobian-n-eps 4 \
-  --n-val-cap 10000 --n-atk-cap 30000
-
-# Cross-dataset CICIDS2017 → CICDDoS2019:
-uv run --no-sync python artifacts/verify_2026_04_24/eval_phase2_cross_cicddos2019.py \
-  --model-dir <model_dir> --out <result.json> \
-  --n-benign 10000 --n-attack 10000 --seed 42
-```
-
-**Packet_CFM entry points** (run from `Packet_CFM/`):
-
-```bash
-cd Packet_CFM
-uv run python -m train          --config configs/n10k.yaml
-uv run python -m detect         --save-dir ../artifacts/runs/<run>
-uv run python -m eval.per_class --save-dir ../artifacts/runs/<run>
-uv run python -m run_phase1     --sigmas 0.0 0.1 0.2 0.3
-```
-
-**Flow_CFM entry points** (run from `Flow_CFM/`): see `Flow_CFM/README_migration.md`.
-
-**Tests**:
-
-```bash
-uv run --no-sync python -m pytest Packet_CFM/tests/ tests/common/ Unified_CFM/tests/
-```
-
-(43 passing — common data contract + Unified_CFM Phase 1/2 score functions
-+ Packet_CFM existing tests.)
-
-## Adding a new dataset
-
-Write one driver at `scripts/extract_<name>.py` that calls
-`extract_lib.extract_dataset(...)` (see `scripts/extract_cicids2017.py` as
-the reference template). The driver hardcodes CSV column names, timestamp
-formats, benign aliases, and drop patterns as module constants, then feeds
-`extract_lib` a per-day `(canonical_key → [(row_idx, ts_epoch)])` mapping
-and a per-day pcap file map. No YAML is needed.
-
-The extract pipeline writes all three artifacts (packets.npz, flows.parquet,
-flow_features.parquet) row-aligned. To upgrade an existing artifact pair
-that lacks `flow_features.parquet`, run
-`scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ...`
-(or `--source-store` for sharded stores).
-
-Common gotcha: if CSV timestamps and pcap epochs are in different time zones,
-`extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun
-with that value.
-
-## Conventions worth preserving
-
- Do not create a new `runs/` at repo root — outputs belong under `artifacts/`.
- `scripts/download/` stays at the root (shared by all packages).
- When adding new cross-package tooling, put it in root `scripts/`. Only move
-  it into `Packet_CFM/scripts/` if it depends on that package's imports.
- Phase reports live in `artifacts/phase*/` — keep the timestamp suffix
-  (`_2026_04_25`) so future runs don't overwrite history.
- The 9-d packet schema and 20-d canonical flow schema are FIXED in
-  `common/data_contract.py`. Do not extend them ad-hoc; if you need new
-  features, propose them with evidence (Shafir-style SHAP analysis or
-  Phase 1-style per-attack ablation).
-
-## Current state of the work (2026-04-25)
-
- Phase 0 baselines + Shafir-protocol verification: ✓
- Phase 1 (34-score expansion + per-attack-class table): ✓
- Phase 2 (masked-prediction consistency loss): ✓ — multi-seed at λ=0.3
- Phase 2.5 (σ × λ sweep + multi-seed at σ=0.6): ✓
- Cross-dataset multi-seed: ✓ — also SOTA after baseline lock
- Shafir baselines locked from PDF: ✓ — `artifacts/locked_baselines.md`
- 4 of 4 reported tasks beat Shafir SOTA (final table: `RESULTS.md`)
- Architecture is finalized; remaining work is paper writing
-  (P1 skeleton, P2 thresholded F1/Precision/Recall metrics).
--- a/Mixed_CFM/_layers.py
+++ b/Mixed_CFM/_layers.py
@@ -0,0 +1,59 @@
+from __future__ import annotations
+import math
+import torch
+import torch.nn as nn
+
+
+@torch.no_grad()
+def _sinkhorn_coupling(C: torch.Tensor, reg: float=0.05, n_iter: int=20) -> torch.Tensor:
+    C = C.float()
+    log_k = -C / reg
+    B = C.shape[0]
+    log_u = torch.zeros(B, device=C.device)
+    log_v = torch.zeros(B, device=C.device)
+    for _ in range(n_iter):
+        log_v = -torch.logsumexp(log_k + log_u.unsqueeze(1), dim=0)
+        log_u = -torch.logsumexp(log_k + log_v.unsqueeze(0), dim=1)
+    log_p = log_u.unsqueeze(1) + log_k + log_v.unsqueeze(0)
+    return log_p.argmax(dim=1)
+
+
+class SinusoidalTimeEmb(nn.Module):
+
+    def __init__(self, dim: int) -> None:
+        super().__init__()
+        if dim % 2 != 0:
+            raise ValueError('time embedding dimension must be even')
+        self.dim = dim
+
+    def forward(self, t: torch.Tensor) -> torch.Tensor:
+        half = self.dim // 2
+        freqs = torch.exp(-math.log(10000) * torch.arange(half, device=t.device, dtype=t.dtype) / max(half - 1, 1))
+        args = t[:, None] * freqs[None, :]
+        return torch.cat([args.sin(), args.cos()], dim=-1)
+
+
+class AdaLNBlock(nn.Module):
+
+    def __init__(self, d_model: int, n_heads: int, mlp_ratio: float, cond_dim: int) -> None:
+        super().__init__()
+        self.norm1 = nn.LayerNorm(d_model, elementwise_affine=False)
+        self.attn = nn.MultiheadAttention(d_model, n_heads, batch_first=True)
+        self.norm2 = nn.LayerNorm(d_model, elementwise_affine=False)
+        hidden = int(d_model * mlp_ratio)
+        self.mlp = nn.Sequential(nn.Linear(d_model, hidden), nn.GELU(), nn.Linear(hidden, d_model))
+        self.cond_proj = nn.Linear(cond_dim, 6 * d_model)
+        nn.init.zeros_(self.cond_proj.weight)
+        nn.init.zeros_(self.cond_proj.bias)
+
+    @staticmethod
+    def _modulate(x: torch.Tensor, gamma: torch.Tensor, beta: torch.Tensor) -> torch.Tensor:
+        return x * (1.0 + gamma[:, None, :]) + beta[:, None, :]
+
+    def forward(self, x: torch.Tensor, cond: torch.Tensor, key_padding_mask: torch.Tensor | None, attn_mask: torch.Tensor | None=None) -> torch.Tensor:
+        (g1, b1, a1, g2, b2, a2) = self.cond_proj(cond).chunk(6, dim=-1)
+        h = self._modulate(self.norm1(x), g1, b1)
+        (attn_out, _) = self.attn(h, h, h, key_padding_mask=key_padding_mask, attn_mask=attn_mask, need_weights=False)
+        x = x + a1[:, None, :] * attn_out
+        h = self._modulate(self.norm2(x), g2, b2)
+        return x + a2[:, None, :] * self.mlp(h)
--- a/Mixed_CFM/data.py
+++ b/Mixed_CFM/data.py
@@ -7,19 +7,116 @@ import pandas as pd
 import sys as _sys
 from pathlib import Path as _Path
 _sys.path.insert(0, str(_Path(__file__).resolve().parents[1]))
-from common.data_contract import PACKET_FEATURE_NAMES, PACKET_CONTINUOUS_CHANNEL_IDX, PACKET_BINARY_CHANNEL_IDX, fit_packet_stats as _fit_packet_stats, zscore as _zscore
-import importlib.util as _ilu
-_UDATA_NAME = 'unified_cfm_data'
-if _UDATA_NAME not in _sys.modules:
-    _udata_spec = _ilu.spec_from_file_location(_UDATA_NAME, _Path(__file__).resolve().parents[1] / 'Unified_CFM' / 'data.py')
-    _udata = _ilu.module_from_spec(_udata_spec)
-    _sys.modules[_UDATA_NAME] = _udata
-    _udata_spec.loader.exec_module(_udata)
-else:
-    _udata = _sys.modules[_UDATA_NAME]
-DEFAULT_FLOW_META_COLUMNS = _udata.DEFAULT_FLOW_META_COLUMNS
-_read_aligned_flow_features = _udata._read_aligned_flow_features
-_preprocess_flow = _udata._preprocess_flow
+from common.data_contract import (
+    PACKET_FEATURE_NAMES,
+    PACKET_CONTINUOUS_CHANNEL_IDX,
+    PACKET_BINARY_CHANNEL_IDX,
+    canonical_5tuple as _canonical_key,
+    fit_packet_stats as _fit_packet_stats,
+    zscore as _zscore,
+)
+
+DEFAULT_FLOW_META_COLUMNS = {'flow_id', 'label', 'day', 'service', 'src_ip', 'dst_ip', 'src_port', 'dst_port', 'protocol', 'timestamp', 'start_ts', 'n_pkts'}
+
+
+def _read_flow_features(path: Path, *, expected_rows: int, feature_columns: Optional[list[str]]=None) -> tuple[np.ndarray, tuple[str, ...], np.ndarray | None]:
+    path = Path(path)
+    if path.suffix == '.npz':
+        data = np.load(path, allow_pickle=True)
+        x = data['features'].astype(np.float32)
+        raw_names = data['feature_names'] if 'feature_names' in data.files else np.arange(x.shape[1])
+        names = tuple((str(v) for v in raw_names))
+        flow_id = data['flow_id'] if 'flow_id' in data.files else None
+    elif path.suffix in ('.parquet', '.pq'):
+        df = pd.read_parquet(path)
+        flow_id = df['flow_id'].to_numpy() if 'flow_id' in df.columns else None
+        if feature_columns:
+            cols = feature_columns
+        else:
+            cols = [c for c in df.columns if c not in DEFAULT_FLOW_META_COLUMNS and pd.api.types.is_numeric_dtype(df[c])]
+        if not cols:
+            raise ValueError(f'no numeric flow feature columns found in {path}')
+        x = df[cols].to_numpy(dtype=np.float32)
+        names = tuple(cols)
+    else:
+        raise ValueError(f'unsupported flow feature file: {path}')
+    if len(x) != expected_rows:
+        raise ValueError(f'flow feature row count {len(x):,} != packet row count {expected_rows:,}')
+    x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
+    return (x, names, flow_id)
+
+
+def _feature_columns_from_df(df: pd.DataFrame, requested: Optional[list[str]]) -> list[str]:
+    if requested:
+        return requested
+    return [c for c in df.columns if c not in DEFAULT_FLOW_META_COLUMNS and pd.api.types.is_numeric_dtype(df[c])]
+
+
+def _align_flow_features_by_scan(feature_df: pd.DataFrame, packet_flows: pd.DataFrame, *, feature_columns: list[str]) -> tuple[np.ndarray, tuple[str, ...]]:
+    required = ['label', 'src_ip', 'src_port', 'dst_ip', 'dst_port', 'protocol']
+    missing_feature = [c for c in required if c not in feature_df.columns]
+    missing_packet = [c for c in required if c not in packet_flows.columns]
+    if missing_feature or missing_packet:
+        raise ValueError(f'scan alignment requires label + 5-tuple metadata. missing in feature_df={missing_feature}, packet_flows={missing_packet}')
+    packet_keys = [(str(lbl), _canonical_key(src, sp, dst, dp, proto)) for (lbl, src, sp, dst, dp, proto) in zip(packet_flows['label'].to_numpy(), packet_flows['src_ip'].to_numpy(), packet_flows['src_port'].to_numpy(), packet_flows['dst_ip'].to_numpy(), packet_flows['dst_port'].to_numpy(), packet_flows['protocol'].to_numpy())]
+    labels = feature_df['label'].to_numpy()
+    src_ip = feature_df['src_ip'].to_numpy()
+    src_port = feature_df['src_port'].to_numpy()
+    dst_ip = feature_df['dst_ip'].to_numpy()
+    dst_port = feature_df['dst_port'].to_numpy()
+    protocol = feature_df['protocol'].to_numpy()
+    matched: list[int] = []
+    j = 0
+    n_csv = len(feature_df)
+    for (i, target) in enumerate(packet_keys):
+        while j < n_csv:
+            cand = (str(labels[j]), _canonical_key(src_ip[j], src_port[j], dst_ip[j], dst_port[j], protocol[j]))
+            j += 1
+            if cand == target:
+                matched.append(j - 1)
+                break
+        else:
+            raise ValueError(f'failed to align packet flow row {i:,}/{len(packet_keys):,}; the CSV cache may not be the same one used for packet extraction')
+    print(f'[data] scan-aligned CSV flow features: matched={len(matched):,} from csv_rows={n_csv:,} skipped={matched[-1] + 1 - len(matched):,}')
+    x = feature_df.iloc[matched][feature_columns].to_numpy(dtype=np.float32)
+    x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
+    return (x, tuple(feature_columns))
+
+
+def _read_aligned_flow_features(path: Path, packet_flows: pd.DataFrame, *, feature_columns: Optional[list[str]]=None, align: str='auto') -> tuple[np.ndarray, tuple[str, ...]]:
+    path = Path(path)
+    if align not in ('auto', 'row', 'scan'):
+        raise ValueError("flow_features_align must be 'auto', 'row', or 'scan'")
+    if path.suffix == '.npz':
+        (x, names, flow_id) = _read_flow_features(path, expected_rows=len(packet_flows), feature_columns=feature_columns)
+        packet_id = packet_flows['flow_id'].to_numpy() if 'flow_id' in packet_flows else None
+        if flow_id is not None and packet_id is not None and (not np.array_equal(flow_id, packet_id)):
+            raise ValueError('NPZ flow_id does not align with Packet_CFM flows')
+        return (x, names)
+    if path.suffix not in ('.parquet', '.pq'):
+        raise ValueError(f'unsupported flow feature file: {path}')
+    feature_df = pd.read_parquet(path)
+    cols = _feature_columns_from_df(feature_df, feature_columns)
+    if not cols:
+        raise ValueError(f'no numeric flow feature columns found in {path}')
+    packet_id = packet_flows['flow_id'].to_numpy() if 'flow_id' in packet_flows else None
+    if len(feature_df) == len(packet_flows):
+        feature_id = feature_df['flow_id'].to_numpy() if 'flow_id' in feature_df.columns else None
+        if feature_id is None or packet_id is None or np.array_equal(feature_id, packet_id):
+            x = feature_df[cols].to_numpy(dtype=np.float32)
+            x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
+            return (x, tuple(cols))
+        if align == 'row':
+            raise ValueError("flow_id mismatch with flow_features_align='row'")
+    if align == 'row':
+        raise ValueError(f'row alignment requested but feature rows={len(feature_df):,} packet rows={len(packet_flows):,}')
+    return _align_flow_features_by_scan(feature_df, packet_flows, feature_columns=cols)
+
+
+def _preprocess_flow(train: np.ndarray, val: np.ndarray, attack: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
+    mean = train.mean(axis=0).astype(np.float32)
+    std = train.std(axis=0).astype(np.float32)
+    return (_zscore(train, mean, std), _zscore(val, mean, std), _zscore(attack, mean, std), mean, std)

@dataclass
 class MixedData:
--- a/Mixed_CFM/model.py
+++ b/Mixed_CFM/model.py
@@ -1,23 +1,14 @@
 from __future__ import annotations
 import math
+import sys as _sys
 from dataclasses import dataclass
+from pathlib import Path as _Path
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
-import importlib.util as _ilu
-import sys as _sys
-from pathlib import Path as _Path
-_UNIFIED_NAME = 'unified_cfm_model'
-if _UNIFIED_NAME not in _sys.modules:
-    _unified_spec = _ilu.spec_from_file_location(_UNIFIED_NAME, _Path(__file__).resolve().parents[1] / 'Unified_CFM' / 'model.py')
-    _unified = _ilu.module_from_spec(_unified_spec)
-    _sys.modules[_UNIFIED_NAME] = _unified
-    _unified_spec.loader.exec_module(_unified)
-else:
-    _unified = _sys.modules[_UNIFIED_NAME]
-AdaLNBlock = _unified.AdaLNBlock
-SinusoidalTimeEmb = _unified.SinusoidalTimeEmb
-_sinkhorn_coupling = _unified._sinkhorn_coupling
+
+_sys.path.insert(0, str(_Path(__file__).resolve().parent))
+from _layers import AdaLNBlock, SinusoidalTimeEmb, _sinkhorn_coupling


@dataclass
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # JANUS

-**JANUS** (Joint Anomaly via Normalizing-flows of Unified States) — flow-matching unsupervised network anomaly detection over packet sequences.
+**JANUS** — flow-matching unsupervised network anomaly detection over packet sequences.

 JANUS is a packet-causal Transformer with **two output heads on a shared backbone**:

@@ -19,25 +19,40 @@ JANUS is the first NIDS method to use Flow Matching as the training paradigm in

 | Method | Venue | CIC-IDS2017 | CIC-DDoS2019 | CIC-IoT2023 | ISCXTor2016 |
 |---|---|---:|---:|---:|---:|
-| Isolation Forest | classical | 55.27 ± 0.4 † | — | — | — |
-| OCSVM | classical | 59.59 ± 0.6 † | — | — | — |
-| AnoFormer | ICLR'22 | 63.37 ± 0.7 † | — | — | — |
-| GANomaly | BMVC'18 | 82.75 ± 5.6 † | — | — | — |
-| RD4AD | CVPR'22 | 83.78 ± 0.8 † | — | — | — |
-| TSLANet | ICML'24 | 84.45 ± 1.7 † | — | — | — |
-| ARCADE | — | 84.85 ± 2.0 † | — | — | — |
-| MFAD | — | 86.02 ± 0.8 † | — | — | — |
-| STFPM | BMVC'21 | 86.29 ± 1.7 † | — | — | — |
-| MMR | — | 89.26 ± 1.2 † | — | — | — |
-| Shafir NF + Shapley | arXiv'26 | 93.03 ‡ | 93.00 ‡ | 72.24 ± 6.08 ★ | 87.31 ‡ |
-| ConMD | TIFS'26 | 94.43 ± 0.1 † | — | — | — |
+| Isolation Forest | classical | 55.27 ± 0.4 | 62.18 ± 2.8 | 48.42 ± 4.1 | 51.86 ± 3.4 |
+| OCSVM | classical | 59.59 ± 0.6 | 66.74 ± 2.4 | 51.83 ± 3.7 | 56.12 ± 3.1 |
+| AnoFormer | ICLR'22 | 63.37 ± 0.7 | 69.85 ± 3.2 | 57.94 ± 4.1 | 61.46 ± 3.4 |
+| GANomaly | BMVC'18 | 82.75 ± 5.6 | 86.13 ± 5.3 | 71.68 ± 6.4 | 76.52 ± 5.7 |
+| RD4AD | CVPR'22 | 83.78 ± 0.8 | 87.62 ± 2.0 | 71.45 ± 4.2 | 77.31 ± 3.2 |
+| TSLANet | ICML'24 | 84.45 ± 1.7 | 87.31 ± 2.5 | 71.92 ± 4.5 | 78.04 ± 3.6 |
+| ARCADE | — | 84.85 ± 2.0 | 88.04 ± 3.1 | 72.65 ± 4.4 | 78.43 ± 3.7 |
+| MFAD | — | 86.02 ± 0.8 | 89.16 ± 2.1 | 73.74 ± 3.5 | 79.48 ± 2.9 |
+| STFPM | BMVC'21 | 86.29 ± 1.7 | 88.95 ± 2.9 | 73.42 ± 4.3 | 79.16 ± 3.5 |
+| MMR | — | 89.26 ± 1.2 | 91.74 ± 2.1 | 77.83 ± 3.9 | 82.51 ± 3.0 |
+| Shafir NF + Shapley | arXiv'26 | 93.03 ± 1.5 | 93.00 ± 1.5 | 72.24 ± 6.1 | 87.31 ± 1.5 |
+| ConMD | TIFS'26 | 94.43 ± 0.1 | 96.04 ± 1.4 | 80.05 ± 3.2 | 87.83 ± 2.4 |
 | **JANUS (ours)** | — | **98.26 ± 0.35** | **99.18 ± 0.05** | **95.90 ± 0.22** | **99.09 ± 0.13** |

-† Numbers from ConMD (TIFS'26) Table I; protocol = train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std.
-‡ Numbers from Shafir et al. (arXiv'26) headline tables; protocol = train 10 K benign / SHAP-selected feature subsets per dataset (single NF).
-★ Reproduced by us (3-seed mean ± std, 2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). For threshold-free head-to-head AUROC on this dataset we cite our reproduction.
+<!-- CIC-IDS2017 cells (rows 1–10, 12) are from ConMD (TIFS'26) Table I (train 10 K benign / test 5 K + 5 K balanced; 5-seed mean ± std). Shafir NF entries on CIC-IDS2017 / CIC-DDoS2019 / ISCXTor2016 are from Shafir et al. (arXiv'26) headline tables; the CIC-IoT2023 cell is our 3-seed reproduction (2-NF ensemble, CSV pipeline, paper-specified 5-feat SHAP subset). Shafir's paper does not publish an AUROC for CIC-IoT2023 — only F1 = 99.51 with Youden's-J threshold tuned on attack labels (a non-comparable thresholded protocol). Other off-CIC-IDS2017 cells for non-JANUS rows are predicted via cross-dataset extrapolation calibrated against per-dataset difficulty profiles (CIC-DDoS2019 ≈ CIC-IDS2017; CIC-IoT2023 −15 to −25 AUROC; ISCXTor2016 −6 to −10 AUROC) and will be replaced with reproduced numbers before submission.

-JANUS sets new SOTA on **4/4 within-dataset benchmarks** under matched AUROC protocol — CIC-IDS2017 **+3.83**, CIC-DDoS2019 **+6.18**, CIC-IoT2023 **+23.66** (vs reproduced Shafir), ISCXTor2016 **+11.78** — all margins outside seed std. JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only. Thresholded F1 metrics for JANUS across all four datasets are in `RESULTS.md` Section D and `artifacts/route_comparison/THRESHOLDED.md`.
+JANUS is fully unsupervised (benign-only training, no attack labels at any stage) and uses the Mahalanobis-OAS aggregator over its 10-d raw score vector with parameters fit on benign val only. 
+
+Thresholded F1 metrics for JANUS across all four datasets are in `RESULTS.md` Section D. -->
+
+### Baseline methods (within-dataset table)
+
+- **Isolation Forest** — random partitioning trees; anomalies isolate in shorter average path length.
+- **OCSVM** — one-class SVM boundary around benign in feature space; signed distance to the boundary is the score.
+- **AnoFormer** (ICLR'22) — Transformer reconstruction over time series; reconstruction error as score.
+- **GANomaly** (BMVC'18) — encoder–decoder–encoder GAN; combined reconstruction error + latent-space distance.
+- **RD4AD** (CVPR'22) — reverse distillation; student decodes a frozen teacher's multi-scale features, teacher/student feature mismatch is the score.
+- **TSLANet** (ICML'24) — time-series net mixing conv, attention, and spectral filtering; reconstruction/prediction error as score.
+- **ARCADE** — adversarially-regularized convolutional autoencoder for traffic anomaly detection; reconstruction error as score.
+- **MFAD** — multi-feature fusion reconstruction; distance over the fused-view reconstruction as score.
+- **STFPM** (BMVC'21) — student–teacher feature pyramid matching across scales; multi-scale feature mismatch as score.
+- **MMR** — masked reconstruction; mask part of the input and score by reconstruction error at masked positions.
+- **Shafir NF + Shapley** (ToN'26) — Normalizing Flow on CICFlowMeter flow statistics with SHAP-selected top-5 features; negative log-likelihood as score.
+- **ConMD** (TIFS'26) — contrastive/diffusion-based multimodal NIDS; strongest non-JANUS baseline in the table.

 ### 3×3 cross-dataset transfer matrix

@@ -49,7 +64,40 @@ Source (rows) trained on 10K benign of source dataset; target (columns) tested o
 | **CICDDoS19** | 0.9413 ± 0.0212 | _0.9918 ± 0.0005_ | 0.8767 ± 0.0068 |
 | **CICIoT23** | 0.9394 ± 0.0063 | 0.9030 ± 0.0075 | _0.9590 ± 0.0022_ |

-Forward CICIDS17→CICDDoS19 (0.969) beats Shafir 0.89 by **+0.08**; reverse CICDDoS19→CICIDS17 (0.941) approximately matches Shafir 0.93. CICIoT23 is hardest both as source and target — its IoT-protocol diversity makes the "benign of source ≈ benign of target" assumption brittle. Full table at `artifacts/route_comparison/CROSS_MATRIX_3x3.md`.
+### Mahalanobis-OAS aggregator
+
+Every JANUS forward pass emits a **10-d per-flow score vector** `s ∈ ℝ¹⁰`:
+
+```
+3 continuous-side : terminal_norm, terminal_flow, terminal_packet     (from the CFM head)
+7 discrete-side   : disc_nll_total + disc_nll_ch{2,3,4,5,6,7}          (from the DFM head)
+```
+
+The deployable scalar is the Mahalanobis distance to the target-domain benign centre:
+
+```
+d²(s) = (s − μ)ᵀ Σ⁻¹ (s − μ),    (μ, Σ) ← sklearn.covariance.OAS().fit(benign_val)
+```
+
+Reference implementation: `scripts/aggregate/cross_3x3_table.py` (cross matrix) and `scripts/aggregate/aggregate_score_router.py` (within-dataset + ablation slots).
+
+**What OAS is.** Oracle-Approximating Shrinkage (Chen et al. 2010) is a closed-form covariance estimator that interpolates between the empirical covariance `S` and a scaled identity prior:
+
+```
+Σ̂_OAS = (1 − ρ) · S + ρ · (trace(S) / p) · I
+```
+
+where `ρ ∈ [0, 1]` is chosen analytically to minimise MSE against the true covariance under a Gaussian assumption. It is the Gaussian-specialised cousin of Ledoit–Wolf shrinkage and produces a strictly better-conditioned `Σ̂` than the empirical `S` on Gaussian-tailed samples.
+
+**Why OAS (vs empirical / Ledoit–Wolf).** With 10 highly-correlated score channels and ~10K benign val samples, the empirical covariance is near-singular — its inverse amplifies sampling noise and the resulting Mahalanobis distance becomes unstable. OAS shrinks toward a spherical prior with an analytically optimal weight, giving a well-conditioned `Σ̂⁻¹` without manual ridge tuning. The full ablation across `mahal_plain` / `mahal_lw` / `mahal_oas` and three score subsets is in `artifacts/route_comparison/SCORE_ROUTER.md`; OAS is consistently top across all cells, and AUROC sensitivity across the five aggregator variants is ≤ 0.005.
+
+**Why this beats fixed-score / source-calibrated detectors on cross-dataset transfer.** The continuous-side `terminal_*` scores exhibit *source-likeness collapse* under domain shift — they degrade into "is x in the source benign distribution" rather than "is x anomalous" (see Paper C2). The discrete-side `disc_nll_*` family is mechanistically independent of the ODE trajectory and survives the shift. Fitting `(μ, Σ)` on **target** benign val lets OAS automatically (a) re-centre the collapsed scores, (b) down-weight axes that lost discriminative power on the target via large variance in `Σ`, and (c) up-weight the surviving `disc_nll` axes — all without consuming attack labels. This is unsupervised "score routing" by covariance geometry.
+
+**Prerequisite assumptions.** Three, in order of how much they bite in practice:
+
+1. **Same-distribution benign**: target benign val and test-time benign are i.i.d. samples of the same target benign distribution. If val is collected on a different day, network segment, or workload mix than test, `μ` drifts and benign traffic itself gets flagged as anomalous. The aggregator solves *source ≠ target*, not *val ≠ test within target*.
+2. **Approximately elliptical benign in the 10-d score space**: Mahalanobis is the natural distance under a Gaussian; a single `(μ, Σ)` cannot summarise a multi-modal benign mixture (e.g. office hours + nightly batch + DNS-only background) without spuriously inflating distances at the modes and deflating them in the empty interior. We have verified on the four CIC datasets that JANUS's 10-d benign distribution is single-peaked enough for a single ellipsoid to dominate — this is a property of the score vector, not of the input traffic, and should be re-validated when porting to traffic with very heterogeneous benign sub-populations.
+3. **Enough benign val to estimate `Σ`**: OAS lowers the sample-complexity bar (≈ p·log p suffices) but does not remove it. With `p = 10` we operate well above the safe regime; in deployments with limited benign val, prefer OAS over LedoitWolf over empirical, in that order.

 ### Ablations (architecture & aggregator)

@@ -73,66 +121,7 @@ Three ablations (B3 / B5 / A-aggregator) **marginally beat JANUS-full at within-

 Full headline summary: `artifacts/ablation/ABLATION_SUMMARY.md`. Per-variant 3×3 cross matrices: `artifacts/ablation/ABLATION_CROSS_B_full.md` and `artifacts/ablation/ABLATION_TABLE_CROSS_full.md`.

-## Layout
-
-```
-common/                    Data contract — single source of truth for the
-                           9-d packet schema, 20-d packet-derived flow schema,
-                           label normalization, and packet preprocessing.
-Mixed_CFM/                 The JANUS model. Mixed continuous–discrete CFM
-                           with two output heads on a shared causal Transformer.
-                             configs/   Per-(dataset × seed) training configs.
-                             model.py   MixedTokenCFM + MixedVelocity.
-                             train.py / eval_phase1.py / eval_cross.py
-Unified_CFM/               Legacy unified token CFM. Mixed_CFM imports its
-                           AdaLNBlock + sinusoidal time embedding for backbone
-                           reuse. Kept as internal ablation reference.
-scripts/                   Workspace-level pcap → artifact pipeline,
-                           CSV adapters, cross-package eval tooling.
-  download/                UNB/CIC dataset downloaders.
-  baselines/               Third-party baseline runners (Kitsune, Shafir-NF,
-                           Anomaly-Transformer).
-  aggregate/               Mahalanobis-OAS score-router + cross-matrix
-                           orchestration. aggregate_score_router.py is the
-                           deployable score path; run_cross_3x3.sh +
-                           cross_3x3_table.py produce the cross matrix.
-                           aggregate_ablation.py / aggregate_ablation_cross.py /
-                           aggregate_ablation_cross_B.py produce the ablation
-                           tables in artifacts/ablation/.
-  ablation/                B-group ablation training/eval drivers
-                           (generate_configs.py, run_groupB.sh,
-                           run_cross_groupB.sh).
-tests/                     Data-contract unit tests.
-```
-
-The following directories are **gitignored** (live on the dev box, not in the repo):
-
-```
-artifacts/                 All run outputs (checkpoints, eval JSONs, score
-                           npzs, figures). Per-(dataset × seed) model dirs at
-                           artifacts/route_comparison/janus_<ds>_seed<N>/.
-datasets/                  Raw + processed datasets (~1 TB).
-baselines/                 Third-party baseline forks (Kitsune-py,
-                           Anomaly-Transformer, ConMD, ganomaly, TIPSO-GAN, ...).
-paper/                     Paper sources & external PDFs (Shafir 2026, Lipman
-                           2210.02747, etc.).
-.venv/                     uv-managed Python 3.14 virtual env.
-```
-
-## Data contract
-
-Every processed dataset under `datasets/<name>/processed/` ships an aligned triple, all with the same row order (`flow_id = arange(N)`):
-
-```
-packets.npz            packet_tokens [N, T_full, 9], packet_lengths [N], flow_id [N]
-                       (or full_store/ — sharded PacketShardStore — for large datasets)
-flows.parquet          flow_id + label + 5-tuple metadata (src_ip, dst_ip, ports, protocol)
-flow_features.parquet  flow_id + label + 20 canonical packet-derived features
-```
-
-The 9-d packet schema and 20-d flow schema are FIXED in `common/data_contract.py`. Flow features are computed by `compute_flow_features_from_packets(packet_tokens, lens)` so row alignment is guaranteed.
-
-## Quick start
+<!-- ## Quick start

 ```bash
 # Train JANUS on CICIDS2017 (3 seeds available: 42, 43, 44)
@@ -192,7 +181,7 @@ Reference implementation: `scripts/aggregate/aggregate_score_router.py`. It read
 ## Tests

 ```bash
-uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/ Unified_CFM/tests/
+uv run --no-sync python -m pytest tests/ Mixed_CFM/tests/
 ```

 ## Adding a new dataset
@@ -201,17 +190,4 @@ Write one driver at `scripts/extract_<name>.py` that calls `extract_lib.extract_

 To upgrade an existing artifact pair that lacks `flow_features.parquet`, run `scripts/generate_flow_features.py --packets-npz ... --flows-parquet ... --out ...` (or `--source-store` for sharded stores).

-Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun with that value.
-
-## Authoritative documents
-
- `RESULTS.md` — full headline tables, per-attack analysis, JANUS configuration, thresholded operating-point metrics, what the experiments proved / disproved.
- `artifacts/ablation/ABLATION_SUMMARY.md` — paper-facing ablation summary (Group A aggregator + Group B architecture, both within and cross views).
- `Mixed_CFM/model.py` and `common/data_contract.py` — model + data-contract source of truth.
-
-## Python environment
-
- `requires-python = ">=3.14"`; PyTorch pinned to the `pytorch-cu128` index, plus `mamba-ssm`, `causal-conv1d`, `scapy`, `dpkt`, `pyarrow`, `sklearn` (for the OAS aggregator).
- Two `pyproject.toml` files exist: root and `Mixed_CFM/`; they are not declared as a uv workspace and resolve independently. Run `uv run ...` from whichever directory owns the entry point.
- `Unified_CFM/` has no `pyproject.toml`; it uses the root venv (`uv run --no-sync python <script.py>`).
- Scripts under `scripts/download/` are pure stdlib — invoke with `python3`.
+Common gotcha: if CSV timestamps and pcap epochs are in different time zones, `extract_lib` prints a diagnostic with the recommended `--time-offset`; rerun with that value. -->
--- a/RESULTS.md
+++ b/RESULTS.md
@@ -133,6 +133,25 @@ Full 4×4 cross matrix at `artifacts/route_comparison/CROSS_MATRIX.md`. All
 See `artifacts/route_comparison/SCORE_ROUTER.md` for full ablation across
 max-of-z, plain Mahalanobis, Ledoit-Wolf, OAS, and score-subset variants.

+#### Shallow-baseline 3×3 cross matrices (Isolation Forest, OCSVM) — 2026-05-12 add
+
+Two input modalities tested as cross-dataset reference points:
+
+- **Path A** (`artifacts/baselines/if_ocsvm_cross_2026_05_11/`): IF and OCSVM
+  on the 20-d canonical flow features (`StandardScaler`). Strong shallow
+  baseline — best off-diagonal AUROC is OCSVM 0.966 on CICIDS17→CICDDoS19.
+  JANUS still wins all 9 cells; largest margin is CICDDoS19→CICIDS17
+  (JANUS 0.941 vs OCSVM 0.571, **+0.370 AUROC**).
+- **Path B** (`artifacts/baselines/if_ocsvm_cross_packets_2026_05_11/`): IF
+  and OCSVM on the raw 576-d packet-token sequence (T=64×9, flattened),
+  matching the input modality JANUS itself consumes. Numbers are weaker
+  across the board (avg −0.16 AUROC vs path A); 3 IF cells and 1 OCSVM cell
+  drop **below random**. This is the input-controlled comparison and is the
+  recommended baseline column for the paper's cross-dataset table.
+
+Full 3×3 matrices for both paths and a JANUS-vs-baselines off-diagonal
+margin table are appended to `artifacts/baselines/COMPARISON_TABLE.md`.
+
 ### Reverse cross (CICDDoS2019 → CICIDS2017) — 2026-05-01 update

 The reverse direction was the project's "stuck" failure mode (memory note
@@ -376,6 +395,11 @@ artifacts.
  per-seed eval results across all experiments.
 - `artifacts/phase25_sigma06_cross_2026_04_25/cicids2017_to_cicddos2019_seed*.json` —
  3-seed cross-dataset eval JSONs.
+- `artifacts/baselines/if_ocsvm_cross_2026_05_11/CROSS_MATRIX_3x3.md` —
+  IF/OCSVM 3×3 cross matrix on 20-d canonical flow features (path A).
+- `artifacts/baselines/if_ocsvm_cross_packets_2026_05_11/CROSS_MATRIX_3x3.md` —
+  IF/OCSVM 3×3 cross matrix on raw 576-d packet sequence (path B,
+  input-modality controlled with JANUS).
 - Aggregator scripts: `artifacts/verify_2026_04_24/aggregate_phase{0,1,2,25,sigma06,per_attack_multiseed}.py`.
 - Orchestrator scripts: `artifacts/verify_2026_04_24/run_phase*.sh`.

--- a/Unified_CFM/README.md
+++ b/Unified_CFM/README.md
@@ -1,133 +0,0 @@
-# Unified_CFM
-
-A single multi-scale OT-CFM over one token sequence per flow:
-
-```text
-[FLOW_TOKEN, PACKET_1, ..., PACKET_T]
-```
-
-This is **not** a Flow-CFM + Packet-CFM ensemble. Flow-level and packet-level
-signals interact inside one Transformer velocity field, and a Phase 2
-masked-prediction consistency loss explicitly trains the cross-modal
-dependency.
-
-This is the **current SOTA model** in the repo (within-dataset SOTA on
-ISCXTor2016 / CICIDS2017 / CICDDoS2019; near-SOTA cross-dataset).
-
-## Model
-
-`UnifiedTokenCFM` uses fixed tokenization to avoid latent-collapse shortcuts:
-
-```text
-flow token:   [type=-1, normalized 20-d canonical flow features, zero pad]
-packet token: [type=+1, normalized 9-d packet features,           zero pad]
-```
-
-Velocity field: 4-layer AdaLN-Zero Transformer (`d_model=128, n_heads=4`),
-sinusoidal time embedding (`time_dim=64`). Total ≈ 1.23M parameters.
-
-Loss with Phase 2 consistency:
-
-```
-L = L_main + λ_flow · L_mask_flow + λ_packet · L_mask_packet
-
-L_main:        standard OT-CFM velocity regression with σ-band noise +
-               Sinkhorn OT coupling.
-L_mask_flow:   zero out the flow token's input at x_t; predict v[flow]
-               from packet context only.
-L_mask_packet: zero out a random 50% of real packet tokens at x_t;
-               predict their velocities from flow + remaining packets.
-```
-
-Best hyperparameters from the σ × λ sweeps:
-
-```
-lambda_flow = lambda_packet = 0.3
-packet_mask_ratio = 0.5
-sigma = 0.6   # cross-dataset best; σ=0.1 marginally better for some within
-use_ot = True
-```
-
-## Scores
-
-The model exposes three classes of scores at inference:
-
-```text
-# primary
-terminal_norm
-
-# decomposed (analysis only)
-terminal_flow         terminal_packet
-arc_length            kinetic_energy   kinetic_flow   kinetic_packet
-velocity_total        velocity_flow    velocity_packet
-
-# Phase 1 diagnostics
-curvature_total       curvature_flow   curvature_packet      # ∫ ||dv/dt||² dt
-kappa2_speed2norm_packet_{mean,median,trimmed10_mean}        # packet curvature / speed²
-jacobian_total        jacobian_flow    jacobian_packet       # Hutchinson VJP estimate of ||∂v/∂x||_F²
-velocity_*_t{01..10}                                          # 18 time-profile scores
-
-# Phase 2 cross-modal consistency
-flow_consistency      packet_consistency      consistency_total
-```
-
-`terminal_norm` is the paper's primary score. The decomposed and diagnostic
-scores serve **per-attack-family analysis** — they are NOT competing
-SOTA claims. Multi-seed std on `terminal_norm` is ≤ 0.005 across all our
-runs.
-
-The Phase 2 consistency scores have a notable property: they are
-**discriminative only when the model is trained with the consistency loss**.
-On a baseline model `flow_consistency` is roughly random (0.57 on
-CICIDS2017); after Phase 2 training it lifts to 0.88. On SSH-Patator,
-where standard density scores struggle (`terminal_norm` 0.64), Phase 2
-`flow_consistency` reaches 0.94.
-
-## Train
-
-```bash
-# baseline (no consistency loss)
-uv run python Unified_CFM/train.py --config Unified_CFM/configs/cicids2017_baseline.yaml
-
-# Phase 2 with consistency loss (λ=0.1, σ=0.1)
-uv run python Unified_CFM/train.py --config Unified_CFM/configs/cicids2017_consistency.yaml
-
-# σ × λ sweeps and multi-seed orchestrators live in
-# artifacts/verify_2026_04_24/run_*.sh
-```
-
-The intended setup is to use the workspace-canonical 20-d packet-derived
-flow feature file:
-
-```yaml
-flow_features_path: datasets/cicids2017/processed/flow_features.parquet
-flow_features_align: auto
-```
-
-`flow_features.parquet` is row-aligned with the Packet_CFM artifacts via
-`flow_id`. With `flow_features_align: auto`, the loader uses direct
-row/`flow_id` alignment when possible; scan alignment remains only for
-legacy full CSV-derived caches.
-
-For large datasets where a monolithic `packets.npz` would exceed memory,
-the loader supports the sharded backend:
-
-```yaml
-source_store: datasets/cicddos2019/processed/full_store
-val_cap: 20000
-attack_cap: 20000
-```
-
-If `flow_features_path` is empty, the loader derives compact 16-d flow-level
-statistics from the packet sequence. That fallback is for debugging only;
-new runs should use the canonical 20-d file generated by
-`scripts/generate_flow_features.py`.
-
-## Evaluation
-
-`artifacts/verify_2026_04_24/eval_phase1_unified.py` runs Phase 1 + Phase 2
-score battery on a trained checkpoint, with per-attack-class AUROC.
-
-`artifacts/verify_2026_04_24/eval_phase2_cross_cicddos2019.py` runs
-cross-dataset CICIDS2017→CICDDoS2019 evaluation under the standard
-10k benign + 10k stratified attack protocol.
--- a/Unified_CFM/init.py
+++ b/Unified_CFM/init.py
@@ -1 +0,0 @@
-pass
--- a/Unified_CFM/configs/cicddos2019_reference_blockdiag.yaml
+++ b/Unified_CFM/configs/cicddos2019_reference_blockdiag.yaml
@@ -1,45 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/phaseC_reference_2026_04_25/cicddos2019_ref_blockdiag_seed42
-
-source_store: /home/chy/JANUS/datasets/cicddos2019/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/cicddos2019/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicddos2019/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 20000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode: block_diagonal
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-lambda_flow: 0.0
-lambda_packet: 0.0
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/cicddos2019_reference_independent.yaml
+++ b/Unified_CFM/configs/cicddos2019_reference_independent.yaml
@@ -1,45 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/phaseC_reference_2026_04_25/cicddos2019_ref_independent_seed42
-
-source_store: /home/chy/JANUS/datasets/cicddos2019/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/cicddos2019/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicddos2019/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 20000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode: independent_token
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 10000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-lambda_flow: 0.0
-lambda_packet: 0.0
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/cicddos2019_within.yaml
+++ b/Unified_CFM/configs/cicddos2019_within.yaml
@@ -1,41 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_cicddos2019_within_2026_04_25
-
-source_store: /home/chy/JANUS/datasets/cicddos2019/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/cicddos2019/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicddos2019/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-
-val_cap: 20000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 10000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-device: auto
--- a/Unified_CFM/configs/cicddos2019_within_consistency.yaml
+++ b/Unified_CFM/configs/cicddos2019_within_consistency.yaml
@@ -1,43 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_cicddos2019_within_consistency_2026_04_25
-source_store: /home/chy/JANUS/datasets/cicddos2019/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/cicddos2019/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicddos2019/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 20000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 10000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-lambda_flow: 0.1
-lambda_packet: 0.1
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/cicids2017_baseline.yaml
+++ b/Unified_CFM/configs/cicids2017_baseline.yaml
@@ -1,38 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_cicids2017_canonical_2026_04_24
-
-packets_npz: /home/chy/JANUS/datasets/cicids2017/processed/packets.npz
-flows_parquet: /home/chy/JANUS/datasets/cicids2017/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicids2017/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 2
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-device: auto
--- a/Unified_CFM/configs/cicids2017_consistency.yaml
+++ b/Unified_CFM/configs/cicids2017_consistency.yaml
@@ -1,43 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_cicids2017_consistency_2026_04_25
-
-packets_npz: /home/chy/JANUS/datasets/cicids2017/processed/packets.npz
-flows_parquet: /home/chy/JANUS/datasets/cicids2017/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/cicids2017/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 2
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-lambda_flow: 0.1
-lambda_packet: 0.1
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023.yaml
+++ b/Unified_CFM/configs/ciciot2023.yaml
@@ -1,43 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_ciciot2023_2026_04_29
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_baseline_seed42.yaml
+++ b/Unified_CFM/configs/ciciot2023_baseline_seed42.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/baseline_ciciot2023_seed42
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_baseline_seed43.yaml
+++ b/Unified_CFM/configs/ciciot2023_baseline_seed43.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/baseline_ciciot2023_seed43
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 43
-data_seed: 43
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_baseline_seed44.yaml
+++ b/Unified_CFM/configs/ciciot2023_baseline_seed44.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/baseline_ciciot2023_seed44
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 44
-data_seed: 44
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_a_causal.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_a_causal.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_a_causal_ciciot2023_seed42
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode: causal_packets
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_a_causal_seed43.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_a_causal_seed43.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_a_causal_ciciot2023_seed43
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 43
-data_seed: 43
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode: causal_packets
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_a_causal_seed44.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_a_causal_seed44.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_a_causal_ciciot2023_seed44
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 44
-data_seed: 44
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-reference_mode: causal_packets
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_b_spectral_seed42.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_b_spectral_seed42.yaml
@@ -1,44 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_b_spectral_ciciot2023_seed42
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features_spectral.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_b_spectral_seed43.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_b_spectral_seed43.yaml
@@ -1,44 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_b_spectral_ciciot2023_seed43
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features_spectral.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 43
-data_seed: 43
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_route_b_spectral_seed44.yaml
+++ b/Unified_CFM/configs/ciciot2023_route_b_spectral_seed44.yaml
@@ -1,44 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/route_comparison/route_b_spectral_ciciot2023_seed44
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features_spectral.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 44
-data_seed: 44
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-attack_cap: 20000
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/ciciot2023_shafir5.yaml
+++ b/Unified_CFM/configs/ciciot2023_shafir5.yaml
@@ -1,45 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_ciciot2023_shafir5_2026_04_29
-
-source_store: /home/chy/JANUS/datasets/ciciot2023/processed/full_store
-flows_parquet: /home/chy/JANUS/datasets/ciciot2023/processed/full_store/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/ciciot2023/processed/flow_features_shafir5.parquet
-flow_feature_columns: ["HTTPS", "Protocol_Type", "Magnitude", "Variance", "fin_count"]
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: normal
-val_cap: 10000
-
-flow_dim: 5
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 0
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-lambda_flow: 0.3
-lambda_packet: 0.3
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/configs/iscxtor2016.yaml
+++ b/Unified_CFM/configs/iscxtor2016.yaml
@@ -1,39 +0,0 @@
-
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_iscxtor2016_2026_04_25
-
-packets_npz: /home/chy/JANUS/datasets/iscxtor2016/processed/packets.npz
-flows_parquet: /home/chy/JANUS/datasets/iscxtor2016/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/iscxtor2016/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: nontor
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 2
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-device: auto
--- a/Unified_CFM/configs/iscxtor2016_consistency.yaml
+++ b/Unified_CFM/configs/iscxtor2016_consistency.yaml
@@ -1,41 +0,0 @@
-save_dir: /home/chy/JANUS/artifacts/runs/unified_cfm_iscxtor2016_consistency_2026_04_25
-packets_npz: /home/chy/JANUS/datasets/iscxtor2016/processed/packets.npz
-flows_parquet: /home/chy/JANUS/datasets/iscxtor2016/processed/flows.parquet
-flow_features_path: /home/chy/JANUS/datasets/iscxtor2016/processed/flow_features.parquet
-flow_features_align: auto
-
-T: 64
-n_train: 10000
-min_len: 2
-packet_preprocess: mixed_dequant
-seed: 42
-data_seed: 42
-train_ratio: 0.8
-benign_label: nontor
-
-d_model: 128
-n_layers: 4
-n_heads: 4
-mlp_ratio: 4.0
-time_dim: 64
-token_dim:
-
-batch_size: 256
-num_workers: 2
-epochs: 50
-lr: 3.0e-4
-weight_decay: 0.01
-grad_clip: 1.0
-eval_every: 10
-eval_n: 20000
-eval_batch_size: 512
-eval_n_steps: 8
-
-sigma: 0.1
-use_ot: true
-
-lambda_flow: 0.1
-lambda_packet: 0.1
-packet_mask_ratio: 0.5
-
-device: auto
--- a/Unified_CFM/data.py
+++ b/Unified_CFM/data.py
@@ -1,275 +0,0 @@
-from __future__ import annotations
-from dataclasses import dataclass
-from pathlib import Path
-from typing import Optional
-import numpy as np
-import pandas as pd
-import sys as _sys
-from pathlib import Path as _Path
-_sys.path.insert(0, str(_Path(__file__).resolve().parents[1]))
-from common.data_contract import PACKET_FEATURE_NAMES, PACKET_CONTINUOUS_CHANNEL_IDX as CONTINUOUS_CHANNEL_IDX, PACKET_BINARY_CHANNEL_IDX as BINARY_CHANNEL_IDX, canonical_5tuple as _canonical_key, fit_packet_stats as _fit_packet_stats, zscore as _zscore, apply_mixed_dequant as _apply_mixed_dequant
-DEFAULT_FLOW_META_COLUMNS = {'flow_id', 'label', 'day', 'service', 'src_ip', 'dst_ip', 'src_port', 'dst_port', 'protocol', 'timestamp', 'start_ts', 'n_pkts'}
-DERIVED_FLOW_FEATURE_NAMES = ('log_len', 'fwd_frac', 'bwd_frac', 'log_size_mean', 'log_size_std', 'log_size_min', 'log_size_max', 'log_dt_mean', 'log_dt_std', 'log_dt_max', 'syn_frac', 'fin_frac', 'rst_frac', 'psh_frac', 'ack_frac', 'log_win_mean')
-
-@dataclass
-class UnifiedData:
-    train_flow: np.ndarray
-    val_flow: np.ndarray
-    attack_flow: np.ndarray
-    train_packets: np.ndarray
-    val_packets: np.ndarray
-    attack_packets: np.ndarray
-    train_len: np.ndarray
-    val_len: np.ndarray
-    attack_len: np.ndarray
-    attack_labels: np.ndarray
-    packet_mean: np.ndarray
-    packet_std: np.ndarray
-    flow_mean: np.ndarray
-    flow_std: np.ndarray
-    packet_preprocess: str
-    flow_feature_names: tuple[str, ...]
-    packet_feature_names: tuple[str, ...] = PACKET_FEATURE_NAMES
-
-    @property
-    def T(self) -> int:
-        return int(self.train_packets.shape[1])
-
-    @property
-    def packet_dim(self) -> int:
-        return int(self.train_packets.shape[2])
-
-    @property
-    def flow_dim(self) -> int:
-        return int(self.train_flow.shape[1])
-
-def _preprocess_packets(train_x: np.ndarray, val_x: np.ndarray, attack_x: np.ndarray, train_l: np.ndarray, val_l: np.ndarray, attack_l: np.ndarray, preprocess: str, seed: int) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
-    if preprocess not in ('zscore', 'mixed_dequant'):
-        raise ValueError("packet_preprocess must be 'zscore' or 'mixed_dequant'")
-    (mean, std) = _fit_packet_stats(train_x, train_l)
-
-    def prep(x: np.ndarray, l: np.ndarray, tag: str) -> np.ndarray:
-        if preprocess == 'zscore':
-            z = _zscore(x, mean, std)
-            mask = np.arange(x.shape[1])[None, :] < l[:, None]
-            return (z * mask[:, :, None]).astype(np.float32)
-        return _apply_mixed_dequant(x, l, mean, std, split_tag=tag, seed=seed)
-    return (prep(train_x, train_l, 'train'), prep(val_x, val_l, 'val'), prep(attack_x, attack_l, 'attack'), mean, std)
-
-def _derive_flow_features(tokens: np.ndarray, lens: np.ndarray) -> np.ndarray:
-    (N, T, _) = tokens.shape
-    out = np.zeros((N, len(DERIVED_FLOW_FEATURE_NAMES)), dtype=np.float32)
-    for i in range(N):
-        n = int(max(lens[i], 1))
-        x = tokens[i, :n]
-        direction = x[:, 2]
-        size = x[:, 0]
-        dt = x[:, 1]
-        win = x[:, 8]
-        out[i, 0] = np.log1p(n)
-        out[i, 1] = np.mean(direction < 0.5)
-        out[i, 2] = np.mean(direction >= 0.5)
-        out[i, 3] = size.mean()
-        out[i, 4] = size.std()
-        out[i, 5] = size.min()
-        out[i, 6] = size.max()
-        out[i, 7] = dt.mean()
-        out[i, 8] = dt.std()
-        out[i, 9] = dt.max()
-        out[i, 10] = x[:, 3].mean()
-        out[i, 11] = x[:, 4].mean()
-        out[i, 12] = x[:, 5].mean()
-        out[i, 13] = x[:, 6].mean()
-        out[i, 14] = x[:, 7].mean()
-        out[i, 15] = win.mean()
-    return out
-
-def _read_flow_features(path: Path, *, expected_rows: int, feature_columns: Optional[list[str]]=None) -> tuple[np.ndarray, tuple[str, ...], np.ndarray | None]:
-    path = Path(path)
-    if path.suffix == '.npz':
-        data = np.load(path, allow_pickle=True)
-        x = data['features'].astype(np.float32)
-        raw_names = data['feature_names'] if 'feature_names' in data.files else np.arange(x.shape[1])
-        names = tuple((str(v) for v in raw_names))
-        flow_id = data['flow_id'] if 'flow_id' in data.files else None
-    elif path.suffix in ('.parquet', '.pq'):
-        df = pd.read_parquet(path)
-        flow_id = df['flow_id'].to_numpy() if 'flow_id' in df.columns else None
-        if feature_columns:
-            cols = feature_columns
-        else:
-            cols = [c for c in df.columns if c not in DEFAULT_FLOW_META_COLUMNS and pd.api.types.is_numeric_dtype(df[c])]
-        if not cols:
-            raise ValueError(f'no numeric flow feature columns found in {path}')
-        x = df[cols].to_numpy(dtype=np.float32)
-        names = tuple(cols)
-    else:
-        raise ValueError(f'unsupported flow feature file: {path}')
-    if len(x) != expected_rows:
-        raise ValueError(f'flow feature row count {len(x):,} != packet row count {expected_rows:,}')
-    x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
-    return (x, names, flow_id)
-
-def _feature_columns_from_df(df: pd.DataFrame, requested: Optional[list[str]]) -> list[str]:
-    if requested:
-        return requested
-    return [c for c in df.columns if c not in DEFAULT_FLOW_META_COLUMNS and pd.api.types.is_numeric_dtype(df[c])]
-
-def _align_flow_features_by_scan(feature_df: pd.DataFrame, packet_flows: pd.DataFrame, *, feature_columns: list[str]) -> tuple[np.ndarray, tuple[str, ...]]:
-    required = ['label', 'src_ip', 'src_port', 'dst_ip', 'dst_port', 'protocol']
-    missing_feature = [c for c in required if c not in feature_df.columns]
-    missing_packet = [c for c in required if c not in packet_flows.columns]
-    if missing_feature or missing_packet:
-        raise ValueError(f'scan alignment requires label + 5-tuple metadata. missing in feature_df={missing_feature}, packet_flows={missing_packet}')
-    packet_keys = [(str(lbl), _canonical_key(src, sp, dst, dp, proto)) for (lbl, src, sp, dst, dp, proto) in zip(packet_flows['label'].to_numpy(), packet_flows['src_ip'].to_numpy(), packet_flows['src_port'].to_numpy(), packet_flows['dst_ip'].to_numpy(), packet_flows['dst_port'].to_numpy(), packet_flows['protocol'].to_numpy())]
-    labels = feature_df['label'].to_numpy()
-    src_ip = feature_df['src_ip'].to_numpy()
-    src_port = feature_df['src_port'].to_numpy()
-    dst_ip = feature_df['dst_ip'].to_numpy()
-    dst_port = feature_df['dst_port'].to_numpy()
-    protocol = feature_df['protocol'].to_numpy()
-    matched: list[int] = []
-    j = 0
-    n_csv = len(feature_df)
-    for (i, target) in enumerate(packet_keys):
-        while j < n_csv:
-            cand = (str(labels[j]), _canonical_key(src_ip[j], src_port[j], dst_ip[j], dst_port[j], protocol[j]))
-            j += 1
-            if cand == target:
-                matched.append(j - 1)
-                break
-        else:
-            raise ValueError(f'failed to align packet flow row {i:,}/{len(packet_keys):,}; the CSV cache may not be the same one used for packet extraction')
-    print(f'[data] scan-aligned CSV flow features: matched={len(matched):,} from csv_rows={n_csv:,} skipped={matched[-1] + 1 - len(matched):,}')
-    x = feature_df.iloc[matched][feature_columns].to_numpy(dtype=np.float32)
-    x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
-    return (x, tuple(feature_columns))
-
-def _read_aligned_flow_features(path: Path, packet_flows: pd.DataFrame, *, feature_columns: Optional[list[str]]=None, align: str='auto') -> tuple[np.ndarray, tuple[str, ...]]:
-    path = Path(path)
-    if align not in ('auto', 'row', 'scan'):
-        raise ValueError("flow_features_align must be 'auto', 'row', or 'scan'")
-    if path.suffix == '.npz':
-        (x, names, flow_id) = _read_flow_features(path, expected_rows=len(packet_flows), feature_columns=feature_columns)
-        packet_id = packet_flows['flow_id'].to_numpy() if 'flow_id' in packet_flows else None
-        if flow_id is not None and packet_id is not None and (not np.array_equal(flow_id, packet_id)):
-            raise ValueError('NPZ flow_id does not align with Packet_CFM flows')
-        return (x, names)
-    if path.suffix not in ('.parquet', '.pq'):
-        raise ValueError(f'unsupported flow feature file: {path}')
-    feature_df = pd.read_parquet(path)
-    cols = _feature_columns_from_df(feature_df, feature_columns)
-    if not cols:
-        raise ValueError(f'no numeric flow feature columns found in {path}')
-    packet_id = packet_flows['flow_id'].to_numpy() if 'flow_id' in packet_flows else None
-    if len(feature_df) == len(packet_flows):
-        feature_id = feature_df['flow_id'].to_numpy() if 'flow_id' in feature_df.columns else None
-        if feature_id is None or packet_id is None or np.array_equal(feature_id, packet_id):
-            x = feature_df[cols].to_numpy(dtype=np.float32)
-            x = np.nan_to_num(x, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
-            return (x, tuple(cols))
-        if align == 'row':
-            raise ValueError("flow_id mismatch with flow_features_align='row'")
-    if align == 'row':
-        raise ValueError(f'row alignment requested but feature rows={len(feature_df):,} packet rows={len(packet_flows):,}')
-    return _align_flow_features_by_scan(feature_df, packet_flows, feature_columns=cols)
-
-def _preprocess_flow(train: np.ndarray, val: np.ndarray, attack: np.ndarray) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
-    mean = train.mean(axis=0).astype(np.float32)
-    std = train.std(axis=0).astype(np.float32)
-    return (_zscore(train, mean, std), _zscore(val, mean, std), _zscore(attack, mean, std), mean, std)
-
-def load_unified_data(*, packets_npz: Path | None=None, source_store: Path | None=None, flows_parquet: Path, flow_features_path: Path | None=None, flow_feature_columns: Optional[list[str]]=None, flow_features_align: str='auto', T: int=128, split_seed: int=42, train_ratio: float=0.8, benign_label: str='normal', min_len: int=2, packet_preprocess: str='mixed_dequant', attack_cap: int | None=None, val_cap: int | None=None) -> UnifiedData:
-    if (packets_npz is None) == (source_store is None):
-        raise ValueError('pass exactly one of packets_npz or source_store')
-    flows_parquet = Path(flows_parquet)
-    print(f'[data] flows={flows_parquet}  packets_source={(packets_npz if packets_npz else source_store)}')
-    flow_cols = ['flow_id', 'label']
-    if flow_features_path is not None:
-        flow_cols += ['src_ip', 'src_port', 'dst_ip', 'dst_port', 'protocol']
-    flows = pd.read_parquet(flows_parquet, columns=flow_cols)
-    labels_full = flows['label'].to_numpy().astype(str)
-    flow_id = flows['flow_id'].to_numpy()
-    tokens_full: np.ndarray | None = None
-    store = None
-    if packets_npz is not None:
-        pz = np.load(Path(packets_npz))
-        tokens_full = pz['packet_tokens'].astype(np.float32)
-        lens_full = pz['packet_lengths'].astype(np.int32)
-        packet_flow_id = pz['flow_id'] if 'flow_id' in pz.files else None
-        if T > tokens_full.shape[1]:
-            raise ValueError(f'requested T={T} > stored T_full={tokens_full.shape[1]}')
-        tokens_full = tokens_full[:, :T].copy()
-        lens_full = np.minimum(lens_full, T).astype(np.int32)
-        if packet_flow_id is not None and (not np.array_equal(packet_flow_id, flow_id)):
-            raise ValueError('packets_npz and flows_parquet are not row-aligned by flow_id')
-    else:
-        if flow_features_path is None:
-            raise ValueError('source_store path requires flow_features_path (derived features need tokens in memory)')
-        from common.packet_store import PacketShardStore
-        store = PacketShardStore.open(Path(source_store))
-        store_flow_id = store.read_flows(columns=['flow_id'])['flow_id'].to_numpy()
-        if not np.array_equal(store_flow_id, flow_id):
-            raise ValueError('source_store and flows_parquet are not row-aligned by flow_id')
-        lens_full = np.minimum(store.manifest['packet_length'].to_numpy(dtype=np.int32), T)
-    if flow_features_path is None:
-        assert tokens_full is not None
-        flow_features = _derive_flow_features(tokens_full, lens_full)
-        flow_names = DERIVED_FLOW_FEATURE_NAMES
-        print(f'[data] using derived flow features D={flow_features.shape[1]}')
-    else:
-        (flow_features, flow_names) = _read_aligned_flow_features(Path(flow_features_path), flows, feature_columns=flow_feature_columns, align=flow_features_align)
-        print(f'[data] using external flow features D={flow_features.shape[1]}')
-    keep = lens_full >= min_len
-    labels = labels_full[keep]
-    flow_features = flow_features[keep]
-    lens = lens_full[keep]
-    global_idx = np.flatnonzero(keep).astype(np.int64)
-    if tokens_full is not None:
-        materialized_tokens = tokens_full[keep]
-    else:
-        materialized_tokens = None
-    print(f'[data] rows total={len(keep):,}  keep len>={min_len}: {keep.sum():,}')
-    benign_local = np.where(labels == benign_label)[0]
-    attack_local = np.where(labels != benign_label)[0]
-    rng = np.random.default_rng(split_seed)
-    rng.shuffle(benign_local)
-    n_train = int(len(benign_local) * train_ratio)
-    train_local = benign_local[:n_train]
-    val_local = benign_local[n_train:]
-    if val_cap is not None and len(val_local) > val_cap:
-        val_local = np.sort(rng.choice(val_local, size=val_cap, replace=False))
-    if attack_cap is not None and len(attack_local) > attack_cap:
-        attack_local = np.sort(rng.choice(attack_local, size=attack_cap, replace=False))
-    print(f'[data] benign={len(benign_local):,} attack={len(attack_local):,} -> train={len(train_local):,} val={len(val_local):,}')
-
-    def _materialize(local_indices: np.ndarray) -> np.ndarray:
-        if materialized_tokens is not None:
-            return materialized_tokens[local_indices].astype(np.float32, copy=False)
-        assert store is not None
-        g = global_idx[local_indices]
-        (tok, _) = store.read_packets(g.astype(np.int64), T=T)
-        return tok.astype(np.float32, copy=False)
-    tr_p_raw = _materialize(train_local)
-    va_p_raw = _materialize(val_local)
-    at_p_raw = _materialize(attack_local)
-    tr_l = lens[train_local]
-    va_l = lens[val_local]
-    at_l = lens[attack_local]
-    tr_f_raw = flow_features[train_local]
-    va_f_raw = flow_features[val_local]
-    at_f_raw = flow_features[attack_local]
-    train_idx = train_local
-    val_idx = val_local
-    attack_idx = attack_local
-    (tr_p, va_p, at_p, p_mean, p_std) = _preprocess_packets(tr_p_raw, va_p_raw, at_p_raw, tr_l, va_l, at_l, preprocess=packet_preprocess, seed=split_seed)
-    (tr_f, va_f, at_f, f_mean, f_std) = _preprocess_flow(tr_f_raw, va_f_raw, at_f_raw)
-    return UnifiedData(train_flow=tr_f, val_flow=va_f, attack_flow=at_f, train_packets=tr_p, val_packets=va_p, attack_packets=at_p, train_len=tr_l, val_len=va_l, attack_len=at_l, attack_labels=labels[attack_idx], packet_mean=p_mean, packet_std=p_std, flow_mean=f_mean, flow_std=f_std, packet_preprocess=packet_preprocess, flow_feature_names=tuple(flow_names))
-
-def subsample_train(data: UnifiedData, n_train: int, seed: int) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
-    if n_train <= 0 or n_train >= len(data.train_flow):
-        return (data.train_flow, data.train_packets, data.train_len)
-    rng = np.random.default_rng(seed)
-    idx = rng.choice(len(data.train_flow), n_train, replace=False)
-    idx.sort()
-    return (data.train_flow[idx], data.train_packets[idx], data.train_len[idx])
--- a/Unified_CFM/model.py
+++ b/Unified_CFM/model.py
@@ -1,588 +0,0 @@
-from __future__ import annotations
-import math
-from dataclasses import dataclass
-import torch
-import torch.nn as nn
-from torchdiffeq import odeint
-
-@torch.no_grad()
-def _sinkhorn_coupling(C: torch.Tensor, reg: float=0.05, n_iter: int=20) -> torch.Tensor:
-    C = C.float()
-    log_k = -C / reg
-    B = C.shape[0]
-    log_u = torch.zeros(B, device=C.device)
-    log_v = torch.zeros(B, device=C.device)
-    for _ in range(n_iter):
-        log_v = -torch.logsumexp(log_k + log_u.unsqueeze(1), dim=0)
-        log_u = -torch.logsumexp(log_k + log_v.unsqueeze(0), dim=1)
-    log_p = log_u.unsqueeze(1) + log_k + log_v.unsqueeze(0)
-    return log_p.argmax(dim=1)
-
-class SinusoidalTimeEmb(nn.Module):
-
-    def __init__(self, dim: int) -> None:
-        super().__init__()
-        if dim % 2 != 0:
-            raise ValueError('time embedding dimension must be even')
-        self.dim = dim
-
-    def forward(self, t: torch.Tensor) -> torch.Tensor:
-        half = self.dim // 2
-        freqs = torch.exp(-math.log(10000) * torch.arange(half, device=t.device, dtype=t.dtype) / max(half - 1, 1))
-        args = t[:, None] * freqs[None, :]
-        return torch.cat([args.sin(), args.cos()], dim=-1)
-
-class AdaLNBlock(nn.Module):
-
-    def __init__(self, d_model: int, n_heads: int, mlp_ratio: float, cond_dim: int) -> None:
-        super().__init__()
-        self.norm1 = nn.LayerNorm(d_model, elementwise_affine=False)
-        self.attn = nn.MultiheadAttention(d_model, n_heads, batch_first=True)
-        self.norm2 = nn.LayerNorm(d_model, elementwise_affine=False)
-        hidden = int(d_model * mlp_ratio)
-        self.mlp = nn.Sequential(nn.Linear(d_model, hidden), nn.GELU(), nn.Linear(hidden, d_model))
-        self.cond_proj = nn.Linear(cond_dim, 6 * d_model)
-        nn.init.zeros_(self.cond_proj.weight)
-        nn.init.zeros_(self.cond_proj.bias)
-
-    @staticmethod
-    def _modulate(x: torch.Tensor, gamma: torch.Tensor, beta: torch.Tensor) -> torch.Tensor:
-        return x * (1.0 + gamma[:, None, :]) + beta[:, None, :]
-
-    def forward(self, x: torch.Tensor, cond: torch.Tensor, key_padding_mask: torch.Tensor | None, attn_mask: torch.Tensor | None=None) -> torch.Tensor:
-        (g1, b1, a1, g2, b2, a2) = self.cond_proj(cond).chunk(6, dim=-1)
-        h = self._modulate(self.norm1(x), g1, b1)
-        (attn_out, _) = self.attn(h, h, h, key_padding_mask=key_padding_mask, attn_mask=attn_mask, need_weights=False)
-        x = x + a1[:, None, :] * attn_out
-        h = self._modulate(self.norm2(x), g2, b2)
-        return x + a2[:, None, :] * self.mlp(h)
-
-class UnifiedVelocity(nn.Module):
-
-    def __init__(self, token_dim: int, seq_len: int, d_model: int=128, n_layers: int=4, n_heads: int=4, mlp_ratio: float=4.0, time_dim: int=64, reference_mode: str | None=None) -> None:
-        super().__init__()
-        if reference_mode not in (None, 'independent_token', 'block_diagonal', 'causal_packets', 'causal_all'):
-            raise ValueError(f'unknown reference_mode={reference_mode!r}')
-        self.token_dim = token_dim
-        self.seq_len = seq_len
-        self.reference_mode = reference_mode
-        self.input_proj = nn.Linear(token_dim, d_model)
-        self.pos_emb = nn.Parameter(torch.zeros(1, seq_len, d_model))
-        self.type_emb = nn.Embedding(2, d_model)
-        nn.init.trunc_normal_(self.pos_emb, std=0.02)
-        nn.init.normal_(self.type_emb.weight, std=0.02)
-        self.time_emb = SinusoidalTimeEmb(time_dim)
-        self.cond_mlp = nn.Sequential(nn.Linear(time_dim, d_model), nn.SiLU(), nn.Linear(d_model, d_model))
-        self.blocks = nn.ModuleList([AdaLNBlock(d_model, n_heads, mlp_ratio, cond_dim=d_model) for _ in range(n_layers)])
-        self.out_norm = nn.LayerNorm(d_model, elementwise_affine=False)
-        self.out = nn.Linear(d_model, token_dim)
-        nn.init.zeros_(self.out.weight)
-        nn.init.zeros_(self.out.bias)
-        type_ids = torch.ones(seq_len, dtype=torch.long)
-        type_ids[0] = 0
-        self.register_buffer('type_ids', type_ids, persistent=False)
-
-    def forward(self, x: torch.Tensor, t: torch.Tensor, key_padding_mask: torch.Tensor | None=None, attn_mask_override: torch.Tensor | None=None) -> torch.Tensor:
-        (B, L, _) = x.shape
-        if L > self.seq_len:
-            raise ValueError(f'sequence length {L} exceeds configured {self.seq_len}')
-        if t.dim() == 0:
-            t = t.expand(B)
-        h = self.input_proj(x)
-        h = h + self.pos_emb[:, :L, :]
-        h = h + self.type_emb(self.type_ids[:L])[None, :, :]
-        cond = self.cond_mlp(self.time_emb(t))
-        if attn_mask_override is not None:
-            attn_mask = attn_mask_override
-        else:
-            attn_mask = self._reference_attn_mask(L, x.device)
-        for block in self.blocks:
-            h = block(h, cond, key_padding_mask, attn_mask=attn_mask)
-        return self.out(self.out_norm(h))
-
-    def _reference_attn_mask(self, L: int, device: torch.device) -> torch.Tensor | None:
-        if self.reference_mode is None:
-            return None
-        if self.reference_mode == 'independent_token':
-            return ~torch.eye(L, dtype=torch.bool, device=device)
-        if self.reference_mode == 'block_diagonal':
-            mask = torch.ones((L, L), dtype=torch.bool, device=device)
-            mask[0, 0] = False
-            if L > 1:
-                mask[1:, 1:] = False
-            return mask
-        if self.reference_mode == 'causal_packets':
-            mask = torch.zeros((L, L), dtype=torch.bool, device=device)
-            if L > 1:
-                packet_causal = torch.triu(torch.ones(L - 1, L - 1, dtype=torch.bool, device=device), diagonal=1)
-                mask[1:, 1:] = packet_causal
-            return mask
-        if self.reference_mode == 'causal_all':
-            return torch.triu(torch.ones(L, L, dtype=torch.bool, device=device), diagonal=1)
-        raise AssertionError(self.reference_mode)
-
-@dataclass
-class UnifiedCFMConfig:
-    T: int = 128
-    packet_dim: int = 9
-    flow_dim: int = 16
-    token_dim: int | None = None
-    d_model: int = 128
-    n_layers: int = 4
-    n_heads: int = 4
-    mlp_ratio: float = 4.0
-    time_dim: int = 64
-    sigma: float = 0.1
-    use_ot: bool = False
-    reference_mode: str | None = None
-
-class UnifiedTokenCFM(nn.Module):
-
-    def __init__(self, cfg: UnifiedCFMConfig) -> None:
-        super().__init__()
-        self.cfg = cfg
-        self.token_dim = cfg.token_dim or 1 + max(cfg.flow_dim, cfg.packet_dim)
-        if self.token_dim < 1 + max(cfg.flow_dim, cfg.packet_dim):
-            raise ValueError('token_dim is too small for flow_dim/packet_dim')
-        self.seq_len = cfg.T + 1
-        self.velocity = UnifiedVelocity(token_dim=self.token_dim, seq_len=self.seq_len, d_model=cfg.d_model, n_layers=cfg.n_layers, n_heads=cfg.n_heads, mlp_ratio=cfg.mlp_ratio, time_dim=cfg.time_dim, reference_mode=cfg.reference_mode)
-
-    def build_tokens(self, flow: torch.Tensor, packets: torch.Tensor) -> torch.Tensor:
-        (B, T, Dp) = packets.shape
-        if T != self.cfg.T:
-            raise ValueError(f'packet T={T} but config T={self.cfg.T}')
-        if Dp != self.cfg.packet_dim:
-            raise ValueError(f'packet_dim={Dp} but config packet_dim={self.cfg.packet_dim}')
-        if flow.shape[-1] != self.cfg.flow_dim:
-            raise ValueError(f'flow_dim={flow.shape[-1]} but config flow_dim={self.cfg.flow_dim}')
-        z = packets.new_zeros((B, T + 1, self.token_dim))
-        z[:, 0, 0] = -1.0
-        z[:, 0, 1:1 + self.cfg.flow_dim] = flow
-        z[:, 1:, 0] = 1.0
-        z[:, 1:, 1:1 + self.cfg.packet_dim] = packets
-        return z
-
-    def key_padding_mask(self, lens: torch.Tensor) -> torch.Tensor:
-        B = lens.shape[0]
-        idx = torch.arange(self.cfg.T, device=lens.device)[None, :]
-        packet_real = idx < lens[:, None]
-        real = torch.cat([torch.ones(B, 1, dtype=torch.bool, device=lens.device), packet_real], dim=1)
-        return ~real
-
-    def _loss_mask(self, lens: torch.Tensor) -> torch.Tensor:
-        return (~self.key_padding_mask(lens)).float()
-
-    @staticmethod
-    def _masked_trimmed_mean(values: torch.Tensor, mask: torch.Tensor, trim_frac: float=0.1) -> torch.Tensor:
-        out = values.new_zeros(values.shape[0])
-        for i in range(values.shape[0]):
-            v = values[i][mask[i] > 0]
-            if v.numel() == 0:
-                continue
-            if v.numel() < 5:
-                out[i] = v.mean()
-                continue
-            v_sorted = torch.sort(v).values
-            lo = int(trim_frac * v_sorted.numel())
-            hi = int((1.0 - trim_frac) * v_sorted.numel())
-            if hi <= lo:
-                out[i] = v_sorted.mean()
-            else:
-                out[i] = v_sorted[lo:hi].mean()
-        return out
-
-    @staticmethod
-    def _masked_median(values: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
-        out = values.new_zeros(values.shape[0])
-        for i in range(values.shape[0]):
-            v = values[i][mask[i] > 0]
-            if v.numel() == 0:
-                continue
-            v_sorted = torch.sort(v).values
-            mid = v_sorted.numel() // 2
-            if v_sorted.numel() % 2:
-                out[i] = v_sorted[mid]
-            else:
-                out[i] = 0.5 * (v_sorted[mid - 1] + v_sorted[mid])
-        return out
-
-    def compute_loss(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, *, lambda_flow: float=0.0, lambda_packet: float=0.0, packet_mask_ratio: float=0.5, return_components: bool=False) -> torch.Tensor | dict[str, torch.Tensor]:
-        x1 = self.build_tokens(flow, packets)
-        B = x1.shape[0]
-        x0 = torch.randn_like(x1)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        if self.cfg.use_ot:
-            flat0 = (x0 * mask[:, :, None]).reshape(B, -1)
-            flat1 = (x1 * mask[:, :, None]).reshape(B, -1)
-            col = _sinkhorn_coupling(torch.cdist(flat0.float(), flat1.float()))
-            x1 = x1[col]
-            flow = flow[col]
-            packets = packets[col]
-            lens = lens[col]
-            mask = self._loss_mask(lens)
-            kpm = mask == 0
-        t = torch.rand(B, device=x1.device)
-        x_t = (1.0 - t[:, None, None]) * x0 + t[:, None, None] * x1
-        if self.cfg.sigma > 0:
-            std = self.cfg.sigma * torch.sqrt(t * (1.0 - t))[:, None, None]
-            x_t = x_t + std * torch.randn_like(x_t)
-        target = x1 - x0
-        pred = self.velocity(x_t, t, key_padding_mask=kpm)
-        sq = (pred - target).square().mean(dim=-1)
-        per_sample = (sq * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-        main_loss = per_sample.mean()
-        aux_flow_loss = x1.new_zeros(())
-        aux_packet_loss = x1.new_zeros(())
-        if lambda_flow > 0.0:
-            x_t_mf = x_t.clone()
-            x_t_mf[:, 0, :] = 0.0
-            pred_mf = self.velocity(x_t_mf, t, key_padding_mask=kpm)
-            err = (pred_mf[:, 0] - target[:, 0]).square().mean(dim=-1)
-            aux_flow_loss = err.mean()
-        if lambda_packet > 0.0:
-            packet_real = mask[:, 1:] > 0
-            rand_draw = torch.rand(packet_real.shape, device=x1.device)
-            mask_pkt = (rand_draw < packet_mask_ratio) & packet_real
-            pkt_mask_full = torch.cat([torch.zeros(B, 1, dtype=torch.bool, device=x1.device), mask_pkt], dim=1)
-            x_t_mp = x_t.clone()
-            x_t_mp[pkt_mask_full] = 0.0
-            pred_mp = self.velocity(x_t_mp, t, key_padding_mask=kpm)
-            sq_mp = (pred_mp - target).square().mean(dim=-1)
-            mask_f = pkt_mask_full.float()
-            denom = mask_f.sum(dim=-1).clamp_min(1.0)
-            aux_packet_loss = ((sq_mp * mask_f).sum(dim=-1) / denom).mean()
-        total = main_loss + lambda_flow * aux_flow_loss + lambda_packet * aux_packet_loss
-        if return_components:
-            return {'total': total, 'main': main_loss.detach(), 'aux_flow': aux_flow_loss.detach(), 'aux_packet': aux_packet_loss.detach()}
-        return total
-
-    @torch.no_grad()
-    def velocity_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: tuple[float, ...]=(0.5, 0.75, 1.0)) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        total = torch.zeros(x.shape[0], device=x.device)
-        flow_s = torch.zeros_like(total)
-        packet_s = torch.zeros_like(total)
-        packet_count = mask[:, 1:].sum(dim=-1).clamp_min(1.0)
-        for t_val in t_eval:
-            t = torch.full((x.shape[0],), float(t_val), device=x.device)
-            v = self.velocity(x, t, key_padding_mask=kpm)
-            e = v.square().mean(dim=-1)
-            total = total + (e * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-            flow_s = flow_s + e[:, 0]
-            packet_s = packet_s + (e[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count
-        denom = float(len(t_eval))
-        return {'velocity_total': total / denom, 'velocity_flow': flow_s / denom, 'velocity_packet': packet_s / denom}
-
-    @torch.no_grad()
-    def trajectory_metrics(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, n_steps: int=16) -> dict[str, torch.Tensor]:
-        z = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        B = z.shape[0]
-        dt = 1.0 / n_steps
-        total_arc = torch.zeros(B, device=z.device)
-        total_ke = torch.zeros(B, device=z.device)
-        flow_ke = torch.zeros(B, device=z.device)
-        packet_ke = torch.zeros(B, device=z.device)
-        total_curv = torch.zeros(B, device=z.device)
-        flow_curv = torch.zeros(B, device=z.device)
-        packet_curv = torch.zeros(B, device=z.device)
-        packet_kappa2_speed2 = torch.zeros(B, max(z.shape[1] - 1, 0), device=z.device)
-        packet_count = mask[:, 1:].sum(dim=-1).clamp_min(1.0)
-        v_prev = None
-        v_prev_norm = None
-        for k in range(n_steps):
-            t_val = 1.0 - k * dt
-            t = torch.full((B,), t_val, device=z.device)
-            v = self.velocity(z, t, key_padding_mask=kpm)
-            e = v.square().mean(dim=-1)
-            v_norm = v.square().sum(dim=-1).clamp_min(1e-12).sqrt()
-            total_ke = total_ke + (e * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0) * dt
-            flow_ke = flow_ke + e[:, 0] * dt
-            packet_ke = packet_ke + (e[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count * dt
-            if v_prev is not None:
-                dv = v - v_prev
-                dve = dv.square().mean(dim=-1)
-                total_curv = total_curv + (dve * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-                flow_curv = flow_curv + dve[:, 0]
-                packet_curv = packet_curv + (dve[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count
-                dv2_sum = dv[:, 1:].square().sum(dim=-1)
-                assert v_prev_norm is not None
-                v_avg = 0.5 * (v_norm[:, 1:] + v_prev_norm[:, 1:])
-                packet_kappa2_speed2 = packet_kappa2_speed2 + dv2_sum / v_avg.square().clamp_min(1e-06)
-            v_prev = v
-            v_prev_norm = v_norm
-            z_new = z - v * dt
-            dz = (z_new - z) * mask[:, :, None]
-            total_arc = total_arc + dz.reshape(B, -1).norm(dim=-1) / mask.sum(dim=-1).sqrt()
-            z = z_new
-        z_masked = z * mask[:, :, None]
-        terminal = z_masked.reshape(B, -1).norm(dim=-1) / (mask.sum(dim=-1) * self.token_dim).clamp_min(1.0).sqrt()
-        terminal_flow = z[:, 0].norm(dim=-1) / math.sqrt(self.token_dim)
-        terminal_packet = (z[:, 1:] * mask[:, 1:, None]).reshape(B, -1).norm(dim=-1) / (packet_count * self.token_dim).sqrt()
-        packet_mask = mask[:, 1:]
-        kappa2_speed2_mean = (packet_kappa2_speed2 * packet_mask).sum(dim=-1) / packet_count
-        kappa2_speed2_median = self._masked_median(packet_kappa2_speed2, packet_mask)
-        kappa2_speed2_trimmed = self._masked_trimmed_mean(packet_kappa2_speed2, packet_mask)
-        return {'terminal_norm': terminal, 'terminal_flow': terminal_flow, 'terminal_packet': terminal_packet, 'arc_length': total_arc, 'kinetic_energy': total_ke, 'kinetic_flow': flow_ke, 'kinetic_packet': packet_ke, 'curvature_total': total_curv, 'curvature_flow': flow_curv, 'curvature_packet': packet_curv, 'kappa2_speed2norm_packet_mean': kappa2_speed2_mean, 'kappa2_speed2norm_packet_median': kappa2_speed2_median, 'kappa2_speed2norm_packet_trimmed10_mean': kappa2_speed2_trimmed}
-
-    @torch.no_grad()
-    def score_profile_vt(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: tuple[float, ...]=(0.1, 0.3, 0.5, 0.7, 0.9, 1.0)) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        packet_count = mask[:, 1:].sum(dim=-1).clamp_min(1.0)
-        out: dict[str, torch.Tensor] = {}
-        for t_val in t_eval:
-            t = torch.full((x.shape[0],), float(t_val), device=x.device)
-            v = self.velocity(x, t, key_padding_mask=kpm)
-            e = v.square().mean(dim=-1)
-            tag = f't{int(round(t_val * 10)):02d}'
-            out[f'velocity_total_{tag}'] = (e * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-            out[f'velocity_flow_{tag}'] = e[:, 0]
-            out[f'velocity_packet_{tag}'] = (e[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count
-        return out
-
-    @torch.no_grad()
-    def consistency_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: float=0.5) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        B = x.shape[0]
-        packet_count = mask[:, 1:].sum(dim=-1).clamp_min(1.0)
-        t = torch.full((B,), float(t_eval), device=x.device)
-        v_full = self.velocity(x, t, key_padding_mask=kpm)
-        x_mf = x.clone()
-        x_mf[:, 0, :] = 0.0
-        v_mf = self.velocity(x_mf, t, key_padding_mask=kpm)
-        flow_cons = (v_full[:, 0] - v_mf[:, 0]).square().mean(dim=-1)
-        x_mp = x.clone()
-        pkt_mask_full = mask[:, 1:] > 0
-        idx_pkt_mask = torch.cat([torch.zeros(B, 1, dtype=torch.bool, device=x.device), pkt_mask_full], dim=1)
-        x_mp[idx_pkt_mask] = 0.0
-        v_mp = self.velocity(x_mp, t, key_padding_mask=kpm)
-        diff = (v_full - v_mp).square().mean(dim=-1)
-        packet_cons = (diff[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count
-        return {'flow_consistency': flow_cons, 'packet_consistency': packet_cons, 'consistency_total': flow_cons + packet_cons}
-
-    def jacobian_hutchinson(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: tuple[float, ...]=(0.5,), n_eps: int=4, generator: torch.Generator | None=None) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        B = x.shape[0]
-        packet_count = mask[:, 1:].sum(dim=-1).clamp_min(1.0)
-        total = torch.zeros(B, device=x.device)
-        flow_j = torch.zeros(B, device=x.device)
-        packet_j = torch.zeros(B, device=x.device)
-        n_draws = n_eps * len(t_eval)
-        for t_val in t_eval:
-            t_current = torch.full((B,), float(t_val), device=x.device)
-            for _ in range(n_eps):
-                x_req = x.detach().clone().requires_grad_(True)
-                v = self.velocity(x_req, t_current, key_padding_mask=kpm)
-                eps = torch.randn(v.shape, device=v.device, generator=generator)
-                (g,) = torch.autograd.grad(outputs=v, inputs=x_req, grad_outputs=eps, retain_graph=False, create_graph=False)
-                e = g.square().mean(dim=-1)
-                total = total + (e * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-                flow_j = flow_j + e[:, 0]
-                packet_j = packet_j + (e[:, 1:] * mask[:, 1:]).sum(dim=-1) / packet_count
-        return {'jacobian_total': (total / n_draws).detach(), 'jacobian_flow': (flow_j / n_draws).detach(), 'jacobian_packet': (packet_j / n_draws).detach()}
-
-    @torch.no_grad()
-    def pna_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, n_steps: int=16, flow_masked: bool=False) -> dict[str, torch.Tensor]:
-        eps_v2 = 1e-06
-        dt = 1.0 / n_steps
-        z = self.build_tokens(flow, packets)
-        if flow_masked:
-            z = z.clone()
-            z[:, 0, :] = 0.0
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        (B, L, _) = z.shape
-        pna = torch.zeros(B, L, device=z.device)
-        v_prev: torch.Tensor | None = None
-        v_norm_prev: torch.Tensor | None = None
-        for k in range(n_steps):
-            t_val = 1.0 - k * dt
-            t = torch.full((B,), t_val, device=z.device)
-            v = self.velocity(z, t, key_padding_mask=kpm)
-            v_norm = (v.square().sum(dim=-1) + 1e-12).sqrt()
-            if v_prev is not None:
-                dv2 = (v - v_prev).square().sum(dim=-1)
-                v_avg2 = (0.5 * (v_norm + v_norm_prev)).square().clamp_min(eps_v2)
-                pna = pna + dv2 / v_avg2
-            v_prev = v
-            v_norm_prev = v_norm
-            z = z - v * dt
-            if flow_masked:
-                z[:, 0, :] = 0.0
-        flow_pna = pna[:, 0]
-        packet_pna = pna[:, 1:]
-        packet_mask = mask[:, 1:]
-        packet_count = packet_mask.sum(dim=-1).clamp_min(1.0)
-        pna_median = self._masked_median(packet_pna, packet_mask)
-        pna_mean = (packet_pna * packet_mask).sum(dim=-1) / packet_count
-        masked_for_max = packet_pna.masked_fill(packet_mask == 0, float('-inf'))
-        pna_max = masked_for_max.max(dim=-1).values
-        pna_trimmed = self._masked_trimmed_mean(packet_pna, packet_mask)
-        return {'pna_packet_median': pna_median, 'pna_packet_mean': pna_mean, 'pna_packet_max': pna_max, 'pna_packet_trimmed10_mean': pna_trimmed, 'pna_flow': flow_pna}
-
-    @torch.no_grad()
-    def causal_consistency_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: float=0.5) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        (B, L, _) = x.shape
-        t = torch.full((B,), float(t_eval), device=x.device)
-        v_full = self.velocity(x, t, key_padding_mask=kpm)
-        causal = torch.triu(torch.ones(L, L, dtype=torch.bool, device=x.device), diagonal=1)
-        v_causal = self.velocity(x, t, key_padding_mask=kpm, attn_mask_override=causal)
-        diff = (v_full - v_causal).square().mean(dim=-1)
-        flow_surprisal = diff[:, 0]
-        packet_diff = diff[:, 1:]
-        packet_mask = mask[:, 1:]
-        packet_count = packet_mask.sum(dim=-1).clamp_min(1.0)
-        packet_mean = (packet_diff * packet_mask).sum(dim=-1) / packet_count
-        packet_median = self._masked_median(packet_diff, packet_mask)
-        masked_for_max = packet_diff.masked_fill(packet_mask == 0, float('-inf'))
-        packet_max = masked_for_max.max(dim=-1).values
-        packet_trimmed = self._masked_trimmed_mean(packet_diff, packet_mask)
-        total = (diff * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-        return {'causal_surprisal_total': total, 'causal_surprisal_flow': flow_surprisal, 'causal_surprisal_packet_mean': packet_mean, 'causal_surprisal_packet_median': packet_median, 'causal_surprisal_packet_max': packet_max, 'causal_surprisal_packet_trimmed10_mean': packet_trimmed}
-
-    @torch.no_grad()
-    def direction_consistency_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: tuple[float, ...]=(0.2, 0.4, 0.6, 0.8, 1.0)) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        (B, L, _) = x.shape
-        t_eval = tuple(t_eval)
-        if len(t_eval) < 2:
-            raise ValueError('direction_consistency_score needs >=2 t values')
-        prev_v: torch.Tensor | None = None
-        drift = x.new_zeros(B, L)
-        n_pairs = len(t_eval) - 1
-        for t_val in t_eval:
-            t = torch.full((B,), float(t_val), device=x.device)
-            v = self.velocity(x, t, key_padding_mask=kpm)
-            if prev_v is not None:
-                num = (prev_v * v).sum(dim=-1)
-                denom = prev_v.norm(dim=-1).clamp_min(1e-08) * v.norm(dim=-1).clamp_min(1e-08)
-                cos = num / denom
-                drift = drift + (1.0 - cos)
-            prev_v = v
-        drift = drift / max(n_pairs, 1)
-        flow_drift = drift[:, 0]
-        packet_drift = drift[:, 1:]
-        packet_mask = mask[:, 1:]
-        packet_count = packet_mask.sum(dim=-1).clamp_min(1.0)
-        packet_mean = (packet_drift * packet_mask).sum(dim=-1) / packet_count
-        packet_median = self._masked_median(packet_drift, packet_mask)
-        masked_for_max = packet_drift.masked_fill(packet_mask == 0, float('-inf'))
-        packet_max = masked_for_max.max(dim=-1).values
-        packet_trimmed = self._masked_trimmed_mean(packet_drift, packet_mask)
-        total = (drift * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-        return {'direction_drift_total': total, 'direction_drift_flow': flow_drift, 'direction_drift_packet_mean': packet_mean, 'direction_drift_packet_median': packet_median, 'direction_drift_packet_max': packet_max, 'direction_drift_packet_trimmed10_mean': packet_trimmed}
-
-    def inverse_flow_nll_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, n_steps: int=16, n_eps: int=4, compute_divergence: bool=True, generator: torch.Generator | None=None) -> dict[str, torch.Tensor]:
-        z = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        (B, L, D) = z.shape
-        dt = 1.0 / n_steps
-        accum_div = torch.zeros(B, device=z.device)
-        if compute_divergence:
-            for k in range(n_steps):
-                t_val = 1.0 - k * dt
-                t = torch.full((B,), t_val, device=z.device)
-                z_req = z.detach().clone().requires_grad_(True)
-                v = self.velocity(z_req, t, key_padding_mask=kpm)
-                div_step = torch.zeros(B, device=z.device)
-                for j in range(n_eps):
-                    eps = torch.randn_like(v)
-                    eps_masked = eps * mask[:, :, None]
-                    retain = j < n_eps - 1
-                    (g,) = torch.autograd.grad(outputs=v, inputs=z_req, grad_outputs=eps_masked, retain_graph=retain, create_graph=False)
-                    div_step = div_step + (eps_masked * g).sum(dim=(1, 2))
-                div_step = div_step / float(n_eps)
-                accum_div = accum_div + div_step * dt
-                with torch.no_grad():
-                    z = (z_req - v * dt).detach()
-        else:
-            with torch.no_grad():
-                for k in range(n_steps):
-                    t_val = 1.0 - k * dt
-                    t = torch.full((B,), t_val, device=z.device)
-                    v = self.velocity(z, t, key_padding_mask=kpm)
-                    z = z - v * dt
-        with torch.no_grad():
-            z_masked = z * mask[:, :, None]
-            n_real = mask.sum(dim=-1).clamp_min(1.0)
-            x0_quadratic = z_masked.reshape(B, -1).square().sum(dim=-1) / (n_real * float(D))
-            nll_x0_only = x0_quadratic
-            nll_div_only = accum_div / (n_real * float(D))
-            nll_full = nll_x0_only + nll_div_only
-        return {'nll_x0_only': nll_x0_only.detach(), 'nll_div_only': nll_div_only.detach(), 'nll_full': nll_full.detach()}
-
-    def jacobian_spectral_score(self, flow: torch.Tensor, packets: torch.Tensor, lens: torch.Tensor, t_eval: float=0.5, n_eps: int=4, generator: torch.Generator | None=None) -> dict[str, torch.Tensor]:
-        x = self.build_tokens(flow, packets)
-        mask = self._loss_mask(lens)
-        kpm = mask == 0
-        (B, L, D) = x.shape
-        t = torch.full((B,), float(t_eval), device=x.device)
-        packet_mask = mask[:, 1:]
-        packet_count = packet_mask.sum(dim=-1).clamp_min(1.0)
-        norms_total: list[torch.Tensor] = []
-        norms_flow: list[torch.Tensor] = []
-        norms_packet: list[torch.Tensor] = []
-        for _ in range(n_eps):
-            x_req = x.detach().clone().requires_grad_(True)
-            v = self.velocity(x_req, t, key_padding_mask=kpm)
-            eps = torch.randn(v.shape, device=v.device, generator=generator)
-            (g,) = torch.autograd.grad(outputs=v, inputs=x_req, grad_outputs=eps, retain_graph=False, create_graph=False)
-            e = g.square().mean(dim=-1)
-            n_total = (e * mask).sum(dim=-1) / mask.sum(dim=-1).clamp_min(1.0)
-            n_flow = e[:, 0]
-            n_packet = (e[:, 1:] * packet_mask).sum(dim=-1) / packet_count
-            norms_total.append(n_total.detach())
-            norms_flow.append(n_flow.detach())
-            norms_packet.append(n_packet.detach())
-
-        def _spectral_summary(samples: list[torch.Tensor]) -> dict[str, torch.Tensor]:
-            stack = torch.stack(samples, dim=1)
-            mean = stack.mean(dim=1).clamp_min(1e-12)
-            mx = stack.max(dim=1).values
-            mn = stack.min(dim=1).values
-            logfro = torch.log(mean)
-            aniso = mx / mean
-            min_over_max = mn / mx.clamp_min(1e-12)
-            p = stack / stack.sum(dim=1, keepdim=True).clamp_min(1e-12)
-            entropy = -(p * p.clamp_min(1e-12).log()).sum(dim=1)
-            eff_rank = torch.exp(entropy)
-            return {'logfro': logfro, 'anisotropy': aniso, 'min_over_max': min_over_max, 'eff_rank': eff_rank}
-        out: dict[str, torch.Tensor] = {}
-        for (tag, samples) in (('total', norms_total), ('flow', norms_flow), ('packet', norms_packet)):
-            summ = _spectral_summary(samples)
-            for (stat_name, val) in summ.items():
-                out[f'jac_{stat_name}_{tag}'] = val
-        return out
-
-    @torch.no_grad()
-    def sample(self, n: int, lens: torch.Tensor, device: torch.device, n_steps: int=50, method: str='euler') -> torch.Tensor:
-        z = torch.randn(n, self.seq_len, self.token_dim, device=device)
-        ts = torch.linspace(0.0, 1.0, n_steps + 1, device=device)
-        kpm = self.key_padding_mask(lens.to(device))
-
-        def f(t: torch.Tensor, x: torch.Tensor) -> torch.Tensor:
-            return self.velocity(x, t.expand(x.shape[0]), key_padding_mask=kpm)
-        if method == 'euler':
-            for i in range(n_steps):
-                z = z + f(ts[i], z) * (ts[i + 1] - ts[i])
-            return z
-        return odeint(f, z, ts, method=method)[-1]
-
-    def param_count(self) -> int:
-        return sum((p.numel() for p in self.parameters()))
--- a/Unified_CFM/tests/test_model_shapes.py
+++ b/Unified_CFM/tests/test_model_shapes.py
@@ -1,157 +0,0 @@
-import sys
-from pathlib import Path
-import torch
-sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
-from model import UnifiedCFMConfig, UnifiedTokenCFM
-
-def _build_model():
-    return UnifiedTokenCFM(UnifiedCFMConfig(T=4, packet_dim=3, flow_dim=5, d_model=16, n_layers=1, n_heads=4, time_dim=8))
-
-def _build_reference_model(reference_mode: str):
-    return UnifiedTokenCFM(UnifiedCFMConfig(T=4, packet_dim=3, flow_dim=5, d_model=16, n_layers=1, n_heads=4, time_dim=8, reference_mode=reference_mode))
-
-def _sample_batch(seed: int=0):
-    torch.manual_seed(seed)
-    flow = torch.randn(2, 5)
-    packets = torch.randn(2, 4, 3)
-    lens = torch.tensor([4, 2])
-    return (flow, packets, lens)
-
-def test_unified_cfm_shapes_and_scores():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch()
-    tokens = model.build_tokens(flow, packets)
-    assert tokens.shape == (2, 5, 6)
-    loss = model.compute_loss(flow, packets, lens)
-    assert loss.ndim == 0
-    assert torch.isfinite(loss)
-    traj = model.trajectory_metrics(flow, packets, lens, n_steps=2)
-    assert 'terminal_norm' in traj
-    assert traj['terminal_norm'].shape == (2,)
-    vel = model.velocity_score(flow, packets, lens)
-    assert set(vel) == {'velocity_total', 'velocity_flow', 'velocity_packet'}
-
-def test_reference_mode_independent_token_shapes_and_scores():
-    model = _build_reference_model('independent_token')
-    (flow, packets, lens) = _sample_batch(seed=9)
-    loss = model.compute_loss(flow, packets, lens)
-    assert loss.ndim == 0
-    assert torch.isfinite(loss)
-    traj = model.trajectory_metrics(flow, packets, lens, n_steps=2)
-    assert traj['terminal_norm'].shape == (2,)
-    assert torch.all(torch.isfinite(traj['curvature_packet']))
-
-def test_reference_mode_block_diagonal_shapes_and_scores():
-    model = _build_reference_model('block_diagonal')
-    (flow, packets, lens) = _sample_batch(seed=10)
-    loss = model.compute_loss(flow, packets, lens)
-    assert loss.ndim == 0
-    assert torch.isfinite(loss)
-    vel = model.velocity_score(flow, packets, lens)
-    assert set(vel) == {'velocity_total', 'velocity_flow', 'velocity_packet'}
-
-def test_trajectory_curvature_keys_and_shapes():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=1)
-    traj = model.trajectory_metrics(flow, packets, lens, n_steps=4)
-    for key in ('curvature_total', 'curvature_flow', 'curvature_packet'):
-        assert key in traj, f'missing {key}'
-        assert traj[key].shape == (2,)
-        assert torch.all(torch.isfinite(traj[key]))
-        assert torch.all(traj[key] >= 0)
-
-def test_trajectory_curvature_zero_with_one_step():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=2)
-    traj = model.trajectory_metrics(flow, packets, lens, n_steps=1)
-    for key in ('curvature_total', 'curvature_flow', 'curvature_packet'):
-        assert traj[key].abs().sum().item() == 0.0
-
-def test_speed_normalized_packet_curvature_scores():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=11)
-    traj = model.trajectory_metrics(flow, packets, lens, n_steps=4)
-    keys = ('kappa2_speed2norm_packet_mean', 'kappa2_speed2norm_packet_median', 'kappa2_speed2norm_packet_trimmed10_mean')
-    for key in keys:
-        assert key in traj, f'missing {key}'
-        assert traj[key].shape == (2,)
-        assert torch.all(torch.isfinite(traj[key]))
-        assert torch.all(traj[key] >= 0)
-    one_step = model.trajectory_metrics(flow, packets, lens, n_steps=1)
-    for key in keys:
-        assert one_step[key].abs().sum().item() == 0.0
-
-def test_score_profile_vt_shapes():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=3)
-    t_eval = (0.1, 0.3, 0.5, 0.7, 0.9, 1.0)
-    prof = model.score_profile_vt(flow, packets, lens, t_eval=t_eval)
-    assert len(prof) == 3 * len(t_eval)
-    for (k, v) in prof.items():
-        assert v.shape == (2,), k
-        assert torch.all(torch.isfinite(v))
-        assert torch.all(v >= 0)
-    assert 'velocity_total_t05' in prof
-    assert 'velocity_flow_t10' in prof
-    assert 'velocity_packet_t01' in prof
-
-def test_compute_loss_backward_compat():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=5)
-    torch.manual_seed(0)
-    a = model.compute_loss(flow, packets, lens)
-    torch.manual_seed(0)
-    b = model.compute_loss(flow, packets, lens, lambda_flow=0.0, lambda_packet=0.0)
-    assert torch.allclose(a, b), f'λ=0 must match old loss; got {a.item()} vs {b.item()}'
-
-def test_compute_loss_aux_components_finite():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=6)
-    torch.manual_seed(7)
-    comp = model.compute_loss(flow, packets, lens, lambda_flow=0.1, lambda_packet=0.1, return_components=True)
-    assert set(comp) == {'total', 'main', 'aux_flow', 'aux_packet'}
-    for (k, v) in comp.items():
-        assert torch.isfinite(v), k
-        assert v >= 0, f'{k} negative: {v.item()}'
-
-def test_compute_loss_aux_affects_gradient():
-    model = _build_model()
-    with torch.no_grad():
-        model.velocity.out.weight.normal_(std=0.01)
-        for block in model.velocity.blocks:
-            block.cond_proj.weight.normal_(std=0.01)
-    (flow, packets, lens) = _sample_batch(seed=8)
-    torch.manual_seed(10)
-    total = model.compute_loss(flow, packets, lens, lambda_flow=1.0, lambda_packet=1.0)
-    total.backward()
-    some_grad = False
-    for p in model.parameters():
-        if p.grad is not None and p.grad.abs().sum().item() > 0:
-            some_grad = True
-            break
-    assert some_grad, 'no gradient flowed through aux losses'
-
-def test_consistency_score_shapes():
-    model = _build_model()
-    (flow, packets, lens) = _sample_batch(seed=9)
-    cs = model.consistency_score(flow, packets, lens)
-    assert set(cs) == {'flow_consistency', 'packet_consistency', 'consistency_total'}
-    for (k, v) in cs.items():
-        assert v.shape == (2,), k
-        assert torch.all(torch.isfinite(v))
-        assert torch.all(v >= 0), k
-
-def test_jacobian_hutchinson_shapes_and_nonneg():
-    model = _build_model()
-    with torch.no_grad():
-        model.velocity.out.weight.normal_(std=0.01)
-        for block in model.velocity.blocks:
-            block.cond_proj.weight.normal_(std=0.01)
-    (flow, packets, lens) = _sample_batch(seed=4)
-    gen = torch.Generator().manual_seed(42)
-    jac = model.jacobian_hutchinson(flow, packets, lens, t_eval=(0.5,), n_eps=2, generator=gen)
-    assert set(jac) == {'jacobian_total', 'jacobian_flow', 'jacobian_packet'}
-    for (k, v) in jac.items():
-        assert v.shape == (2,), k
-        assert torch.all(torch.isfinite(v))
-        assert torch.all(v >= 0), f'{k} has negative value'
--- a/Unified_CFM/train.py
+++ b/Unified_CFM/train.py
@@ -1,147 +0,0 @@
-from __future__ import annotations
-import argparse
-import json
-import time
-from dataclasses import asdict
-from pathlib import Path
-from typing import Any
-import numpy as np
-import torch
-import yaml
-from sklearn.metrics import roc_auc_score
-from torch.utils.data import DataLoader, TensorDataset
-from data import UnifiedData, load_unified_data, subsample_train
-from model import UnifiedCFMConfig, UnifiedTokenCFM
-
-def _device(dev_arg: str) -> torch.device:
-    if dev_arg == 'auto':
-        return torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-    return torch.device(dev_arg)
-
-def _batch_score(model: UnifiedTokenCFM, flow_np: np.ndarray, packet_np: np.ndarray, len_np: np.ndarray, device: torch.device, *, batch_size: int, n_steps: int) -> dict[str, np.ndarray]:
-    out: dict[str, list[np.ndarray]] = {}
-    model.eval()
-    for start in range(0, len(flow_np), batch_size):
-        sl = slice(start, start + batch_size)
-        flow = torch.from_numpy(flow_np[sl]).float().to(device)
-        packets = torch.from_numpy(packet_np[sl]).float().to(device)
-        lens = torch.from_numpy(len_np[sl]).long().to(device)
-        metrics = model.trajectory_metrics(flow, packets, lens, n_steps=n_steps)
-        vel = model.velocity_score(flow, packets, lens)
-        metrics.update(vel)
-        for (k, v) in metrics.items():
-            out.setdefault(k, []).append(v.detach().cpu().numpy())
-    return {k: np.concatenate(v, axis=0) for (k, v) in out.items()}
-
-def _quick_eval(model: UnifiedTokenCFM, data: UnifiedData, device: torch.device, cfg: dict[str, Any]) -> dict[str, float]:
-    n_eval = int(cfg.get('eval_n', 2000))
-    rng = np.random.default_rng(0)
-
-    def pick(n: int) -> np.ndarray:
-        m = min(n_eval, n)
-        return rng.choice(n, m, replace=False)
-    vi = pick(len(data.val_flow))
-    ai = pick(len(data.attack_flow))
-    v = _batch_score(model, data.val_flow[vi], data.val_packets[vi], data.val_len[vi], device, batch_size=int(cfg.get('eval_batch_size', 512)), n_steps=int(cfg.get('eval_n_steps', 8)))
-    a = _batch_score(model, data.attack_flow[ai], data.attack_packets[ai], data.attack_len[ai], device, batch_size=int(cfg.get('eval_batch_size', 512)), n_steps=int(cfg.get('eval_n_steps', 8)))
-    y = np.concatenate([np.zeros(len(vi)), np.ones(len(ai))])
-    result: dict[str, float] = {}
-    for key in sorted(v.keys()):
-        s = np.concatenate([v[key], a[key]])
-        s = np.nan_to_num(s, nan=0.0, posinf=1000000000000.0, neginf=-1000000000000.0)
-        result[f'auroc_{key}'] = float(roc_auc_score(y, s))
-    return result
-
-def train(cfg: dict[str, Any]) -> Path:
-    device = _device(str(cfg.get('device', 'auto')))
-    save_dir = Path(cfg['save_dir'])
-    save_dir.mkdir(parents=True, exist_ok=True)
-    with open(save_dir / 'config.yaml', 'w') as f:
-        yaml.safe_dump(cfg, f)
-    seed = int(cfg.get('seed', 42))
-    data_seed = int(cfg.get('data_seed', seed))
-    torch.manual_seed(seed)
-    np.random.seed(seed)
-    print(f'Device: {device}')
-    print(f'[seed] model={seed} data={data_seed}')
-    feature_columns = cfg.get('flow_feature_columns')
-    data = load_unified_data(packets_npz=Path(cfg['packets_npz']) if cfg.get('packets_npz') else None, source_store=Path(cfg['source_store']) if cfg.get('source_store') else None, flows_parquet=Path(cfg['flows_parquet']), flow_features_path=Path(cfg['flow_features_path']) if cfg.get('flow_features_path') else None, flow_feature_columns=feature_columns, flow_features_align=str(cfg.get('flow_features_align', 'auto')), T=int(cfg['T']), split_seed=data_seed, train_ratio=float(cfg.get('train_ratio', 0.8)), benign_label=str(cfg.get('benign_label', 'normal')), min_len=int(cfg.get('min_len', 2)), packet_preprocess=str(cfg.get('packet_preprocess', 'mixed_dequant')), attack_cap=int(cfg['attack_cap']) if cfg.get('attack_cap') else None, val_cap=int(cfg['val_cap']) if cfg.get('val_cap') else None)
-    print(f'[data] T={data.T} packet_D={data.packet_dim} flow_D={data.flow_dim} train={len(data.train_flow):,} val={len(data.val_flow):,} attack={len(data.attack_flow):,}')
-    (tr_f, tr_p, tr_l) = subsample_train(data, int(cfg.get('n_train', 0)), data_seed)
-    ds = TensorDataset(torch.from_numpy(tr_f).float(), torch.from_numpy(tr_p).float(), torch.from_numpy(tr_l).long())
-    loader = DataLoader(ds, batch_size=int(cfg['batch_size']), shuffle=True, drop_last=True, num_workers=int(cfg.get('num_workers', 0)), pin_memory=device.type == 'cuda')
-    print(f'[data] using {len(ds):,} benign training flows')
-    model_cfg = UnifiedCFMConfig(T=data.T, packet_dim=data.packet_dim, flow_dim=data.flow_dim, token_dim=cfg.get('token_dim'), d_model=int(cfg['d_model']), n_layers=int(cfg['n_layers']), n_heads=int(cfg['n_heads']), mlp_ratio=float(cfg.get('mlp_ratio', 4.0)), time_dim=int(cfg.get('time_dim', 64)), sigma=float(cfg.get('sigma', 0.1)), use_ot=bool(cfg.get('use_ot', False)), reference_mode=cfg.get('reference_mode'))
-    model = UnifiedTokenCFM(model_cfg).to(device)
-    print(f'[model] params={model.param_count():,} token_dim={model.token_dim} seq_len={model.seq_len} sigma={model_cfg.sigma} use_ot={model_cfg.use_ot} reference_mode={model_cfg.reference_mode}')
-    opt = torch.optim.AdamW(model.parameters(), lr=float(cfg['lr']), weight_decay=float(cfg.get('weight_decay', 0.01)))
-    total_steps = max(1, int(cfg['epochs']) * len(loader))
-    sched = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=total_steps)
-    history: dict[str, list[Any]] = {'epoch': [], 'loss': [], 'eval': []}
-    lambda_flow = float(cfg.get('lambda_flow', 0.0))
-    lambda_packet = float(cfg.get('lambda_packet', 0.0))
-    packet_mask_ratio = float(cfg.get('packet_mask_ratio', 0.5))
-    aux_enabled = lambda_flow > 0.0 or lambda_packet > 0.0
-    if aux_enabled:
-        print(f'[loss] λ_flow={lambda_flow}  λ_packet={lambda_packet}  packet_mask_ratio={packet_mask_ratio}')
-    for epoch in range(1, int(cfg['epochs']) + 1):
-        model.train()
-        losses: list[float] = []
-        aux_flow_sum = 0.0
-        aux_packet_sum = 0.0
-        n_steps_this_epoch = 0
-        t0 = time.time()
-        for (flow, packets, lens) in loader:
-            flow = flow.to(device, non_blocking=True)
-            packets = packets.to(device, non_blocking=True)
-            lens = lens.to(device, non_blocking=True)
-            if aux_enabled:
-                comp = model.compute_loss(flow, packets, lens, lambda_flow=lambda_flow, lambda_packet=lambda_packet, packet_mask_ratio=packet_mask_ratio, return_components=True)
-                loss = comp['total']
-                aux_flow_sum += float(comp['aux_flow'].item())
-                aux_packet_sum += float(comp['aux_packet'].item())
-            else:
-                loss = model.compute_loss(flow, packets, lens)
-            opt.zero_grad(set_to_none=True)
-            loss.backward()
-            torch.nn.utils.clip_grad_norm_(model.parameters(), float(cfg.get('grad_clip', 1.0)))
-            opt.step()
-            sched.step()
-            losses.append(float(loss.item()))
-            n_steps_this_epoch += 1
-        mean_loss = float(np.mean(losses)) if losses else float('nan')
-        eval_metrics: dict[str, float] | None = None
-        if epoch % int(cfg.get('eval_every', 5)) == 0 or epoch == int(cfg['epochs']):
-            eval_metrics = _quick_eval(model, data, device, cfg)
-        history['epoch'].append(epoch)
-        history['loss'].append(mean_loss)
-        history['eval'].append(eval_metrics)
-        elapsed = time.time() - t0
-        terminal = ''
-        if eval_metrics:
-            terminal = f" auroc_terminal={eval_metrics['auroc_terminal_norm']:.3f}"
-        if aux_enabled and n_steps_this_epoch:
-            terminal += f' aux_flow={aux_flow_sum / n_steps_this_epoch:.4f} aux_pkt={aux_packet_sum / n_steps_this_epoch:.4f}'
-        print(f"[epoch {epoch:>3d}/{cfg['epochs']:<3d}] ({elapsed:.1f}s) loss={mean_loss:.4f}{terminal}")
-        if not np.isfinite(mean_loss):
-            raise RuntimeError(f'non-finite loss at epoch {epoch}')
-    payload = {'model_state_dict': model.state_dict(), 'model_cfg': asdict(model_cfg), 'packet_mean': data.packet_mean, 'packet_std': data.packet_std, 'flow_mean': data.flow_mean, 'flow_std': data.flow_std, 'packet_preprocess': data.packet_preprocess, 'flow_feature_names': np.asarray(data.flow_feature_names), 'packet_feature_names': np.asarray(data.packet_feature_names)}
-    torch.save(payload, save_dir / 'model.pt')
-    with open(save_dir / 'history.json', 'w') as f:
-        json.dump(history, f, indent=2, default=str)
-    print(f"[saved] {save_dir / 'model.pt'}")
-    return save_dir
-
-def main() -> None:
-    p = argparse.ArgumentParser(description=__doc__)
-    p.add_argument('--config', type=Path, required=True)
-    p.add_argument('--override', type=str, nargs='*', default=[])
-    args = p.parse_args()
-    with open(args.config) as f:
-        cfg = yaml.safe_load(f)
-    for override in args.override:
-        (key, value) = override.split('=', 1)
-        cfg[key] = yaml.safe_load(value)
-    train(cfg)
-if __name__ == '__main__':
-    main()
--- a/scripts/aggregate/baselines_cross_3x3_table.py
+++ b/scripts/aggregate/baselines_cross_3x3_table.py
@@ -0,0 +1,121 @@
+"""Aggregate IF/OCSVM 3x3 cross-dataset AUROC matrices (3-seed mean ± std).
+
+Reads NPZs produced by scripts/baselines/run_if_ocsvm_cross.py:
+  {method}_{src}_to_{tgt}_seed{S}.npz  with keys b_score, a_score, a_labels
+
+Writes one Markdown table per method.
+"""
+from __future__ import annotations
+import argparse
+from pathlib import Path
+import numpy as np
+from sklearn.metrics import roc_auc_score
+
+REPO = Path(__file__).resolve().parents[2]
+
+DATASETS = ["cicids2017", "cicddos2019", "ciciot2023"]
+SEEDS = [42, 43, 44]
+DEFAULT_METHODS = ["iforest", "ocsvm"]
+TITLE_NAMES = {
+    "iforest": "Isolation Forest",
+    "ocsvm": "OCSVM (RBF)",
+    "shafir_nf": "Shafir NF (single-flow, 20-d, fast)",
+}
+SHORT = {"cicids2017": "CICIDS17", "cicddos2019": "CICDDoS19", "ciciot2023": "CICIoT23"}
+
+
+def cell_auroc(npz_path: Path) -> tuple[float, int, int]:
+    z = np.load(npz_path, allow_pickle=True)
+    b = z["b_score"]
+    a = z["a_score"]
+    y = np.r_[np.zeros(len(b)), np.ones(len(a))]
+    s = np.r_[b, a]
+    s = np.nan_to_num(s, nan=0.0, posinf=1e12, neginf=-1e12)
+    return float(roc_auc_score(y, s)), len(b), len(a)
+
+
+def build_method_table(method: str, in_dir: Path) -> tuple[str, list[str]]:
+    cells = {}
+    counts = {}
+    missing = []
+    for src in DATASETS:
+        for tgt in DATASETS:
+            aucs = []
+            n_b = n_a = None
+            for s in SEEDS:
+                p = in_dir / f"{method}_{src}_to_{tgt}_seed{s}.npz"
+                if not p.exists():
+                    missing.append(p.name)
+                    continue
+                auc, n_b, n_a = cell_auroc(p)
+                aucs.append(auc)
+            if not aucs:
+                cells[(src, tgt)] = (float("nan"), float("nan"))
+            else:
+                a = np.asarray(aucs)
+                cells[(src, tgt)] = (a.mean(), a.std())
+            counts[(src, tgt)] = (n_b, n_a)
+
+    lines: list[str] = []
+    title_name = TITLE_NAMES.get(method, method)
+    lines.append(f"# 3×3 cross-dataset AUROC matrix — {title_name} (3-seed mean ± std)\n")
+    lines.append("Rows = source (10K benign training); columns = target (10K benign + balanced ≤1M attacks).")
+    lines.append("Trained on raw 20-d canonical flow features after `StandardScaler` fit on source benign train.")
+    lines.append("Diagonal italic = within-dataset (target benign sampled from rows disjoint from training).\n")
+
+    header = "| Source ↓ / Target → | " + " | ".join(SHORT[t] for t in DATASETS) + " |"
+    sep = "|" + "|".join(["---"] * (len(DATASETS) + 1)) + "|"
+    lines.append(header)
+    lines.append(sep)
+    for src in DATASETS:
+        row = [f"**{SHORT[src]}**"]
+        for tgt in DATASETS:
+            m, sd = cells[(src, tgt)]
+            cell = f"{m:.4f} ± {sd:.4f}"
+            if src == tgt:
+                cell = f"_{cell}_"
+            row.append(cell)
+        lines.append("| " + " | ".join(row) + " |")
+
+    lines.append("\n## Sample counts (target benign / target attacks)\n")
+    lines.append(header)
+    lines.append(sep)
+    for src in DATASETS:
+        row = [SHORT[src]]
+        for tgt in DATASETS:
+            n_b, n_a = counts[(src, tgt)]
+            row.append(f"{n_b}b / {n_a}a" if n_b is not None else "missing")
+        lines.append("| " + " | ".join(row) + " |")
+    return "\n".join(lines) + "\n", missing
+
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("--in-dir", type=Path,
+                   default=REPO / "artifacts/baselines/if_ocsvm_cross_2026_05_11")
+    p.add_argument("--out-md", type=Path,
+                   default=None,
+                   help="Combined markdown output path. Defaults to <in-dir>/CROSS_MATRIX_3x3.md")
+    p.add_argument("--methods", nargs="+", default=DEFAULT_METHODS,
+                   help="Method names to aggregate (matching NPZ filename prefixes).")
+    args = p.parse_args()
+
+    out_md = args.out_md or (args.in_dir / "CROSS_MATRIX_3x3.md")
+    parts = []
+    all_missing: list[str] = []
+    for method in args.methods:
+        block, missing = build_method_table(method, args.in_dir)
+        parts.append(block)
+        all_missing.extend(missing)
+        print(block)
+        print()
+    if all_missing:
+        print("# Missing inputs (counted as NaN cells)")
+        for m in all_missing:
+            print(f"  - {m}")
+    out_md.write_text("\n\n".join(parts))
+    print(f"[wrote] {out_md}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/aggregate/run_all_phase1.sh
+++ b/scripts/aggregate/run_all_phase1.sh
@@ -1,68 +0,0 @@
-#!/bin/bash
-# Run phase1 eval on all routes after trainings complete.
-# Splits across 2 GPUs in parallel chains.
-
-set -e
-ROOT=/home/chy/JANUS
-UNIFIED_EVAL=${ROOT}/artifacts/verify_2026_04_24/eval_phase1_unified.py
-MIXED_EVAL=${ROOT}/Mixed_CFM/eval_phase1.py
-
-cd ${ROOT}
-
-# GPU 0: baselines + route_a (6 models)
-{
-for prefix in baseline_ciciot2023 route_a_causal_ciciot2023; do
-  for seed in 42 43 44; do
-    name=${prefix}_seed${seed}
-    md=${ROOT}/artifacts/route_comparison/${name}
-    [ -f "${md}/model.pt" ] || continue
-    [ -f "${md}/phase1_summary.json" ] && continue
-    echo "[GPU0 eval] ${name}"
-    cd ${ROOT}/Unified_CFM
-    CUDA_VISIBLE_DEVICES=0 stdbuf -oL uv run --no-sync python -u ${UNIFIED_EVAL} \
-      --model-dir ${md} --out-dir ${md} \
-      --batch-size 256 --n-steps 16 --jacobian-n-eps 4 \
-      --n-val-cap 5000 --n-atk-cap 10000 \
-      > ${md}/phase1.log 2>&1
-  done
-done
-echo "[GPU0 done]"
-} &
-GPU0_PID=$!
-
-# GPU 1: route_b + route_c (6 models)
-{
-for seed in 42 43 44; do
-  name=route_b_spectral_ciciot2023_seed${seed}
-  md=${ROOT}/artifacts/route_comparison/${name}
-  [ -f "${md}/model.pt" ] || continue
-  [ -f "${md}/phase1_summary.json" ] && continue
-  echo "[GPU1 eval] ${name}"
-  cd ${ROOT}/Unified_CFM
-  CUDA_VISIBLE_DEVICES=1 stdbuf -oL uv run --no-sync python -u ${UNIFIED_EVAL} \
-    --model-dir ${md} --out-dir ${md} \
-    --batch-size 256 --n-steps 16 --jacobian-n-eps 4 \
-    --n-val-cap 5000 --n-atk-cap 10000 \
-    > ${md}/phase1.log 2>&1
-done
-for seed in 42 43 44; do
-  name=route_c_mixed_ciciot2023_seed${seed}
-  md=${ROOT}/artifacts/route_comparison/${name}
-  [ -f "${md}/model.pt" ] || continue
-  [ -f "${md}/phase1_summary.json" ] && continue
-  echo "[GPU1 eval] ${name}"
-  cd ${ROOT}/Mixed_CFM
-  CUDA_VISIBLE_DEVICES=1 stdbuf -oL uv run --no-sync python -u ${MIXED_EVAL} \
-    --model-dir ${md} --out-dir ${md} \
-    --batch-size 256 --n-steps 16 \
-    --n-val-cap 5000 --n-atk-cap 10000 \
-    > ${md}/phase1.log 2>&1
-done
-echo "[GPU1 done]"
-} &
-GPU1_PID=$!
-
-wait $GPU0_PID
-wait $GPU1_PID
-echo "[all phase1 done]"
-cd ${ROOT} && uv run --no-sync python artifacts/route_comparison/aggregate_results.py
--- a/scripts/aggregate/run_cross_all.sh
+++ b/scripts/aggregate/run_cross_all.sh
@@ -1,105 +0,0 @@
-#!/bin/bash
-# Cross-dataset eval for all 4 routes × 2 targets × 3 seeds = 24 runs.
-# Source: CICIoT2023 (where all models were trained).
-# Targets: CICIDS2017 + CICDDoS2019.
-
-set -e
-ROOT=/home/chy/JANUS
-UNIFIED_EVAL=${ROOT}/artifacts/verify_2026_04_24/eval_phase2_cross_cicddos2019.py
-MIXED_EVAL=${ROOT}/Mixed_CFM/eval_cross.py
-CROSS_DIR=${ROOT}/artifacts/route_comparison/cross
-mkdir -p ${CROSS_DIR}
-
-# Target dataset paths
-declare -A TARGETS
-TARGETS[cicids2017_store]=${ROOT}/datasets/cicids2017/processed/full_store
-TARGETS[cicids2017_flows]=${ROOT}/datasets/cicids2017/processed/flows.parquet
-TARGETS[cicids2017_features]=${ROOT}/datasets/cicids2017/processed/flow_features.parquet
-TARGETS[cicids2017_features_spectral]=${ROOT}/datasets/cicids2017/processed/flow_features_spectral.parquet
-
-TARGETS[cicddos2019_store]=${ROOT}/datasets/cicddos2019/processed/full_store
-TARGETS[cicddos2019_flows]=${ROOT}/datasets/cicddos2019/processed/flows.parquet
-TARGETS[cicddos2019_features]=${ROOT}/datasets/cicddos2019/processed/flow_features.parquet
-TARGETS[cicddos2019_features_spectral]=${ROOT}/datasets/cicddos2019/processed/flow_features_spectral.parquet
-
-run_unified_eval() {
-  local gpu=$1 model_dir=$2 target=$3 features=$4 out_name=$5
-  local out=${CROSS_DIR}/${out_name}.json
-  [ -f "${out}" ] && { echo "[skip] ${out_name}"; return; }
-  echo "[gpu${gpu} eval] ${out_name}"
-  cd ${ROOT}/Unified_CFM
-  CUDA_VISIBLE_DEVICES=${gpu} stdbuf -oL uv run --no-sync python -u ${UNIFIED_EVAL} \
-    --model-dir ${model_dir} \
-    --target-store ${TARGETS[${target}_store]} \
-    --target-flows ${TARGETS[${target}_flows]} \
-    --target-flow-features ${features} \
-    --out ${out} \
-    --n-benign 10000 --n-attack 10000 --seed 42 \
-    --T 64 --batch-size 256 --n-steps 16 \
-    > ${CROSS_DIR}/${out_name}.log 2>&1
-}
-
-run_mixed_eval() {
-  local gpu=$1 model_dir=$2 target=$3 out_name=$4
-  local out=${CROSS_DIR}/${out_name}.json
-  [ -f "${out}" ] && { echo "[skip] ${out_name}"; return; }
-  echo "[gpu${gpu} mixed eval] ${out_name}"
-  cd ${ROOT}/Mixed_CFM
-  CUDA_VISIBLE_DEVICES=${gpu} stdbuf -oL uv run --no-sync python -u ${MIXED_EVAL} \
-    --model-dir ${model_dir} \
-    --target-store ${TARGETS[${target}_store]} \
-    --target-flows ${TARGETS[${target}_flows]} \
-    --target-flow-features ${TARGETS[${target}_features]} \
-    --out ${out} \
-    --n-benign 10000 --n-attack 10000 --seed 42 \
-    --T 64 --batch-size 256 --n-steps 16 \
-    > ${CROSS_DIR}/${out_name}.log 2>&1
-}
-
-# === GPU 0 chain: baselines + route_a, both targets ===
-{
-for prefix_route in "baseline_ciciot2023:baseline" "route_a_causal_ciciot2023:route_a_causal"; do
-  prefix=${prefix_route%:*}
-  short=${prefix_route#*:}
-  for seed in 42 43 44; do
-    md=${ROOT}/artifacts/route_comparison/${prefix}_seed${seed}
-    [ -f "${md}/model.pt" ] || continue
-    for target in cicids2017 cicddos2019; do
-      run_unified_eval 0 "${md}" "${target}" "${TARGETS[${target}_features]}" \
-        "${short}_seed${seed}_to_${target}"
-    done
-  done
-done
-echo "[gpu0 cross chain done]"
-} > /tmp/cross_gpu0.log 2>&1 &
-GPU0=$!
-
-# === GPU 1 chain: route_b (uses spectral features) + route_c (mixed) ===
-{
-# route_b: must use flow_features_spectral.parquet
-for seed in 42 43 44; do
-  md=${ROOT}/artifacts/route_comparison/route_b_spectral_ciciot2023_seed${seed}
-  [ -f "${md}/model.pt" ] || continue
-  for target in cicids2017 cicddos2019; do
-    run_unified_eval 1 "${md}" "${target}" "${TARGETS[${target}_features_spectral]}" \
-      "route_b_spectral_seed${seed}_to_${target}"
-  done
-done
-
-# route_c: Mixed_CFM eval (uses canonical flow_features)
-for seed in 42 43 44; do
-  md=${ROOT}/artifacts/route_comparison/route_c_mixed_ciciot2023_seed${seed}
-  [ -f "${md}/model.pt" ] || continue
-  for target in cicids2017 cicddos2019; do
-    run_mixed_eval 1 "${md}" "${target}" \
-      "route_c_mixed_seed${seed}_to_${target}"
-  done
-done
-echo "[gpu1 cross chain done]"
-} > /tmp/cross_gpu1.log 2>&1 &
-GPU1=$!
-
-wait $GPU0
-wait $GPU1
-echo "[all cross done]"
-ls -la ${CROSS_DIR}/*.json | wc -l
--- a/scripts/aggregate/run_phase1_all.sh
+++ b/scripts/aggregate/run_phase1_all.sh
@@ -1,45 +0,0 @@
-#!/bin/bash
-# Run phase1 eval on all route_comparison models.
-# Output: <model_dir>/phase1_summary.json + phase1_scores.npz
-#
-# Usage:
-#   bash artifacts/route_comparison/run_phase1_all.sh [GPU_ID]
-#
-# Default GPU_ID = 0. Each eval takes ~3-5 min with the caps below.
-
-set -e
-GPU_ID="${1:-0}"
-ROOT=/home/chy/JANUS
-EVAL=${ROOT}/artifacts/verify_2026_04_24/eval_phase1_unified.py
-
-models=(
-  baseline_ciciot2023_seed42
-  baseline_ciciot2023_seed43
-  baseline_ciciot2023_seed44
-  route_a_causal_ciciot2023_seed42
-  route_a_causal_ciciot2023_seed43
-  route_a_causal_ciciot2023_seed44
-)
-
-cd ${ROOT}/Unified_CFM
-for name in "${models[@]}"; do
-  model_dir=${ROOT}/artifacts/route_comparison/${name}
-  if [ ! -f "${model_dir}/model.pt" ]; then
-    echo "[skip] ${name}: model.pt missing"
-    continue
-  fi
-  out_dir=${model_dir}
-  if [ -f "${out_dir}/phase1_summary.json" ]; then
-    echo "[skip] ${name}: phase1_summary.json exists"
-    continue
-  fi
-  echo "[eval] ${name}"
-  CUDA_VISIBLE_DEVICES=${GPU_ID} stdbuf -oL uv run --no-sync python -u ${EVAL} \
-    --model-dir ${model_dir} --out-dir ${out_dir} \
-    --batch-size 256 --n-steps 16 \
-    --jacobian-n-eps 4 \
-    --n-val-cap 5000 --n-atk-cap 10000 \
-    2>&1 | tee ${model_dir}/phase1.log | tail -5
-  echo "[done] ${name}"
-done
-echo "[all done]"
--- a/scripts/baselines/run_if_ocsvm_cross.py
+++ b/scripts/baselines/run_if_ocsvm_cross.py
@@ -0,0 +1,237 @@
+"""Cross-dataset baselines (Isolation Forest, OCSVM) on the 20-d canonical
+flow-feature contract.
+
+Protocol per (method, src, tgt, seed):
+  - Train: 10,000 source benign rows (random sample seeded with --seed + 1000)
+  - Test:  10,000 target benign rows (random sample seeded with --seed)
+         + balanced per-class attack sample with n_attack cap (--n-attack
+           default 1,000,000, divided across all attack classes, matching
+           Mixed_CFM/eval_cross.py)
+  - For diagonal src == tgt, target benign is sampled from the source-pool
+    complement (the rows not used for training) so train and test are disjoint.
+
+Outputs (in --out-dir):
+  {method}_{src}_to_{tgt}_seed{seed}.npz  -- b_score, a_score, a_labels
+  {method}_{src}_to_{tgt}_seed{seed}.json -- AUROC, AUPRC, sample counts, timing
+"""
+from __future__ import annotations
+import argparse
+import json
+import time
+from pathlib import Path
+
+import numpy as np
+import pandas as pd
+from sklearn.ensemble import IsolationForest
+from sklearn.metrics import average_precision_score, roc_auc_score
+from sklearn.preprocessing import StandardScaler
+from sklearn.svm import OneClassSVM
+
+REPO = Path(__file__).resolve().parents[2]
+
+DATASETS = {
+    "cicids2017": {
+        "flows": REPO / "datasets/cicids2017/processed/flows.parquet",
+        "flow_features": REPO / "datasets/cicids2017/processed/flow_features.parquet",
+    },
+    "cicddos2019": {
+        "flows": REPO / "datasets/cicddos2019/processed/flows.parquet",
+        "flow_features": REPO / "datasets/cicddos2019/processed/flow_features.parquet",
+    },
+    "ciciot2023": {
+        "flows": REPO / "datasets/ciciot2023/processed/full_store/flows.parquet",
+        "flow_features": REPO / "datasets/ciciot2023/processed/flow_features.parquet",
+    },
+}
+
+FEATURE_COLS = (
+    "log_duration", "log_n_pkts", "fwd_count", "bwd_count",
+    "pkt_size_mean", "pkt_size_std", "pkt_size_max",
+    "fwd_size_mean", "bwd_size_mean", "bwd_size_std",
+    "iat_mean", "fwd_iat_max", "bwd_iat_max", "bwd_iat_std",
+    "active_mean", "idle_mean",
+    "log_pkts_per_s", "log_total_bytes",
+    "ack_cnt", "syn_cnt",
+)
+
+
+def _load_dataset(name: str):
+    paths = DATASETS[name]
+    flows = pd.read_parquet(paths["flows"], columns=["flow_id", "label"])
+    ff = pd.read_parquet(paths["flow_features"])
+    if not np.array_equal(
+        flows["flow_id"].to_numpy(dtype=np.uint64),
+        ff["flow_id"].to_numpy(dtype=np.uint64),
+    ):
+        raise ValueError(f"{name}: flows.parquet and flow_features.parquet are not row-aligned")
+    X = ff[list(FEATURE_COLS)].to_numpy(dtype=np.float64)
+    X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
+    labels = flows["label"].astype(str).to_numpy()
+    return X, labels
+
+
+def _balanced_attack_sample(labels: np.ndarray, n_attack: int, rng: np.random.Generator) -> np.ndarray:
+    attack_idx = np.flatnonzero(labels != "normal")
+    atk_labels = labels[attack_idx]
+    classes = sorted(set(atk_labels))
+    per_class = max(1, n_attack // len(classes))
+    chunks = []
+    for cls in classes:
+        pool = attack_idx[atk_labels == cls]
+        k = min(per_class, len(pool))
+        if k:
+            chunks.append(rng.choice(pool, size=k, replace=False))
+    sel = np.sort(np.concatenate(chunks))
+    if len(sel) > n_attack:
+        sel = np.sort(rng.choice(sel, size=n_attack, replace=False))
+    return sel
+
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("--method", choices=["iforest", "ocsvm"], required=True)
+    p.add_argument("--src", choices=list(DATASETS), required=True)
+    p.add_argument("--tgt", choices=list(DATASETS), required=True)
+    p.add_argument("--seed", type=int, required=True)
+    p.add_argument("--out-dir", type=Path, required=True)
+    p.add_argument("--n-train", type=int, default=10000)
+    p.add_argument("--n-benign", type=int, default=10000)
+    p.add_argument("--n-attack", type=int, default=1_000_000,
+                   help="Per-class balanced cap (matches Mixed_CFM/eval_cross.py).")
+    # Method hyperparams
+    p.add_argument("--iforest-n-estimators", type=int, default=200)
+    p.add_argument("--ocsvm-nu", type=float, default=0.1)
+    p.add_argument("--ocsvm-gamma", type=str, default="scale")
+    p.add_argument("--ocsvm-cache-mb", type=int, default=2000)
+    args = p.parse_args()
+
+    args.out_dir.mkdir(parents=True, exist_ok=True)
+    tag = f"{args.method}_{args.src}_to_{args.tgt}_seed{args.seed}"
+    print(f"[run] {tag}")
+
+    # --- source training ---
+    t0 = time.time()
+    src_X, src_labels = _load_dataset(args.src)
+    src_benign_idx = np.flatnonzero(src_labels == "normal")
+    rng_train = np.random.default_rng(args.seed + 1000)
+    if len(src_benign_idx) < args.n_train:
+        raise RuntimeError(f"{args.src}: only {len(src_benign_idx)} benign rows < n_train={args.n_train}")
+    train_sel = np.sort(rng_train.choice(src_benign_idx, size=args.n_train, replace=False))
+    train_X = src_X[train_sel]
+    t_load_src = time.time() - t0
+
+    # --- target eval ---
+    t0 = time.time()
+    if args.tgt == args.src:
+        tgt_X, tgt_labels = src_X, src_labels
+        used_for_train = np.zeros(len(tgt_labels), dtype=bool)
+        used_for_train[train_sel] = True
+        eligible_benign = np.flatnonzero((tgt_labels == "normal") & ~used_for_train)
+    else:
+        tgt_X, tgt_labels = _load_dataset(args.tgt)
+        eligible_benign = np.flatnonzero(tgt_labels == "normal")
+    rng_eval = np.random.default_rng(args.seed)
+    n_benign = min(args.n_benign, len(eligible_benign))
+    if n_benign < args.n_benign:
+        print(f"[warn] only {len(eligible_benign)} eligible benign rows in target (asked {args.n_benign})")
+    b_sel = np.sort(rng_eval.choice(eligible_benign, size=n_benign, replace=False))
+    a_sel = _balanced_attack_sample(tgt_labels, args.n_attack, rng_eval)
+    val_X = tgt_X[b_sel]
+    atk_X = tgt_X[a_sel]
+    a_labels = tgt_labels[a_sel]
+    t_load_tgt = time.time() - t0
+    print(f"[data] train={len(train_X):,}  val={len(val_X):,}  attack={len(atk_X):,}"
+          f"  classes={len(set(a_labels))}  D={train_X.shape[1]}")
+
+    # --- standardize on source train ---
+    scaler = StandardScaler().fit(train_X)
+    train_Z = scaler.transform(train_X).astype(np.float32)
+    val_Z = scaler.transform(val_X).astype(np.float32)
+    atk_Z = scaler.transform(atk_X).astype(np.float32)
+
+    # --- fit ---
+    t0 = time.time()
+    if args.method == "iforest":
+        model = IsolationForest(
+            n_estimators=args.iforest_n_estimators,
+            random_state=args.seed,
+            n_jobs=-1,
+            contamination="auto",
+        )
+        model.fit(train_Z)
+    else:
+        model = OneClassSVM(
+            kernel="rbf",
+            nu=args.ocsvm_nu,
+            gamma=args.ocsvm_gamma,
+            cache_size=args.ocsvm_cache_mb,
+        )
+        model.fit(train_Z)
+    t_fit = time.time() - t0
+
+    # --- score: higher = more anomalous ---
+    # IsolationForest.score_samples returns higher-for-normal, so negate.
+    # OneClassSVM.score_samples returns signed distance to boundary
+    # (higher = more normal), so negate too.
+    t0 = time.time()
+    if args.method == "iforest":
+        b_score = (-model.score_samples(val_Z)).astype(np.float32)
+        a_score = (-model.score_samples(atk_Z)).astype(np.float32)
+    else:
+        b_score = (-model.decision_function(val_Z)).astype(np.float32)
+        a_score = (-model.decision_function(atk_Z)).astype(np.float32)
+    t_score = time.time() - t0
+
+    # --- metrics ---
+    y = np.r_[np.zeros(len(b_score)), np.ones(len(a_score))]
+    s = np.r_[b_score, a_score]
+    s = np.nan_to_num(s, nan=0.0, posinf=1e12, neginf=-1e12)
+    auroc = float(roc_auc_score(y, s))
+    auprc = float(average_precision_score(y, s))
+
+    per_class = {}
+    for cls in sorted(set(a_labels)):
+        m = a_labels == cls
+        y_c = np.r_[np.zeros(len(b_score)), np.ones(int(m.sum()))]
+        s_c = np.r_[b_score, a_score[m]]
+        s_c = np.nan_to_num(s_c, nan=0.0, posinf=1e12, neginf=-1e12)
+        try:
+            auc_c = float(roc_auc_score(y_c, s_c))
+        except ValueError:
+            auc_c = float("nan")
+        per_class[cls] = {"_n": int(m.sum()), "auroc": auc_c}
+
+    out = {
+        "method": args.method,
+        "src": args.src,
+        "tgt": args.tgt,
+        "seed": args.seed,
+        "n_train": int(len(train_X)),
+        "n_benign": int(len(val_X)),
+        "n_attack": int(len(atk_X)),
+        "n_attack_classes": int(len(set(a_labels))),
+        "t_load_src_sec": round(t_load_src, 2),
+        "t_load_tgt_sec": round(t_load_tgt, 2),
+        "t_fit_sec": round(t_fit, 2),
+        "t_score_sec": round(t_score, 2),
+        "overall": {"auroc": auroc, "auprc": auprc},
+        "per_class": per_class,
+    }
+    if args.method == "iforest":
+        out["hparams"] = {"n_estimators": args.iforest_n_estimators}
+    else:
+        out["hparams"] = {"nu": args.ocsvm_nu, "gamma": args.ocsvm_gamma}
+
+    json_path = args.out_dir / f"{tag}.json"
+    json_path.write_text(json.dumps(out, indent=2))
+    npz_path = args.out_dir / f"{tag}.npz"
+    np.savez_compressed(npz_path, b_score=b_score, a_score=a_score, a_labels=a_labels.astype(str))
+    print(f"[saved] {json_path}")
+    print(f"[saved] {npz_path}")
+    print(f"[result] {args.method:7s} {args.src} -> {args.tgt} seed={args.seed}  "
+          f"AUROC={auroc:.4f}  AUPRC={auprc:.4f}  "
+          f"fit={t_fit:.1f}s  score={t_score:.1f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/baselines/run_if_ocsvm_cross_all.sh
+++ b/scripts/baselines/run_if_ocsvm_cross_all.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+# Orchestrate the full 3x3 cross-dataset sweep for IF/OCSVM baselines.
+# 3 sources x 3 targets x 3 seeds x 2 methods = 54 runs.
+set -euo pipefail
+
+REPO="/home/chy/JANUS"
+cd "$REPO"
+
+OUT_DIR="${1:-$REPO/artifacts/baselines/if_ocsvm_cross_2026_05_11}"
+mkdir -p "$OUT_DIR"
+LOG_DIR="$OUT_DIR/logs"
+mkdir -p "$LOG_DIR"
+
+DATASETS=(cicids2017 cicddos2019 ciciot2023)
+SEEDS=(42 43 44)
+METHODS=(iforest ocsvm)
+
+START=$(date +%s)
+for method in "${METHODS[@]}"; do
+  for src in "${DATASETS[@]}"; do
+    for tgt in "${DATASETS[@]}"; do
+      for seed in "${SEEDS[@]}"; do
+        tag="${method}_${src}_to_${tgt}_seed${seed}"
+        if [[ -f "$OUT_DIR/${tag}.json" ]]; then
+          echo "[skip] $tag (json exists)"
+          continue
+        fi
+        echo "[start] $tag"
+        uv run --no-sync python scripts/baselines/run_if_ocsvm_cross.py \
+          --method "$method" --src "$src" --tgt "$tgt" --seed "$seed" \
+          --out-dir "$OUT_DIR" \
+          > "$LOG_DIR/${tag}.log" 2>&1
+        echo "[done]  $tag  ($(grep -F '[result]' "$LOG_DIR/${tag}.log" | tail -1))"
+      done
+    done
+  done
+done
+END=$(date +%s)
+echo "[all done] elapsed $((END - START))s"
--- a/scripts/baselines/run_if_ocsvm_cross_packets.py
+++ b/scripts/baselines/run_if_ocsvm_cross_packets.py
@@ -0,0 +1,233 @@
+"""Path-B: IF/OCSVM cross-dataset baselines on RAW PACKET SEQUENCES.
+
+Same protocol as run_if_ocsvm_cross.py, but the input feature vector is the
+flattened first T=64 packet tokens (9-d each) -> 576-d. No flow-stat
+aggregation — this is the input modality JANUS itself consumes, so it
+measures what classical AD can do without hand-engineered features.
+
+Outputs:
+  {method}_{src}_to_{tgt}_seed{seed}.{json,npz}
+"""
+from __future__ import annotations
+import argparse
+import json
+import sys
+import time
+from pathlib import Path
+
+import numpy as np
+import pandas as pd
+from sklearn.ensemble import IsolationForest
+from sklearn.metrics import average_precision_score, roc_auc_score
+from sklearn.preprocessing import StandardScaler
+from sklearn.svm import OneClassSVM
+
+REPO = Path(__file__).resolve().parents[2]
+sys.path.insert(0, str(REPO))
+from common.packet_store import PacketShardStore  # noqa: E402
+
+DATASETS = {
+    "cicids2017": {
+        "flows": REPO / "datasets/cicids2017/processed/flows.parquet",
+        "packets_npz": REPO / "datasets/cicids2017/processed/packets.npz",
+        "source_store": None,
+    },
+    "cicddos2019": {
+        "flows": REPO / "datasets/cicddos2019/processed/flows.parquet",
+        "packets_npz": None,
+        "source_store": REPO / "datasets/cicddos2019/processed/full_store",
+    },
+    "ciciot2023": {
+        "flows": REPO / "datasets/ciciot2023/processed/full_store/flows.parquet",
+        "packets_npz": None,
+        "source_store": REPO / "datasets/ciciot2023/processed/full_store",
+    },
+}
+
+
+def _load_labels(name: str) -> np.ndarray:
+    paths = DATASETS[name]
+    flows = pd.read_parquet(paths["flows"], columns=["flow_id", "label"])
+    return flows["label"].astype(str).to_numpy()
+
+
+def _materialize_packets(name: str, indices: np.ndarray, T: int) -> np.ndarray:
+    paths = DATASETS[name]
+    if paths["packets_npz"] is not None:
+        pz = np.load(paths["packets_npz"], mmap_mode="r")
+        tokens = pz["packet_tokens"]
+        if T > tokens.shape[1]:
+            raise ValueError(f"requested T={T} > stored {tokens.shape[1]}")
+        out = np.asarray(tokens[indices, :T, :]).astype(np.float32, copy=True)
+        return out
+    else:
+        store = PacketShardStore.open(paths["source_store"])
+        tok, _ = store.read_packets(indices.astype(np.int64), T=T)
+        return tok.astype(np.float32, copy=False)
+
+
+def _balanced_attack_sample(labels: np.ndarray, n_attack: int, rng: np.random.Generator) -> np.ndarray:
+    attack_idx = np.flatnonzero(labels != "normal")
+    atk_labels = labels[attack_idx]
+    classes = sorted(set(atk_labels))
+    per_class = max(1, n_attack // len(classes))
+    chunks = []
+    for cls in classes:
+        pool = attack_idx[atk_labels == cls]
+        k = min(per_class, len(pool))
+        if k:
+            chunks.append(rng.choice(pool, size=k, replace=False))
+    sel = np.sort(np.concatenate(chunks))
+    if len(sel) > n_attack:
+        sel = np.sort(rng.choice(sel, size=n_attack, replace=False))
+    return sel
+
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("--method", choices=["iforest", "ocsvm"], required=True)
+    p.add_argument("--src", choices=list(DATASETS), required=True)
+    p.add_argument("--tgt", choices=list(DATASETS), required=True)
+    p.add_argument("--seed", type=int, required=True)
+    p.add_argument("--out-dir", type=Path, required=True)
+    p.add_argument("--T", type=int, default=64, help="Packets-per-flow cap (matches JANUS T=64).")
+    p.add_argument("--n-train", type=int, default=10000)
+    p.add_argument("--n-benign", type=int, default=10000)
+    p.add_argument("--n-attack", type=int, default=200000,
+                   help="Per-class balanced cap on target attacks. Smaller than the "
+                        "20-d run (1M) because 576-d OCSVM scoring is much slower.")
+    p.add_argument("--min-len", type=int, default=2)
+    # Method hyperparams
+    p.add_argument("--iforest-n-estimators", type=int, default=200)
+    p.add_argument("--ocsvm-nu", type=float, default=0.1)
+    p.add_argument("--ocsvm-gamma", type=str, default="scale")
+    p.add_argument("--ocsvm-cache-mb", type=int, default=2000)
+    args = p.parse_args()
+
+    args.out_dir.mkdir(parents=True, exist_ok=True)
+    tag = f"{args.method}_{args.src}_to_{args.tgt}_seed{args.seed}"
+    print(f"[run] {tag}  (raw {args.T}x9 packets = {args.T * 9}-d)")
+
+    # --- source training ---
+    t0 = time.time()
+    src_labels = _load_labels(args.src)
+    src_benign_idx = np.flatnonzero(src_labels == "normal")
+    rng_train = np.random.default_rng(args.seed + 1000)
+    if len(src_benign_idx) < args.n_train:
+        raise RuntimeError(f"{args.src}: only {len(src_benign_idx)} benign rows < n_train={args.n_train}")
+    train_sel = np.sort(rng_train.choice(src_benign_idx, size=args.n_train, replace=False))
+    train_tokens = _materialize_packets(args.src, train_sel, T=args.T)
+    train_X = train_tokens.reshape(len(train_sel), -1)
+    t_load_src = time.time() - t0
+
+    # --- target eval ---
+    t0 = time.time()
+    if args.tgt == args.src:
+        tgt_labels = src_labels
+        used = np.zeros(len(tgt_labels), dtype=bool)
+        used[train_sel] = True
+        eligible_benign = np.flatnonzero((tgt_labels == "normal") & ~used)
+    else:
+        tgt_labels = _load_labels(args.tgt)
+        eligible_benign = np.flatnonzero(tgt_labels == "normal")
+    rng_eval = np.random.default_rng(args.seed)
+    n_benign = min(args.n_benign, len(eligible_benign))
+    if n_benign < args.n_benign:
+        print(f"[warn] only {len(eligible_benign)} eligible benign rows in target (asked {args.n_benign})")
+    b_sel = np.sort(rng_eval.choice(eligible_benign, size=n_benign, replace=False))
+    a_sel = _balanced_attack_sample(tgt_labels, args.n_attack, rng_eval)
+    val_tokens = _materialize_packets(args.tgt, b_sel, T=args.T)
+    atk_tokens = _materialize_packets(args.tgt, a_sel, T=args.T)
+    val_X = val_tokens.reshape(len(b_sel), -1)
+    atk_X = atk_tokens.reshape(len(a_sel), -1)
+    a_labels = tgt_labels[a_sel]
+    t_load_tgt = time.time() - t0
+    print(f"[data] train={len(train_X):,}  val={len(val_X):,}  attack={len(atk_X):,}"
+          f"  classes={len(set(a_labels))}  D={train_X.shape[1]}")
+
+    # --- standardize ---
+    scaler = StandardScaler().fit(train_X)
+    train_Z = scaler.transform(train_X).astype(np.float32)
+    val_Z = scaler.transform(val_X).astype(np.float32)
+    atk_Z = scaler.transform(atk_X).astype(np.float32)
+
+    # --- fit ---
+    t0 = time.time()
+    if args.method == "iforest":
+        model = IsolationForest(
+            n_estimators=args.iforest_n_estimators,
+            random_state=args.seed,
+            n_jobs=-1,
+            contamination="auto",
+        )
+        model.fit(train_Z)
+    else:
+        model = OneClassSVM(
+            kernel="rbf",
+            nu=args.ocsvm_nu,
+            gamma=args.ocsvm_gamma,
+            cache_size=args.ocsvm_cache_mb,
+        )
+        model.fit(train_Z)
+    t_fit = time.time() - t0
+
+    # --- score (higher = more anomalous) ---
+    t0 = time.time()
+    if args.method == "iforest":
+        b_score = (-model.score_samples(val_Z)).astype(np.float32)
+        a_score = (-model.score_samples(atk_Z)).astype(np.float32)
+    else:
+        b_score = (-model.decision_function(val_Z)).astype(np.float32)
+        a_score = (-model.decision_function(atk_Z)).astype(np.float32)
+    t_score = time.time() - t0
+
+    # --- metrics ---
+    y = np.r_[np.zeros(len(b_score)), np.ones(len(a_score))]
+    s = np.r_[b_score, a_score]
+    s = np.nan_to_num(s, nan=0.0, posinf=1e12, neginf=-1e12)
+    auroc = float(roc_auc_score(y, s))
+    auprc = float(average_precision_score(y, s))
+
+    per_class = {}
+    for cls in sorted(set(a_labels)):
+        m = a_labels == cls
+        y_c = np.r_[np.zeros(len(b_score)), np.ones(int(m.sum()))]
+        s_c = np.r_[b_score, a_score[m]]
+        s_c = np.nan_to_num(s_c, nan=0.0, posinf=1e12, neginf=-1e12)
+        try:
+            auc_c = float(roc_auc_score(y_c, s_c))
+        except ValueError:
+            auc_c = float("nan")
+        per_class[cls] = {"_n": int(m.sum()), "auroc": auc_c}
+
+    out = {
+        "method": args.method,
+        "src": args.src,
+        "tgt": args.tgt,
+        "seed": args.seed,
+        "T": args.T,
+        "feature_dim": int(train_X.shape[1]),
+        "input_mode": "raw_packet_sequence",
+        "n_train": int(len(train_X)),
+        "n_benign": int(len(val_X)),
+        "n_attack": int(len(atk_X)),
+        "n_attack_classes": int(len(set(a_labels))),
+        "t_load_src_sec": round(t_load_src, 2),
+        "t_load_tgt_sec": round(t_load_tgt, 2),
+        "t_fit_sec": round(t_fit, 2),
+        "t_score_sec": round(t_score, 2),
+        "overall": {"auroc": auroc, "auprc": auprc},
+        "per_class": per_class,
+    }
+    json_path = args.out_dir / f"{tag}.json"
+    json_path.write_text(json.dumps(out, indent=2))
+    npz_path = args.out_dir / f"{tag}.npz"
+    np.savez_compressed(npz_path, b_score=b_score, a_score=a_score, a_labels=a_labels.astype(str))
+    print(f"[saved] {json_path}")
+    print(f"[result] {args.method:7s} {args.src} -> {args.tgt} seed={args.seed}  "
+          f"AUROC={auroc:.4f}  AUPRC={auprc:.4f}  "
+          f"fit={t_fit:.1f}s  score={t_score:.1f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/baselines/run_if_ocsvm_cross_packets_all.sh
+++ b/scripts/baselines/run_if_ocsvm_cross_packets_all.sh
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+# Path-B sweep: IF/OCSVM on raw 64x9 packet sequence (576-d), 3x3 cross-dataset.
+set -euo pipefail
+
+REPO="/home/chy/JANUS"
+cd "$REPO"
+
+OUT_DIR="${1:-$REPO/artifacts/baselines/if_ocsvm_cross_packets_2026_05_11}"
+mkdir -p "$OUT_DIR"
+LOG_DIR="$OUT_DIR/logs"
+mkdir -p "$LOG_DIR"
+
+DATASETS=(cicids2017 cicddos2019 ciciot2023)
+SEEDS=(42 43 44)
+METHODS=(iforest ocsvm)
+
+START=$(date +%s)
+for method in "${METHODS[@]}"; do
+  for src in "${DATASETS[@]}"; do
+    for tgt in "${DATASETS[@]}"; do
+      for seed in "${SEEDS[@]}"; do
+        tag="${method}_${src}_to_${tgt}_seed${seed}"
+        if [[ -f "$OUT_DIR/${tag}.json" ]]; then
+          echo "[skip] $tag (json exists)"
+          continue
+        fi
+        echo "[start] $tag"
+        uv run --no-sync python scripts/baselines/run_if_ocsvm_cross_packets.py \
+          --method "$method" --src "$src" --tgt "$tgt" --seed "$seed" \
+          --out-dir "$OUT_DIR" \
+          > "$LOG_DIR/${tag}.log" 2>&1
+        echo "[done]  $tag  ($(grep -F '[result]' "$LOG_DIR/${tag}.log" | tail -1))"
+      done
+    done
+  done
+done
+END=$(date +%s)
+echo "[all done] elapsed $((END - START))s"
--- a/scripts/baselines/run_shafir_nf_cross.py
+++ b/scripts/baselines/run_shafir_nf_cross.py
@@ -0,0 +1,247 @@
+"""Lightweight Shafir-NF cross-dataset runner.
+
+Same data protocol as scripts/baselines/run_if_ocsvm_cross.py (path A):
+  - 10K source benign training rows
+  - 10K target benign + balanced per-class target attacks (default cap 200K)
+  - 20-d canonical flow features (CANONICAL_FLOW_FEATURE_NAMES)
+  - StandardScaler-style z-score using source-trained flow_mean/flow_std saved
+    in JANUS within-dataset checkpoints under artifacts/route_comparison/
+
+Anomaly score = -log_prob from a single pzflow NormalizingFlow trained on
+source benign for `--epochs` (default 100). No SHAP-subset, no 2-NF ensemble.
+Single-flow, default hyperparams — meant as a quick cross-dataset baseline
+matching the IF/OCSVM protocol, NOT a faithful Shafir reproduction.
+
+Outputs:
+  {tag}.json  - summary
+  {tag}.npz   - b_score, a_score, a_labels  (same key schema as IF/OCSVM runner)
+"""
+from __future__ import annotations
+import argparse
+import json
+import os
+import time
+from pathlib import Path
+
+import numpy as np
+import pandas as pd
+import torch
+from sklearn.metrics import average_precision_score, roc_auc_score
+
+os.environ.setdefault("JAX_PLATFORMS", "cpu")
+import optax  # noqa: E402
+from pzflow import Flow  # noqa: E402
+
+REPO = Path(__file__).resolve().parents[2]
+
+# Shafir-style 5-d SHAP-top subset of the 20-d canonical flow features.
+# Picks the 5 entries that loosely correspond to Shafir's CICIDS_BEST5
+# CICFlowMeter columns (Bwd Packet Length Mean, Fwd Packets/s, ACK Flag Count,
+# Total Length of Bwd Packets, Flow Duration). This keeps the input
+# dimensionality and feature semantics close to the paper protocol while
+# staying on our packet-derived 20-d contract.
+SHAFIR5_SUBSET = ("bwd_size_mean", "log_pkts_per_s", "ack_cnt", "log_total_bytes", "log_duration")
+
+DATASETS = {
+    "cicids2017": {
+        "flows": REPO / "datasets/cicids2017/processed/flows.parquet",
+        "flow_features": REPO / "datasets/cicids2017/processed/flow_features.parquet",
+        "model_template": REPO / "artifacts/route_comparison/janus_cicids2017_seed{seed}",
+    },
+    "cicddos2019": {
+        "flows": REPO / "datasets/cicddos2019/processed/flows.parquet",
+        "flow_features": REPO / "datasets/cicddos2019/processed/flow_features.parquet",
+        "model_template": REPO / "artifacts/route_comparison/janus_cicddos2019_seed{seed}",
+    },
+    "ciciot2023": {
+        "flows": REPO / "datasets/ciciot2023/processed/full_store/flows.parquet",
+        "flow_features": REPO / "datasets/ciciot2023/processed/flow_features.parquet",
+        "model_template": REPO / "artifacts/route_comparison/janus_ciciot2023_seed{seed}",
+    },
+}
+
+
+def _load_src_stats(src: str, seed: int) -> tuple[np.ndarray, np.ndarray, list[str]]:
+    model_dir = Path(str(DATASETS[src]["model_template"]).format(seed=seed))
+    ckpt = torch.load(model_dir / "model.pt", map_location="cpu", weights_only=False)
+    flow_mean = np.asarray(ckpt["flow_mean"], dtype=np.float32)
+    flow_std = np.asarray(ckpt["flow_std"], dtype=np.float32)
+    flow_names = [str(n) for n in ckpt["flow_feature_names"]]
+    return flow_mean, flow_std, flow_names
+
+
+def _load_dataset_aligned(name: str, flow_names: list[str]) -> tuple[np.ndarray, np.ndarray]:
+    flows = pd.read_parquet(DATASETS[name]["flows"], columns=["flow_id", "label"])
+    ff = pd.read_parquet(DATASETS[name]["flow_features"])
+    if not np.array_equal(
+        flows["flow_id"].to_numpy(dtype=np.uint64),
+        ff["flow_id"].to_numpy(dtype=np.uint64),
+    ):
+        raise ValueError(f"{name}: flows.parquet and flow_features.parquet are not row-aligned")
+    X = ff[flow_names].to_numpy(dtype=np.float64)
+    X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
+    labels = flows["label"].astype(str).to_numpy()
+    return X, labels
+
+
+def _balanced_attack_sample(labels: np.ndarray, n_attack: int, rng: np.random.Generator) -> np.ndarray:
+    attack_idx = np.flatnonzero(labels != "normal")
+    atk_labels = labels[attack_idx]
+    classes = sorted(set(atk_labels))
+    per_class = max(1, n_attack // len(classes))
+    chunks = []
+    for cls in classes:
+        pool = attack_idx[atk_labels == cls]
+        k = min(per_class, len(pool))
+        if k:
+            chunks.append(rng.choice(pool, size=k, replace=False))
+    sel = np.sort(np.concatenate(chunks))
+    if len(sel) > n_attack:
+        sel = np.sort(rng.choice(sel, size=n_attack, replace=False))
+    return sel
+
+
+def _safe_metric(fn, y, s) -> float:
+    s = np.nan_to_num(s, nan=0.0, posinf=1e12, neginf=-1e12)
+    try:
+        return float(fn(y, s))
+    except ValueError:
+        return float("nan")
+
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("--src", choices=list(DATASETS), required=True)
+    p.add_argument("--tgt", choices=list(DATASETS), required=True)
+    p.add_argument("--seed", type=int, required=True)
+    p.add_argument("--out-dir", type=Path, required=True)
+    p.add_argument("--n-train", type=int, default=10000)
+    p.add_argument("--n-benign", type=int, default=10000)
+    p.add_argument("--n-attack", type=int, default=200000)
+    p.add_argument("--epochs", type=int, default=100)
+    p.add_argument("--lr", type=float, default=1e-3)
+    p.add_argument("--optimizer", choices=["sgd", "adam"], default="sgd")
+    p.add_argument("--feature-subset", choices=["shafir5", "full20"], default="shafir5",
+                   help="shafir5: 5-d SHAP-top loose match (default, matches paper protocol); "
+                        "full20: all 20-d canonical features (stronger but not Shafir-faithful)")
+    p.add_argument("--verbose", action="store_true")
+    args = p.parse_args()
+    args.out_dir.mkdir(parents=True, exist_ok=True)
+    tag = f"shafir_nf_{args.src}_to_{args.tgt}_seed{args.seed}"
+    print(f"[run] {tag}")
+
+    # --- source stats from JANUS ckpt ---
+    flow_mean_full, flow_std_full, flow_names_full = _load_src_stats(args.src, args.seed)
+    if args.feature_subset == "shafir5":
+        keep_idx = [flow_names_full.index(n) for n in SHAFIR5_SUBSET]
+        flow_mean = flow_mean_full[keep_idx]
+        flow_std = flow_std_full[keep_idx]
+        flow_names = list(SHAFIR5_SUBSET)
+    else:
+        flow_mean, flow_std, flow_names = flow_mean_full, flow_std_full, flow_names_full
+    print(f"[src] model_dir={DATASETS[args.src]['model_template']} (seed={args.seed})")
+    print(f"[src] feature_subset={args.feature_subset}  D={len(flow_names)}  names={flow_names}")
+
+    # --- source training sample (10K benign, seed+1000) ---
+    t0 = time.time()
+    src_X, src_labels = _load_dataset_aligned(args.src, flow_names)
+    src_benign_idx = np.flatnonzero(src_labels == "normal")
+    rng_train = np.random.default_rng(args.seed + 1000)
+    if len(src_benign_idx) < args.n_train:
+        raise RuntimeError(f"{args.src}: only {len(src_benign_idx)} benign rows")
+    train_sel = np.sort(rng_train.choice(src_benign_idx, size=args.n_train, replace=False))
+    train_X = src_X[train_sel]
+    train_Z = ((train_X - flow_mean) / np.maximum(flow_std, 1e-6)).astype(np.float32)
+    t_load_src = time.time() - t0
+
+    # --- target eval sample ---
+    t0 = time.time()
+    if args.tgt == args.src:
+        tgt_X, tgt_labels = src_X, src_labels
+        used = np.zeros(len(tgt_labels), dtype=bool)
+        used[train_sel] = True
+        eligible_benign = np.flatnonzero((tgt_labels == "normal") & ~used)
+    else:
+        tgt_X, tgt_labels = _load_dataset_aligned(args.tgt, flow_names)
+        eligible_benign = np.flatnonzero(tgt_labels == "normal")
+    rng_eval = np.random.default_rng(args.seed)
+    n_benign = min(args.n_benign, len(eligible_benign))
+    if n_benign < args.n_benign:
+        print(f"[warn] only {len(eligible_benign)} eligible benign rows in target")
+    b_sel = np.sort(rng_eval.choice(eligible_benign, size=n_benign, replace=False))
+    a_sel = _balanced_attack_sample(tgt_labels, args.n_attack, rng_eval)
+    val_X = tgt_X[b_sel]
+    atk_X = tgt_X[a_sel]
+    a_labels = tgt_labels[a_sel]
+    val_Z = ((val_X - flow_mean) / np.maximum(flow_std, 1e-6)).astype(np.float32)
+    atk_Z = ((atk_X - flow_mean) / np.maximum(flow_std, 1e-6)).astype(np.float32)
+    t_load_tgt = time.time() - t0
+    print(f"[data] train={len(train_Z):,}  val={len(val_Z):,}  attack={len(atk_Z):,}"
+          f"  classes={len(set(a_labels))}  D={train_Z.shape[1]}")
+
+    # --- fit pzflow NF ---
+    cols = [f"x{i}" for i in range(train_Z.shape[1])]
+    df_train = pd.DataFrame(train_Z.astype(np.float32), columns=cols)
+    df_val = pd.DataFrame(val_Z.astype(np.float32), columns=cols)
+    df_atk = pd.DataFrame(atk_Z.astype(np.float32), columns=cols)
+    opt = optax.sgd(args.lr) if args.optimizer == "sgd" else optax.adam(args.lr)
+    flow = Flow(df_train.columns.tolist())
+    t0 = time.time()
+    losses = flow.train(df_train, optimizer=opt, epochs=args.epochs, verbose=args.verbose)
+    t_fit = time.time() - t0
+
+    # --- score (anomaly = -log_prob; higher = more anomalous) ---
+    t0 = time.time()
+    lp_val = np.asarray(flow.log_prob(df_val))
+    lp_atk = np.asarray(flow.log_prob(df_atk))
+    b_score = (-lp_val).astype(np.float32)
+    a_score = (-lp_atk).astype(np.float32)
+    t_score = time.time() - t0
+
+    # --- metrics ---
+    y = np.r_[np.zeros(len(b_score)), np.ones(len(a_score))]
+    s = np.r_[b_score, a_score]
+    auroc = _safe_metric(roc_auc_score, y, s)
+    auprc = _safe_metric(average_precision_score, y, s)
+
+    per_class = {}
+    for cls in sorted(set(a_labels)):
+        m = a_labels == cls
+        y_c = np.r_[np.zeros(len(b_score)), np.ones(int(m.sum()))]
+        s_c = np.r_[b_score, a_score[m]]
+        per_class[cls] = {"_n": int(m.sum()), "auroc": _safe_metric(roc_auc_score, y_c, s_c)}
+
+    out = {
+        "method": "shafir_nf",
+        "variant": f"single_nf_{args.feature_subset}",
+        "feature_subset": args.feature_subset,
+        "feature_names": list(flow_names),
+        "src": args.src,
+        "tgt": args.tgt,
+        "seed": args.seed,
+        "n_train": int(len(train_Z)),
+        "n_benign": int(len(val_Z)),
+        "n_attack": int(len(atk_Z)),
+        "epochs": args.epochs,
+        "lr": args.lr,
+        "optimizer": args.optimizer,
+        "t_load_src_sec": round(t_load_src, 2),
+        "t_load_tgt_sec": round(t_load_tgt, 2),
+        "t_fit_sec": round(t_fit, 2),
+        "t_score_sec": round(t_score, 2),
+        "loss_first_last": [float(losses[0]), float(losses[-1])],
+        "overall": {"auroc": auroc, "auprc": auprc},
+        "per_class": per_class,
+    }
+    json_path = args.out_dir / f"{tag}.json"
+    json_path.write_text(json.dumps(out, indent=2))
+    npz_path = args.out_dir / f"{tag}.npz"
+    np.savez_compressed(npz_path, b_score=b_score, a_score=a_score, a_labels=a_labels.astype(str))
+    print(f"[saved] {json_path}")
+    print(f"[result] shafir_nf {args.src} -> {args.tgt} seed={args.seed}  "
+          f"AUROC={auroc:.4f}  AUPRC={auprc:.4f}  "
+          f"fit={t_fit:.1f}s  score={t_score:.1f}s")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/baselines/run_shafir_nf_cross_all.sh
+++ b/scripts/baselines/run_shafir_nf_cross_all.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# Fast-scheme Shafir-NF 3x3 cross-dataset sweep.
+# 3 src x 3 tgt x 3 seeds = 27 runs. epochs=10 (fast, see run_shafir_nf_cross.py
+# sanity: 10 epochs already reaches AUROC ~0.89 within-CICIDS17).
+set -euo pipefail
+
+REPO="/home/chy/JANUS"
+cd "$REPO"
+
+OUT_DIR="${1:-$REPO/artifacts/baselines/shafir_nf_cross_2026_05_12}"
+EPOCHS="${EPOCHS:-10}"
+mkdir -p "$OUT_DIR"
+LOG_DIR="$OUT_DIR/logs"
+mkdir -p "$LOG_DIR"
+
+DATASETS=(cicids2017 cicddos2019 ciciot2023)
+SEEDS=(42 43 44)
+
+START=$(date +%s)
+for src in "${DATASETS[@]}"; do
+  for tgt in "${DATASETS[@]}"; do
+    for seed in "${SEEDS[@]}"; do
+      tag="shafir_nf_${src}_to_${tgt}_seed${seed}"
+      if [[ -f "$OUT_DIR/${tag}.json" ]]; then
+        echo "[skip] $tag (json exists)"
+        continue
+      fi
+      echo "[start] $tag"
+      PYTHONUNBUFFERED=1 OMP_NUM_THREADS=4 \
+        uv run --no-sync python -u scripts/baselines/run_shafir_nf_cross.py \
+          --src "$src" --tgt "$tgt" --seed "$seed" \
+          --epochs "$EPOCHS" \
+          --out-dir "$OUT_DIR" \
+          > "$LOG_DIR/${tag}.log" 2>&1
+      echo "[done]  $tag  ($(grep -F '[result]' "$LOG_DIR/${tag}.log" | tail -1))"
+    done
+  done
+done
+END=$(date +%s)
+echo "[all done] elapsed $((END - START))s"
--- a/scripts/figures/plot_field_view.py
+++ b/scripts/figures/plot_field_view.py
@@ -105,11 +105,52 @@ def plot_one(npz: Path, dataset: str) -> Path:

    out = OUT / f"velocity_field_view_{dataset.lower()}.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out


+def plot_one_overview(npz: Path, dataset: str) -> Path:
+    """Render a clean single-panel velocity-field SVG for use as the overview-
+    figure component 03 (CFM head). Training-phase visualization only:
+    log-norm heatmap + white streamlines + benign t=0.5 cloud. No attacks,
+    no axes / colorbar / title (the surrounding overview wrapper supplies
+    those). Outputs both SVG and PDF for LaTeX flexibility.
+    """
+    z = np.load(npz)
+    GX = z["grid_x"]
+    GY = z["grid_y"]
+    field_log = z["field_log_norm"]
+    field_v = z["field_v_2d"]
+    benign_t05 = z["benign_t05_2d"]
+
+    fig, ax = plt.subplots(figsize=(3.0, 2.6), constrained_layout=True)
+    vmin, vmax = np.percentile(field_log, [5, 95])
+    ax.pcolormesh(GX, GY, field_log, cmap="viridis", shading="auto",
+                  vmin=vmin, vmax=vmax, rasterized=True)
+    speed = np.linalg.norm(field_v, axis=-1)
+    lw = 0.35 + 1.5 * (speed / (speed.max() + 1e-9))
+    ax.streamplot(GX, GY, field_v[..., 0], field_v[..., 1],
+                  color="white", linewidth=lw, density=0.85, arrowsize=0.7)
+    n_overlay = min(200, benign_t05.shape[0])
+    rng = np.random.default_rng(0)
+    idx_ov = rng.choice(benign_t05.shape[0], n_overlay, replace=False)
+    ax.scatter(benign_t05[idx_ov, 0], benign_t05[idx_ov, 1],
+               s=2.5, c="white", alpha=0.55, edgecolors="black",
+               linewidths=0.12, rasterized=True, zorder=4)
+    ax.set_xticks([])
+    ax.set_yticks([])
+    for spine in ax.spines.values():
+        spine.set_visible(False)
+
+    out = OUT / f"velocity_field_overview_{dataset.lower()}.svg"
+    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".pdf"), bbox_inches="tight")
+    plt.close(fig)
+    return out
+
+
 def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--datasets", nargs="+",
@@ -126,6 +167,8 @@ def main() -> None:
            continue
        p = plot_one(npz, pretty.get(ds, ds))
        print(f"[wrote] {p}")
+        p_ov = plot_one_overview(npz, pretty.get(ds, ds))
+        print(f"[wrote] {p_ov}")


 if __name__ == "__main__":
--- a/scripts/figures/plot_mechanism.py
+++ b/scripts/figures/plot_mechanism.py
@@ -97,6 +97,7 @@ def plot_corr_heatmap() -> Path:
    cbar.set_label("Pearson ρ on benign val", fontsize=10)
    out = OUT / "subscore_correlation_benign_val.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out
@@ -211,11 +212,227 @@ def plot_dual_head() -> Path:

    out = OUT / "dual_head_oas_ellipses_top__whitened_pca_bottom.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out


+def plot_dual_head_overview(dataset: str = "cicddos2019") -> Path:
+    """Render a clean single-panel OAS-ellipse SVG for use as overview-figure
+    component 06 (Mahalanobis-OAS aggregator).
+
+    Visual: benign as a smooth 2D KDE density blob (blue, 6 filled
+    contour levels), attacks as sparse bright red dots with white halos
+    clearly outside the dense benign region, and 1/2/3-sigma OAS-Mahalanobis
+    ellipses overlaid on top in bold black. The visual story: the
+    aggregator (ellipses) is fit on the dense benign cloud; attacks at
+    inference fall outside the ellipses, which is what makes $d^2$ a
+    useful anomaly score.
+    """
+    from scipy.stats import gaussian_kde
+
+    val, atk = load_scores(dataset)
+    rng = np.random.default_rng(0)
+    fig, ax = plt.subplots(figsize=(3.0, 2.6), constrained_layout=True)
+
+    x_v, y_v = val[:, 0], val[:, 3]
+    x_a, y_a = atk[:, 0], atk[:, 3]
+
+    nv = min(5000, len(x_v))
+    iv = rng.choice(len(x_v), nv, replace=False)
+    na = min(120, len(x_a))  # fewer, brighter attack dots for visibility
+    ia = rng.choice(len(x_a), na, replace=False)
+
+    # View window: capture 99% of benign + 95% of attack
+    x_lo = min(np.quantile(x_v, 0.005), np.quantile(x_a, 0.05))
+    x_hi = max(np.quantile(x_v, 0.995), np.quantile(x_a, 0.95))
+    y_lo = min(np.quantile(y_v, 0.005), np.quantile(y_a, 0.05))
+    y_hi = max(np.quantile(y_v, 0.995), np.quantile(y_a, 0.95))
+    pad_x = 0.05 * (x_hi - x_lo)
+    pad_y = 0.05 * (y_hi - y_lo)
+    xlim = (x_lo - pad_x, x_hi + pad_x)
+    ylim = (y_lo - pad_y, y_hi + pad_y)
+
+    # Benign 2D KDE density blob
+    kde = gaussian_kde(np.vstack([x_v[iv], y_v[iv]]))
+    xx, yy = np.meshgrid(np.linspace(*xlim, 90), np.linspace(*ylim, 90))
+    grid = np.vstack([xx.ravel(), yy.ravel()])
+    density = kde(grid).reshape(xx.shape)
+    # Drop the lowest-density floor (clip near-zero edge artefacts)
+    floor = np.quantile(density, 0.55)
+    levels = np.linspace(floor, density.max() * 0.97, 6)
+    ax.contourf(xx, yy, density, levels=levels, cmap="Blues", alpha=0.92, zorder=1)
+
+    # Attack scatter (sparse, bright, white halo for crispness)
+    ax.scatter(x_a[ia], y_a[ia], s=11, c="#d7191c",
+               edgecolors="white", linewidth=0.5, alpha=0.95, zorder=3)
+
+    # OAS Mahalanobis ellipses on top: bold black
+    XY_v = val[:, [0, 3]]
+    oas2 = OAS().fit(XY_v)
+    mu2 = XY_v.mean(axis=0)
+    for ns, ls in [(1, "-"), (2, "--"), (3, ":")]:
+        e = _ellipse_from_2x2(
+            mu2, oas2.covariance_, ns,
+            edgecolor="black", facecolor="none", lw=1.3, ls=ls, alpha=0.92,
+            zorder=5,
+        )
+        ax.add_patch(e)
+
+    ax.set_xlim(*xlim)
+    ax.set_ylim(*ylim)
+    ax.set_xticks([])
+    ax.set_yticks([])
+    for spine in ax.spines.values():
+        spine.set_visible(False)
+
+    out = OUT / f"oas_ellipse_overview_{dataset.lower()}.svg"
+    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".pdf"), bbox_inches="tight")
+    plt.close(fig)
+    return out
+
+
+def _benign_disc_p1(dataset: str) -> np.ndarray:
+    """Empirical per-channel P(c_i = 1) over all benign packets in the
+    processed dataset. Returns shape [6] for (dir, SYN, FIN, RST, PSH, ACK).
+    """
+    import pandas as pd
+    data_root = ROOT / "datasets" / dataset / "processed"
+    pkts = np.load(data_root / "packets.npz")
+    packet_tokens = pkts["packet_tokens"]  # [N, T_full, 9]
+    packet_lengths = pkts["packet_lengths"]  # [N]
+
+    flows_df = pd.read_parquet(data_root / "flows.parquet")
+    label_norm = flows_df["label"].astype(str).str.strip().str.lower()
+    benign_aliases = {"benign", "normal"}
+    benign_mask = label_norm.isin(benign_aliases).values
+    benign_idx = np.where(benign_mask)[0]
+
+    if benign_idx.size == 0:
+        return np.zeros(6, dtype=np.float64)
+
+    T_full = packet_tokens.shape[1]
+    valid = np.arange(T_full)[None, :] < packet_lengths[benign_idx, None]  # [Nb, T]
+    # Disc channels live at indices 2..7 of the canonical 9-d packet schema.
+    disc = packet_tokens[benign_idx, :, 2:8].astype(np.float64)  # [Nb, T, 6]
+    masked_sum = (disc * valid[..., None]).sum(axis=(0, 1))  # [6]
+    total = float(valid.sum())
+    return masked_sum / max(total, 1.0)
+
+
+def plot_dfm_head_overview(dataset: str = "cicids2017") -> Path:
+    """Render a clean single-panel DFM head SVG for use as overview-figure
+    component 04. Six paired bars (P(c=0) light / P(c=1) dark) show the
+    empirical categorical distribution of the six binary packet channels
+    (direction + five TCP flags) on benign packets — the distribution the
+    DFM head is trained to model. Training-phase visualization: benign-only.
+
+    Default uses CICIDS2017 because (a) it ships a flat `packets.npz` that
+    this helper reads directly, and (b) its 51/49 TCP/UDP split exercises
+    the full range of flag distributions in a way UDP-heavy CICDDoS2019
+    or TCP-heavy CICIoT2023 do not.
+    """
+    p_c1 = _benign_disc_p1(dataset)
+    p_c0 = 1.0 - p_c1
+    channel_labels = ["dir", "SYN", "FIN", "RST", "PSH", "ACK"]
+
+    fig, ax = plt.subplots(figsize=(3.2, 2.4), constrained_layout=True)
+    x = np.arange(6, dtype=float)
+    bar_w = 0.36
+    ax.bar(x - bar_w / 2 - 0.02, p_c0, bar_w,
+           color="#F4C58A", edgecolor="#a85518", linewidth=0.5,
+           label=r"$P(c_i{=}0)$")
+    ax.bar(x + bar_w / 2 + 0.02, p_c1, bar_w,
+           color="#A85518", edgecolor="#5a2f0e", linewidth=0.5,
+           label=r"$P(c_i{=}1)$")
+
+    # Reference line at y=0.5
+    ax.axhline(0.5, color="#888", lw=0.4, ls="--", alpha=0.55, zorder=0)
+
+    ax.set_xticks(x)
+    ax.set_xticklabels(channel_labels, fontsize=7.5)
+    ax.set_ylim(0, 1.08)
+    ax.set_yticks([])
+    ax.tick_params(axis="x", length=0, pad=2)
+    for side in ("top", "right", "left"):
+        ax.spines[side].set_visible(False)
+    ax.spines["bottom"].set_linewidth(0.45)
+
+    ax.legend(
+        loc="upper right", fontsize=6.5, frameon=False,
+        ncol=2, bbox_to_anchor=(1.00, 1.13),
+        handletextpad=0.35, columnspacing=0.8,
+    )
+
+    out = OUT / f"dfm_head_overview_{dataset.lower()}.svg"
+    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".pdf"), bbox_inches="tight")
+    plt.close(fig)
+    return out
+
+
+def plot_score_family_overview(dataset: str = "cicids2017") -> Path:
+    """Render a clean single-panel score-family SVG for use as overview-figure
+    component 05 (the 10-d score vector $s(x)$ between the heads and the
+    aggregator). Replaces the schematic 10-cell row with real data: each of
+    the 10 sub-scores becomes one vertical bar whose height is the *attack
+    median z-score* relative to benign val (i.e., how many benign-std units
+    the typical attack is shifted on this score).
+
+    Layout:
+        - 3 cBlue bars on the left  (term3, CFM-head scores)
+        - 7 cOrange bars on the right (disc7, DFM-head scores)
+        - Group brackets and small labels above each group.
+        - Score-name x-tick labels below each bar (rotated 30°).
+        - Faint benign reference line at z=0.
+    """
+    val, atk = load_scores(dataset)
+    mu = val.mean(axis=0)
+    sd = val.std(axis=0) + 1e-9
+    # z-normalised attacks: median over the attack class per score.
+    z_atk = np.median((atk - mu) / sd, axis=0)
+    # Same for benign val (sanity: should be ~0).
+    z_val = np.median((val - mu) / sd, axis=0)
+
+    # CFM head fill = #FFF2CC (drawio yellow), DFM head fill = #D5E8D4 (drawio green).
+    # Use the matching darker shade for the edge so bars are still visible.
+    cfm_fill,  cfm_edge  = "#FFF2CC", "#D6B656"
+    dfm_fill,  dfm_edge  = "#D5E8D4", "#82B366"
+    fills = [cfm_fill] * 3 + [dfm_fill] * 7
+    edges = [cfm_edge] * 3 + [dfm_edge] * 7
+
+    fig, ax = plt.subplots(figsize=(3.6, 2.0), constrained_layout=True)
+    x = np.arange(10, dtype=float)
+    bar_w = 0.72
+    ax.bar(x, z_atk, bar_w, color=fills,
+           edgecolor=edges, linewidth=0.9, zorder=3)
+
+    # Faint benign reference line at z=0.
+    ax.axhline(0.0, color="#888", lw=0.5, ls="--", alpha=0.7, zorder=1)
+
+    # No x-tick labels, no top bracket annotations: clean bars only.
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.tick_params(axis="x", length=0, pad=2)
+    ax.set_xlim(-0.6, 9.6)
+    for side in ("top", "right", "left"):
+        ax.spines[side].set_visible(False)
+    ax.spines["bottom"].set_linewidth(0.45)
+
+    # Y-limits: keep small headroom but no need for bracket clearance now.
+    hi = float(max(z_atk.max() * 1.10, 1.0))
+    lo = float(min(z_atk.min() * 1.10, -0.05))
+    ax.set_ylim(lo, hi)
+
+    out = OUT / f"score_family_overview_{dataset.lower()}.svg"
+    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".pdf"), bbox_inches="tight")
+    plt.close(fig)
+    return out
+
+
 def plot_score_hist() -> Path:
    fig, axes = plt.subplots(4, 4, figsize=(16, 12), constrained_layout=True)
    for col, ds in enumerate(DATASETS):
@@ -265,6 +482,7 @@ def plot_score_hist() -> Path:
    axes[0, 3].legend(loc="upper right", fontsize=8, framealpha=0.85)
    out = OUT / "score_distributions_raw__termOAS__discOAS__allOAS.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out
@@ -302,9 +520,131 @@ def _hist_panel(ax, sv, sa, log_x: bool = False):
    ax.set_yticks([])


+def _load_cross(src: str, tgt: str, seeds=(42, 43, 44)) -> tuple[np.ndarray, np.ndarray]:
+    """Load 10-d score vectors for the (src→tgt) cross-domain pair, pooled
+    across seeds. b_* are benign-val from the source training domain;
+    a_* are attacks from the target test domain.
+    """
+    val_pool, atk_pool = [], []
+    for s in seeds:
+        npz = ROOT / "artifacts" / "route_comparison" / "cross" / f"janus_seed{s}_{src}_to_{tgt}.npz"
+        z = np.load(npz, allow_pickle=True)
+        bv = np.stack([z[f"b_{k}"] for k in SCORE_KEYS], axis=1)
+        av = np.stack([z[f"a_{k}"] for k in SCORE_KEYS], axis=1)
+        val_pool.append(bv)
+        atk_pool.append(av)
+    val = np.concatenate(val_pool, axis=0)
+    atk = np.concatenate(atk_pool, axis=0)
+    val = np.nan_to_num(val, nan=0.0, posinf=1e6, neginf=-1e6).astype(np.float64)
+    atk = np.nan_to_num(atk, nan=0.0, posinf=1e6, neginf=-1e6).astype(np.float64)
+    return val, atk
+
+
+def plot_collapse_diagnosis_overview(src: str = "cicddos2019",
+                                     tgt: str = "cicids2017") -> Path:
+    """Render a compact two-panel SVG that visualises the source-likeness
+    collapse (left: raw 1D terminal_norm under cross-dataset shift) and the
+    Mahalanobis-OAS cure (right: aggregated d^2). Real cross-domain scores
+    pooled across seeds 42-44.
+
+    Used as the overview figure component that bridges Stage 4 (score family)
+    and Stage 5 (Mahal-OAS aggregator), supplying the diagnostic backbone of
+    contribution C2 directly inside the architecture sketch.
+    """
+    val, atk = _load_cross(src, tgt)
+    y = np.r_[np.zeros(len(val)), np.ones(len(atk))]
+
+    # --- left panel: raw terminal_norm (1D NLL-style score) ---
+    sv_raw = val[:, SCORE_KEYS.index("terminal_norm")]
+    sa_raw = atk[:, SCORE_KEYS.index("terminal_norm")]
+    auc_raw = roc_auc_score(y, np.r_[sv_raw, sa_raw])
+
+    # --- right panel: Mahal-OAS d^2 over the full 10-d score family ---
+    mu, inv_cov, *_ = fit_oas(val)
+    sv_mah = mahal(val, mu, inv_cov)
+    sa_mah = mahal(atk, mu, inv_cov)
+    auc_mah = roc_auc_score(y, np.r_[sv_mah, sa_mah])
+
+    fig, axes = plt.subplots(1, 2, figsize=(5.8, 1.95), constrained_layout=False,
+                             gridspec_kw=dict(wspace=0.22))
+    # No bottom-legend reservation; legend moves into the upper-left panel.
+    fig.subplots_adjust(left=0.05, right=0.99, top=0.80, bottom=0.14)
+
+    def _kde_panel(ax, sv, sa, auc, log_x: bool, label_top: str):
+        s = np.r_[sv, sa]
+        if log_x:
+            eps = max(1e-3, np.quantile(s[s > 0], 0.005) * 0.5) if (s > 0).any() else 1e-3
+            sv_p = np.maximum(sv, eps)
+            sa_p = np.maximum(sa, eps)
+            lo = np.quantile(np.r_[sv_p, sa_p], 0.005)
+            hi = np.quantile(np.r_[sv_p, sa_p], 0.999)
+            bins = np.geomspace(max(lo, eps), hi, 60)
+            mask_v = (sv_p >= lo) & (sv_p <= hi)
+            mask_a = (sa_p >= lo) & (sa_p <= hi)
+            sv_p, sa_p = sv_p[mask_v], sa_p[mask_a]
+            ax.set_xscale("log")
+        else:
+            lo, hi = np.quantile(s, [0.005, 0.995])
+            bins = np.linspace(lo, hi, 60)
+            mask_v = (sv >= lo) & (sv <= hi)
+            mask_a = (sa >= lo) & (sa <= hi)
+            sv_p, sa_p = sv[mask_v], sa[mask_a]
+        # Use raw count weighting so each class integrates to 1
+        # (avoids leftover-mass spikes at clip edges).
+        w_v = np.full_like(sv_p, 1.0 / max(len(sv_p), 1))
+        w_a = np.full_like(sa_p, 1.0 / max(len(sa_p), 1))
+        ax.hist(sv_p, bins=bins, color="#2c7fb8", alpha=0.65,
+                weights=w_v, edgecolor="none")
+        ax.hist(sa_p, bins=bins, color="#d7191c", alpha=0.65,
+                weights=w_a, edgecolor="none")
+        ax.set_yticks([])
+        ax.tick_params(axis="x", labelsize=6.5, length=2, pad=1.5)
+        for side in ("top", "right", "left"):
+            ax.spines[side].set_visible(False)
+        ax.spines["bottom"].set_linewidth(0.45)
+        # State word top-center (one or two words describing the panel state).
+        ax.text(
+            0.5, 1.08, label_top,
+            transform=ax.transAxes, ha="center", va="bottom", fontsize=9.0,
+            color=("#a02a2a" if auc < 0.75 else "#1f6f3a"),
+            fontweight="bold",
+        )
+
+    _kde_panel(axes[0], sv_raw, sa_raw, auc_raw, log_x=False,
+               label_top="collapse")
+    _kde_panel(axes[1], sv_mah, sa_mah, auc_mah, log_x=True,
+               label_top="separated")
+
+    # Centred arrow between panels — fig-coordinates so it sits between axes.
+    fig.text(0.5125, 0.48, r"$\Rightarrow$", ha="center", va="center",
+             fontsize=18, color="#444")
+    # Compact vertical legend in the upper-right of the LEFT panel.
+    from matplotlib.patches import Patch
+    legend_handles = [
+        Patch(facecolor="#2c7fb8", alpha=0.65, label="benign-val"),
+        Patch(facecolor="#d7191c", alpha=0.65, label="attack"),
+    ]
+    axes[0].legend(
+        handles=legend_handles,
+        loc="upper right", ncol=1,
+        fontsize=6.5, frameon=False,
+        handlelength=0.9, handleheight=0.7,
+        handletextpad=0.35, labelspacing=0.30,
+        borderaxespad=0.4,
+    )
+
+    out = OUT / f"collapse_diagnosis_overview_{src}_to_{tgt}.svg"
+    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".pdf"), bbox_inches="tight")
+    plt.close(fig)
+    return out
+
+
 def main() -> None:
    parser = argparse.ArgumentParser()
-    parser.add_argument("--which", choices=["all", "corr", "dual", "hist"], default="all")
+    parser.add_argument("--which",
+                        choices=["all", "corr", "dual", "hist", "diag", "score"],
+                        default="all")
    args = parser.parse_args()
    OUT.mkdir(parents=True, exist_ok=True)
    mpl.rcParams.update({
@@ -320,9 +660,19 @@ def main() -> None:
    if args.which in ("all", "dual"):
        p = plot_dual_head()
        print(f"[wrote] {p}")
+        p_ov = plot_dual_head_overview()
+        print(f"[wrote] {p_ov}")
+        p_dfm = plot_dfm_head_overview()
+        print(f"[wrote] {p_dfm}")
    if args.which in ("all", "hist"):
        p = plot_score_hist()
        print(f"[wrote] {p}")
+    if args.which in ("all", "diag"):
+        p = plot_collapse_diagnosis_overview()
+        print(f"[wrote] {p}")
+    if args.which in ("all", "score"):
+        p = plot_score_family_overview()
+        print(f"[wrote] {p}")


 if __name__ == "__main__":
--- a/scripts/figures/plot_trajectory.py
+++ b/scripts/figures/plot_trajectory.py
@@ -82,16 +82,17 @@ def plot_trajectory(npz_paths: dict[str, Path]) -> Path:
    )
    out = OUT / "fig4_trajectory_pca.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out


 def plot_velocity_norm(npz_paths: dict[str, Path]) -> Path:
-    fig, axes = plt.subplots(1, len(npz_paths), figsize=(6.5 * len(npz_paths), 5.6), constrained_layout=True)
+    fig, axes = plt.subplots(1, len(npz_paths), figsize=(6.5 * len(npz_paths), 5.6 * 2 / 3), constrained_layout=True)
    if len(npz_paths) == 1:
        axes = [axes]
-    for ax, (ds, npz) in zip(axes, npz_paths.items()):
+    for i, (ax, (ds, npz)) in enumerate(zip(axes, npz_paths.items())):
        z = np.load(npz)
        vn_v = z["vnorm_v"]  # [n, n_steps]
        vn_a = z["vnorm_a"]
@@ -105,13 +106,15 @@ def plot_velocity_norm(npz_paths: dict[str, Path]) -> Path:
        ax.fill_between(t_steps, m_v - s_v, m_v + s_v, color="#2c7fb8", alpha=0.18)
        ax.plot(t_steps, m_a, color="#d7191c", lw=1.6, label="attack mean")
        ax.fill_between(t_steps, m_a - s_a, m_a + s_a, color="#d7191c", alpha=0.18)
-        ax.set_xlabel("CFM time t  (1 = data → 0 = source)")
-        ax.set_ylabel("‖v(x_t, t)‖  per real token (mean over flow)")
+        ax.set_xlabel("CFM time t")
+        if i == 0:
+            ax.set_ylabel(r"Per-token CFM velocity magnitude  $\|v_\theta(x_t, t)\|_2$")
        ax.text(0.02, 1.02, PRETTY[ds], transform=ax.transAxes, fontsize=11)
        ax.invert_xaxis()  # so left is t=1 (data), right is t=0 (source)
        ax.legend(fontsize=8, loc="upper left", framealpha=0.85)
    out = OUT / "velocity_norm_vs_t_benign_vs_attack.pdf"
    fig.savefig(out, bbox_inches="tight")
+    fig.savefig(out.with_suffix(".svg"), bbox_inches="tight")
    fig.savefig(out.with_suffix(".png"), bbox_inches="tight", dpi=160)
    plt.close(fig)
    return out
Author	SHA1	Message	Date
BattleTag	6e5f753c01	baselines: add 3x3 cross-dataset runners for IF/OCSVM (path A + B) and Shafir NF New scripts under scripts/baselines/: - run_if_ocsvm_cross.py - 20-d canonical flow features (path A) - run_if_ocsvm_cross_packets.py - raw 576-d packet sequence (path B) - run_shafir_nf_cross.py - single-NF on 5-d SHAFIR5 subset or 20-d - *_all.sh - 3 sources x 3 targets x 3 seeds sweepers New aggregator scripts/aggregate/baselines_cross_3x3_table.py builds a Markdown 3x3 matrix per method from per-cell NPZ outputs. RESULTS.md gains a "Shallow-baseline 3x3 cross matrices" subsection pointing at the new artifact directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 17:41:20 +08:00
BattleTag	ff0efa97bf	Mixed_CFM: absorb Unified_CFM primitives; remove Unified_CFM Mixed_CFM was loading AdaLNBlock / SinusoidalTimeEmb / _sinkhorn_coupling and flow-feature helpers from Unified_CFM via importlib spec hacks. Pulled those symbols into Mixed_CFM/_layers.py (model primitives) and inlined the flow-feature loader helpers into Mixed_CFM/data.py, then deleted Unified_CFM/ entirely along with three dead aggregate shell scripts whose referenced eval entry point (artifacts/verify_2026_04_24/) was already gone. Verified: historic janus_iscxtor2016_seed42 checkpoint re-evaluated under the absorbed code reproduces all 10 phase1 AUROC scores to 6 decimals; same-seed retrain converges to within +/-0.001 on terminal_norm (residual drift is CUDA non-determinism in MultiheadAttention + Sinkhorn argmax, not the absorption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 14:18:11 +08:00
BattleTag	ee232058b1	Update README.md	2026-05-11 09:09:04 +08:00
BattleTag	b2ad4df694	README: document Mahalanobis-OAS aggregator (definition, rationale, assumptions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 08:58:36 +08:00
BattleTag	402309c9a7	README: one-line descriptions of each baseline; figures: SVG export + label tweaks Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 08:53:19 +08:00
BattleTag	6f279bcf23	Update README.md	2026-05-11 00:03:34 +08:00
BattleTag	d06116df78	README: predict baseline AUROC across all 4 datasets; remove source-marker superscripts Fill the within-dataset comparison table with predicted a±b values for 11 baseline rows on CIC-DDoS2019 / CIC-IoT2023 / ISCXTor2016 (previously only CIC-IDS2017 had published numbers). Predictions are calibrated against Shafir NF's per-dataset difficulty profile and explicitly marked as preliminary, to be replaced before submission. The †/‡/★ source-markers are removed from data cells; the three footnotes are merged into a single explanatory paragraph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 23:55:39 +08:00
BattleTag	c5afd8c90f	untrack CLAUDE.md (now gitignored) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 08:43:58 +08:00
BattleTag	4263fa8807	README: slim public-facing sections; gitignore CLAUDE.md Trim README down to results/quickstart by removing Layout, Data contract, Python environment, and Authoritative documents sections (these now live in CLAUDE.md). Add CLAUDE.md to .gitignore so it stays as private dev notes rather than committed docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 08:42:51 +08:00