Initial commit: code, paper, small artifacts
This commit is contained in:
125
paper/background_related.md
Normal file
125
paper/background_related.md
Normal file
@@ -0,0 +1,125 @@
|
||||
## 2 Background
|
||||
|
||||
### 2.1 Unsupervised network anomaly detection
|
||||
|
||||
We consider the standard unsupervised setting: a detector is trained only on
|
||||
benign traffic and, at inference time, must assign an anomaly score to each
|
||||
flow without access to attack labels at any stage of training. Public
|
||||
benchmarks (e.g., CIC-IDS2017, CIC-DDoS2019, ISCXTor2016) provide labelled
|
||||
attack traffic for evaluation only. Two granularities dominate the
|
||||
literature: flow-level detectors operate on per-flow aggregate features
|
||||
(byte counts, inter-arrival statistics, flag tallies), while packet-level
|
||||
detectors operate on the ordered sequence of per-packet features inside a
|
||||
flow and retain temporal structure that flow aggregates discard.
|
||||
|
||||
Within-dataset AUROC on the standard benchmarks has narrowed to within
|
||||
seed noise across recent recipes; the substantive evaluation axis is now
|
||||
cross-dataset transfer, in which a detector is trained on one environment
|
||||
and evaluated on traffic from another. Performance on this axis has not
|
||||
converged.
|
||||
|
||||
### 2.2 Continuous Flow Matching
|
||||
|
||||
Continuous Flow Matching (CFM) trains a time-dependent vector field
|
||||
$v_\theta(x, t)$ to transport a tractable source distribution (typically
|
||||
$\mathcal{N}(0, I)$) to the data distribution along an ODE
|
||||
$\mathrm{d}x_t = v_\theta(x_t, t)\,\mathrm{d}t$. The training objective
|
||||
regresses $v_\theta$ onto a target velocity defined along a chosen
|
||||
conditional probability path; for the linear (Gaussian) path this reduces
|
||||
to a simple least-squares loss, side-stepping the score-matching objective
|
||||
and stochastic sampler of diffusion models. OT-CFM straightens
|
||||
trajectories by pairing source and data samples through minibatch optimal
|
||||
transport, which lowers integration error and enables stable few-step
|
||||
inference.
|
||||
|
||||
A trained CFM model gives access not only to the learned density but to
|
||||
a family of geometric quantities along the trajectory: terminal velocity
|
||||
norm, divergence, curvature, and Jacobian-trace estimators. These can be
|
||||
read off the velocity field without retraining.
|
||||
|
||||
### 2.3 Discrete Flow Matching
|
||||
|
||||
Continuous FM does not apply to categorical state spaces, where adding
|
||||
Gaussian noise is undefined. Discrete Flow Matching (DFM) generalises
|
||||
the framework to finite alphabets through continuous-time Markov chains:
|
||||
the model parameterises token-level transition rates that interpolate
|
||||
between a source distribution (typically uniform) and the data
|
||||
distribution. The training objective remains a simple regression onto
|
||||
target rates derived from a chosen interpolation schedule. DFM has been
|
||||
validated on language and molecular generation; mixed
|
||||
continuous–discrete data, where each observation has both numerical and
|
||||
categorical channels, is the natural composition of CFM and DFM.
|
||||
|
||||
---
|
||||
|
||||
## 3 Related Work
|
||||
|
||||
### 3.1 Reconstruction-based detectors
|
||||
|
||||
Autoencoder-style detectors learn to reconstruct benign inputs and score
|
||||
anomalies by reconstruction error. Kitsune popularised the design for
|
||||
online NIDS using an ensemble of small autoencoders, and MemAE introduced
|
||||
a learned memory bank to constrain the latent representation to the
|
||||
benign manifold. The family suffers from a documented identity-mapping
|
||||
failure: sufficiently expressive autoencoders reconstruct out-of-
|
||||
distribution inputs near-perfectly, eroding the gap between benign and
|
||||
anomalous reconstruction error. Recent critiques argue that this
|
||||
behaviour is structural rather than a hyperparameter artefact, and that
|
||||
reconstruction error is therefore an unreliable anomaly score in
|
||||
general.
|
||||
|
||||
### 3.2 Density-based detectors
|
||||
|
||||
Three deep generative families currently hold the public SOTA on
|
||||
NIDS benchmarks. **Normalising flows** fit an explicit invertible
|
||||
density on benign traffic and score by negative log-likelihood; the
|
||||
strongest recent pipeline reports 0.93 within-dataset AUROC on
|
||||
CIC-DDoS2019 with cross-domain transfer in the 0.89–0.93 range.
|
||||
**Diffusion-based detectors** include contextual masking distillation
|
||||
schemes that compare a student denoiser against a benign-trained
|
||||
teacher, alongside a broader 2025 survey of diffusion AD variants.
|
||||
**GAN-based detectors**, exemplified by recent NDSS work that augments
|
||||
the optimisation with particle-swarm search, score by discriminator
|
||||
output or cycle-reconstruction error. All three families reduce a
|
||||
packet stream to a single scalar derived from one homogeneous
|
||||
probabilistic model fit to benign data, and the reported log-likelihood
|
||||
is known to dissociate from anomaly status once the benign distribution
|
||||
drifts.
|
||||
|
||||
A separate line of work uses self-supervised contrastive
|
||||
representations, graph neural networks, or pre-trained traffic
|
||||
foundation models, with anomaly scoring delegated to a downstream
|
||||
detector such as OCSVM or Mahalanobis distance. These pipelines are
|
||||
typically two-stage, are primarily evaluated on encrypted-traffic
|
||||
classification rather than open-set anomaly detection, and are not the
|
||||
focus of the cross-dataset robustness comparison we pursue.
|
||||
|
||||
### 3.3 Flow Matching for anomaly detection
|
||||
|
||||
Outside NIDS, two recent works adopt Flow Matching as the AD objective.
|
||||
A time-reversed FM detector for image anomaly detection couples
|
||||
worst-transport coupling with a high-dimensional latent, scoring by
|
||||
deviation from the learned velocity field. A tabular detector built on
|
||||
one-step FM offers explainability and provable robustness guarantees on
|
||||
heterogeneous structured data. Both validate FM-based scoring as
|
||||
competitive with reconstruction- and density-based baselines in their
|
||||
respective regimes. Discrete Flow Matching has been validated on
|
||||
language and molecular generation but not, to our knowledge, evaluated
|
||||
as an anomaly-detection objective. No prior work applies either
|
||||
continuous or discrete FM to packet-sequence NIDS.
|
||||
|
||||
### 3.4 Cross-dataset robustness in NIDS
|
||||
|
||||
As within-dataset metrics have saturated, cross-dataset evaluation has
|
||||
emerged as the field's discriminating axis. A 2024 systematic study
|
||||
measures the generalisation gap across the standard NIDS benchmarks
|
||||
under matched feature schemas and reports AUROC drops of 0.10–0.30
|
||||
when detectors trained on one environment are evaluated on another.
|
||||
Subsequent work on heterogeneous deep stacked ensembles, calibrated
|
||||
transformers, and few-shot multi-domain fusion targets the same gap
|
||||
through architectural or training-time interventions. The phenomenon is
|
||||
broadly observed and quantified; what is missing from the literature is
|
||||
a mechanism-level account of why density-based scores in particular
|
||||
degrade under domain shift, as opposed to an accumulation of empirical
|
||||
remedies. The pilot study in §X revisits this gap directly and frames
|
||||
the structural failure mode that the rest of the paper addresses.
|
||||
Reference in New Issue
Block a user