Files
ER-TP-DGP/docs/phase4_labels.md
BattleTag b86ae87b75 Initial commit: ER-TP-DGP research prototype
Event-Reified Temporal Provenance Dual-Granularity Prompting for
LLM-based APT detection on DARPA provenance datasets.

Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt
modules, scripts for THEIA candidate universe, landmark CSG construction,
hybrid prompting, and LLM inference. Excludes data/, reports/, and
local LLM config from version control.
2026-05-15 16:53:57 +08:00

1.0 KiB

Phase 4 Ground Truth Mapping and Labels

Ground truth is used only for label mapping and evaluation. It must not enter LLM prompts.

Label Levels

  • Event-level: direct matched attack events.
  • Process-level: processes involved in malicious event chains.
  • Subgraph-level: local evidence subgraphs containing key attack-chain events.

Ambiguous Cases

Ambiguous targets should be assigned unknown or ignore, not forced to malicious or benign:

  • attack window overlap without explicit evidence;
  • normal child behavior from a compromised process;
  • normal process later abused by an attacker;
  • missing fields that prevent reliable mapping.

Negative Sampling

Negative sampling must avoid:

  • arbitrary benign labels inside attack windows;
  • train/test leakage through the same attack entity;
  • adjacent attack-chain events split across train and test;
  • using attack-report text as prompt content.

Checks

  • Label records are not prompt-allowed.
  • Each label has source and confidence.
  • Trainable labels require high confidence.