Event-Reified Temporal Provenance Dual-Granularity Prompting for LLM-based APT detection on DARPA provenance datasets. Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt modules, scripts for THEIA candidate universe, landmark CSG construction, hybrid prompting, and LLM inference. Excludes data/, reports/, and local LLM config from version control.
1.0 KiB
1.0 KiB
Phase 4 Ground Truth Mapping and Labels
Ground truth is used only for label mapping and evaluation. It must not enter LLM prompts.
Label Levels
- Event-level: direct matched attack events.
- Process-level: processes involved in malicious event chains.
- Subgraph-level: local evidence subgraphs containing key attack-chain events.
Ambiguous Cases
Ambiguous targets should be assigned unknown or ignore, not forced to
malicious or benign:
- attack window overlap without explicit evidence;
- normal child behavior from a compromised process;
- normal process later abused by an attacker;
- missing fields that prevent reliable mapping.
Negative Sampling
Negative sampling must avoid:
- arbitrary benign labels inside attack windows;
- train/test leakage through the same attack entity;
- adjacent attack-chain events split across train and test;
- using attack-report text as prompt content.
Checks
- Label records are not prompt-allowed.
- Each label has source and confidence.
- Trainable labels require high confidence.