Initial commit: ER-TP-DGP research prototype
Event-Reified Temporal Provenance Dual-Granularity Prompting for LLM-based APT detection on DARPA provenance datasets. Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt modules, scripts for THEIA candidate universe, landmark CSG construction, hybrid prompting, and LLM inference. Excludes data/, reports/, and local LLM config from version control.
This commit is contained in:
36
docs/phase4_labels.md
Normal file
36
docs/phase4_labels.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Phase 4 Ground Truth Mapping and Labels
|
||||
|
||||
Ground truth is used only for label mapping and evaluation. It must not enter
|
||||
LLM prompts.
|
||||
|
||||
## Label Levels
|
||||
|
||||
- Event-level: direct matched attack events.
|
||||
- Process-level: processes involved in malicious event chains.
|
||||
- Subgraph-level: local evidence subgraphs containing key attack-chain events.
|
||||
|
||||
## Ambiguous Cases
|
||||
|
||||
Ambiguous targets should be assigned `unknown` or `ignore`, not forced to
|
||||
malicious or benign:
|
||||
|
||||
- attack window overlap without explicit evidence;
|
||||
- normal child behavior from a compromised process;
|
||||
- normal process later abused by an attacker;
|
||||
- missing fields that prevent reliable mapping.
|
||||
|
||||
## Negative Sampling
|
||||
|
||||
Negative sampling must avoid:
|
||||
|
||||
- arbitrary benign labels inside attack windows;
|
||||
- train/test leakage through the same attack entity;
|
||||
- adjacent attack-chain events split across train and test;
|
||||
- using attack-report text as prompt content.
|
||||
|
||||
## Checks
|
||||
|
||||
- Label records are not prompt-allowed.
|
||||
- Each label has source and confidence.
|
||||
- Trainable labels require high confidence.
|
||||
|
||||
Reference in New Issue
Block a user