Initial commit: ER-TP-DGP research prototype

Event-Reified Temporal Provenance Dual-Granularity Prompting for
LLM-based APT detection on DARPA provenance datasets.

Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt
modules, scripts for THEIA candidate universe, landmark CSG construction,
hybrid prompting, and LLM inference. Excludes data/, reports/, and
local LLM config from version control.
This commit is contained in:
BattleTag
2026-05-15 16:53:57 +08:00
commit b86ae87b75
88 changed files with 18570 additions and 0 deletions

34
docs/phase5_candidates.md Normal file
View File

@@ -0,0 +1,34 @@
# Phase 5 Candidate Target Generation
Candidate generation reduces LLM call volume. It is not the final detector.
## Allowed Signals
Signals must be label-free:
- rare parent-child process relation;
- rare process path;
- rare file path;
- first-seen external endpoint;
- write-then-execute behavior;
- read-then-send behavior;
- unusual process tree depth;
- login followed by lateral communication;
- statistical anomaly or weak detector alert.
## Required Evaluation
Candidate generation is evaluated separately from final LLM classification:
- candidate generation recall;
- candidate generation precision;
- number of candidates;
- positive coverage by process/event target;
- end-to-end recall after LLM classification.
## Checks
- Candidate generation must not use test labels.
- Candidate generation must not use attack report narratives.
- Weak signals are retained for audit but do not replace ER-TP-DGP.