Initial commit: ER-TP-DGP research prototype

Event-Reified Temporal Provenance Dual-Granularity Prompting for LLM-based APT detection on DARPA provenance datasets. Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt modules, scripts for THEIA candidate universe, landmark CSG construction, hybrid prompting, and LLM inference. Excludes data/, reports/, and local LLM config from version control.
2026-05-15 16:53:57 +08:00
commit b86ae87b75
88 changed files with 18570 additions and 0 deletions
--- a/docs/phase5_candidates.md
+++ b/docs/phase5_candidates.md
@@ -0,0 +1,34 @@
+# Phase 5 Candidate Target Generation
+
+Candidate generation reduces LLM call volume. It is not the final detector.
+
+## Allowed Signals
+
+Signals must be label-free:
+
+- rare parent-child process relation;
+- rare process path;
+- rare file path;
+- first-seen external endpoint;
+- write-then-execute behavior;
+- read-then-send behavior;
+- unusual process tree depth;
+- login followed by lateral communication;
+- statistical anomaly or weak detector alert.
+
+## Required Evaluation
+
+Candidate generation is evaluated separately from final LLM classification:
+
+- candidate generation recall;
+- candidate generation precision;
+- number of candidates;
+- positive coverage by process/event target;
+- end-to-end recall after LLM classification.
+
+## Checks
+
+- Candidate generation must not use test labels.
+- Candidate generation must not use attack report narratives.
+- Weak signals are retained for audit but do not replace ER-TP-DGP.
+