# ER-TP-DGP Event-Reified Temporal Provenance Dual-Granularity Prompting for LLM-based APT detection. This repository is a research prototype for evaluating graph-enhanced LLM detection on DARPA provenance datasets. The main method is not raw log prompting, not a GNN classifier, and not a rules detector. The main pipeline is: ```text DARPA provenance records -> schema-aware provenance IR -> event-reified temporal heterogeneous graph -> time-respecting APT semantic evidence paths -> dual-granularity graph prompt -> LLM classification with evidence path IDs ``` The current implementation is data-independent scaffolding. It intentionally does not assume that every DARPA dataset contains command lines, registry objects, hashes, domains, services, tasks, modules, or complete ground truth. ## Core Formula ```text Prompt(q) = Fine(q) + Local(q) + sum_P [Summary_P(q) + Stats_P(q) + Evidence_P(q)] ``` `q` is a process or event target. `P` is an APT semantic metapath such as execution chain, file staging, network/C2, exfiltration-like, persistence, or lateral movement. ## Current Status Implemented without real data: - Phase 0 method specification. - Phase 1 dataset schema audit model and report generation. - Unified provenance IR dataclasses. - IR validation and JSONL serialization. - Dataset adapter interface and schema mismatch reporting. - Event-view and causal-view graph construction. - Time-window, host-filtered, target-context, and ID-based graph views. - Time-respecting APT metapath path extraction for core path families. - Temporal, structural, semantic, and security-aware trimming scaffold. - Dual-granularity prompt construction with evidence IDs. - Label-only ground-truth mapping interfaces. - LLM strategy, baseline, and ablation method registry. - Imbalanced APT detection metrics including AUPRC, AUROC, Macro-F1, Precision@K, Recall@K, FPR at fixed recall, detection delay, token/cost accounting, and evidence-path hit rate. - Time, campaign, and host split helpers with leakage checks for raw event IDs, process IDs, IOC-like file paths, duplicated prompts, summaries, campaigns, and same-host time windows. - OpenAI-compatible LLM inference client for remote API and local deployments, with first-token `MALICIOUS`/`BENIGN` parsing and raw response retention. - THEIA CDM18 action semantics with auditable canonical actions, causal directions, metapath hints, and MEMORY entity support. - Common-behavior context annotations such as browser-like process ratio and local IPC flow ratio. These are neutral prompt features, not hard filters or rule-based benign decisions. - Synthetic unit tests for interface and invariant checks. ## LLM Inference Remote OpenAI-compatible API: ```bash export OPENAI_COMPAT_API_KEY='...' cp configs/llm.example.yaml configs/llm.yaml # edit configs/llm.yaml: provider=api, base_url, model, api_key_env .venv/bin/python scripts/run_llm_inference.py \ --config configs/llm.yaml \ --prompt-file reports/theia_e3_idea/prompt.txt \ --output-jsonl reports/llm_predictions.jsonl ``` Local OpenAI-compatible deployment: ```bash cp configs/llm.example.yaml configs/llm.yaml # edit configs/llm.yaml: provider=local, base_url, model .venv/bin/python scripts/run_llm_inference.py \ --config configs/llm.yaml \ --prompt-file reports/theia_e3_idea/prompt.txt \ --output-jsonl reports/local_llm_predictions.jsonl ``` The LLM prompt must not include ground-truth reports, IOC narratives, or labels. Ground truth is only for label mapping and evaluation. Synthetic examples are debugging-only fixtures and are not experimental results.