Go to file

BattleTag b86ae87b75 Initial commit: ER-TP-DGP research prototype

Event-Reified Temporal Provenance Dual-Granularity Prompting for
LLM-based APT detection on DARPA provenance datasets.

Includes phase 0-14 method spec, IR/graph/metapath/trimming/prompt
modules, scripts for THEIA candidate universe, landmark CSG construction,
hybrid prompting, and LLM inference. Excludes data/, reports/, and
local LLM config from version control.

2026-05-15 16:53:57 +08:00

configs

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

docs

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

examples

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

refers

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

scripts

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

src/er_tp_dgp

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

tests

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

.codex

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

.gitignore

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

pyproject.toml

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

README.md

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

uv.lock

Initial commit: ER-TP-DGP research prototype

2026-05-15 16:53:57 +08:00

README.md

ER-TP-DGP

Event-Reified Temporal Provenance Dual-Granularity Prompting for LLM-based APT detection.

This repository is a research prototype for evaluating graph-enhanced LLM detection on DARPA provenance datasets. The main method is not raw log prompting, not a GNN classifier, and not a rules detector. The main pipeline is:

DARPA provenance records
  -> schema-aware provenance IR
  -> event-reified temporal heterogeneous graph
  -> time-respecting APT semantic evidence paths
  -> dual-granularity graph prompt
  -> LLM classification with evidence path IDs

The current implementation is data-independent scaffolding. It intentionally does not assume that every DARPA dataset contains command lines, registry objects, hashes, domains, services, tasks, modules, or complete ground truth.

Core Formula

Prompt(q) = Fine(q) + Local(q)
          + sum_P [Summary_P(q) + Stats_P(q) + Evidence_P(q)]

q is a process or event target. P is an APT semantic metapath such as execution chain, file staging, network/C2, exfiltration-like, persistence, or lateral movement.

Current Status

Implemented without real data:

Phase 0 method specification.
Phase 1 dataset schema audit model and report generation.
Unified provenance IR dataclasses.
IR validation and JSONL serialization.
Dataset adapter interface and schema mismatch reporting.
Event-view and causal-view graph construction.
Time-window, host-filtered, target-context, and ID-based graph views.
Time-respecting APT metapath path extraction for core path families.
Temporal, structural, semantic, and security-aware trimming scaffold.
Dual-granularity prompt construction with evidence IDs.
Label-only ground-truth mapping interfaces.
LLM strategy, baseline, and ablation method registry.
Imbalanced APT detection metrics including AUPRC, AUROC, Macro-F1, Precision@K, Recall@K, FPR at fixed recall, detection delay, token/cost accounting, and evidence-path hit rate.
Time, campaign, and host split helpers with leakage checks for raw event IDs, process IDs, IOC-like file paths, duplicated prompts, summaries, campaigns, and same-host time windows.
OpenAI-compatible LLM inference client for remote API and local deployments, with first-token MALICIOUS/BENIGN parsing and raw response retention.
THEIA CDM18 action semantics with auditable canonical actions, causal directions, metapath hints, and MEMORY entity support.
Common-behavior context annotations such as browser-like process ratio and local IPC flow ratio. These are neutral prompt features, not hard filters or rule-based benign decisions.
Synthetic unit tests for interface and invariant checks.

LLM Inference

Remote OpenAI-compatible API:

export OPENAI_COMPAT_API_KEY='...'
cp configs/llm.example.yaml configs/llm.yaml
# edit configs/llm.yaml: provider=api, base_url, model, api_key_env

.venv/bin/python scripts/run_llm_inference.py \
  --config configs/llm.yaml \
  --prompt-file reports/theia_e3_idea/prompt.txt \
  --output-jsonl reports/llm_predictions.jsonl

Local OpenAI-compatible deployment:

cp configs/llm.example.yaml configs/llm.yaml
# edit configs/llm.yaml: provider=local, base_url, model

.venv/bin/python scripts/run_llm_inference.py \
  --config configs/llm.yaml \
  --prompt-file reports/theia_e3_idea/prompt.txt \
  --output-jsonl reports/local_llm_predictions.jsonl

The LLM prompt must not include ground-truth reports, IOC narratives, or labels. Ground truth is only for label mapping and evaluation.

Synthetic examples are debugging-only fixtures and are not experimental results.