Compare commits

5 Commits

Author SHA1 Message Date
BattleTag
444d58726a refactor: native tool calling + generic forced-retry + terminal exit
- llm_client: switch tool_call_loop from text-based <tool_call> regex
  to OpenAI-native tools=[...] / structured tool_calls field; accumulate
  delta.reasoning_content for DeepSeek thinking-mode echo-back; fold
  preserves system msg and aligns boundary to never orphan role:tool
- base_agent: generic forced-retry via mandatory_record_tools class attr
  (filesystem -> add_phenomenon, timeline -> add_temporal_edge,
  hypothesis -> add_hypothesis, report -> save_report); count via
  executor wrapper
- terminal_tools class attr + loop short-circuit: when a terminal tool
  is called, loop exits with its raw return as final_text. ReportAgent
  declares save_report as terminal - replaces the <answer>-tag stop
  signal that native tool calling broke
- _execute_*: return (raw, formatted) - terminal exit uses untruncated
  raw, conversation history uses 3000-char-capped formatted
- evidence_graph + orchestrator: LLM-derived InvestigationArea support
  (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS /
  _AREA_TOOLS); manual yaml block kept as optional seed
- strip <answer> references from agent prompts (no longer load-bearing)

Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures
(was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via
save_report (was max_iterations regression). 78/78 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:51:19 +08:00
BattleTag
0a2b344c84 fix: share _safe_json_loads with tool-call parser, not just orchestrator
Move _safe_json_loads from orchestrator.py to llm_client.py and have
_extract_tool_calls use it when parsing <tool_call> JSON blocks from
model output. orchestrator now imports it from llm_client.

Background: in the first full DeepSeek run (runs/2026-05-12T17-25-38),
~10 'Failed to parse tool call JSON' warnings appeared, all from regex
patterns where the LLM wrote \. or \* inside JSON string values:

  Failed to parse tool call JSON: {..., "pattern": "Outlook Express|...|\.dbx"}
  Failed to parse tool call JSON: {..., "pattern": "ethereal.*\.pcap"}
  Failed to parse tool call JSON: {..., "pattern": "lookatlan.*\.txt|..."}

These are exactly the kind of stray-backslash errors stage-1 sanitize
already handles for orchestrator JSON calls — but tool-call extraction
was using bare json.loads. Result: each failed tool call silently dropped
on the floor, the LLM never got a result, and at least one network agent
burned 14m26s spinning before hitting max_iterations=40.

Now the sanitize/log-on-failure path is shared. Verified against the
three failure cases from yesterday's log: all three now parse cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:29:21 +08:00
BattleTag
76df34ed79 docs: add TODO marker for adaptive edge weights
Note that the hard-coded HYPOTHESIS_EDGE_WEIGHTS table is a temporary
choice; an adaptive scheme should be explored once the full pipeline
is stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:23 +08:00
BattleTag
893f5b5de2 fix: address agent boundary / JSON robustness / Phase 4 no-op from CFReDS run
Issues found running the system end-to-end on the NIST CFReDS Hacking Case
disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes:

1. HypothesisAgent boundary leak (two layers)
   B.1 Tool set: BaseAgent._register_graph_tools was registering
       add_phenomenon / add_lead / link_to_entity for every agent. With
       an empty graph in Phase 2, HypothesisAgent "compensated" by
       inventing phenomena, dispatching leads, and linking entities.
   B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call
       investigation tools (list_directory, parse_registry_key, etc.)".
       HypothesisAgent hallucinated list_directory and wasted 2 LLM
       rounds on 'unknown tool' errors before backing off.

   Fix:
   - Split _register_graph_tools into _register_graph_read_tools +
     _register_graph_write_tools.
   - HypothesisAgent, ReportAgent, TimelineAgent override
     _register_graph_tools to skip write tools.
   - HypothesisAgent and TimelineAgent override _build_system_prompt
     with focused, role-specific workflows (no Phase A-D investigation
     boilerplate).

2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost)
   DeepSeek emits JSON with stray backslashes (Windows path references)
   and occasional minor syntax slips. Old single-stage sanitize couldn't
   recover; per-hypothesis fallback silently swallowed each failure.

   Fix:
   - _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray
     \X (anything not in valid JSON escape set), log raw input on final
     failure for diagnosis.
   - New _call_llm_for_json helper: on parse failure, append the error
     to the prompt and re-call LLM (self-correcting retry, up to 2).
   - All 4 LLM-JSON callsites in orchestrator refactored to use it.

3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable)
   Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer>
   is DISCARDED; only graph mutations propagate" plus a new rule:
   negative findings (searched X, found nothing) MUST also be recorded
   as phenomena, since they constrain the hypothesis space.

4. Phase 4 Timeline was a no-op
   TimelineAgent inherited BaseAgent's Phase A-D prompt and never called
   add_temporal_edge — produced 0 temporal edges. Override the prompt
   with concrete workflow (build_filesystem_timeline ->
   get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and
   restrict tool set to read-only + its 3 temporal tools.

Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent
13 (no graph writes), TimelineAgent 10 (read + temporal + timeline).
All 60 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:16 +08:00
BattleTag
0a966d8476 feat: switch LLM client to OpenAI SDK for DeepSeek compatibility
The previous LLMClient used raw httpx + Claude Messages API (/v1/messages,
x-api-key, Anthropic SSE event types). Incompatible with DeepSeek.

Rewrite LLMClient.__init__/chat/close to use openai.AsyncOpenAI:
- /v1/chat/completions endpoint, OpenAI message format
- Bearer auth, native SDK error types
- Stream chunks via async for + chunk.choices[0].delta.content

Tool calling protocol (ReAct text-based tags) and all surrounding helpers
(_apply_progressive_decay, _fold_old_messages, _partition_tool_calls,
tool_call_loop, etc.) are unchanged — endpoint-agnostic by design.

New optional config params surfaced to config.yaml.agent:
- reasoning_effort: "high" | "medium" | "low" — DeepSeek/o1-style depth
- thinking_enabled: bool — DeepSeek extra_body.thinking switch

main.py and regenerate_report.py pass these through to LLMClient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:13:54 +08:00
13 changed files with 1818 additions and 429 deletions

View File

@@ -36,6 +36,8 @@ main.py 入口:配置加载、镜像选择、断连
| `Entity` | `ent-*` | 人、程序、主机、IP 等可复现的实体 |
Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHTS`
# TODO
当前流程跑通以后,寻找自适应方案
| 边类型 | 权重 | 语义 |
|---|---:|---|
@@ -65,14 +67,18 @@ Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHT
| **Phase 4** | TimelineAgent 用 `build_filesystem_timeline` 生成 MAC 时间线,与 Phenomenon 时间戳关联 |
| **Phase 5** | ReportAgent 综合假设、证据、实体,生成 Markdown 报告 |
### Gap AnalysisPhase 3 末
### Investigation Areashypothesis-derived
`config.yaml:investigation_areas` 列出必须覆盖的调查领域系统信息、用户账户、网络配置、邮件配置、IRC 日志、PCAP、删除文件、Prefetch 等。Orchestrator 两层判定覆盖情况
Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 **InvestigationArea**snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids。Areas 存进 `graph.investigation_areas`,序列化到 `runs/<ts>/investigation_areas.json`。两个用途
1. **关键词匹配**`_AREA_KEYWORDS`)— 扫现有 Phenomenon 标题/描述
2. **工具命中**`_AREA_TOOLS`)— 检查是否调用过该领域的关键工具(如 `enumerate_users``parse_pcap_strings`
1. **Phase 3 主循环提示** — 每个 hypothesis 块附 `Expected areas: a, b, c`LLM 仍自由选 lead 但有软引导
2. **Phase 3 末尾 Gap Analysis** — 两层判定覆盖情况:
- **关键词匹配**:扫 Phenomenon 标题/描述对照 area.expected_keywords
- **工具命中**:检查 area.expected_tools 是否实际调用过
未覆盖的领域自动派 lead最多 3 轮补漏。
未覆盖的 area 自动派 lead`suggested_agent` + `priority` + `motivating_hypothesis_ids[0]` 透传给 `Lead.hypothesis_id` 保留 provenance,最多 3 轮补漏。
**手动 override**`config.yaml:investigation_areas` 默认注释掉,纯 LLM 派生。取消注释可添加强制必查的领域,会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖LLM 只会 augment keyword/tool 列表)。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。
## Agent 体系
@@ -181,11 +187,12 @@ max_investigation_rounds: 5 # Phase 3 最大迭代轮数
# - title: "嫌疑人主动实施网络嗅探"
# description: "..."
investigation_areas: # Gap Analysis 必须覆盖的领域
- area: system_info
agent: registry
task: "..."
# ...
# investigation_areas: # 可选:手动 override默认全 LLM 派生)
# - area: shutdown_time # LLM 通过 slug dedupe 只 augment
# agent: registry # keyword/tool 列表,不覆盖 manual
# priority: 3
# keywords: [shutdown]
# tools: [get_shutdown_time]
```
未配置 `hypotheses` 时由 HypothesisAgent 自动生成。

View File

@@ -1,7 +1,9 @@
"""Hypothesis Agent — generates investigative hypotheses from phenomena.
Generates hypotheses only. Phenomenon→Hypothesis linking is handled centrally
by Orchestrator._judge_new_phenomena, so all link logic lives in one place.
by Orchestrator._judge_new_phenomena. Tool set is restricted to read-only
graph queries + add_hypothesis to prevent the agent from creating phenomena,
leads, or entity links.
"""
from __future__ import annotations
@@ -22,11 +24,16 @@ class HypothesisAgent(BaseAgent):
"and formulate investigative hypotheses about what happened on this system. "
"Your ultimate goal: build the most complete picture of events that occurred."
)
mandatory_record_tools = ("add_hypothesis",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_hypothesis_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools. add_hypothesis is registered separately."""
self._register_graph_read_tools()
def _register_hypothesis_tools(self) -> None:
self.register_tool(
name="add_hypothesis",
@@ -51,6 +58,32 @@ class HypothesisAgent(BaseAgent):
executor=self._add_hypothesis,
)
def _build_system_prompt(self, task: str) -> str:
"""Focused prompt — no INVESTIGATE/RECORD/LINK workflow."""
return (
f"You are {self.name}, a forensic hypothesis analyst.\n"
f"Role: {self.role}\n\n"
f"Image: {self.graph.image_path}\n"
f"Current investigation state: {self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call list_phenomena and search_graph to review existing findings.\n"
f"2. For each hypothesis you want to record, call add_hypothesis (title + description).\n"
f"3. STOP after you have generated 3-7 hypotheses. Do not call any more tools.\n\n"
f"STRICT BOUNDARIES:\n"
f"- Your only mutation tool is add_hypothesis. Do NOT attempt list_directory, "
f"parse_registry_key, extract_file, or any disk-image investigation tools — "
f"they are not yours and you will get 'unknown tool' errors.\n"
f"- You CANNOT create phenomena, leads, or entity links. The orchestrator handles "
f"all phenomenon↔hypothesis linking after you finish.\n"
f"- Each hypothesis must be specific and testable. Avoid generic templates like "
f"'Unauthorized Remote Access' or 'Malware Deployment' unless concrete phenomena "
f"in the graph already point to them.\n"
f"- If the graph is empty, generate broad starting hypotheses and mark them "
f"clearly as exploratory in their description so downstream agents know they "
f"still need evidence."
)
async def _add_hypothesis(self, title: str, description: str) -> str:
hid = await self.graph.add_hypothesis(
title=title,

View File

@@ -2,9 +2,6 @@
from __future__ import annotations
import json
import os
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
@@ -15,34 +12,46 @@ class ReportAgent(BaseAgent):
role = (
"Forensic report writer. You synthesize all findings from the investigation "
"into a structured, professional forensic analysis report organized by hypotheses.\n\n"
"IMPORTANT: Only include findings that have a source_tool attribution (marked VERIFIED). "
"Only include findings that have a source_tool attribution (marked VERIFIED). "
"If evidence lacks source attribution, mark it as UNVERIFIED. "
"Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence.\n\n"
"CRITICAL: You MUST call save_report to write the final report."
"Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence."
)
# Calling save_report is BOTH the recording action and the completion
# signal. tool_call_loop returns the moment save_report executes; the
# tool's return value becomes the agent's final_text. The forced-retry
# mechanism fires if save_report is never called.
mandatory_record_tools = ("save_report",)
terminal_tools = ("save_report",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools. Report agent does not mutate state."""
self._register_graph_read_tools()
def _build_system_prompt(self, task: str) -> str:
"""Report agent gets a clean prompt — no Phase A/B/C/D workflow."""
return (
f"You are a forensic report writer.\n"
f"Role: {self.role}\n\n"
f"Investigation state:\n{self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call get_hypotheses_with_evidence to get all hypotheses and their linked evidence\n"
f"2. Call get_all_phenomena to get detailed findings by category\n"
f"3. Call get_entities to get people, programs, and hosts\n"
f"4. Call get_case_info for case metadata\n"
f"5. Write the complete report directly in your <answer> block\n\n"
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info "
f" to gather all the data needed for the report. Make these calls in parallel.\n"
f"2. Assemble the complete markdown forensic report.\n"
f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
f" This single call is the completion signal — the run ENDS the moment it executes.\n"
f" Do NOT call any read tools after this point; they will not run.\n"
f" Do NOT write the report as free text outside of save_report; only the\n"
f" `content` argument of save_report is persisted.\n\n"
f"RULES:\n"
f"- Write the report DIRECTLY in <answer> — do NOT use save_report tool\n"
f"- Only include findings present in the evidence graph\n"
f"- Do NOT invent timestamps, file paths, or data not in the phenomena\n"
f"- The report must be complete — do not cut off mid-section\n"
f"- The report must be the complete markdown — do not cut off mid-section.\n"
f"- Only include findings present in the evidence graph.\n"
f"- Do NOT invent timestamps, file paths, or data not in the phenomena.\n"
f"- The `content` argument can be 10K+ chars. JSON-escape inner quotes (\\\") and\n"
f" backslashes (\\\\) and newlines (\\n) correctly.\n"
)
def _register_tools(self) -> None:
@@ -182,10 +191,16 @@ class ReportAgent(BaseAgent):
return "\n".join(lines)
async def _save_report(self, content: str, output_path: str) -> str:
try:
os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
with open(output_path, "w") as f:
f.write(content)
return f"Report saved to {output_path} ({len(content)} chars)"
except Exception as e:
return f"Error saving report: {e}"
"""Save the report and return the content itself.
The content is returned (rather than a "saved to ..." status string)
so that when tool_call_loop short-circuits on this terminal tool,
`final_text` is the full markdown — orchestrator writes it to the
canonical report.md path under runs/<ts>/.
The output_path argument is kept for backward compat but the model's
chosen path is ignored — the orchestrator owns the persistence path.
"""
if not content:
return ""
return content

View File

@@ -1,14 +1,21 @@
"""Timeline Agent — correlates evidence across time."""
"""Timeline Agent — connects existing phenomena with temporal edges.
Operates on phenomena already in the graph. Does NOT investigate the disk
image itself. The agent's only useful output is the temporal edges it
creates between phenomena.
"""
from __future__ import annotations
import json
import logging
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
logger = logging.getLogger(__name__)
class TimelineAgent(BaseAgent):
name = "timeline"
@@ -17,29 +24,39 @@ class TimelineAgent(BaseAgent):
"MAC timestamps and correlate events across all phenomena categories in the "
"evidence graph to reconstruct the sequence of activities on the system."
)
mandatory_record_tools = ("add_temporal_edge",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools — Timeline does not add phenomena."""
self._register_graph_read_tools()
def _register_tools(self) -> None:
# Filesystem timeline tool from catalog
td = TOOL_CATALOG.get("build_filesystem_timeline")
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)
# Custom tool to get all phenomena with timestamps for correlation
self.register_tool(
name="get_timestamped_phenomena",
description="Get all phenomena that have timestamps, sorted chronologically. Use for timeline correlation.",
description=(
"Get all phenomena that have timestamps, sorted chronologically. "
"Returns each phenomenon's id, category, title, and a short description "
"preview. Use this as your primary input for temporal correlation."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_timestamped_phenomena,
)
# Tool to add temporal edges between phenomena
self.register_tool(
name="add_temporal_edge",
description="Add a temporal relationship between two phenomena (before, after, or concurrent).",
description=(
"Add a temporal relationship edge between two existing phenomena. "
"Use 'before' when source phenomenon happened before target, "
"'concurrent' when they occurred within seconds of each other."
),
input_schema={
"type": "object",
"properties": {
@@ -56,6 +73,42 @@ class TimelineAgent(BaseAgent):
executor=self._add_temporal_edge,
)
def _build_system_prompt(self, task: str) -> str:
"""Focused prompt — Timeline connects existing phenomena, doesn't investigate."""
return (
f"You are {self.name}, a forensic timeline correlation analyst.\n"
f"Role: {self.role}\n\n"
f"Image: {self.graph.image_path}\n"
f"Current state: {self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call build_filesystem_timeline once to materialize MAC times for the disk.\n"
f"2. Call get_timestamped_phenomena to see all phenomena with timestamps, "
f"sorted chronologically. THIS IS YOUR PRIMARY INPUT.\n"
f"3. For each meaningful temporal relationship between phenomena, call "
f"add_temporal_edge(source_id, target_id, relation). Use 'before' when "
f"source happened first (the common case); 'concurrent' for events within "
f"a few seconds of each other.\n"
f" Examples of meaningful connections:\n"
f" - 'Cain installer executed' (before) 'Cain.exe first execution'\n"
f" - 'WHOIS first lookup' (before) 'WHOIS second lookup'\n"
f" - 'Recon tool cluster' (before) 'Anti-forensics defrag'\n"
f" - 'Tool installation' (before) 'Tool execution'\n"
f"4. Aim for 15-40 temporal edges that connect the major events into a "
f"forensic story.\n"
f"5. STOP after recording all meaningful temporal edges. Do not call any more tools.\n\n"
f"STRICT BOUNDARIES:\n"
f"- Your job is to CONNECT existing phenomena, NOT to discover new ones. "
f"You CANNOT call add_phenomenon — the tool isn't yours.\n"
f"- Use ONLY phenomenon IDs returned by get_timestamped_phenomena or "
f"list_phenomena. NEVER fabricate IDs.\n"
f"- Connect events that tell a forensic story (recon -> exploit -> cover-up). "
f"Do not exhaustively pair every two phenomena; focus on causally-relevant "
f"sequences.\n"
f"- The orchestrator handles report writing in the next phase. Your only "
f"output that propagates is the temporal edges you create."
)
async def _get_timestamped_phenomena(self) -> str:
items = [
ph for ph in self.graph.phenomena.values()

View File

@@ -31,11 +31,26 @@ class BaseAgent:
name: str = "base"
role: str = "A forensic analysis agent."
# Tools the agent MUST invoke at least once for the run to count as productive.
# If none of these were called when tool_call_loop returns, run() fires a
# forced retry with an explicit "you forgot to record" instruction.
# Subclasses override to declare their own recording responsibility
# (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
# Tools whose invocation ends the run immediately. After any terminal tool
# is called, tool_call_loop returns with that tool's result text as
# final_text. Used by agents whose "completion" is a single explicit
# action rather than "model decides to stop calling tools". For multi-call
# agents (filesystem records many phenomena) leave empty.
terminal_tools: tuple[str, ...] = ()
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
self.llm = llm
self.graph = graph
self._tools: dict[str, dict] = {} # name -> schema
self._executors: dict[str, Any] = {} # name -> async callable
self._record_call_counts: dict[str, int] = {}
self._work_log: list[str] = []
self._current_lead_id: str | None = None
@@ -52,7 +67,18 @@ class BaseAgent:
"description": description,
"input_schema": input_schema,
}
self._executors[name] = executor
if name in self.mandatory_record_tools:
self._executors[name] = self._wrap_record_executor(name, executor)
else:
self._executors[name] = executor
def _wrap_record_executor(self, name: str, executor: Any) -> Any:
"""Wrap a mandatory-record executor to count successful invocations."""
async def wrapped(*args, **kwargs):
result = await executor(*args, **kwargs)
self._record_call_counts[name] = self._record_call_counts.get(name, 0) + 1
return result
return wrapped
def get_tool_definitions(self) -> list[dict]:
"""Get tool definitions in Claude API format."""
@@ -91,14 +117,21 @@ class BaseAgent:
f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
f" Then call link_to_entity for each relevant phenomenon.\n"
f" NEVER guess or fabricate a phenomenon ID. If an ID is not in list_phenomena output, it does not exist.\n\n"
f"Phase D — ANSWER:\n"
f" Only give your <answer> AFTER completing Phases B and C.\n\n"
f"IMPORTANT:\n"
f"- You MUST call add_phenomenon at least once before finishing\n"
f"- Complete each phase before starting the next\n"
f"- Other agents can ONLY see what you write to the graph\n"
f"- If you don't record findings, they are LOST\n"
f"- Include relevant file paths, inode numbers, timestamps, and raw data\n\n"
f"Phase D — STOP:\n"
f" Once all phenomena are recorded and entities linked, you are DONE.\n"
f" Do not call any more tools. The orchestrator picks up automatically.\n\n"
f"CRITICAL — RECORDING REQUIREMENT:\n"
f"- Only graph mutations propagate to other agents and the final report.\n"
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
f"a registry key) and found NOTHING, that absence IS evidence — call "
f"add_phenomenon with a 'No matches for X' title and the search scope in "
f"raw_data. Negative findings constrain the hypothesis space and prevent "
f"the next agent from wasting time re-searching.\n"
f"- If you stop without having called add_phenomenon at least once, the task "
f"is FAILED and a forced retry will fire.\n"
f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
f"that produced each finding.\n\n"
f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
f"- ONLY record findings that appear VERBATIM in tool results you received\n"
f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
@@ -116,6 +149,7 @@ class BaseAgent:
self._current_lead_id = lead_id
self._register_graph_tools()
self._record_call_counts.clear()
system = self._build_system_prompt(task)
messages = [{"role": "user", "content": task}]
@@ -124,12 +158,60 @@ class BaseAgent:
ph_before = len(self.graph.phenomena)
try:
final_text, _ = await self.llm.tool_call_loop(
final_text, conversation = await self.llm.tool_call_loop(
messages=messages,
tools=self.get_tool_definitions(),
tool_executor=self._executors,
system=system,
terminal_tools=self.terminal_tools,
)
# Forced-record retry: if the agent has any mandatory recording
# tools but never invoked any of them, force one more round with
# an explicit "you forgot to record" instruction. The mandatory
# set is declared on the class — Timeline → add_temporal_edge,
# Hypothesis → add_hypothesis, ReportAgent → (). For agents with
# empty mandatory_record_tools this branch is a no-op.
registered_mandatory = [
t for t in self.mandatory_record_tools if t in self._executors
]
recorded_any = any(
self._record_call_counts.get(t, 0) > 0
for t in registered_mandatory
)
if registered_mandatory and not recorded_any:
missing = "/".join(registered_mandatory)
logger.warning(
"[%s] finished without calling any of [%s] — forcing RECORD retry",
self.name, missing,
)
conversation.append({
"role": "user",
"content": (
f"STOP. You produced an answer without ever calling "
f"{missing}. Your answer is DISCARDED — only graph "
f"mutations propagate to other agents and the final "
f"report.\n\n"
f"You MUST now call {missing} for every significant "
f"finding from your prior investigation, including "
f"exact identifiers, timestamps, and the source_tool "
f"that produced each finding. If you genuinely found "
f"NOTHING noteworthy, call the recording tool ONCE "
f"with a 'No significant findings' style entry "
f"summarizing what you searched.\n\n"
f"Do not run more investigation tools. Just record "
f"what you already found. Then end."
),
})
final_text, _ = await self.llm.tool_call_loop(
messages=conversation,
tools=self.get_tool_definitions(),
tool_executor=self._executors,
system=system,
max_iterations=10,
terminal_tools=self.terminal_tools,
)
self._work_log.append(f"[Task: {task[:80]}] -> {final_text[:150]}")
except Exception:
self.graph.agent_status[self.name] = "failed"
@@ -145,9 +227,17 @@ class BaseAgent:
# ---- Graph interaction tools --------------------------------------------
def _register_graph_tools(self) -> None:
"""Register tools for querying and writing to the evidence graph."""
"""Register graph query + mutation tools.
# --- Read tools ---
Subclasses can override to restrict the toolset. For example, a
read-only agent (hypothesis, report) overrides this to skip
_register_graph_write_tools.
"""
self._register_graph_read_tools()
self._register_graph_write_tools()
def _register_graph_read_tools(self) -> None:
"""Register read-only graph + asset query tools."""
self.register_tool(
name="list_phenomena",
@@ -213,7 +303,49 @@ class BaseAgent:
executor=self._get_hypothesis_status,
)
# --- Write tools ---
self.register_tool(
name="list_assets",
description=(
"List all files extracted from the disk image. "
"Shows filename, category, size, local path, and inode. "
"Check this before calling extract_file to avoid re-extraction."
),
input_schema={
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": [
"registry_hive", "chat_log", "prefetch", "network_capture",
"config_file", "address_book", "recycle_bin", "executable",
"text_log", "other",
],
"description": "Filter by category. Omit to list all.",
},
},
},
executor=self._list_assets,
)
self.register_tool(
name="find_extracted_file",
description=(
"Find an already-extracted file by inode or filename. "
"Returns the local path so you can use it directly with "
"parse_registry_key, read_text_file, etc. without re-extracting."
),
input_schema={
"type": "object",
"properties": {
"inode": {"type": "string", "description": "Inode to look up."},
"filename": {"type": "string", "description": "Filename or partial name to search."},
},
},
executor=self._find_extracted_file,
)
def _register_graph_write_tools(self) -> None:
"""Register graph mutation tools (add_phenomenon, add_lead, link_to_entity)."""
self.register_tool(
name="add_phenomenon",
@@ -282,49 +414,6 @@ class BaseAgent:
executor=self._link_to_entity,
)
# --- Asset library tools ---
self.register_tool(
name="list_assets",
description=(
"List all files extracted from the disk image. "
"Shows filename, category, size, local path, and inode. "
"Check this before calling extract_file to avoid re-extraction."
),
input_schema={
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": [
"registry_hive", "chat_log", "prefetch", "network_capture",
"config_file", "address_book", "recycle_bin", "executable",
"text_log", "other",
],
"description": "Filter by category. Omit to list all.",
},
},
},
executor=self._list_assets,
)
self.register_tool(
name="find_extracted_file",
description=(
"Find an already-extracted file by inode or filename. "
"Returns the local path so you can use it directly with "
"parse_registry_key, read_text_file, etc. without re-extracting."
),
input_schema={
"type": "object",
"properties": {
"inode": {"type": "string", "description": "Inode to look up."},
"filename": {"type": "string", "description": "Filename or partial name to search."},
},
},
executor=self._find_extracted_file,
)
# ---- Tool executors -----------------------------------------------------
async def _list_phenomena(self, category: str | None = None) -> str:

View File

@@ -197,6 +197,41 @@ class Lead:
return cls(**d)
@dataclass
class InvestigationArea:
"""An area to investigate to confirm/refute one or more hypotheses.
Derived by the orchestrator from active hypotheses after Phase 2; also
seeded from config.yaml:investigation_areas as an optional manual
override. Each area carries its own keywords + expected tools so the
gap-analysis coverage check is generic, not tied to hard-coded constants.
"""
id: str # "area-{slug}"
area: str # snake_case slug (dedupe key)
description: str
suggested_agent: str # filesystem / registry / communication / network / timeline
expected_keywords: list[str] = field(default_factory=list)
expected_tools: list[str] = field(default_factory=list)
priority: int = 5 # 1 (highest) - 10 (lowest)
motivating_hypothesis_ids: list[str] = field(default_factory=list)
created_by: str = "" # "manual" | "llm_derive" | "fallback"
created_at: str = ""
def to_dict(self) -> dict:
return asdict(self)
@classmethod
def from_dict(cls, d: dict) -> InvestigationArea:
return cls(**d)
def summary(self) -> str:
return (
f"[{self.area}] P{self.priority} agent={self.suggested_agent} "
f"(motivating: {len(self.motivating_hypothesis_ids)})"
)
@dataclass
class ExtractedAsset:
"""A file extracted from the disk image and tracked in the asset library."""
@@ -270,6 +305,11 @@ class EvidenceGraph:
self.asset_library: dict[str, ExtractedAsset] = {}
self._inode_index: dict[str, str] = {} # inode → asset_id
# Investigation areas — derived from hypotheses (LLM) and/or seeded
# from config.yaml:investigation_areas (manual override). Drives the
# gap-analysis coverage check.
self.investigation_areas: dict[str, InvestigationArea] = {}
# Set by BaseAgent.run() before each agent execution
self._current_agent: str = ""
@@ -295,6 +335,9 @@ class EvidenceGraph:
"leads": [l.to_dict() for l in self.leads],
"agent_status": dict(self.agent_status),
"asset_library": {aid: a.to_dict() for aid, a in self.asset_library.items()},
"investigation_areas": {
aid: a.to_dict() for aid, a in self.investigation_areas.items()
},
"saved_at": datetime.now().isoformat(),
}
tmp = self._persist_path.with_suffix(".tmp")
@@ -345,6 +388,10 @@ class EvidenceGraph:
asset = ExtractedAsset.from_dict(a_data)
graph.asset_library[aid] = asset
graph._inode_index[asset.inode] = aid
graph.investigation_areas = {
aid: InvestigationArea.from_dict(a)
for aid, a in data.get("investigation_areas", {}).items()
}
graph._rebuild_adjacency()
logger.info(
"EvidenceGraph restored: %d phenomena, %d hypotheses, %d entities, "
@@ -656,6 +703,57 @@ class EvidenceGraph:
break
self._auto_save()
# ---- Investigation areas -------------------------------------------------
async def add_investigation_area(
self,
area: str,
description: str,
suggested_agent: str,
expected_keywords: list[str] | None = None,
expected_tools: list[str] | None = None,
priority: int = 5,
motivating_hypothesis_ids: list[str] | None = None,
created_by: str = "",
) -> tuple[str, bool]:
"""Add or merge an investigation area. Dedupe key is the `area` slug.
On collision, union the three list fields (keywords / tools /
motivating_hypothesis_ids); description / suggested_agent / priority
are preserved from the first writer (manual seed wins over LLM derive).
Returns (id, was_existing).
"""
async with self._lock:
for existing in self.investigation_areas.values():
if existing.area == area:
for kw in (expected_keywords or []):
if kw not in existing.expected_keywords:
existing.expected_keywords.append(kw)
for t in (expected_tools or []):
if t not in existing.expected_tools:
existing.expected_tools.append(t)
for hid in (motivating_hypothesis_ids or []):
if hid not in existing.motivating_hypothesis_ids:
existing.motivating_hypothesis_ids.append(hid)
self._auto_save()
return existing.id, True
aid = f"area-{area}"
self.investigation_areas[aid] = InvestigationArea(
id=aid,
area=area,
description=description,
suggested_agent=suggested_agent,
expected_keywords=list(expected_keywords or []),
expected_tools=list(expected_tools or []),
priority=priority,
motivating_hypothesis_ids=list(motivating_hypothesis_ids or []),
created_by=created_by,
created_at=datetime.now().isoformat(),
)
self._auto_save()
return aid, False
# ---- Asset library -------------------------------------------------------
async def register_asset(

View File

@@ -1,8 +1,10 @@
"""Custom LLM client using httpx for Claude Messages API via third-party proxy.
"""LLM client via the OpenAI SDK (works with DeepSeek's OpenAI-compatible API).
The proxy does not support Claude's native tool_use format (it strips the `tools`
field from requests). So we embed tool definitions in the system prompt and parse
structured JSON tool calls from the model's text output (ReAct-style).
Tool calling uses the OpenAI-native `tools=[...]` parameter. The model
returns structured tool_calls via the streaming protocol; we accumulate
them, dispatch to our executors, and feed results back as `role: "tool"`
messages. This eliminates the fragile "model writes JSON inside free
text" problem of the previous ReAct text mode.
"""
from __future__ import annotations
@@ -18,6 +20,7 @@ from dataclasses import dataclass, field
from typing import Any
import httpx
from openai import APIConnectionError, APIError, APITimeoutError, AsyncOpenAI
logger = logging.getLogger(__name__)
@@ -30,69 +33,81 @@ class LLMAPIError(Exception):
self.attempts = attempts
# Markers the model uses to signal tool calls and final answers
TOOL_CALL_TAG = "<tool_call>"
TOOL_CALL_END = "</tool_call>"
TOOL_RESULT_TAG = "<tool_result>"
TOOL_RESULT_END = "</tool_result>"
# Optional answer tags — kept for backward compat with prompts that wrap
# their final response in <answer>...</answer>. Native tool calling does
# not need these (no tool_calls = final), but if the model continues to
# emit them, we strip the tags so callers see clean text.
ANSWER_TAG = "<answer>"
ANSWER_END = "</answer>"
def _build_tools_prompt(tools: list[dict]) -> str:
"""Format tool definitions for inclusion in the system prompt."""
lines = ["You have access to the following tools:\n"]
for t in tools:
schema = t.get("input_schema", {})
props = schema.get("properties", {})
required = schema.get("required", [])
def _to_openai_tools(tools: list[dict]) -> list[dict]:
"""Convert internal tool definitions to OpenAI native function-tools format."""
return [
{
"type": "function",
"function": {
"name": t["name"],
"description": t["description"],
"parameters": t.get("input_schema", {"type": "object", "properties": {}}),
},
}
for t in tools
]
params = []
for pname, pdef in props.items():
req = " (required)" if pname in required else ""
desc = pdef.get("description", "")
ptype = pdef.get("type", "string")
enum_vals = pdef.get("enum")
if enum_vals:
allowed = ", ".join(f'"{v}"' for v in enum_vals)
params.append(f" - {pname}: {ptype}{req}{desc} Allowed values: [{allowed}]")
else:
params.append(f" - {pname}: {ptype}{req}{desc}")
param_block = "\n".join(params) if params else " (no parameters)"
lines.append(f"## {t['name']}\n{t['description']}\nParameters:\n{param_block}\n")
def _extract_first_balanced(text: str, open_char: str, close_char: str) -> str | None:
"""Return the first balanced [...] or {...} substring, or None if no balanced pair.
lines.append(
"## How to use tools\n"
"To call a tool, output a JSON block wrapped in XML tags like this:\n"
f"{TOOL_CALL_TAG}\n"
'{"name": "tool_name", "arguments": {"param1": "value1"}}\n'
f"{TOOL_CALL_END}\n\n"
"You can call multiple tools in sequence. After each tool call, you will receive the result in:\n"
f"{TOOL_RESULT_TAG}\n...result...\n{TOOL_RESULT_END}\n\n"
"When you have finished your analysis and have a final answer, wrap it in:\n"
f"{ANSWER_TAG}\nyour final answer here\n{ANSWER_END}\n\n"
"Think step by step. Call tools to gather evidence before drawing conclusions.\n"
"You MUST call at least one tool before giving your final answer."
Stack-based — handles nested brackets correctly (regex with .*? would
truncate at the first inner closing bracket, regex with .* would over-eat
trailing text). Brackets inside JSON string literals are ignored by
callers because the caller passes the result through json.loads which
re-parses with proper string handling.
"""
start = text.find(open_char)
if start < 0:
return None
depth = 0
for i in range(start, len(text)):
c = text[i]
if c == open_char:
depth += 1
elif c == close_char:
depth -= 1
if depth == 0:
return text[start:i + 1]
return None
def _safe_json_loads(text: str):
"""Parse JSON with progressive sanitization for LLM-produced output.
Tries (0) as-is, (1) escape stray backslashes outside valid JSON escapes
(\\" \\\\ \\/ \\b \\f \\n \\r \\t \\uXXXX). On final failure, logs raw
input (first 600 chars) so we can diagnose what the model emitted.
Used by orchestrator JSON callsites (_call_llm_for_json) and by
tool_call_loop when parsing tool_call arguments returned by the API.
"""
try:
return json.loads(text)
except json.JSONDecodeError:
pass
stage1 = re.sub(
r'\\(?!["\\/bfnrt]|u[0-9a-fA-F]{4})',
r'\\\\',
text,
)
return "\n".join(lines)
def _extract_tool_calls(text: str) -> list[dict]:
"""Extract tool call JSON blocks from model output."""
pattern = re.compile(
re.escape(TOOL_CALL_TAG) + r"\s*(.*?)\s*" + re.escape(TOOL_CALL_END),
re.DOTALL,
)
calls = []
for match in pattern.finditer(text):
raw = match.group(1).strip()
try:
parsed = json.loads(raw)
calls.append(parsed)
except json.JSONDecodeError:
logger.warning("Failed to parse tool call JSON: %s", raw[:200])
return calls
try:
return json.loads(stage1)
except json.JSONDecodeError as e:
logger.warning(
"_safe_json_loads failed after sanitize (%s); raw head[:600]=%r",
e, text[:600],
)
raise
def _extract_answer(text: str) -> str | None:
@@ -234,50 +249,41 @@ _DECAY_TIERS: list[tuple[int, int]] = [
def _apply_progressive_decay(messages: list[dict]) -> list[dict]:
"""Truncate tool results in older messages to save context space.
"""Truncate the `content` of older `role: "tool"` messages to save context.
Operates in-place-style on a copy. Only touches user messages that
contain <tool_result> blocks (these are the tool-result messages
generated by tool_call_loop).
Each `role: "tool"` message in the conversation corresponds to one tool
call's result. We rank these messages by recency and progressively
truncate older ones according to `_DECAY_TIERS`.
"""
# Count rounds from the end. A "round" is a (assistant, user) pair.
# messages alternate: [user, assistant, user, assistant, user, ...]
# The initial user message is index 0, then pairs start at index 1.
total = len(messages)
if total <= 10: # not enough messages to bother
if total <= 10:
return messages
result = []
# Count tool-result user messages from the end
tool_result_indices = [
i for i, m in enumerate(messages)
if m["role"] == "user" and TOOL_RESULT_TAG in m.get("content", "")
tool_msg_indices = [
i for i, m in enumerate(messages) if m.get("role") == "tool"
]
# Build a set of indices that need decay, mapped to their max_chars
decay_map: dict[int, int] = {}
n_tool_msgs = len(tool_result_indices)
for rank, idx in enumerate(reversed(tool_result_indices)):
rounds_ago = rank # 0 = most recent, 1 = second most recent, ...
for rank, idx in enumerate(reversed(tool_msg_indices)):
rounds_ago = rank
for threshold, max_chars in _DECAY_TIERS:
if rounds_ago < threshold:
decay_map[idx] = max_chars
break
result = []
for i, msg in enumerate(messages):
if i in decay_map:
max_chars = decay_map[i]
content = msg["content"]
content = msg.get("content", "") or ""
if len(content) > max_chars + 200:
# Truncate but preserve the tool_result tags structure
truncated = content[:max_chars]
# Count how many tool results are in this message
n_results = content.count(TOOL_RESULT_TAG)
truncated += (
f"\n... [context compressed: {len(content)} -> {max_chars} chars, "
f"{n_results} tool result(s)]"
truncated = (
content[:max_chars]
+ f"\n... [context compressed: {len(content)} -> {max_chars} chars]"
)
result.append({"role": msg["role"], "content": truncated})
new_msg = dict(msg)
new_msg["content"] = truncated
result.append(new_msg)
else:
result.append(msg)
else:
@@ -301,44 +307,51 @@ _FOLD_SUMMARY_SYSTEM = (
class LLMClient:
"""Calls Claude Messages API through a third-party proxy using raw httpx.
"""Async LLM client via the OpenAI SDK.
Uses prompt-based tool calling (ReAct pattern) since the proxy does not
support Claude's native tool_use format.
Works with any OpenAI-compatible endpoint (OpenAI, DeepSeek, ...).
Tool calling is text-based (ReAct) — see module docstring.
"""
def __init__(
self,
base_url: str,
api_key: str,
model: str = "claude-sonnet-4-6",
model: str = "deepseek-v4-pro",
max_tokens: int = 4096,
proxy: str | None = "auto",
reasoning_effort: str | None = None,
thinking_enabled: bool = False,
) -> None:
self.base_url = base_url.rstrip("/")
self.api_key = api_key
self.model = model
self.max_tokens = max_tokens
# proxy="auto": read from env; proxy=None/""/"none": no proxy; proxy="http://...": use it
self.reasoning_effort = reasoning_effort
self.thinking_enabled = thinking_enabled
# proxy="auto": read from env; proxy=None/""/"none": no proxy
if proxy == "auto":
proxy_url = os.environ.get("https_proxy") or os.environ.get("HTTPS_PROXY")
elif proxy and proxy.lower() != "none":
proxy_url = proxy
else:
proxy_url = None
self._client = httpx.AsyncClient(
http_client = (
httpx.AsyncClient(proxy=proxy_url, timeout=300.0)
if proxy_url else None
)
self._client = AsyncOpenAI(
api_key=self.api_key,
base_url=self.base_url,
headers={
"x-api-key": self.api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
timeout=300.0,
proxy=proxy_url,
http_client=http_client,
)
async def close(self) -> None:
await self._client.aclose()
await self._client.close()
async def chat(
self,
@@ -346,81 +359,144 @@ class LLMClient:
system: str | None = None,
max_retries: int = 5,
) -> str:
"""Send a streaming chat request and return the assembled text response.
"""Send a streaming chat completion and return the assembled text."""
full_messages: list[dict] = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
Uses SSE streaming to keep the connection alive and avoid gateway
timeouts (504/524) on long-running completions.
"""
import asyncio as _asyncio
payload: dict[str, Any] = {
kwargs: dict[str, Any] = {
"model": self.model,
"messages": full_messages,
"max_tokens": self.max_tokens,
"messages": messages,
"stream": True,
}
if system:
payload["system"] = system
if self.reasoning_effort:
kwargs["reasoning_effort"] = self.reasoning_effort
if self.thinking_enabled:
kwargs["extra_body"] = {"thinking": {"type": "enabled"}}
for attempt in range(max_retries):
logger.debug("LLM request (stream): %d messages (attempt %d)", len(messages), attempt + 1)
logger.debug(
"LLM request (stream): %d messages (attempt %d)",
len(messages), attempt + 1,
)
text_parts: list[str] = []
try:
async with self._client.stream(
"POST", "/v1/messages", json=payload,
) as resp:
# Check for HTTP errors before consuming stream
if resp.status_code >= 400:
body = await resp.aread()
raise httpx.HTTPStatusError(
f"Server error '{resp.status_code}' for url '{resp.url}'",
request=resp.request,
response=resp,
)
# Parse SSE events
async for line in resp.aiter_lines():
if not line.startswith("data: "):
continue
data_str = line[6:] # strip "data: " prefix
if data_str.strip() == "[DONE]":
break
try:
event = json.loads(data_str)
except json.JSONDecodeError:
continue
event_type = event.get("type", "")
if event_type == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
text_parts.append(delta["text"])
elif event_type == "message_stop":
break
elif event_type == "error":
err_msg = event.get("error", {}).get("message", "Unknown streaming error")
raise httpx.HTTPStatusError(
err_msg, request=resp.request, response=resp,
)
stream = await self._client.chat.completions.create(**kwargs)
async for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if delta.content:
text_parts.append(delta.content)
text = "".join(text_parts)
logger.debug("LLM response (stream): %d chars", len(text))
return text
except (httpx.HTTPStatusError, httpx.ConnectError, httpx.ReadTimeout, httpx.RemoteProtocolError) as e:
except (APIConnectionError, APITimeoutError, APIError) as e:
if attempt < max_retries - 1:
wait = 2 ** attempt * 10
logger.warning("Request failed (%s), retrying in %ds...", e, wait)
await _asyncio.sleep(wait)
await asyncio.sleep(wait)
else:
raise LLMAPIError(
f"LLM API unreachable after {max_retries} attempts: {e}",
attempts=max_retries,
) from e
# Should not reach here, but just in case
return ""
async def _chat_with_tools(
self,
messages: list[dict],
openai_tools: list[dict],
max_retries: int = 5,
) -> tuple[str, str | None, list[dict]]:
"""Stream a chat completion with native tool calling enabled.
Returns:
(text_content, reasoning_content, raw_tool_calls).
- reasoning_content is non-None when DeepSeek thinking mode is
active; the caller MUST echo it back in the assistant message
on subsequent requests, or the API returns HTTP 400.
- raw_tool_calls is a list of {"id","name","arguments"} dicts;
arguments is the raw JSON string returned by the API.
"""
kwargs: dict[str, Any] = {
"model": self.model,
"messages": messages,
"max_tokens": self.max_tokens,
"stream": True,
"tools": openai_tools,
}
if self.reasoning_effort:
kwargs["reasoning_effort"] = self.reasoning_effort
if self.thinking_enabled:
kwargs["extra_body"] = {"thinking": {"type": "enabled"}}
for attempt in range(max_retries):
logger.debug(
"LLM request (stream+tools): %d messages, %d tools (attempt %d)",
len(messages), len(openai_tools), attempt + 1,
)
text_parts: list[str] = []
reasoning_parts: list[str] = []
tool_calls_acc: dict[int, dict] = {} # index -> {id, name, arguments}
try:
stream = await self._client.chat.completions.create(**kwargs)
async for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if delta.content:
text_parts.append(delta.content)
# DeepSeek thinking-mode: reasoning_content is returned
# alongside content and MUST be echoed back on subsequent
# requests, otherwise the API rejects with HTTP 400.
rc = getattr(delta, "reasoning_content", None)
if rc:
reasoning_parts.append(rc)
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
entry = tool_calls_acc.setdefault(
idx, {"id": None, "name": None, "arguments": ""},
)
if tc_delta.id:
entry["id"] = tc_delta.id
fn = tc_delta.function
if fn:
if fn.name:
entry["name"] = fn.name
if fn.arguments:
entry["arguments"] += fn.arguments
text = "".join(text_parts)
reasoning = "".join(reasoning_parts) or None
ordered = [tool_calls_acc[i] for i in sorted(tool_calls_acc)]
logger.debug(
"LLM response (stream+tools): %d chars, %d reasoning chars, %d tool calls",
len(text), len(reasoning or ""), len(ordered),
)
return text, reasoning, ordered
except (APIConnectionError, APITimeoutError, APIError) as e:
if attempt < max_retries - 1:
wait = 2 ** attempt * 10
logger.warning(
"Tool-call request failed (%s), retrying in %ds...", e, wait,
)
await asyncio.sleep(wait)
else:
raise LLMAPIError(
f"LLM API unreachable after {max_retries} attempts: {e}",
attempts=max_retries,
) from e
return "", None, []
async def tool_call_loop(
self,
messages: list[dict],
@@ -428,87 +504,159 @@ class LLMClient:
tool_executor: dict[str, Any],
system: str | None = None,
max_iterations: int = 40,
terminal_tools: tuple[str, ...] = (),
) -> tuple[str, list[dict]]:
"""Run a ReAct-style tool-calling loop.
"""Run a tool-calling loop using OpenAI-native tool calls.
The model outputs <tool_call> blocks which we parse and execute,
feeding results back as <tool_result> blocks until the model
outputs an <answer> block.
The model returns structured `tool_calls` in its message; we
dispatch them through our executor dict and feed each result back
as a `role: "tool"` message with the matching `tool_call_id`. The
loop ends when:
- the model returns a message with no tool_calls (normal exit), or
- any tool in `terminal_tools` is called — in that case, the loop
short-circuits with that tool's result text as final_text. This
gives agents (notably ReportAgent) an explicit completion signal
that the old `<answer>` text tag used to provide.
Returns:
(final_text, all_messages)
(final_text, full_message_history)
"""
# Build system prompt with tool definitions
tools_prompt = _build_tools_prompt(tools)
full_system = f"{system}\n\n{tools_prompt}" if system else tools_prompt
terminal_set = set(terminal_tools)
openai_tools = _to_openai_tools(tools)
messages = list(messages) # don't mutate caller's list
_folded = False # Track whether we've already folded once this loop
# The caller may pass `messages` either as raw conversation (no system)
# together with `system=...`, OR as a complete history that already
# starts with the system message (retry path). Accept both shapes.
if messages and messages[0].get("role") == "system":
full_messages: list[dict] = list(messages)
else:
full_messages = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
_folded = False
for i in range(max_iterations):
for _i in range(max_iterations):
# ── Context compression before each API call ──────────────
# Stage A: progressively decay old tool results
messages = _apply_progressive_decay(messages)
# Stage B: fold oldest messages into LLM summary if too long
if not _folded and len(messages) > _FOLD_THRESHOLD:
messages = await self._fold_old_messages(messages, full_system)
full_messages = _apply_progressive_decay(full_messages)
if not _folded and len(full_messages) > _FOLD_THRESHOLD:
full_messages = await self._fold_old_messages(full_messages)
_folded = True
elif _folded and len(messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
# Allow a second fold if messages grew back significantly
messages = await self._fold_old_messages(messages, full_system)
elif _folded and len(full_messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
full_messages = await self._fold_old_messages(full_messages)
text = await self.chat(messages, system=full_system)
text, reasoning, raw_tool_calls = await self._chat_with_tools(
full_messages, openai_tools,
)
# Check for final answer
answer = _extract_answer(text)
if answer is not None:
messages.append({"role": "assistant", "content": text})
return answer, messages
if not raw_tool_calls:
# Model produced a final response. Strip optional <answer>
# tags for backward compatibility with old prompts.
final_msg: dict[str, Any] = {"role": "assistant", "content": text}
if reasoning:
final_msg["reasoning_content"] = reasoning
full_messages.append(final_msg)
answer = _extract_answer(text)
return (answer if answer is not None else text), full_messages
# Check for tool calls
tool_calls = _extract_tool_calls(text)
# Parse arguments + build internal call dicts
parsed_calls: list[dict] = []
for rc in raw_tool_calls:
args_str = rc.get("arguments", "") or ""
try:
args = _safe_json_loads(args_str) if args_str.strip() else {}
except (json.JSONDecodeError, ValueError) as e:
logger.warning(
"Failed to parse arguments for tool %s: %s",
rc.get("name"), e,
)
args = {}
parsed_calls.append({
"id": rc.get("id"),
"name": rc.get("name", ""),
"arguments": args,
})
if not tool_calls:
# No tool calls and no answer tag — treat entire text as answer
messages.append({"role": "assistant", "content": text})
return text, messages
# Append the assistant turn with the raw tool_calls (and the
# DeepSeek-mandated reasoning_content echo-back), then execute.
asst_msg: dict[str, Any] = {
"role": "assistant",
"content": text or None,
"tool_calls": [
{
"id": rc.get("id"),
"type": "function",
"function": {
"name": rc.get("name", ""),
"arguments": rc.get("arguments", "") or "",
},
}
for rc in raw_tool_calls
],
}
if reasoning:
asst_msg["reasoning_content"] = reasoning
full_messages.append(asst_msg)
# Execute tool calls — read-only tools run in parallel
messages.append({"role": "assistant", "content": text})
result_parts = []
batches = _partition_tool_calls(tool_calls)
batches = _partition_tool_calls(parsed_calls)
t_batch_start = time.monotonic()
# Each entry: (tool_call_dict, raw_result, formatted_for_llm)
executed: list[tuple[dict, str, str]] = []
for batch in batches:
if batch.is_read_only and len(batch.calls) > 1:
batch_results = await self._execute_tool_batch_parallel(
results = await self._execute_tool_batch_parallel(
batch.calls, tool_executor, tools,
)
result_parts.extend(batch_results)
for tc, (raw, formatted) in zip(batch.calls, results):
executed.append((tc, raw, formatted))
else:
for tc in batch.calls:
result_parts.append(
await self._execute_single_tool(tc, tool_executor, tools)
raw, formatted = await self._execute_single_tool(
tc, tool_executor, tools,
)
executed.append((tc, raw, formatted))
# Emit folded tool-call summary for the terminal
t_batch_elapsed = time.monotonic() - t_batch_start
_emit_tool_call_summary(tool_calls, t_batch_elapsed)
_emit_tool_call_summary(parsed_calls, t_batch_elapsed)
# Feed results back as a user message
result_message = "\n\n".join(result_parts)
messages.append({"role": "user", "content": result_message})
# Append formatted tool results to the conversation (this is
# what the LLM sees on subsequent rounds — truncated for context
# economy).
for tc, _raw, formatted in executed:
full_messages.append({
"role": "tool",
"tool_call_id": tc["id"],
"content": formatted,
})
# Terminal-tool short-circuit: if the model called any tool in
# `terminal_tools`, end the loop immediately. The terminal tool's
# RAW result (untruncated) becomes final_text — the LLM may have
# produced a 20K-char report via save_report and we must not
# truncate it just because the LLM-facing copy is truncated.
if terminal_set:
for tc, raw, _formatted in executed:
name = tc.get("name", "")
if name in terminal_set:
logger.info(
"Terminal tool %s called — exiting tool_call_loop", name,
)
return raw, full_messages
logger.warning("Tool call loop hit max iterations (%d)", max_iterations)
return "[Max tool call iterations reached]", messages
return "[Max tool call iterations reached]", full_messages
async def _execute_single_tool(
self, tc: dict, tool_executor: dict[str, Any],
tools: list[dict] | None = None,
) -> str:
"""Execute a single tool call and return the formatted result."""
) -> tuple[str, str]:
"""Execute a single tool call.
Returns (raw_result, formatted_for_llm). `raw_result` is the
unmodified executor return (used by terminal-tool short-circuit as
final_text). `formatted_for_llm` is `[tool_name] {truncated}` and
is what gets fed back to the model as the tool message content.
"""
tool_name = tc.get("name", "")
tool_args = tc.get("arguments", {})
@@ -519,72 +667,106 @@ class LLMClient:
executor = tool_executor.get(tool_name)
if executor is None:
result_text = f"Error: unknown tool '{tool_name}'"
raw = f"Error: unknown tool '{tool_name}'"
else:
try:
result_text = await executor(**tool_args)
raw = await executor(**tool_args)
except Exception as e:
logger.error("Tool %s failed: %s", tool_name, e)
result_text = f"Error executing {tool_name}: {e}"
raw = f"Error executing {tool_name}: {e}"
return (
f"{TOOL_RESULT_TAG}\n"
f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
f"{TOOL_RESULT_END}"
)
formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
return raw, formatted
async def _execute_tool_batch_parallel(
self, calls: list[dict], tool_executor: dict[str, Any],
tools: list[dict] | None = None,
) -> list[str]:
"""Execute multiple read-only tool calls concurrently."""
) -> list[tuple[str, str]]:
"""Execute multiple read-only tool calls concurrently.
Returns a list of (raw_result, formatted_for_llm) tuples in the
same order as `calls`.
"""
logger.info("Executing %d read-only tools in parallel", len(calls))
async def _run_one(tc: dict) -> str:
async def _run_one(tc: dict) -> tuple[str, str]:
tool_name = tc.get("name", "")
tool_args = tc.get("arguments", {})
if tools:
tool_args = _fix_tool_args(tool_name, tool_args, tools)
logger.info("Calling tool (parallel): %s(%s)", tool_name, json.dumps(tool_args, ensure_ascii=False))
logger.info(
"Calling tool (parallel): %s(%s)",
tool_name, json.dumps(tool_args, ensure_ascii=False),
)
executor = tool_executor.get(tool_name)
if executor is None:
result_text = f"Error: unknown tool '{tool_name}'"
raw = f"Error: unknown tool '{tool_name}'"
else:
try:
result_text = await executor(**tool_args)
raw = await executor(**tool_args)
except Exception as e:
logger.error("Tool %s failed: %s", tool_name, e)
result_text = f"Error executing {tool_name}: {e}"
return (
f"{TOOL_RESULT_TAG}\n"
f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
f"{TOOL_RESULT_END}"
)
raw = f"Error executing {tool_name}: {e}"
formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
return raw, formatted
results = await asyncio.gather(*[_run_one(tc) for tc in calls])
return list(results)
async def _fold_old_messages(
self, messages: list[dict], system: str,
self, messages: list[dict],
) -> list[dict]:
"""Fold old messages into an LLM-generated summary (Stage B).
Keeps the most recent _FOLD_KEEP_RECENT messages intact and
replaces earlier ones with a single summary message.
Preserves the leading system message (if any), keeps the most
recent _FOLD_KEEP_RECENT messages intact, and replaces the older
middle slice with a single summary user message.
"""
n_to_fold = len(messages) - _FOLD_KEEP_RECENT
# Pin the system message — it must NEVER be summarized away.
system_msgs: list[dict] = []
body = messages
if messages and messages[0].get("role") == "system":
system_msgs = [messages[0]]
body = messages[1:]
n_to_fold = len(body) - _FOLD_KEEP_RECENT
if n_to_fold <= 2:
return messages
old_messages = messages[:n_to_fold]
recent_messages = messages[n_to_fold:]
# Pull the fold boundary forward so we never split an assistant turn
# from its matching tool results. The API rejects (HTTP 400) any
# `role: "tool"` message that does not immediately follow an
# `assistant` message with `tool_calls`. We walk the boundary into
# `recent_messages` while its head is a `role: "tool"` message, or
# while the prior `recent` message is `assistant{tool_calls}` whose
# paired tools span the boundary.
while n_to_fold < len(body):
head = body[n_to_fold]
if head.get("role") == "tool":
n_to_fold += 1
continue
break
if n_to_fold >= len(body):
# Everything got folded — nothing recent to keep.
return system_msgs + [body[0]] if system_msgs else messages
old_messages = body[:n_to_fold]
recent_messages = body[n_to_fold:]
# Build a text dump of old messages for summarization
old_text_parts = []
for msg in old_messages:
role = msg["role"]
content = msg.get("content", "")
# Truncate each message for the summary prompt to avoid overload
role = msg.get("role", "?")
content = msg.get("content") or ""
# Render tool_calls (assistant turn) compactly.
if role == "assistant" and msg.get("tool_calls"):
tc_names = [
tc.get("function", {}).get("name", "?")
for tc in msg["tool_calls"]
]
content = (content + " " if content else "") + (
"called: " + ", ".join(tc_names)
)
if len(content) > 1000:
content = content[:1000] + "..."
old_text_parts.append(f"[{role}]: {content}")
@@ -608,7 +790,6 @@ class LLMClient:
logger.warning("Context folding failed: %s — keeping original messages", e)
return messages
# Replace old messages with a single summary
summary_message = {
"role": "user",
"content": (
@@ -616,4 +797,4 @@ class LLMClient:
f"messages in this conversation]\n\n{summary}"
),
}
return [summary_message] + recent_messages
return system_msgs + [summary_message] + recent_messages

View File

@@ -219,6 +219,8 @@ async def async_main() -> None:
model=agent_cfg["model"],
max_tokens=agent_cfg.get("max_tokens", 4096),
proxy=agent_cfg.get("proxy", "auto"),
reasoning_effort=agent_cfg.get("reasoning_effort"),
thinking_enabled=agent_cfg.get("thinking_enabled", False),
)
# Initialize evidence graph

View File

@@ -12,7 +12,8 @@ from pathlib import Path
from agent_factory import AgentFactory
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads
from tool_registry import TOOL_CATALOG
logger = logging.getLogger(__name__)
@@ -93,6 +94,14 @@ class Orchestrator:
"Omit phenomena that are unrelated. Be conservative — only link genuinely relevant evidence."
)
_AREA_DERIVE_SYSTEM = (
"You are a forensic investigation strategist. Given a set of hypotheses, "
"decompose them into a minimal aggregate set of investigation areas. An "
"area is a focused, concrete question with the keywords and tool names an "
"answering phenomenon would mention. Aggregate aggressively — when two "
"hypotheses share an area, emit it once and list both hypothesis_ids."
)
def __init__(
self,
llm: LLMClient,
@@ -216,6 +225,170 @@ class Orchestrator:
"The ultimate goal is to reconstruct a detailed timeline of what happened on this host."
)
# ---- Investigation areas (manual seed + LLM derive) ----------------------
_VALID_AGENT_TYPES = {"filesystem", "registry", "communication", "network", "timeline"}
async def _seed_manual_investigation_areas(self) -> None:
"""Import config.yaml:investigation_areas entries (manual override).
Run early in Phase 2 so manual entries are in the graph before LLM
derivation; LLM derive then augments via slug-based dedupe.
"""
for entry in self.config.get("investigation_areas", []):
area = entry.get("area")
if not area:
continue
await self.graph.add_investigation_area(
area=area,
description=entry.get("description", entry.get("task", "")),
suggested_agent=entry.get("agent", "filesystem"),
expected_keywords=entry.get("keywords", []),
expected_tools=entry.get("tools", []),
priority=entry.get("priority", 3),
created_by="manual",
)
async def _derive_investigation_areas(self) -> None:
"""Ask LLM to derive investigation areas from active hypotheses.
Manual-seeded or already-populated graph (resume) → no-op. On LLM
failure or empty output, falls back to one-area-per-hypothesis.
"""
if self.graph.investigation_areas:
return
active = [h for h in self.graph.hypotheses.values() if h.status == "active"]
if not active:
return
available_tools = sorted(TOOL_CATALOG.keys())
hyp_lines = "\n".join(
f" [{h.id}] {h.title}: {h.description}" for h in active
)
prompt = (
f"Active hypotheses:\n{hyp_lines}\n\n"
f"Available agents: {sorted(self._VALID_AGENT_TYPES)}\n"
f"Available tool names (pick 1-3 per area for expected_tools): {available_tools}\n\n"
f"Emit 5-12 distinct investigation areas covering the FULL hypothesis set.\n"
f"Each area must include:\n"
f" - area: snake_case slug (dedupe key)\n"
f" - description: one sentence on what to find\n"
f" - suggested_agent: one of the agents above\n"
f" - expected_keywords: 3-8 lowercase tokens that an answering phenomenon would mention\n"
f" - expected_tools: 1-3 tool names from the list above\n"
f" - priority: 1 (highest) to 10\n"
f" - motivating_hypothesis_ids: at least one [hyp-xxx] from above\n\n"
f"Aggregate aggressively — when two hypotheses share an area, emit it ONCE "
f"and list both ids in motivating_hypothesis_ids.\n\n"
f"Respond ONLY with JSON:\n"
f'[{{"area":"...","description":"...","suggested_agent":"...",'
f'"expected_keywords":[...],"expected_tools":[...],"priority":1-10,'
f'"motivating_hypothesis_ids":["hyp-xxx"]}}]'
)
try:
items = await self._call_llm_for_json(
system=self._AREA_DERIVE_SYSTEM,
user_prompt=prompt,
schema="array",
)
except Exception as e:
logger.warning("Area derivation LLM failed: %s — falling back", e)
await self._derive_areas_fallback(active)
return
valid_hyp_ids = set(self.graph.hypotheses.keys())
for it in items:
area = it.get("area", "").strip()
if not area:
continue
agent = it.get("suggested_agent", "filesystem")
if agent not in self._VALID_AGENT_TYPES:
agent = AGENT_ALIASES.get(agent, "filesystem")
tools = [t for t in it.get("expected_tools", []) if t in TOOL_CATALOG]
motivating = [
h for h in it.get("motivating_hypothesis_ids", [])
if h in valid_hyp_ids
]
priority = max(1, min(10, int(it.get("priority", 5))))
await self.graph.add_investigation_area(
area=area,
description=it.get("description", ""),
suggested_agent=agent,
expected_keywords=[
str(kw).lower() for kw in it.get("expected_keywords", [])
],
expected_tools=tools,
priority=priority,
motivating_hypothesis_ids=motivating,
created_by="llm_derive",
)
if not self.graph.investigation_areas:
await self._derive_areas_fallback(active)
async def _derive_areas_fallback(self, active: list) -> None:
"""One area per active hypothesis as a minimal safety net."""
for h in active:
slug = re.sub(r"[^a-z0-9_]+", "_", h.title.lower())[:40].strip("_")
if not slug:
slug = h.id.replace("-", "_")
await self.graph.add_investigation_area(
area=slug,
description=h.title,
suggested_agent="filesystem",
expected_keywords=h.title.lower().split()[:6],
expected_tools=[],
priority=5,
motivating_hypothesis_ids=[h.id],
created_by="fallback",
)
# ---- LLM JSON helper -----------------------------------------------------
async def _call_llm_for_json(
self,
system: str,
user_prompt: str,
schema: str = "array",
max_retries: int = 2,
):
"""Call LLM expecting JSON output; self-correct on parse failure.
Two layers of safety:
1. _safe_json_loads escapes stray backslashes outside valid JSON escapes.
2. On any remaining parse error, append the error to the prompt and ask
the LLM to retry, up to max_retries additional attempts.
"""
error_hint = ""
last_err: Exception | None = None
open_c, close_c = ('[', ']') if schema == "array" else ('{', '}')
for attempt in range(max_retries + 1):
messages = [{"role": "user", "content": user_prompt + error_hint}]
response = await self.llm.chat(messages=messages, system=system)
candidate = _extract_first_balanced(response, open_c, close_c) or response
try:
return _safe_json_loads(candidate)
except (json.JSONDecodeError, ValueError) as e:
last_err = e
if attempt < max_retries:
logger.info(
"JSON parse attempt %d/%d failed (%s); retrying with hint",
attempt + 1, max_retries + 1, e,
)
error_hint = (
f"\n\n[Your previous response could not be parsed as JSON: {e}]\n"
f"Output STRICT JSON only — no markdown fences, no code blocks. "
f"Do NOT include literal backslash characters in any string value; "
f"rephrase using forward slashes or describe paths in English "
f"(e.g. 'the Cain folder under Program Files' instead of "
f"'C:\\Program Files\\Cain'). "
f"The response must be a single JSON {schema}."
)
assert last_err is not None
raise last_err
# ---- Hypothesis-directed investigation -----------------------------------
async def _generate_hypothesis_leads(self) -> None:
@@ -232,10 +405,19 @@ class Orchestrator:
existing = "\n".join(
f" - {r['node']} [{r['edge_type']}]" for r in related
) or " (none yet)"
related_areas = [
a.area for a in self.graph.investigation_areas.values()
if hyp.id in a.motivating_hypothesis_ids
]
expected_line = (
f" Expected areas to investigate: {', '.join(related_areas)}\n"
if related_areas else ""
)
hyp_blocks.append(
f"Hypothesis [{hyp.id}]: {hyp.title}\n"
f" Description: {hyp.description}\n"
f" Current confidence: {hyp.confidence:.2f}\n"
f"{expected_line}"
f" Existing evidence:\n{existing}"
)
@@ -252,15 +434,11 @@ class Orchestrator:
)
try:
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
tasks = await self._call_llm_for_json(
system=self._LEAD_GEN_SYSTEM,
user_prompt=prompt,
schema="array",
)
match = re.search(r'\[.*?\]', response, re.DOTALL)
if match:
tasks = json.loads(match.group())
else:
tasks = json.loads(response)
for task in tasks:
hyp_id = task.get("hypothesis_id", "")
@@ -286,12 +464,21 @@ class Orchestrator:
existing_evidence = "\n".join(
f" - {r['node']} [{r['edge_type']}]" for r in related
) or " (none yet)"
related_areas = [
a.area for a in self.graph.investigation_areas.values()
if hyp.id in a.motivating_hypothesis_ids
]
expected_line = (
f"Expected areas to investigate: {', '.join(related_areas)}\n\n"
if related_areas else ""
)
prompt = (
f"Hypothesis: {hyp.title}\n"
f"Description: {hyp.description}\n"
f"Current confidence: {hyp.confidence:.2f}\n\n"
f"Existing evidence linked to this hypothesis:\n{existing_evidence}\n\n"
f"{expected_line}"
f"What additional evidence should we look for to CONFIRM or DENY this hypothesis?\n"
f"List 1-3 specific, actionable investigation tasks.\n"
f"For each, specify which agent type should handle it: "
@@ -300,12 +487,11 @@ class Orchestrator:
f'[{{"agent": "agent_type", "task": "what to investigate", "priority": 1-10}}]'
)
try:
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
tasks = await self._call_llm_for_json(
system=self._LEAD_GEN_SYSTEM,
user_prompt=prompt,
schema="array",
)
match = re.search(r'\[.*?\]', response, re.DOTALL)
tasks = json.loads(match.group()) if match else json.loads(response)
for task in tasks:
await self.graph.add_lead(
target_agent=task.get("agent", "filesystem"),
@@ -351,15 +537,11 @@ class Orchestrator:
)
try:
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
judgments = await self._call_llm_for_json(
system=self._JUDGE_SYSTEM,
user_prompt=prompt,
schema="array",
)
match = re.search(r'\[.*?\]', response, re.DOTALL)
if match:
judgments = json.loads(match.group())
else:
judgments = json.loads(response)
for j in judgments:
hyp_id = j.get("hypothesis_id", "")
@@ -402,12 +584,11 @@ class Orchestrator:
f'[{{"phenomenon_id": "ph-xxx", "edge_type": "supports|contradicts|...", "reason": "brief explanation"}}]'
)
try:
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
judgments = await self._call_llm_for_json(
system=self._JUDGE_SYSTEM,
user_prompt=prompt,
schema="array",
)
match = re.search(r'\[.*?\]', response, re.DOTALL)
judgments = json.loads(match.group()) if match else json.loads(response)
for j in judgments:
ph_id = j.get("phenomenon_id", "")
edge_type = j.get("edge_type", "")
@@ -428,74 +609,57 @@ class Orchestrator:
# ---- Gap analysis (coverage check) ---------------------------------------
_AREA_KEYWORDS: dict[str, list[str]] = {
"system_info": ["install date", "registered owner", "product name", "windows xp", "system information"],
"user_accounts": ["user account", "enumerate", "sam hive", "administrator", "mr. evil"],
"shutdown_time": ["shutdown"],
"network_config": ["network interface", "network adapter", "ip address", "dhcp", "mac address", "network config"],
"installed_software": ["installed software", "program files", "installed program"],
"email_config": ["smtp", "pop3", "nntp", "email account", "email config"],
"chat_logs": ["irc", "mirc", "chat log", "channel"],
"network_activity": ["packet capture", "pcap", "interception", "http request", "user-agent"],
"deleted_files": ["deleted file", "recycle", "recycler"],
"execution_evidence": ["prefetch", "execution", "run count", "last execution"],
}
def _check_coverage(self) -> set[str]:
"""Return slugs of investigation_areas already covered by phenomena.
# Deterministic coverage: if the canonical tool was called, the area is covered.
_AREA_TOOLS: dict[str, list[str]] = {
"system_info": ["get_system_info"],
"user_accounts": ["enumerate_users"],
"shutdown_time": ["get_shutdown_time"],
"network_config": ["get_network_interfaces"],
"installed_software": ["list_installed_software"],
"email_config": ["get_email_config"],
"network_activity": ["parse_pcap_strings"],
"deleted_files": ["count_deleted_files"],
"execution_evidence": ["parse_prefetch"],
}
Layer A: any expected_keyword found in evidence text (category +
title + description, lowercased).
Layer B: any expected_tool present in the source_tool set of recorded
phenomena (deterministic — the canonical tool was actually called).
"""
evidence_text = " ".join(
f"{ph.category} {ph.title} {ph.description}".lower()
for ph in self.graph.phenomena.values()
)
used_tools: set[str] = {
ph.source_tool for ph in self.graph.phenomena.values() if ph.source_tool
}
def _check_coverage(self, areas: list[dict]) -> set[str]:
# Layer 1: keyword matching on category + title + description
evidence_text = ""
for ph in self.graph.phenomena.values():
evidence_text += f" {ph.category} {ph.title} {ph.description} ".lower()
# Layer 2: collect all source_tools that produced phenomena
used_tools: set[str] = {ph.source_tool for ph in self.graph.phenomena.values() if ph.source_tool}
covered = set()
for area in areas:
area_name = area["area"]
# Check keywords
keywords = self._AREA_KEYWORDS.get(area_name, [])
if any(kw in evidence_text for kw in keywords):
covered.add(area_name)
covered: set[str] = set()
for a in self.graph.investigation_areas.values():
if any(kw.lower() in evidence_text for kw in a.expected_keywords):
covered.add(a.area)
continue
# Check source_tool
area_tools = self._AREA_TOOLS.get(area_name, [])
if any(tool in used_tools for tool in area_tools):
covered.add(area_name)
if any(t in used_tools for t in a.expected_tools):
covered.add(a.area)
return covered
async def _run_gap_analysis(self) -> None:
areas = self.config.get("investigation_areas", [])
areas = list(self.graph.investigation_areas.values())
if not areas:
return
covered = self._check_coverage(areas)
uncovered = [a for a in areas if a["area"] not in covered]
covered = self._check_coverage()
uncovered = [a for a in areas if a.area not in covered]
if not uncovered:
_log(f"All {len(areas)} investigation areas covered", event="progress")
return
uncovered_names = ", ".join(a["area"] for a in uncovered)
_log(f"{len(uncovered)}/{len(areas)} areas uncovered: {uncovered_names}", event="dispatch")
for area in uncovered:
uncovered_names = ", ".join(a.area for a in uncovered)
_log(
f"{len(uncovered)}/{len(areas)} areas uncovered: {uncovered_names}",
event="dispatch",
)
for a in uncovered:
await self.graph.add_lead(
target_agent=area["agent"],
description=area["task"],
priority=3,
target_agent=a.suggested_agent,
description=a.description,
priority=a.priority,
hypothesis_id=(
a.motivating_hypothesis_ids[0]
if a.motivating_hypothesis_ids else None
),
)
for round_num in range(3):
@@ -542,6 +706,15 @@ class Orchestrator:
json.dumps(leads_data, ensure_ascii=False, indent=2)
)
# Investigation areas export
areas_data = {
aid: a.to_dict()
for aid, a in self.graph.investigation_areas.items()
}
(self.run_dir / "investigation_areas.json").write_text(
json.dumps(areas_data, ensure_ascii=False, indent=2)
)
# Run metadata
end_time = datetime.now()
metadata = {
@@ -601,6 +774,11 @@ class Orchestrator:
if resume_phase <= 2:
_log("Phase 2: Hypothesis Generation", event="phase")
t0 = time.monotonic()
# Seed manual investigation areas (if any) BEFORE LLM derive,
# so manual entries win the dedupe and LLM only augments.
await self._seed_manual_investigation_areas()
manual_hypotheses = self.config.get("hypotheses", [])
if manual_hypotheses:
await self._generate_hypotheses_manual(manual_hypotheses)
@@ -611,10 +789,17 @@ class Orchestrator:
if self.graph.phenomena and self.graph.hypotheses:
await self._judge_new_phenomena()
# Derive investigation areas from active hypotheses.
# No-op if manual seed already populated or resume restored areas.
await self._derive_investigation_areas()
for h in self.graph.hypotheses.values():
_log(f" {h.summary()}", event="hypothesis")
for a in self.graph.investigation_areas.values():
_log(f" {a.summary()}", event="area")
_log(
f"+{len(self.graph.hypotheses)} hypotheses generated",
f"+{len(self.graph.hypotheses)} hypotheses, "
f"{len(self.graph.investigation_areas)} areas",
event="progress", elapsed=time.monotonic() - t0,
)

View File

@@ -5,6 +5,7 @@ description = "Multi-Agent System for Digital Forensics"
requires-python = ">=3.14"
dependencies = [
"httpx[socks]>=0.28.1",
"openai>=2.36.0",
"pyyaml",
"regipy>=6.2.1",
]

View File

@@ -45,6 +45,8 @@ async def main() -> None:
api_key=agent_cfg["api_key"],
model=agent_cfg["model"],
max_tokens=16384,
reasoning_effort=agent_cfg.get("reasoning_effort"),
thinking_enabled=agent_cfg.get("thinking_enabled", False),
)
register_all_tools(graph.image_path, graph.partition_offset, graph)

View File

@@ -14,7 +14,6 @@ from evidence_graph import (
from llm_client import (
_truncate_tool_result, _partition_tool_calls, _ToolBatch, READ_ONLY_TOOLS,
_apply_progressive_decay, _FOLD_THRESHOLD, _FOLD_KEEP_RECENT,
TOOL_RESULT_TAG, TOOL_RESULT_END,
)
from tool_registry import (
_tool_result_cache, _cache_key, _make_cached, CACHEABLE_TOOLS,
@@ -598,7 +597,8 @@ class TestParallelToolExecution:
elapsed = time.monotonic() - start
assert len(results) == 3
assert all("ok" in r for r in results)
# results are (raw, formatted) tuples; both contain "ok"
assert all("ok" in raw and "ok" in formatted for raw, formatted in results)
# 3 tasks × 50ms each should take ~50ms parallel, not ~150ms serial
assert elapsed < 0.12, f"Expected parallel execution but took {elapsed:.3f}s"
@@ -627,8 +627,10 @@ class TestParallelToolExecution:
results = await client._execute_tool_batch_parallel(calls, tool_executor)
assert len(results) == 2
assert "success" in results[0]
assert "Error" in results[1]
raw0, formatted0 = results[0]
raw1, formatted1 = results[1]
assert "success" in raw0 and "success" in formatted0
assert "Error" in raw1 and "Error" in formatted1
# ---------------------------------------------------------------------------
@@ -843,21 +845,27 @@ class TestToolResultCache:
class TestProgressiveDecay:
def _make_messages(self, n_rounds: int) -> list[dict]:
"""Build a synthetic message list with n_rounds of (assistant, user) pairs."""
"""Build a synthetic message list shaped like native tool calling:
user → (assistant w/ tool_calls → tool result)+
"""
messages = [{"role": "user", "content": "Start task"}]
for i in range(n_rounds):
tc_id = f"call_{i}"
messages.append({
"role": "assistant",
"content": f"<tool_call>{{'name': 'tool_{i}'}}</tool_call>",
"content": None,
"tool_calls": [
{
"id": tc_id,
"type": "function",
"function": {"name": f"tool_{i}", "arguments": "{}"},
},
],
})
# Tool result message with substantial content
messages.append({
"role": "user",
"content": (
f"{TOOL_RESULT_TAG}\n"
f"[tool_{i}] {'x' * 2500}\n"
f"{TOOL_RESULT_END}"
),
"role": "tool",
"tool_call_id": tc_id,
"content": f"[tool_{i}] {'x' * 2500}",
})
return messages
@@ -865,21 +873,16 @@ class TestProgressiveDecay:
msgs = self._make_messages(3)
result = _apply_progressive_decay(msgs)
assert len(result) == len(msgs)
# Content should be identical for short conversations
for orig, decayed in zip(msgs, result):
assert orig["content"] == decayed["content"]
assert orig.get("content") == decayed.get("content")
def test_old_messages_truncated(self):
msgs = self._make_messages(20)
result = _apply_progressive_decay(msgs)
# Recent tool results should be full length
last_tool_result = [m for m in result if m["role"] == "user" and TOOL_RESULT_TAG in m["content"]][-1]
assert len(last_tool_result["content"]) > 2000
# Oldest tool results should be truncated
first_tool_result = [m for m in result if m["role"] == "user" and TOOL_RESULT_TAG in m["content"]][0]
assert len(first_tool_result["content"]) < 500
tool_msgs = [m for m in result if m.get("role") == "tool"]
assert len(tool_msgs[-1]["content"]) > 2000
assert len(tool_msgs[0]["content"]) < 500
def test_message_count_preserved(self):
msgs = self._make_messages(20)
@@ -915,7 +918,7 @@ class TestMessageFolding:
messages.append({"role": "assistant", "content": f"thinking step {i}"})
messages.append({"role": "user", "content": f"tool result {i}: {'data ' * 50}"})
result = await client._fold_old_messages(messages, "system prompt")
result = await client._fold_old_messages(messages)
# Should be significantly shorter
assert len(result) < len(messages)
@@ -937,7 +940,7 @@ class TestMessageFolding:
{"role": "assistant", "content": "hi"},
]
result = await client._fold_old_messages(messages, "system")
result = await client._fold_old_messages(messages)
# Should return original (n_to_fold = 2 - 10 = negative, so no folding)
assert result == messages
client.chat.assert_not_called()
@@ -951,7 +954,555 @@ class TestMessageFolding:
client.chat = AsyncMock(side_effect=Exception("API error"))
messages = [{"role": "user", "content": f"msg {i}"} for i in range(40)]
result = await client._fold_old_messages(messages, "system")
result = await client._fold_old_messages(messages)
# On failure, should return original messages unchanged
assert len(result) == 40
@pytest.mark.asyncio
async def test_fold_boundary_never_orphans_tool_message(self):
"""If the natural fold boundary would leave `role: "tool"` at the
head of `recent_messages`, fold must walk the boundary forward
until the head is non-tool. The API rejects orphan tool messages
with HTTP 400."""
from llm_client import LLMClient
from unittest.mock import AsyncMock
client = LLMClient.__new__(LLMClient)
client.chat = AsyncMock(return_value="summary")
# Build a long conversation of (assistant{tool_calls}, tool) pairs.
# Place the assistant at the exact n_to_fold boundary so its paired
# tool would otherwise be orphaned at the head of recent_messages.
msgs: list[dict] = [{"role": "user", "content": "task"}]
for i in range(30):
tc_id = f"call_{i}"
msgs.append({
"role": "assistant", "content": None,
"tool_calls": [{
"id": tc_id, "type": "function",
"function": {"name": f"t_{i}", "arguments": "{}"},
}],
})
msgs.append({"role": "tool", "tool_call_id": tc_id, "content": "ok"})
result = await client._fold_old_messages(msgs)
# No `role: "tool"` may appear without an `assistant{tool_calls}`
# immediately preceding it.
for i, m in enumerate(result):
if m.get("role") == "tool":
assert i > 0, "tool message cannot be first"
prev = result[i - 1]
assert prev.get("role") == "assistant" and prev.get("tool_calls"), (
f"tool at index {i} preceded by {prev.get('role')} "
f"(tool_calls={bool(prev.get('tool_calls'))})"
)
# ---------------------------------------------------------------------------
# Investigation areas: dataclass + derivation + coverage + dispatch
# ---------------------------------------------------------------------------
class TestInvestigationAreaDerivation:
@pytest.fixture
def graph(self):
return EvidenceGraph()
@pytest.mark.asyncio
async def test_add_investigation_area_dedupes_and_merges_lists(self, graph):
aid1, existed1 = await graph.add_investigation_area(
area="password_hashes", description="SAM dump",
suggested_agent="filesystem",
expected_keywords=["sam", "pwdump"],
expected_tools=["search_strings"],
motivating_hypothesis_ids=["hyp-a"],
created_by="llm_derive",
)
aid2, existed2 = await graph.add_investigation_area(
area="password_hashes", description="Other description",
suggested_agent="registry",
expected_keywords=["sam", "hashdump"], # one new
expected_tools=["search_strings", "parse_registry_key"], # one new
motivating_hypothesis_ids=["hyp-b"], # new
created_by="manual",
)
assert not existed1
assert existed2
assert aid1 == aid2
a = graph.investigation_areas[aid1]
# Description and suggested_agent NOT overwritten (first-write wins)
assert a.description == "SAM dump"
assert a.suggested_agent == "filesystem"
# Three list fields are unioned
assert set(a.expected_keywords) == {"sam", "pwdump", "hashdump"}
assert set(a.expected_tools) == {"search_strings", "parse_registry_key"}
assert set(a.motivating_hypothesis_ids) == {"hyp-a", "hyp-b"}
@pytest.mark.asyncio
async def test_check_coverage_keyword_layer(self, graph):
await graph.add_phenomenon(
"fs", "filesystem", "Cain SAM dump artifact",
"Found sam.lst in the Cain folder",
source_tool="list_directory",
)
await graph.add_investigation_area(
area="password_hashes", description="SAM dump",
suggested_agent="filesystem",
expected_keywords=["sam.lst", "pwdump"],
expected_tools=["nonexistent_tool"],
)
from orchestrator import Orchestrator
from agent_factory import AgentFactory
from unittest.mock import AsyncMock
orch = Orchestrator(AsyncMock(), graph, AgentFactory(AsyncMock(), graph))
covered = orch._check_coverage()
assert "password_hashes" in covered
@pytest.mark.asyncio
async def test_check_coverage_tool_layer(self, graph):
await graph.add_phenomenon(
"reg", "registry", "User accounts",
"Found accounts",
source_tool="enumerate_users",
)
await graph.add_investigation_area(
area="user_accounts", description="Enum users",
suggested_agent="registry",
expected_keywords=["irrelevant"],
expected_tools=["enumerate_users"],
)
from orchestrator import Orchestrator
from agent_factory import AgentFactory
from unittest.mock import AsyncMock
orch = Orchestrator(AsyncMock(), graph, AgentFactory(AsyncMock(), graph))
covered = orch._check_coverage()
assert "user_accounts" in covered
@pytest.mark.asyncio
async def test_load_state_round_trip_areas(self, graph, tmp_path):
await graph.add_investigation_area(
area="x", description="d", suggested_agent="filesystem",
expected_keywords=["k1"], expected_tools=["t1"],
priority=2, motivating_hypothesis_ids=["hyp-a"],
created_by="manual",
)
path = tmp_path / "state.json"
graph.save_state(path)
g2 = EvidenceGraph.load_state(path)
assert len(g2.investigation_areas) == 1
a = list(g2.investigation_areas.values())[0]
assert a.area == "x"
assert a.expected_keywords == ["k1"]
assert a.priority == 2
assert a.motivating_hypothesis_ids == ["hyp-a"]
assert a.created_by == "manual"
@pytest.mark.asyncio
async def test_derive_no_op_when_areas_already_populated(self, graph):
"""Resume safety: if areas are already in the graph (manual seed or
restored from disk), _derive_investigation_areas does nothing."""
from unittest.mock import AsyncMock
from orchestrator import Orchestrator
from agent_factory import AgentFactory
await graph.add_hypothesis("test", "desc", created_by="t")
await graph.add_investigation_area(
area="pre_existing", description="d", suggested_agent="filesystem",
created_by="manual",
)
llm = AsyncMock()
orch = Orchestrator(llm, graph, AgentFactory(llm, graph))
await orch._derive_investigation_areas()
# LLM should not have been called
assert llm.chat.call_count == 0
# Area count unchanged
assert len(graph.investigation_areas) == 1
@pytest.mark.asyncio
async def test_fallback_when_llm_returns_empty_list(self, graph):
from unittest.mock import AsyncMock
from orchestrator import Orchestrator
from agent_factory import AgentFactory
await graph.add_hypothesis("Some compromise", "desc", created_by="t")
llm = AsyncMock()
llm.chat.return_value = "[]"
orch = Orchestrator(llm, graph, AgentFactory(llm, graph))
await orch._derive_investigation_areas()
# Fallback creates one area per hypothesis
assert len(graph.investigation_areas) == 1
a = list(graph.investigation_areas.values())[0]
assert a.created_by == "fallback"
@pytest.mark.asyncio
async def test_unknown_tool_filtered_kept_keywords(self, graph):
"""LLM emits a tool name not in TOOL_CATALOG; tool is filtered,
but the area itself with its keywords is kept."""
from unittest.mock import AsyncMock
from orchestrator import Orchestrator
from agent_factory import AgentFactory
h = await graph.add_hypothesis("h", "desc", created_by="t")
llm = AsyncMock()
llm.chat.return_value = (
'[{"area":"foo","description":"desc","suggested_agent":"filesystem",'
'"expected_keywords":["kw1","kw2"],"expected_tools":["nonexistent_tool"],'
f'"priority":2,"motivating_hypothesis_ids":["{h}"]}}]'
)
orch = Orchestrator(llm, graph, AgentFactory(llm, graph))
await orch._derive_investigation_areas()
assert "area-foo" in graph.investigation_areas
a = graph.investigation_areas["area-foo"]
assert a.expected_keywords == ["kw1", "kw2"]
assert a.expected_tools == [] # filtered out
@pytest.mark.asyncio
async def test_unknown_agent_resolved_via_AGENT_ALIASES(self, graph):
"""LLM emits 'chat' (which is in AGENT_ALIASES → 'communication').
The area should land with the resolved agent name."""
from unittest.mock import AsyncMock
from orchestrator import Orchestrator
from agent_factory import AgentFactory
h = await graph.add_hypothesis("h", "desc", created_by="t")
llm = AsyncMock()
llm.chat.return_value = (
'[{"area":"chat_stuff","description":"d","suggested_agent":"chat",'
'"expected_keywords":["irc"],"expected_tools":[],'
f'"priority":3,"motivating_hypothesis_ids":["{h}"]}}]'
)
orch = Orchestrator(llm, graph, AgentFactory(llm, graph))
await orch._derive_investigation_areas()
a = graph.investigation_areas["area-chat_stuff"]
assert a.suggested_agent == "communication"
@staticmethod
def _agent_with_executor(graph, llm, tool_name: str, real_executor):
"""Build a BaseAgent that registers tool_name via the real register_tool
path so the mandatory-record wrapper is engaged."""
from base_agent import BaseAgent
agent = BaseAgent(llm, graph)
agent.name = "test_agent"
# Bypass _register_graph_tools side-effects in run() — we register
# only what the test needs.
agent._register_graph_tools = lambda: None
agent.register_tool(
name=tool_name, description="", input_schema={}, executor=real_executor,
)
return agent
@pytest.mark.asyncio
async def test_forced_record_retry_fires_when_zero_phenomena(self):
"""BaseAgent.run should automatically retry one more LLM round if
the agent finished without calling any mandatory recording tool."""
from unittest.mock import AsyncMock
graph = EvidenceGraph()
llm = AsyncMock()
async def real_add(**kw):
await graph.add_phenomenon(
source_agent="test", category="filesystem",
title="Forced retry record",
description="Recorded after STOP prompt",
source_tool="forced_retry",
)
agent = self._agent_with_executor(graph, llm, "add_phenomenon", real_add)
async def fake_tool_call_loop(messages, tools, tool_executor, system, max_iterations=40, terminal_tools=()):
already_retrying = any(
"STOP." in (m.get("content", "") if isinstance(m, dict) else "")
for m in messages
)
if not already_retrying:
return "Final answer without recording.", list(messages) + [
{"role": "assistant", "content": "Final answer without recording."}
]
await tool_executor["add_phenomenon"]() # goes through wrapper
return "Recorded.", []
llm.tool_call_loop = fake_tool_call_loop
await agent.run("test task")
assert len(graph.phenomena) == 1
assert agent._record_call_counts["add_phenomenon"] == 1
@pytest.mark.asyncio
async def test_no_retry_when_mandatory_tool_was_called(self):
"""Retry should NOT fire if a mandatory recording tool was invoked."""
from unittest.mock import AsyncMock
graph = EvidenceGraph()
llm = AsyncMock()
call_count = {"n": 0}
async def real_add(**kw):
await graph.add_phenomenon(
source_agent="test", category="filesystem", title="x",
description="y", source_tool="t",
)
agent = self._agent_with_executor(graph, llm, "add_phenomenon", real_add)
async def fake_tool_call_loop(messages, tools, tool_executor, system, max_iterations=40, terminal_tools=()):
call_count["n"] += 1
await tool_executor["add_phenomenon"]() # wrapper increments count
return "done.", list(messages)
llm.tool_call_loop = fake_tool_call_loop
await agent.run("test task")
assert call_count["n"] == 1 # no retry
@pytest.mark.asyncio
async def test_no_retry_when_mandatory_tools_empty(self):
"""ReportAgent declares mandatory_record_tools=() — retry should
not fire even with zero graph mutations (final text IS the output)."""
from unittest.mock import AsyncMock
from base_agent import BaseAgent
graph = EvidenceGraph()
llm = AsyncMock()
call_count = {"n": 0}
async def fake_tool_call_loop(messages, tools, tool_executor, system, max_iterations=40, terminal_tools=()):
call_count["n"] += 1
return "report body here", list(messages)
llm.tool_call_loop = fake_tool_call_loop
class ReportLike(BaseAgent):
mandatory_record_tools = ()
agent = ReportLike(llm, graph)
agent.name = "report_like"
agent._register_graph_tools = lambda: None
await agent.run("test task")
assert call_count["n"] == 1
@pytest.mark.asyncio
async def test_forced_retry_fires_for_timeline_agent(self):
"""TimelineAgent.mandatory_record_tools=('add_temporal_edge',) — retry
should fire when timeline finishes without creating any temporal edges,
even though the agent does not have add_phenomenon."""
from unittest.mock import AsyncMock
from base_agent import BaseAgent
graph = EvidenceGraph()
llm = AsyncMock()
call_count = {"n": 0}
edge_added = {"n": 0}
async def real_add_edge(**kw):
edge_added["n"] += 1
class TimelineLike(BaseAgent):
mandatory_record_tools = ("add_temporal_edge",)
agent = TimelineLike(llm, graph)
agent.name = "timeline_like"
agent._register_graph_tools = lambda: None
agent.register_tool("add_temporal_edge", "", {}, real_add_edge)
async def fake_tool_call_loop(messages, tools, tool_executor, system, max_iterations=40, terminal_tools=()):
call_count["n"] += 1
already_retrying = any(
"STOP." in (m.get("content", "") if isinstance(m, dict) else "")
for m in messages
)
if not already_retrying:
return "answer", list(messages)
await tool_executor["add_temporal_edge"]()
return "recorded.", []
llm.tool_call_loop = fake_tool_call_loop
await agent.run("build timeline")
assert call_count["n"] == 2 # first + retry
assert edge_added["n"] == 1
assert agent._record_call_counts["add_temporal_edge"] == 1
# ---- terminal_tools: real LLMClient.tool_call_loop short-circuit -----
@pytest.mark.asyncio
async def test_terminal_tool_exits_loop_immediately(self):
"""When a terminal tool is called, tool_call_loop must return
with that tool's result text as final_text — no further LLM calls."""
from unittest.mock import AsyncMock
from llm_client import LLMClient
client = LLMClient.__new__(LLMClient)
client.max_tokens = 4096
client.reasoning_effort = None
client.thinking_enabled = False
client.model = "test"
client._client = None
call_count = {"n": 0}
async def fake_chat_with_tools(messages, openai_tools):
call_count["n"] += 1
if call_count["n"] == 1:
# First turn: model calls a read tool then the terminal tool.
return "thinking aloud", None, [
{"id": "tc1", "name": "read_tool", "arguments": "{}"},
{"id": "tc2", "name": "save_report",
"arguments": '{"content":"FINAL REPORT BODY","output_path":"r.md"}'},
]
raise AssertionError("loop should have exited after terminal tool")
client._chat_with_tools = fake_chat_with_tools
async def read_tool():
return "some data"
async def save_report(content, output_path):
return content # terminal tool returns content as final_text
tools = [
{"name": "read_tool", "description": "", "input_schema": {"type": "object", "properties": {}}},
{"name": "save_report", "description": "",
"input_schema": {"type": "object", "properties": {
"content": {"type": "string"}, "output_path": {"type": "string"}}}},
]
executors = {"read_tool": read_tool, "save_report": save_report}
final_text, _ = await client.tool_call_loop(
messages=[{"role": "user", "content": "go"}],
tools=tools, tool_executor=executors,
system="sys", terminal_tools=("save_report",),
)
assert final_text == "FINAL REPORT BODY"
assert call_count["n"] == 1 # never called a 2nd round
@pytest.mark.asyncio
async def test_no_terminal_short_circuit_when_not_declared(self):
"""When terminal_tools is empty, the same call sequence should
run the read tool, run save_report-like tool, AND continue the loop
(i.e. another LLM round) until the model stops calling tools."""
from unittest.mock import AsyncMock
from llm_client import LLMClient
client = LLMClient.__new__(LLMClient)
client.max_tokens = 4096
client.reasoning_effort = None
client.thinking_enabled = False
client.model = "test"
client._client = None
call_count = {"n": 0}
async def fake_chat_with_tools(messages, openai_tools):
call_count["n"] += 1
if call_count["n"] == 1:
return "", None, [
{"id": "tc1", "name": "add_phenomenon",
"arguments": '{"title":"x","description":"y"}'},
]
return "all done", None, [] # model stops calling tools
client._chat_with_tools = fake_chat_with_tools
async def add_phenomenon(title, description):
return f"recorded {title}"
tools = [
{"name": "add_phenomenon", "description": "",
"input_schema": {"type": "object", "properties": {
"title": {"type": "string"}, "description": {"type": "string"}}}},
]
executors = {"add_phenomenon": add_phenomenon}
final_text, _ = await client.tool_call_loop(
messages=[{"role": "user", "content": "go"}],
tools=tools, tool_executor=executors,
system="sys", terminal_tools=(), # NOT terminal
)
assert final_text == "all done"
assert call_count["n"] == 2 # 2 rounds — terminal_tools empty, loop continues
@pytest.mark.asyncio
async def test_report_agent_terminal_tool_declared(self):
"""ReportAgent should declare save_report as both mandatory and terminal."""
from agents.report import ReportAgent
assert ReportAgent.terminal_tools == ("save_report",)
assert ReportAgent.mandatory_record_tools == ("save_report",)
@pytest.mark.asyncio
async def test_terminal_tool_result_not_truncated(self):
"""Terminal tool's raw return is used as final_text and must NOT
be truncated to 3000 chars (the truncation cap applies only to
LLM-context tool result messages). A 20K-char markdown report
passed through save_report should reach the caller intact."""
from llm_client import LLMClient
client = LLMClient.__new__(LLMClient)
client.max_tokens = 4096
client.reasoning_effort = None
client.thinking_enabled = False
client.model = "test"
client._client = None
long_report = "# Report\n" + ("- finding " + "x" * 100 + "\n") * 200
assert len(long_report) > 10000
call_count = {"n": 0}
async def fake_chat_with_tools(messages, openai_tools):
call_count["n"] += 1
if call_count["n"] == 1:
return "", None, [
{"id": "tc1", "name": "save_report",
"arguments": '{"content":"placeholder","output_path":"r.md"}'},
]
raise AssertionError("loop should have exited")
client._chat_with_tools = fake_chat_with_tools
async def save_report(content, output_path):
return long_report # ignore content arg; return long content
tools = [{"name": "save_report", "description": "",
"input_schema": {"type": "object", "properties": {
"content": {"type": "string"}, "output_path": {"type": "string"}}}}]
executors = {"save_report": save_report}
final_text, _ = await client.tool_call_loop(
messages=[{"role": "user", "content": "go"}],
tools=tools, tool_executor=executors,
system="sys", terminal_tools=("save_report",),
)
assert final_text == long_report
assert len(final_text) > 10000 # not truncated to 3000
@pytest.mark.asyncio
async def test_dispatch_uses_hypothesis_id_when_motivating_ids_present(self, graph):
from unittest.mock import AsyncMock
from orchestrator import Orchestrator
from agent_factory import AgentFactory
h = await graph.add_hypothesis("h", "desc", created_by="t")
await graph.add_investigation_area(
area="uncovered_area", description="d", suggested_agent="registry",
expected_keywords=["xyz_no_match"], expected_tools=[],
priority=2, motivating_hypothesis_ids=[h],
created_by="llm_derive",
)
orch = Orchestrator(AsyncMock(), graph, AgentFactory(AsyncMock(), graph))
# Don't actually dispatch (would call agents) — just hit the lead-add path
# by manually replicating what _run_gap_analysis does.
covered = orch._check_coverage()
assert "uncovered_area" not in covered
# Simulate dispatch
for a in graph.investigation_areas.values():
if a.area not in covered:
await graph.add_lead(
target_agent=a.suggested_agent,
description=a.description,
priority=a.priority,
hypothesis_id=(a.motivating_hypothesis_ids[0]
if a.motivating_hypothesis_ids else None),
)
assert len(graph.leads) == 1
assert graph.leads[0].hypothesis_id == h
assert graph.leads[0].target_agent == "registry"
assert graph.leads[0].priority == 2

172
uv.lock generated
View File

@@ -2,6 +2,15 @@ version = 1
revision = 3
requires-python = ">=3.14"
[[package]]
name = "annotated-types"
version = "0.7.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
]
[[package]]
name = "anyio"
version = "4.13.0"
@@ -41,6 +50,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b2/fb/08b3f4bf05da99aba8ffea52a558758def16e8516bc75ca94ff73587e7d3/construct-2.10.70-py3-none-any.whl", hash = "sha256:c80be81ef595a1a821ec69dc16099550ed22197615f4320b57cc9ce2a672cb30", size = 63020, upload-time = "2023-11-29T08:44:46.876Z" },
]
[[package]]
name = "distro"
version = "1.9.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed", size = 60722, upload-time = "2023-12-24T09:54:32.31Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2", size = 20277, upload-time = "2023-12-24T09:54:30.421Z" },
]
[[package]]
name = "h11"
version = "0.16.0"
@@ -110,12 +128,48 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
]
[[package]]
name = "jiter"
version = "0.14.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/6e/c1/0cddc6eb17d4c53a99840953f95dd3accdc5cfc7a337b0e9b26476276be9/jiter-0.14.0.tar.gz", hash = "sha256:e8a39e66dac7153cf3f964a12aad515afa8d74938ec5cc0018adcdae5367c79e", size = 165725, upload-time = "2026-04-10T14:28:42.01Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/4f/1e/354ed92461b165bd581f9ef5150971a572c873ec3b68a916d5aa91da3cc2/jiter-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:6f396837fc7577871ca8c12edaf239ed9ccef3bbe39904ae9b8b63ce0a48b140", size = 315277, upload-time = "2026-04-10T14:27:18.109Z" },
{ url = "https://files.pythonhosted.org/packages/a6/95/8c7c7028aa8636ac21b7a55faef3e34215e6ed0cbf5ae58258427f621aa3/jiter-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a4d50ea3d8ba4176f79754333bd35f1bbcd28e91adc13eb9b7ca91bc52a6cef9", size = 315923, upload-time = "2026-04-10T14:27:19.603Z" },
{ url = "https://files.pythonhosted.org/packages/47/40/e2a852a44c4a089f2681a16611b7ce113224a80fd8504c46d78491b47220/jiter-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce17f8a050447d1b4153bda4fb7d26e6a9e74eb4f4a41913f30934c5075bf615", size = 344943, upload-time = "2026-04-10T14:27:21.262Z" },
{ url = "https://files.pythonhosted.org/packages/fc/1f/670f92adee1e9895eac41e8a4d623b6da68c4d46249d8b556b60b63f949e/jiter-0.14.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f4f1c4b125e1652aefbc2e2c1617b60a160ab789d180e3d423c41439e5f32850", size = 369725, upload-time = "2026-04-10T14:27:22.766Z" },
{ url = "https://files.pythonhosted.org/packages/01/2f/541c9ba567d05de1c4874a0f8f8c5e3fd78e2b874266623da9a775cf46e0/jiter-0.14.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be808176a6a3a14321d18c603f2d40741858a7c4fc982f83232842689fe86dd9", size = 461210, upload-time = "2026-04-10T14:27:24.315Z" },
{ url = "https://files.pythonhosted.org/packages/ce/a9/c31cbec09627e0d5de7aeaec7690dba03e090caa808fefd8133137cf45bc/jiter-0.14.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26679d58ba816f88c3849306dd58cb863a90a1cf352cdd4ef67e30ccf8a77994", size = 380002, upload-time = "2026-04-10T14:27:26.155Z" },
{ url = "https://files.pythonhosted.org/packages/50/02/3c05c1666c41904a2f607475a73e7a4763d1cbde2d18229c4f85b22dc253/jiter-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80381f5a19af8fa9aef743f080e34f6b25ebd89656475f8cf0470ec6157052aa", size = 354678, upload-time = "2026-04-10T14:27:27.701Z" },
{ url = "https://files.pythonhosted.org/packages/7d/97/e15b33545c2b13518f560d695f974b9891b311641bdcf178d63177e8801e/jiter-0.14.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:004df5fdb8ecbd6d99f3227df18ba1a259254c4359736a2e6f036c944e02d7c5", size = 358920, upload-time = "2026-04-10T14:27:29.256Z" },
{ url = "https://files.pythonhosted.org/packages/ad/d2/8b1461def6b96ba44530df20d07ef7a1c7da22f3f9bf1727e2d611077bf1/jiter-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cff5708f7ed0fa098f2b53446c6fa74c48469118e5cd7497b4f1cd569ab06928", size = 394512, upload-time = "2026-04-10T14:27:31.344Z" },
{ url = "https://files.pythonhosted.org/packages/e3/88/837566dd6ed6e452e8d3205355afd484ce44b2533edfa4ed73a298ea893e/jiter-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:2492e5f06c36a976d25c7cc347a60e26d5470178d44cde1b9b75e60b4e519f28", size = 521120, upload-time = "2026-04-10T14:27:33.299Z" },
{ url = "https://files.pythonhosted.org/packages/89/6b/b00b45c4d1b4c031777fe161d620b755b5b02cdade1e316dcb46e4471d63/jiter-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7609cfbe3a03d37bfdbf5052012d5a879e72b83168a363deae7b3a26564d57de", size = 553668, upload-time = "2026-04-10T14:27:34.868Z" },
{ url = "https://files.pythonhosted.org/packages/ad/d8/6fe5b42011d19397433d345716eac16728ac241862a2aac9c91923c7509a/jiter-0.14.0-cp314-cp314-win32.whl", hash = "sha256:7282342d32e357543565286b6450378c3cd402eea333fc1ebe146f1fabb306fc", size = 207001, upload-time = "2026-04-10T14:27:36.455Z" },
{ url = "https://files.pythonhosted.org/packages/e5/43/5c2e08da1efad5e410f0eaaabeadd954812612c33fbbd8fd5328b489139d/jiter-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:bd77945f38866a448e73b0b7637366afa814d4617790ecd88a18ca74377e6c02", size = 202187, upload-time = "2026-04-10T14:27:38Z" },
{ url = "https://files.pythonhosted.org/packages/aa/1f/6e39ac0b4cdfa23e606af5b245df5f9adaa76f35e0c5096790da430ca506/jiter-0.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:f2d4c61da0821ee42e0cdf5489da60a6d074306313a377c2b35af464955a3611", size = 192257, upload-time = "2026-04-10T14:27:39.504Z" },
{ url = "https://files.pythonhosted.org/packages/05/57/7dbc0ffbbb5176a27e3518716608aa464aee2e2887dc938f0b900a120449/jiter-0.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1bf7ff85517dd2f20a5750081d2b75083c1b269cf75afc7511bdf1f9548beb3b", size = 323441, upload-time = "2026-04-10T14:27:41.039Z" },
{ url = "https://files.pythonhosted.org/packages/83/6e/7b3314398d8983f06b557aa21b670511ec72d3b79a68ee5e4d9bff972286/jiter-0.14.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c8ef8791c3e78d6c6b157c6d360fbb5c715bebb8113bc6a9303c5caff012754a", size = 348109, upload-time = "2026-04-10T14:27:42.552Z" },
{ url = "https://files.pythonhosted.org/packages/ae/4f/8dc674bcd7db6dba566de73c08c763c337058baff1dbeb34567045b27cdc/jiter-0.14.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e74663b8b10da1fe0f4e4703fd7980d24ad17174b6bb35d8498d6e3ebce2ae6a", size = 368328, upload-time = "2026-04-10T14:27:44.574Z" },
{ url = "https://files.pythonhosted.org/packages/3b/5f/188e09a1f20906f98bbdec44ed820e19f4e8eb8aff88b9d1a5a497587ff3/jiter-0.14.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1aca29ba52913f78362ec9c2da62f22cdc4c3083313403f90c15460979b84d9b", size = 463301, upload-time = "2026-04-10T14:27:46.717Z" },
{ url = "https://files.pythonhosted.org/packages/ac/f0/19046ef965ed8f349e8554775bb12ff4352f443fbe12b95d31f575891256/jiter-0.14.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8b39b7d87a952b79949af5fef44d2544e58c21a28da7f1bae3ef166455c61746", size = 378891, upload-time = "2026-04-10T14:27:48.32Z" },
{ url = "https://files.pythonhosted.org/packages/c4/c3/da43bd8431ee175695777ee78cf0e93eacbb47393ff493f18c45231b427d/jiter-0.14.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:78d918a68b26e9fab068c2b5453577ef04943ab2807b9a6275df2a812599a310", size = 360749, upload-time = "2026-04-10T14:27:49.88Z" },
{ url = "https://files.pythonhosted.org/packages/72/26/e054771be889707c6161dbdec9c23d33a9ec70945395d70f07cfea1e9a6f/jiter-0.14.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:b08997c35aee1201c1a5361466a8fb9162d03ae7bf6568df70b6c859f1e654a4", size = 358526, upload-time = "2026-04-10T14:27:51.504Z" },
{ url = "https://files.pythonhosted.org/packages/c3/0f/7bea65ea2a6d91f2bf989ff11a18136644392bf2b0497a1fa50934c30a9c/jiter-0.14.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:260bf7ca20704d58d41f669e5e9fe7fe2fa72901a6b324e79056f5d52e9c9be2", size = 393926, upload-time = "2026-04-10T14:27:53.368Z" },
{ url = "https://files.pythonhosted.org/packages/3c/a1/b1ff7d70deef61ac0b7c6c2f12d2ace950cdeecb4fdc94500a0926802857/jiter-0.14.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:37826e3df29e60f30a382f9294348d0238ef127f4b5d7f5f8da78b5b9e050560", size = 521052, upload-time = "2026-04-10T14:27:55.058Z" },
{ url = "https://files.pythonhosted.org/packages/0b/7b/3b0649983cbaf15eda26a414b5b1982e910c67bd6f7b1b490f3cfc76896a/jiter-0.14.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:645be49c46f2900937ba0eaf871ad5183c96858c0af74b6becc7f4e367e36e06", size = 553716, upload-time = "2026-04-10T14:27:57.269Z" },
{ url = "https://files.pythonhosted.org/packages/97/f8/33d78c83bd93ae0c0af05293a6660f88a1977caef39a6d72a84afab94ce0/jiter-0.14.0-cp314-cp314t-win32.whl", hash = "sha256:2f7877ed45118de283786178eceaf877110abacd04fde31efff3940ae9672674", size = 207957, upload-time = "2026-04-10T14:27:59.285Z" },
{ url = "https://files.pythonhosted.org/packages/d6/ac/2b760516c03e2227826d1f7025d89bf6bf6357a28fe75c2a2800873c50bf/jiter-0.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:14c0cb10337c49f5eafe8e7364daca5e29a020ea03580b8f8e6c597fed4e1588", size = 204690, upload-time = "2026-04-10T14:28:00.962Z" },
{ url = "https://files.pythonhosted.org/packages/dc/2e/a44c20c58aeed0355f2d326969a181696aeb551a25195f47563908a815be/jiter-0.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:5419d4aa2024961da9fe12a9cfe7484996735dca99e8e090b5c88595ef1951ff", size = 191338, upload-time = "2026-04-10T14:28:02.853Z" },
]
[[package]]
name = "masforensics"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
{ name = "httpx", extra = ["socks"] },
{ name = "openai" },
{ name = "pyyaml" },
{ name = "regipy" },
]
@@ -129,6 +183,7 @@ dev = [
[package.metadata]
requires-dist = [
{ name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
{ name = "openai", specifier = ">=2.36.0" },
{ name = "pyyaml" },
{ name = "regipy", specifier = ">=6.2.1" },
]
@@ -139,6 +194,25 @@ dev = [
{ name = "pytest-asyncio", specifier = ">=1.3.0" },
]
[[package]]
name = "openai"
version = "2.36.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
{ name = "distro" },
{ name = "httpx" },
{ name = "jiter" },
{ name = "pydantic" },
{ name = "sniffio" },
{ name = "tqdm" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f4/a1/4d5e84cf51720fc1526cc49e10ac1961abcccb55b0efb3d970db1e9a2728/openai-2.36.0.tar.gz", hash = "sha256:139dea0edd2f1b30c33d46ae1a6929e03906254140318e4608e98fe8c566f2e7", size = 753003, upload-time = "2026-05-07T17:33:17.075Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9d/1c/5d43735b2553baae2a5e899dcbcd0670a86930d993184d72ca909bf11c9b/openai-2.36.0-py3-none-any.whl", hash = "sha256:143f6194b548dbc2c921af1f1b03b9f14c85fed8a75b5b516f5bcc11a2a50c63", size = 1302361, upload-time = "2026-05-07T17:33:15.063Z" },
]
[[package]]
name = "packaging"
version = "26.0"
@@ -157,6 +231,62 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
]
[[package]]
name = "pydantic"
version = "2.13.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-types" },
{ name = "pydantic-core" },
{ name = "typing-extensions" },
{ name = "typing-inspection" },
]
sdist = { url = "https://files.pythonhosted.org/packages/18/a5/b60d21ac674192f8ab0ba4e9fd860690f9b4a6e51ca5df118733b487d8d6/pydantic-2.13.4.tar.gz", hash = "sha256:c40756b57adaa8b1efeeced5c196f3f3b7c435f90e84ea7f443901bec8099ef6", size = 844775, upload-time = "2026-05-06T13:43:05.343Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fd/7b/122376b1fd3c62c1ed9dc80c931ace4844b3c55407b6fb2d199377c9736f/pydantic-2.13.4-py3-none-any.whl", hash = "sha256:45a282cde31d808236fd7ea9d919b128653c8b38b393d1c4ab335c62924d9aba", size = 472262, upload-time = "2026-05-06T13:43:02.641Z" },
]
[[package]]
name = "pydantic-core"
version = "2.46.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9d/56/921726b776ace8d8f5db44c4ef961006580d91dc52b803c489fafd1aa249/pydantic_core-2.46.4.tar.gz", hash = "sha256:62f875393d7f270851f20523dd2e29f082bcc82292d66db2b64ea71f64b6e1c1", size = 471464, upload-time = "2026-05-06T13:37:06.98Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8d/74/228a26ddad29c6672b805d9fd78e8d251cd04004fa7eed0e622096cd0250/pydantic_core-2.46.4-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:428e04521a40150c85216fc8b85e8d39fece235a9cf5e383761238c7fa9b96fb", size = 2102079, upload-time = "2026-05-06T13:38:41.019Z" },
{ url = "https://files.pythonhosted.org/packages/ad/1f/8970b150a4b4365623ae00fc88603491f763c627311ae8031e3111356d6e/pydantic_core-2.46.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:23ace664830ee0bfe014a0c7bc248b1f7f25ed7ad103852c317624a1083af462", size = 1952179, upload-time = "2026-05-06T13:36:59.812Z" },
{ url = "https://files.pythonhosted.org/packages/95/30/5211a831ae054928054b2f79731661087a2bc5c01e825c672b3a4a8f1b3e/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce5c1d2a8b27468f433ca974829c44060b8097eedc39933e3c206a90ee49c4a9", size = 1978926, upload-time = "2026-05-06T13:37:39.933Z" },
{ url = "https://files.pythonhosted.org/packages/57/e9/689668733b1eb67adeef047db3c2e8788fcf65a7fd9c9e2b46b7744fe245/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7283d57845ecf5a163403eb0702dfc220cc4fbdd18919cb5ccea4f95ee1cdab4", size = 2046785, upload-time = "2026-05-06T13:38:01.995Z" },
{ url = "https://files.pythonhosted.org/packages/60/d9/6715260422ff50a2109878fd24d948a6c3446bb2664f34ee78cd972b3acd/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8daafc69c93ee8a0204506a3b6b30f586ef54028f52aeeeb5c4cfc5184fd5914", size = 2228733, upload-time = "2026-05-06T13:40:50.371Z" },
{ url = "https://files.pythonhosted.org/packages/18/ae/fdb2f64316afca925640f8e70bb1a564b0ec2721c1389e25b8eb4bf9a299/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:cd2213145bcc2ba85884d0ac63d222fece9209678f77b9b4d76f054c561adb28", size = 2307534, upload-time = "2026-05-06T13:37:21.531Z" },
{ url = "https://files.pythonhosted.org/packages/89/1d/8eff589b45bb8190a9d12c49cfad0f176a5cbd1534908a6b5125e2886239/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7a5f930472650a82629163023e630d160863fce524c616f4e5186e5de9d9a49b", size = 2099732, upload-time = "2026-05-06T13:39:31.942Z" },
{ url = "https://files.pythonhosted.org/packages/06/d5/ee5a3366637fee41dee51a1fc91562dcf12ddbc68fda34e6b253da2324bb/pydantic_core-2.46.4-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:c1b3f518abeca3aa13c712fd202306e145abf59a18b094a6bafb2d2bbf59192c", size = 2129627, upload-time = "2026-05-06T13:37:25.033Z" },
{ url = "https://files.pythonhosted.org/packages/94/33/2414be571d2c6a6c4d08be21f9292b6d3fdb08949a97b6dfe985017821db/pydantic_core-2.46.4-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a7dd0b3ee80d90150e3495a3a13ac34dbcbfd4f012996a6a1d8900e91b5c0fb", size = 2179141, upload-time = "2026-05-06T13:37:14.046Z" },
{ url = "https://files.pythonhosted.org/packages/7b/79/7daa95be995be0eecc4cf75064cb33f9bbbfe3fe0158caf2f0d4a996a5c7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:3fb702cd90b0446a3a1c5e470bfa0dd23c0233b676a9099ddcc964fa6ca13898", size = 2184325, upload-time = "2026-05-06T13:36:53.615Z" },
{ url = "https://files.pythonhosted.org/packages/9f/cb/d0a382f5c0de8a222dc61c65348e0ce831b1f68e0a018450d31c2cace3a5/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:b8458003118a712e66286df6a707db01c52c0f52f7db8e4a38f0da1d3b94fc4e", size = 2323990, upload-time = "2026-05-06T13:40:29.971Z" },
{ url = "https://files.pythonhosted.org/packages/05/db/d9ba624cc4a5aced1598e88c04fdbd8310c8a69b9d38b9a3d39ce3a61ed7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:372429a130e469c9cd698925ce5fc50940b7a1336b0d82038e63d5bbc4edc519", size = 2369978, upload-time = "2026-05-06T13:37:23.027Z" },
{ url = "https://files.pythonhosted.org/packages/f2/20/d15df15ba918c423461905802bfd2981c3af0bfa0e40d05e13edbfa48bc3/pydantic_core-2.46.4-cp314-cp314-win32.whl", hash = "sha256:85bb3611ff1802f3ee7fdd7dbff26b56f343fb432d57a4728fdd49b6ef35e2f4", size = 1966354, upload-time = "2026-05-06T13:38:03.499Z" },
{ url = "https://files.pythonhosted.org/packages/fc/b6/6b8de4c0a7d7ab3004c439c80c5c1e0a3e8d78bbae19379b01960383d9e5/pydantic_core-2.46.4-cp314-cp314-win_amd64.whl", hash = "sha256:811ff8e9c313ab425368bcbb36e5c4ebd7108c2bbf4e4089cfbb0b01eff63fac", size = 2072238, upload-time = "2026-05-06T13:39:40.807Z" },
{ url = "https://files.pythonhosted.org/packages/32/36/51eb763beec1f4cf59b1db243a7dcc39cbb41230f050a09b9d69faaf0a48/pydantic_core-2.46.4-cp314-cp314-win_arm64.whl", hash = "sha256:bfec22eab3c8cc2ceec0248aec886624116dc079afa027ecc8ad4a7e62010f8a", size = 2018251, upload-time = "2026-05-06T13:37:26.72Z" },
{ url = "https://files.pythonhosted.org/packages/e8/91/855af51d625b23aa987116a19e231d2aaef9c4a415273ddc189b79a45fee/pydantic_core-2.46.4-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:af8244b2bef6aaad6d92cda81372de7f8c8d36c9f0c3ea36e827c60e7d9467a0", size = 2099593, upload-time = "2026-05-06T13:39:47.682Z" },
{ url = "https://files.pythonhosted.org/packages/fb/1b/8784a54c65edb5f49f0a14d6977cf1b209bba85a4c77445b255c2de58ab3/pydantic_core-2.46.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:5a4330cdbc57162e4b3aa303f588ba752257694c9c9be3e7ebb11b4aca659b5d", size = 1935226, upload-time = "2026-05-06T13:40:40.428Z" },
{ url = "https://files.pythonhosted.org/packages/e8/e7/1955d28d1afc56dd4b3ad7cc0cf39df1b9852964cf16e5d13912756d6d6b/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:29c61fc04a3d840155ff08e475a04809278972fe6aef51e2720554e96367e34b", size = 1974605, upload-time = "2026-05-06T13:37:32.029Z" },
{ url = "https://files.pythonhosted.org/packages/93/e2/3fedbf0ba7a22850e6e9fd78117f1c0f10f950182344d8a6c535d468fdd8/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c50f2528cf200c5eed56faf3f4e22fcd5f38c157a8b78576e6ba3168ec35f000", size = 2030777, upload-time = "2026-05-06T13:38:55.239Z" },
{ url = "https://files.pythonhosted.org/packages/f8/61/46be275fcaaba0b4f5b9669dd852267ce1ff616592dccf7a7845588df091/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:0cbe8b01f948de4286c74cdd6c667aceb38f5c1e26f0693b3983d9d74887c65e", size = 2236641, upload-time = "2026-05-06T13:37:08.096Z" },
{ url = "https://files.pythonhosted.org/packages/60/db/12e93e46a8bac9988be3c016860f83293daea8c716c029c9ace279036f2f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:617d7e2ca7dcb8c5cf6bcb8c59b8832c94b36196bbf1cbd1bfb56ed341905edd", size = 2286404, upload-time = "2026-05-06T13:40:20.221Z" },
{ url = "https://files.pythonhosted.org/packages/e2/4a/4d8b19008f38d31c53b8219cfedc2e3d5de5fe99d90076b7e767de29274f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7027560ee92211647d0d34e3f7cd6f50da56399d26a9c8ad0da286d3869a53f3", size = 2109219, upload-time = "2026-05-06T13:38:12.153Z" },
{ url = "https://files.pythonhosted.org/packages/88/70/3cbc40978fefb7bb09c6708d40d4ad1a5d70fd7213c3d17f971de868ec1f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:f99626688942fb746e545232e7726926f3be91b5975f8b55327665fafda991c7", size = 2110594, upload-time = "2026-05-06T13:40:02.971Z" },
{ url = "https://files.pythonhosted.org/packages/9d/20/b8d36736216e29491125531685b2f9e61aa5b4b2599893f8268551da3338/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fc3e9034a63de20e15e8ade85358bc6efc614008cab72898b4b4952bea0509ff", size = 2159542, upload-time = "2026-05-06T13:39:27.506Z" },
{ url = "https://files.pythonhosted.org/packages/1d/a2/367df868eb584dacf6bf82a389272406d7178e301c4ac82545ab98bc2dd9/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:97e7cf2be5c77b7d1a9713a05605d49460d02c6078d38d8bef3cbe323c548424", size = 2168146, upload-time = "2026-05-06T13:38:31.93Z" },
{ url = "https://files.pythonhosted.org/packages/c1/b8/4460f77f7e201893f649a29ab355dddd3beee8a97bcb1a320db414f9a06e/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:3bf92c5d0e00fefaab325a4d27828fe6b6e2a21848686b5b60d2d9eeb09d76c6", size = 2306309, upload-time = "2026-05-06T13:37:44.717Z" },
{ url = "https://files.pythonhosted.org/packages/64/c4/be2639293acd87dc8ddbcec41a73cee9b2ebf996fe6d892a1a74e88ad3f7/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:3ecbc122d18468d06ca279dc26a8c2e2d5acb10943bb35e36ae92096dc3b5565", size = 2369736, upload-time = "2026-05-06T13:37:05.645Z" },
{ url = "https://files.pythonhosted.org/packages/30/a6/9f9f380dbb301f67023bf8f707aaa75daadf84f7152d95c410fd7e81d994/pydantic_core-2.46.4-cp314-cp314t-win32.whl", hash = "sha256:e846ae7835bf0703ae43f534ab79a867146dadd59dc9ca5c8b53d5c8f7c9ef02", size = 1955575, upload-time = "2026-05-06T13:38:51.116Z" },
{ url = "https://files.pythonhosted.org/packages/40/1f/f1eb9eb350e795d1af8586289746f5c5677d16043040d63710e22abc43c9/pydantic_core-2.46.4-cp314-cp314t-win_amd64.whl", hash = "sha256:2108ba5c1c1eca18030634489dc544844144ee36357f2f9f780b93e7ddbb44b5", size = 2051624, upload-time = "2026-05-06T13:38:21.672Z" },
{ url = "https://files.pythonhosted.org/packages/f6/d2/42dd53d0a85c27606f316d3aa5d2869c4e8470a5ed6dec30e4a1abe19192/pydantic_core-2.46.4-cp314-cp314t-win_arm64.whl", hash = "sha256:4fcbe087dbc2068af7eda3aa87634eba216dbda64d1ae73c8684b621d33f6596", size = 2017325, upload-time = "2026-05-06T13:40:52.723Z" },
]
[[package]]
name = "pygments"
version = "2.20.0"
@@ -243,6 +373,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/65/eb/db13ab9b8d54e04f42b6619acca417ee37b07eb141a54884d13d20d7459e/regipy-6.2.1-py3-none-any.whl", hash = "sha256:b03110e5c4e12385e1ba53c032ccd120c6dcde1b71afb8c3b7aa4717a5a24e43", size = 134861, upload-time = "2026-01-22T15:26:05.653Z" },
]
[[package]]
name = "sniffio"
version = "1.3.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a2/87/a6771e1546d97e7e041b6ae58d80074f81b7d5121207425c964ddf5cfdbd/sniffio-1.3.1.tar.gz", hash = "sha256:f4324edc670a0f49750a81b895f35c3adb843cca46f0530f79fc1babb23789dc", size = 20372, upload-time = "2024-02-25T23:20:04.057Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
]
[[package]]
name = "socksio"
version = "1.0.0"
@@ -251,3 +390,36 @@ sdist = { url = "https://files.pythonhosted.org/packages/f8/5c/48a7d9495be3d1c65
wheels = [
{ url = "https://files.pythonhosted.org/packages/37/c3/6eeb6034408dac0fa653d126c9204ade96b819c936e136c5e8a6897eee9c/socksio-1.0.0-py3-none-any.whl", hash = "sha256:95dc1f15f9b34e8d7b16f06d74b8ccf48f609af32ab33c608d08761c5dcbb1f3", size = 12763, upload-time = "2020-04-17T15:50:31.878Z" },
]
[[package]]
name = "tqdm"
version = "4.67.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
]
[[package]]
name = "typing-extensions"
version = "4.15.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
]
[[package]]
name = "typing-inspection"
version = "0.4.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/55/e3/70399cb7dd41c10ac53367ae42139cf4b1ca5f36bb3dc6c9d33acdb43655/typing_inspection-0.4.2.tar.gz", hash = "sha256:ba561c48a67c5958007083d386c3295464928b01faa735ab8547c5692e87f464", size = 75949, upload-time = "2025-10-01T02:14:41.687Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611, upload-time = "2025-10-01T02:14:40.154Z" },
]