12 Commits

Author SHA1 Message Date
BattleTag
8b964b5dec docs(strategist) S8/9: DESIGN.md updates + DESIGN_STRATEGIST.md spec
DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design
big enough to need its own document, so it lives as a sibling to
DESIGN.md rather than inline.

DESIGN_STRATEGIST.md (new, 543 lines) covers:
  §0  Scope, non-goals, invariants preserved
  §1  Data model (Lead extension, InvestigationRound)
  §2  Six tools (graph_overview / source_coverage / marginal_yield /
      budget_status / propose_lead / declare_investigation_complete)
      with full input_schema
  §3  InvestigationStrategist agent class
  §4  Orchestrator Phase 3 loop pseudocode
  §5  Persistence + resume strategy
  §6  config schema
  §7  Test plan (8 scenarios)
  §8  9-step build order (matches commit history)
  §9  Risks + mitigations
  §10 Open questions
  §11 Required DESIGN.md updates (applied here)
  §12 What this design does NOT solve (exam-test coverage, vision-
      capable LLM, blockchain explorer, etc.)

DESIGN.md updates per §11:
  §4.5  Note harmonic damping is now landed
  §4.9  Phase 3 table row now points at the strategist loop +
        inline summary
  §5    Lead + InvestigationRound rows added to the data-model
        summary table

This commit closes the strategist refit. All 174 tests pass / 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:28:06 -10:00
BattleTag
388321ee30 feat(strategist) S7: strategist resume / open-round repair
DESIGN_STRATEGIST.md §5. Support resume from a crash mid-strategist-loop.

_resume_strategist_state inspects investigation_rounds for a tail entry
without completed_at — an "open" round, i.e. one that started but never
closed. Two repairs:

  1. Mark the round closed with strategist_action="interrupted_resume"
     so the run history reflects what actually happened.
  2. Walk that round's leads; any still in "assigned" state are
     re-marked as "failed" with failure_reason="interrupted before
     complete". The Retry-failed-leads + Gap-analysis passes that run
     after the strategist loop can pick them up.

Returns max(round_number) + 1 — the round at which to resume the loop.
On a clean graph (no prior rounds) returns 1 and makes no changes.

_phase3_strategist_loop now calls this helper before the main for-loop
and uses its return value as start_round, so a resume run lands at the
right round number rather than restarting from R1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:27:05 -10:00
BattleTag
093f3cec1f feat(strategist) S6: config.example.yaml schema for strategist + budgets
DESIGN_STRATEGIST.md §6. Document the strategist loop's tunables so
operators can override defaults without code changes.

config.yaml itself is gitignored (it carries the API key), so this
commit adds config.example.yaml as the tracked schema reference.
The runtime reads config.yaml; operators copy the example as a
starting point.

  strategist.enabled       — default true; false routes Phase 3 through
                             the legacy fixed-round loop instead.
  strategist.max_rounds    — orchestrator cap (default 10).
  strategist.hard_stop_marginal_yield_zero_rounds — safety net for
                             over-eager strategist + zero-yielding
                             workers (default 3).
  budgets.tool_calls_total — global tool-call hard cap.
  budgets.strategist_rounds_max — informational, surfaced via
                             budget_status (orchestrator enforces
                             via strategist.max_rounds instead).
  budgets.wall_clock_minutes_max — wall-clock hard cap.

Comment out any budget cap to make it unbounded — Orchestrator's
_budget_exceeded treats missing caps as no-op.

Legacy max_investigation_rounds is kept as the fallback used only when
strategist.enabled is false; documented inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:26:12 -10:00
BattleTag
a103c17bdb feat(strategist) S5: Phase 3 strategist loop in orchestrator
DESIGN_STRATEGIST.md §4. Replace the fixed-round hypothesis-directed
loop with a belief-driven strategist loop that runs the strategist
agent once per round and dispatches the leads it proposes.

New helpers on Orchestrator:
  _budget_exceeded()              hard budget caps (tool_calls,
                                  wall_clock_minutes), complementing
                                  strategist self-throttling.
  _execute_strategist_lead(lead)  dispatch one lead serially; the
                                  next strategist round sees the
                                  cumulative effect of this lead's
                                  graph mutations.
  _phase3_strategist_loop()       main loop. Open round, run strategist,
                                  exit on declare_complete or empty
                                  proposals, otherwise dispatch each
                                  lead, judge new phenomena, close round,
                                  apply yield/budget checks.
  _phase3_legacy_loop()           fallback when strategist.enabled is
                                  false. Identical to the
                                  pre-DESIGN_STRATEGIST behaviour.

The run() entry point branches on strategist_cfg.enabled (default
true) and always follows up with _retry_failed_leads() + Gap
Analysis + mark_remaining_inconclusive() regardless of variant.

Orchestrator.__init__ also wires graph.budgets and
graph.run_start_monotonic from config so the budget_status tool
sees real numbers.

Integration tests use a mock strategist + mock workers to verify
declare_complete, propose_lead -> worker dispatch, zero-yield-streak
hard stop, and budget-cap-stops-the-loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:25:04 -10:00
BattleTag
65745d21dc feat(strategist) S4: InvestigationStrategist agent
DESIGN_STRATEGIST.md §3. The smallest possible agent — its entire
output per round is one decision: propose 1-3 leads (each citing a
real hypothesis it expects to move) OR declare the investigation
complete with a reason.

Constraint surface:
  mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
  terminal_tools         = ("declare_investigation_complete",)

The agent inherits the BaseAgent forced-retry mechanism: if it returns
without calling either action tool, the orchestrator force-prompts a
RECORD-only retry. declare_complete being terminal means the
tool_call_loop short-circuits the moment the strategist decides
we're done.

_register_graph_tools overrides BaseAgent's default to skip
_register_graph_write_tools entirely — the strategist NEVER writes
phenomena, entities, edges, or hypotheses directly. All graph
mutations come from the workers it dispatches via leads. This keeps
the planning agent's responsibility surface narrow: read the graph,
choose what to do next, that's it.

Prompt walks through the workflow (call graph_overview / marginal_
yield / budget_status / source_coverage first, then take exactly
one terminal action) with decision criteria for propose vs stop.

Registered in agent_factory._AGENT_CLASSES["strategist"].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:22:05 -10:00
BattleTag
ff3a05d7ce feat(strategist) S3: propose_lead / declare_investigation_complete
DESIGN_STRATEGIST.md §2.5. The strategist's two write actions.

propose_lead validates motivating_hypothesis exists in the graph,
validates expected_evidence_type is a real edge type, validates
source_id refers to a real source in the case — fast specific
errors so the strategist gets fixable feedback rather than a
generic crash. On success, calls graph.add_lead with proposed_by=
"strategist" and round_number=graph.current_strategist_round so
the round-completion code can collect this round's leads.

declare_investigation_complete sets graph.strategist_complete_requested
which the orchestrator inspects after each strategist run to decide
whether to break the loop. reason must come from a closed enum so
the audit log is consistent.

EvidenceGraph gains two transient run-context fields:
  current_strategist_round       — set by orchestrator at start of round
  strategist_complete_requested  — flipped by declare_complete

These are intentionally NOT persisted — they're per-run flags, not
graph state.

Both tools required to be in InvestigationStrategist.mandatory_record_
tools (added in S4) so the agent's forced-retry mechanism kicks in if
it returns without taking a documented decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:21:13 -10:00
BattleTag
6ebbc675c1 feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status
DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses
to ground its decision each round.

  graph_overview()      — hypotheses table (log_odds, conf, edges_in,
                          distinct_sources, recent_flip), sources table,
                          pending leads. distinct_sources is the
                          critical signal: a hypothesis with 23 edges
                          but only 1 distinct_source has fragile cross-
                          source independence and is a candidate for
                          a corroboration-seeking lead.
  source_coverage(src)  — per-source ✓/✗ against an expected-artefact
                          catalogue. Catalogue is heuristic hints,
                          NOT a forced checklist. Footer reminds the
                          strategist to investigate ✗ items only when
                          an active hypothesis depends on them — this
                          is the "应试能力存在但不被绑死" guardrail.
  marginal_yield(N)     — new phenomena / edges / status flips per
                          recent round. Two consecutive zero-yield
                          rounds = strong signal to declare complete.
  budget_status()       — usage vs caps (tool_calls, rounds, wall
                          clock). Pacing warnings at 70% / 90%.

tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type
table of (name, detector, value_for) entries. Detectors are
substring patterns on tool name + args; the matcher resolves at
call time against graph.tool_invocations. Catalogue covers iOS /
Android / Windows disk / media-collection / archive source types.

All four tools registered in tool_registry, listed as read-only in
llm_client.READ_ONLY_TOOLS for parallel execution. They go through
the invocation-logging wrapper so the strategist's reads are
themselves auditable (the wrapper does NOT cache them — graph
state changes between calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:19:54 -10:00
BattleTag
ca96f29849 feat(strategist) S1: Lead extension + InvestigationRound model
DESIGN_STRATEGIST.md §1. Foundation for the Phase 3 strategist loop.

Lead now carries four annotations that let the orchestrator measure
marginal yield per lead and dedupe strategist proposals:
  - proposed_by         (agent that proposed it: "strategist", "filesystem", …)
  - motivating_hypothesis (hyp-id the lead is meant to corroborate/refute)
  - expected_evidence_type (edge type the lead's worker should produce)
  - round_number        (0 = Phase 1 lead, ≥1 = strategist-proposed)

add_lead idempotently dedupes strategist proposals on
(motivating_hypothesis, expected_evidence_type, target_agent, source_id)
to prevent the "strategist loops on the same lead" failure mode.

New InvestigationRound dataclass records per-round provenance: before/
after hypothesis status snapshots, phenomena + edge count deltas, and
the strategist's decision_rationale. ``new_phenomena_count``,
``new_edges_count``, ``status_flips`` are derived properties that the
marginal_yield tool will use.

start_investigation_round / complete_investigation_round /
get_investigation_round / latest_round / leads_from_round complete the
lifecycle. complete is idempotent on already-closed rounds.

Lead.from_dict is forward-compat for state files written before this
commit. InvestigationRound persists as a top-level list in
graph_state.json (auto-save + load_state both wired).

EvidenceGraph also gains graph.budgets and graph.run_start_monotonic
fields that the budget_status view (S2) will read; orchestrator
populates them in S5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:18:35 -10:00
BattleTag
8020c24776 fix(graph): harmonic damping for repeated same-edge_type evidence
First full-case run (runs/2026-05-20T20-15-04/) produced hypotheses
with log_odds +31 (8 direct_evidence + 15 supports). That's the
naive-Bayes independence assumption breaking down: 15 different
phenomena all "supporting" the same hypothesis from one source are
not 15 independent pieces of evidence, they're highly correlated.
DESIGN.md §4.5 last bullet flagged this as a "未实施旋钮" — this
commit implements it.

Rule: the k-th edge of a given (hyp_id, edge_type) contributes
log_lr_base / k instead of log_lr_base. Cumulative is harmonic
sum H_N, bounded by ~ ln N. Single-edge hypotheses unaffected
(k=1 → /1 → no change). Replaying the 2026-05-20 graph's 108
edges under the new rule pulls the top hypothesis from +31.0 →
+8.75; the smallest active hypothesis from +4.0 → +2.08.

Also adds rank + log_lr_base to confidence_log entries so the
math is auditable from the persisted graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:16:37 -10:00
BattleTag
f04ccd4bc7 fix(base_agent): forced-retry iter cap 10→30 + narrow tools to record+read
Timeline agent on the 2026-05-20 full run produced 0 phenomena: initial
round hit max_iterations=60 cap before recording, forced retry then hit
max_iterations=10 cap because every grounding-rejected call burns one
iteration in the new gateway. Two changes restore depth without re-
introducing the original "agent wanders off and never records" failure:

  1. Raise retry cap 10 → 30. With grounding auto-rescue (prev commit)
     most rejections heal on the first retry, but some still need 2-3
     turns; 10 is empirically too tight, 30 leaves headroom.

  2. Narrow the retry tool surface to RECORD + graph-write +
     read-only-graph-query tools. Investigation tools (list_directory,
     sqlite_query, parse_registry_key) are dropped on retry so the agent
     can't restart its search loop — the retry is explicitly "record
     what you already found, then stop".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:15:08 -10:00
BattleTag
6b485b98f7 fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error
First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.

Two-layer fix in evidence_graph.validate_fact_grounding:

  Layer A (silent heal): when the cited inv-id misses, search the same
  agent / task's invocations for one whose output contains the value
  (strict or normalised substring). If exactly one matches, rewrite
  fact.invocation_id in place and accept. Multi-match is NOT auto-
  rescued — the candidate ids go back to the LLM so it picks deliberately.

  Layer B (informative retry): GroundingError now appends the agent's
  recent invocation ids and a brief tool-call summary, so the LLM has
  the real ids in front of it for the next attempt rather than
  fabricating again from memory.

Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:14:20 -10:00
BattleTag
81ade8f7ac feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source
Consolidates the long-running refit work (DESIGN.md as authoritative spec)
into a single baseline commit. Six stages landed together:

  S1  Case + EvidenceSource abstraction; tools parameterised by source_id
      (case.py, main.py multi-source bootstrap, .bin extension support)
  S2  Grounding gateway in add_phenomenon: verified_facts cite real
      ToolInvocation ids; substring / normalised match enforced; agent +
      task scope checked. Phenomenon.description split into verified_facts
      (grounded) + interpretation (free text). [invocation: inv-xxx]
      prefix on every wrapped tool result so the LLM can cite.
  S3  Confidence as additive log-odds: edge_type → log10(LR) calibration
      table; commutative updates; supported / refuted thresholds derived
      from log_odds; hypothesis × evidence matrix view.
  S4  iOS plugin: unzip_archive + parse_plist / sqlite_tables /
      sqlite_query / parse_ios_keychain / read_idevice_info;
      IOSArtifactAgent; SOURCE_TYPE_AGENTS routing.
  S5  Cross-source entity resolution: typed identifiers on Entity,
      observe_identity gateway, auto coref hypothesis with shared /
      conflicting strong/weak LR edges, reversible same_as edges,
      actor_clusters() view.
  S6  Android partition probe + AndroidArtifactAgent; MediaAgent with
      OCR fallback; orchestrator Phase 1 iterates every analysable
      source; platform-aware get_triage_agent_type; ReportAgent renders
      actor clusters + per-source breakdown.

142 unit tests / 1 skipped — full coverage of the new gateway, log-odds
math, coref hypothesis fall-out, and orchestrator multi-source dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:12:10 -10:00
28 changed files with 8153 additions and 276 deletions

317
DESIGN.md Normal file
View File

@@ -0,0 +1,317 @@
# MASForensics 系统改造设计
> 目标:把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
> (已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿)。
>
> 触发本次改造的实际案件2025 美亚杯资格赛 Individual —— 5 份证据
> 1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图),
> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
---
## 1. 设计原则(贯穿全文的不变式)
1. **LLM 提议,代码裁决**。LLM 负责语言/分类/感知;它**不持有案件状态、
不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中;
支持一种新设备 = 写插件,绝不改核心。
4. **看似不可逆的操作(如实体归并)实为可逆、带证据的论断**,可被推翻。
这四条不是口号——下文每个设计决策都对应其中一条。
---
## 2. 现状问题诊断
| # | 问题 | 位置 | 后果 |
|---|---|---|---|
| P1 | **单镜像假设深植**:工具是闭包绑死 `image_path`,图是单源,主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools``main.py:91-153` | 无法摄取多份证据,无法跨设备关联 |
| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话,错误事实进入案件记录且**事后无法识别** |
| P3 | **置信度公式无统计含义且有序依赖缺陷**`delta=weight*(1-conf)`(正)/`weight*conf`(负),正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
| P4 | **工件分类是 Windows 专属**:靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
| P6 | **镜像发现靠扩展名 glob**`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备,跨源关联无锚点 |
改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
---
## 3. 目标架构
```
case.yaml ──► Case ──► N × EvidenceSource
├ id / type / owner / path
└ access_mode: image | tree
┌──────────────┴───────────────┐
image-backed tree-backed
(TSK, inode 寻址) (路径寻址:已挂载/已解包)
│ │
└────────────┬─────────────────┘
SourceRegistry ── source_id → SourceHandle解析 path/offset/mode
ToolRegistry ── 工具按 access_mode 注册,调用时绑定 source_id
┌──────────────────────┼───────────────────────┐
▼ ▼ ▼
Knowledge-Source Graph Write Gateway ToolInvocationLog
Agents (LLM) ──► (唯一写入口,强制 (每次工具调用留痕:
只能经网关写图 前置条件 = grounding args / 输出 / sha256
│ │
└──────────────────────┴──► Grounded Evidence Graph (GEG)
Phenomenon / Hypothesis / Entity
置信度 = 对数几率累加
```
**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
`AgentFactory` 动态组合——这些设计是好的,不重写,只适配。
---
## 4. 核心设计
### 4.1 证据源抽象(解决 P1/P5/P6/P7地基
新增 `case.py`
- **`EvidenceSource`** 数据类:`id``label``type``owner`(关联人)、
`path``access_mode``meta`(类型特定,如分区 offset / 解包后根目录)。
- **`Case`**:持有 `list[EvidenceSource]` + 案件元数据,从 `case.yaml` 加载。
- **`access_mode` 是关键设计区分**
- `image`:块设备/磁盘镜像,用 TSK 按 inode 寻址USB E01、安卓 `blk0_sda` 各分区)。
- `tree`已挂载文件系统或已解包目录按路径寻址iOS 提取解压后、归档展开后)。
- 工具按 access_mode 分族注册(见 4.2)。一份证据可经「准备」从 image 变为 tree
(如分区 mount、zip 解包)。
`main.py``select_image_interactive`:91-153改为加载/构造 `Case`
`_IMAGE_GLOBS` 改为类型探测(`mmls` 试探 + 文件头嗅探),不再靠扩展名。
`config.yaml` 删除 `cfreds_hacking_case`,案件信息移入 `case.yaml`
### 4.2 工具注册按源参数化(解决 P1
现状:`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
`tool_registry.py:159+`)。改造:
- 工具执行器签名增加 `source_id`;执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
- `TOOL_CATALOG``access_mode` 标注工具适用性agent 拿到的工具集由其
负责的源类型决定。
- **「当前源」上下文**:编排器为 agent 设置 current source类比现有
`graph._current_agent`工具默认作用于它——LLM 不必每次传 `source_id`
(减少出错)。跨源工具(时间线合并、实体查询)显式跨源。
- 缓存键 `_cache_key``tool_registry.py:41`)纳入 `source_id`,防止跨源串味。
### 4.3 图写入网关(解决 P2落实原则 1
现状agent 通过 `add_phenomenon` 等工具直接写图,约束只在 prompt。改造
- 所有图变更(`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …)
收敛到**一个写入网关**。网关在代码层强制前置条件。
- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
INVESTIGATE→RECORD→LINK→ANSWER不变——变的是 RECORD 这一步底下的网关变严。
- `base_agent.py``mandatory_record_tools` 机制保留(它保证 agent 真的记录了东西)。
### 4.4 证据落地约束 Grounding解决 P2落实原则 2
这是系统可靠性的核心机制。
**ToolInvocationLog**:每次工具调用留痕一条记录
`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`
现有结果缓存(`tool_registry.py:29`)已存确定性输出,扩展为完整留痕即可。
**Phenomenon 一分为二**——把「事实」和「解读」分开:
- `verified_facts`: `list[{type, value, invocation_id}]`
`type ∈ {path, timestamp, inode, hash, identifier, count, ...}`
- `interpretation`: 自由文本agent 的分析叙述。
**`add_phenomenon` 网关前置条件**
1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`
2. 代码校验 `fact.value` 命中该次调用的输出:
- 文本输出 → 逐字 substring 匹配;
- 结构化/二进制工具输出 → 与解析后的字段匹配。
3. 任一 fact 不通过 → **整条拒绝写入**,返回失败的 factagent 须修正重试。
4. 通过 → 写入;`verified_facts` 每条带 `invocation_id`(可重跑复核),
`interpretation` 标记为「未核验分析」。
**效果**:在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
**结构性地不可能**。LLM 仍可能写错 `interpretation`,但报告会把
verified facts带重跑指令的引证与 interpretation明确标注的分析
**分开渲染**,人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
> 现有 `_make_auto_record``tool_registry.py:126`)把工具输出直接转 phenomenon——
> 那是「平凡落地」的特例(描述即输出),新设计是它的一般化与形式化。
### 4.5 假设置信度:似然比 / 对数几率(解决 P3
`evidence_graph.py:26``_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
换成基于**似然比LR**的对数几率累加:
- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
(这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …),
数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**延续现有「LLM 选类型、
代码算数值」哲学并赋予统计基础)。
- 置信度更新:
```
L_post = L_prior + Σ log₁₀(LR_i) # 对数几率,可交换 → 无序依赖
confidence = 1 / (1 + 10^(L_post))
```
- 边类型 → `log₁₀(LR)` 标定表(初值,后续可由标注案例校准):
| 边类型 | log₁₀LR |
|---|---:|
| `direct_evidence` | +2.0 |
| `supports` / `consequence_observed` | +1.0 |
| `prerequisite_met` | +0.5 |
| `weakens` | 0.5 |
| `contradicts` | 2.0 |
- 阈值不变≥0.8 supported / ≤0.2 refuted只是改由 `L_post` 推出。
- `prior_prob` 成为可配置量(默认 0.5 → `L_prior=0`)。
- **同类证据调和衰减**2026-05 落地):同 `(hypothesis, edge_type)` 的第 k 条边
贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`(调和级数,~ ln N
解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
2026-05-20 实战数据。单条边不变k=1, 衰减=1.0)。**结构信号**比绝对值
更重要strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
附带产出一个 **假设 × 证据矩阵**视图,供报告与线索选择使用。
### 4.6 跨源实体解析(解决「复杂场景」的关联难题,落实原则 4
复杂取证的核心难题iPhone keychain 里的 Apple ID、安卓短信库里的号码、
USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人?**
**关键设计:「身份共指」本身就是一条假设**——于是实体解析不是独立子系统,
而是 4.5 假设机制的复用:
- agent 观察到标识符即经网关 `observe_identity`,记一条**类型化**的标识符
强标识符IMEI / 钱包地址 / email / 电话号;弱标识符:昵称 / 显示名),
挂到暂定 `Entity`。
- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`;共享强标识符 = 强 +LR 边,
共享弱标识符 = 弱 +LR 边,冲突的强标识符 = 强 LR 边——用 4.5 同一套计算打分。
- **不做破坏性归并**:跨阈值时在两个 Entity 间加一条 `same_as` 边(由该 coref
假设背书)。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
可被后续 contradicts 证据推翻**(落实原则 4
- **Blocking**:只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设,
避免 O(n²)。
跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
### 4.7 能力插件层(接入 5 类证据)
每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
| 插件 | 摄取 | 新工具 | 知识源 agent |
|---|---|---|---|
| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
| **安卓整盘** | `mmls` 分区→各分区 `image` 源;可 mount 为 `tree` | 复用 TSKext4/F2FS 读取;`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
| **磁盘镜像(E01)** | 已支持TSK 含 ewf | 现有 TSK 工具链 | 现有 filesystem/registry |
| **归档** | `unzip_archive` 通用解包 | —— | —— |
| **媒体/截图** | —— | `ocr_image`tesseract注意 DeepSeek 无视觉能力,必须走 OCR | `MediaAgent` |
**安卓风险**`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
探明未加密→TSK 直接用;加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展:分类由源插件提供自己的
工件分类表,而非全局 Windows 关键词表(解决 P4
### 4.8 Agent 体系重组
现有 7 个 agent 按 Windows 工件命名registry、communication=邮件/IRC、
network=浏览器/PCAP。改为按**调查职能**组织,并增加平台特定 agent
- `agent_factory.py` 的 `_AGENT_CLASSES`:34-40扩充新增 `ios_artifact`、
`android_artifact`、`financial`(钱包/交易)、`media`。
- `communication` 泛化:邮件 + IM + 短信,跨平台。
- 新增 **源类型 → 适任 agent** 映射,供 Phase 1 逐源派 triage agent。
- `create_specialized_agent`:69的动态组合机制保留——它本就是应对能力缺口的
正确手段,只是工具目录变大后选择空间更丰富。
### 4.9 编排器多源流水线
| 阶段 | 改造 |
|---|---|
| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**,每源派类型适配的 agent |
| Phase 2 | 假设跨源生成;身份共指假设在此首次登记 |
| Phase 3 | **Strategist 循环**LLM 元 agent 每轮看图决定 propose_lead 或 declare_completeworkers 执行 leadhypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
| Phase 4 | 跨源时间线合并,**按源做时区归一**iOS UTC vs 安卓本地时间) |
| Phase 5 | 一案一份综合报告:含假设结论、实体关联图、每条结论的 provenance 引证 |
**Phase 3 的"LLM 决定深度"**2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地):调度层从代码硬决策("max_rounds=N, converged→stop")转为 LLM 元 agent 驱动。
- 新 agent `InvestigationStrategist``agents/strategist.py`每轮取一个动作propose 1-3 lead或 declare_investigation_complete
- 4 个只读视图工具:`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status``tools/strategy.py`)让 LLM 看到调度信号
- 2 个写入决策工具:`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
断连恢复、运行归档逻辑保留;`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
---
## 5. 数据模型变更汇总
| 节点/结构 | 变更 |
|---|---|
| `EvidenceSource` | **新增**一等节点(`src-*` |
| `ToolInvocation` | **新增**留痕记录(`inv-*`),随 graph 持久化 |
| `Phenomenon` | + `source_id`description 拆为 `verified_facts[]` + `interpretation`;澄清/移除语义含混的 `confidence`(默认 1.0),观测的可靠性由 grounding 表达 |
| `Hypothesis` | + `prior_prob`、`log_odds`(累加量);`confidence` 改为派生值 |
| `Entity` | + 类型化标识符集合;通过 `same_as` 边跨源连通 |
| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS`);同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
| Entity→Entity 边 | **新增** `same_as`(由 coref 假设背书,可逆) |
| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`strategist 注解) |
| `InvestigationRound` | **新增**strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
---
## 6. 组件改动清单
| 文件 | 改动 |
|---|---|
| `case.py` | **新建**`Case` / `EvidenceSource` / `SourceRegistry` |
| `main.py` | 选源逻辑改为加载 `Case`;类型探测替代扩展名 glob |
| `tool_registry.py` | 工具按 `source_id` 参数化;缓存键含 source`_auto_categorize` 改可扩展;`ToolInvocationLog` |
| `evidence_graph.py` | 数据模型变更(第 5 节LR/对数几率置信度;写入网关 + grounding 校验 |
| `base_agent.py` | RECORD 走网关;`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
| `agent_factory.py` | `_AGENT_CLASSES` 扩充源类型→agent 映射 |
| `orchestrator.py` | Phase 1 逐源Phase 4 跨源时区归一Phase 5 综合报告 |
| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py``communication.py` 泛化 |
| `tools/` | 新增 `mobile_ios.py`plist/sqlite/keychain、`media.py`OCR、`archive.py`(解包) |
| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`;新建 `case.yaml` 证据清单 |
---
## 7. 构建顺序(按依赖排序)
| 阶段 | 内容 | 依赖 | 价值 |
|---|---|---|---|
| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基;先只在 USB E01 上跑通验证不破坏现有逻辑 |
| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心;可量化「零幻觉录入」 |
| **S3** | 4.5 LR/对数几率置信度 | 独立(可与 S2 并行) | 修 P3置信度可辩护 |
| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联,复杂场景能力成型 |
| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
S1+S2+S3 是「把系统改对」S4-S6 是「把能力铺全」。建议严格按序——
S1 不稳,后面全是空中楼阁。
---
## 8. 设计取舍与未决问题
1. **grounding 对自由文本的边界**:只硬核验 `verified_facts` 里的结构化原子,
`interpretation` 不做逐字核验(诚实划界)。可加一个二级 lint扫描
interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
2. **LR 标定表初值人定**:先用第 4.5 节的初值跑通;「从标注案例学习 LR」是后续工作。
3. **安卓 userdata 加密**:能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
4. **实体解析的破坏性 vs 可逆**:本设计选**可逆的 `same_as` 边**而非破坏性归并——
牺牲一点查询效率换取完全可审计可回滚,符合原则 4。
5. **报告粒度**:定为「一案一份综合报告」,内嵌每证据小节 + 跨源关联,
而非每证据独立成篇。

543
DESIGN_STRATEGIST.md Normal file
View File

@@ -0,0 +1,543 @@
# Strategist Loop —— Phase 3 信念驱动改造
> 这是 DESIGN.md 的补充设计文档,针对 §4.9 编排器 Phase 3 的具体重写。
>
> **触发动因**2026-05-20 第一次全 6-source 实战(`runs/2026-05-20T20-15-04/`
> 暴露 Phase 3 不工作——8 条 pending leads 一个都没派发,因为
> log-odds 通胀让所有 hypothesis 立即 converged。即使在「调和衰减」修复
> log-odds 数学后commit 在 `evidence_graph.py:update_hypothesis_confidence`
> Phase 3 在当前架构下仍然是「单轮触发、规则收敛」的机械流程——LLM
> 在调度层完全没有发言权。本设计把 Phase 3 改为 LLM 驱动的探索循环。
---
## 0. 范围
### 做什么
`orchestrator.py:Phase 3` 从「单轮、规则触发」改造为「strategist-loop、信念驱动」
新增一个 `InvestigationStrategist` agent + 4 个决策视图工具 + 2 个决策动作工具
+ 编排器循环改写。
### 不做什么
- 不改 Phase 1per-source triage 保持现状)
- 不改 Phase 2HypothesisAgent 不动strategist 可以**调用**它,但不替代)
- 不改 Phase 4/5timeline / report
- 不写专家级 per-source 检查清单(只在 `source_coverage` 工具里塞**软提示**清单)
- 不引入新的图节点类型leads 复用现有结构
### 保留的不变式
- DESIGN.md §4.3 grounding 网关,所有写入仍走它
- DESIGN.md §4.5 log-odds + 调和衰减
- DESIGN.md §4.4 verified_facts vs interpretation 划界
- 断连恢复(`graph_state.json` 序列化兼容)
### 设计原则
1. **"LLM 提议,代码裁决" 上移到调度层**DESIGN.md 第一原则现在只在事实层
grounding兑现调度层「该不该深入、深入哪里、何时停」目前是代码硬决策。
本设计让 LLM 持有调度决策权。
2. **应试能力存在但不被绑死**:系统的工具集和软提示清单覆盖应试场景所需的工件
类别;但是否查某个工件、查到什么深度,由 strategist 看具体案件性质决定,
不被预定义清单强制。
3. **可解释、可审计**:每一轮 strategist 决策、动机、产出收益都被记入持久化的
`InvestigationRound`,可事后复盘。
---
## 1. 数据模型变更
### 1.1 `Lead` 扩 4 字段
`evidence_graph.py:Lead` 现有 `(id, title, description, target_agent, source_id, status, …)`
新增:
```python
@dataclass
class Lead:
# ... existing fields
proposed_by: str = "" # "strategist" | "filesystem" | ... — 提案 agent
motivating_hypothesis: str = "" # hyp-id this lead is meant to corroborate/refute
expected_evidence_type: str = "" # one of edge_types — 期望产出的边类型
round_number: int = 0 # 哪一轮 strategist 产生
```
`motivating_hypothesis` 是关键——它把 lead 和 hypothesis 显式挂钩,让事后能算
"这条 lead 跑完到底有没有改变假设状态",即 strategist 的边际收益度量。
### 1.2 新增 `InvestigationRound` 节点
记录每一轮 strategist 的决策本身——provenance 也要可审计:
```python
@dataclass
class InvestigationRound:
id: str # "round-001"
round_number: int
started_at: str
completed_at: str = ""
strategist_action: str = "" # "propose_leads" | "declare_complete"
leads_proposed: list[str] = field(default_factory=list)
leads_executed: list[str] = field(default_factory=list)
hypothesis_status_snapshot_before: dict = field(default_factory=dict) # hyp_id → status
hypothesis_status_snapshot_after: dict = field(default_factory=dict)
new_phenomena_count: int = 0
new_edges_count: int = 0
decision_rationale: str = "" # strategist 自述
```
随 graph 序列化(加进 `to_dict`/`from_dict`)。
---
## 2. 新工具
放在新文件 `tools/strategy.py`。按现有 `TOOL_CATALOG` 注册模式登记。
### 2.1 `graph_overview()` — 全局态势(只读)
**Signature**: `graph_overview() -> str`
**输出**markdown比 JSON 更易 LLM 解读):
```markdown
# Investigation State
## Hypotheses (8)
| id | title | L | conf | status | edges_in | distinct_sources | flipped_in_last_2_rounds |
|----|-------|---|------|--------|----------|------------------|---------------------------|
| hyp-83db8748 | Multi-Device Composite | +8.75 | 0.99 | supported | 23 | 1 | no |
| hyp-daa7c704 | Multiple Identity Aliases | +9.21 | 0.99 | supported | 11 | 3 | no |
| hyp-7fa9b13e | Sunny.zip contains timer_a | +2.08 | 0.99 | supported | 4 | 1 | yes (active→supported in R2) |
| ...
## Sources (6)
| id | type | phenomena | identities | last_touched_in_round |
| src-usb-leung | disk_image | 8 | 1 | R1 |
| ...
## Pending Leads (3)
| id | from | targeting | for_hypothesis | reason |
| lead-aaa | filesystem | src-ios-chan/Safari | hyp-83db8748 | Safari history likely contains device-switching evidence |
```
**关键标注**`distinct_sources` 一栏暴露了"这个假设只靠一个源支撑"——strategist
看到 23 边都来自 android 源会自动判断"需要从别处独立证据"。
### 2.2 `source_coverage(source_id: str)` — 单源覆盖度(只读)
**Signature**: `source_coverage(source_id: str) -> str`
**实现**:扫 `graph.tool_invocations`,过滤 `source_id == 该源`,按工具名 + 主要 args
分组。然后跟 `EXPECTED_ARTEFACTS[source_type]` 比对,未触达项打 ✗。
```python
# tools/strategy.py
EXPECTED_ARTEFACTS: dict[str, list[dict]] = {
"disk_image+windows": [
{"name": "filesystem layout", "detector": "fls|mmls", "value_for": "deleted files, hidden partitions"},
{"name": "registry hives", "detector": "parse_registry_key", "value_for": "user activity, installed software"},
{"name": "browser history", "detector": "list_directory@AppData/.../History", "value_for": "URL access, downloads"},
{"name": "prefetch", "detector": "extract_file@Windows/Prefetch", "value_for": "program execution evidence"},
# ...
],
"mobile_extraction": [
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb", "value_for": "contacts"},
{"name": "SMS messages", "detector": "sqlite_query@sms.db", "value_for": "messaging content"},
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite", "value_for": "WhatsApp content"},
{"name": "Call history", "detector": "sqlite_query@CallHistoryDB", "value_for": "call records"},
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text@Bookmarks.plist", "value_for": "web browsing"},
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite", "value_for": "photo metadata, EXIF, geolocation"},
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3.sqlite|parse_keychain", "value_for": "Apple ID, services"},
{"name": "App inventory", "detector": "list_directory@var/containers/Bundle/Application", "value_for": "installed apps"},
],
"disk_image+android": [...],
"media_collection": [
{"name": "OCR text", "detector": "ocr_image", "value_for": "screenshot text"},
{"name": "EXIF metadata", "detector": "exif_image", "value_for": "device, timestamps, geolocation"},
],
}
```
**软提示语义**output 末尾必带一句:
> Coverage hints are heuristics, not requirements. Skip an item if the case theory
> makes it irrelevant. Investigate ✗ items only when they could materially affect
> an active hypothesis.
这一句是**"应试能力存在但不被绑死"的关键**——LLM 看到 ✗ 不会盲投,会先看
hypothesis 列表问"这个工件对当前任何 hypothesis 有意义吗"。
### 2.3 `marginal_yield(last_n_rounds: int = 2)` — 边际收益(只读)
**Signature**: `marginal_yield(last_n_rounds: int = 2) -> str`
**实现**:扫最近 N 个 `InvestigationRound`,统计:
- 每轮新增 phenomena 数
- 每轮新增 P→H 边数
- 每轮 hypothesis status flips 数active→supported / 反向)
**输出**
```markdown
# Marginal Yield (last 2 rounds)
| round | new_phenomena | new_edges | status_flips |
| R3 | 5 | 7 | 1 |
| R4 | 2 | 1 | 0 |
Trend: decelerating (R4 yield 33% of R3).
Recommendation interpretation aid: yield trending to zero suggests diminishing
returns; consider declare_complete after one more probe.
```
最后一行是 LLM-friendly heuristic prose不是强制信号。
### 2.4 `budget_status()` — 预算视图(只读)
**Signature**: `budget_status() -> str`
```markdown
# Budget Status
| metric | used | cap | pct |
| tool_calls | 1248 | 5000 | 25% |
| strategist_rounds | 3 | 10 | 30% |
| wall_clock_minutes | 142 | 360 | 39% |
Phase 1 used 89% of allocated. Phase 2 used 4%. Phase 3 (strategist) so far: 7%.
```
预算从 config.yaml 读,新增字段见 §6。无预算配置时进 unbounded 模式(仅靠
strategist 自宣 complete + hard safety cap
### 2.5 决策动作工具(写入)
注册到 strategist 的 `mandatory_record_tools`。Strategist 每轮必须 call 至少一个,
否则 forced-retry 触发(复用现有机制)。
**`propose_lead(...)`**
```python
{
"name": "propose_lead",
"input_schema": {
"type": "object",
"required": [
"description", "target_agent",
"motivating_hypothesis", "expected_evidence_type",
],
"properties": {
"description": {
"type": "string",
"description": "1-2 sentence specific investigation request, including target source/artefact",
},
"target_agent": {
"type": "string",
"enum": ["filesystem","registry","communication","network","ios_artifact","android_artifact","media"],
},
"source_id": {"type": "string", "description": "which source to investigate"},
"motivating_hypothesis": {
"type": "string",
"description": "hyp-id this lead is meant to corroborate or refute",
},
"expected_evidence_type": {
"type": "string",
"enum": ["direct_evidence","supports","contradicts","weakens","prerequisite_met","consequence_observed"],
},
"rationale": {"type": "string", "description": "why this fills a real gap"},
}
}
}
```
**`declare_investigation_complete(...)`**
```python
{
"name": "declare_investigation_complete",
"input_schema": {
"type": "object",
"required": ["reason"],
"properties": {
"reason": {
"type": "string",
"enum": [
"marginal_yield_zero",
"budget_exhausted",
"all_hypotheses_resolved",
"coverage_saturated",
"other",
],
},
"rationale": {"type": "string"},
}
}
}
```
Terminal tool —— 调用即结束循环(复用现有 `terminal_tools` 机制)。
---
## 3. `InvestigationStrategist` agent
新文件 `agents/strategist.py`,约 150 行。
```python
class InvestigationStrategist(BaseAgent):
name = "strategist"
role = (
"You are the investigation strategist. You do not run forensic tools yourself. "
"Your job is to read the current evidence graph and decide ONE of:\n"
" (a) propose 1-3 new investigation leads that would materially affect an active hypothesis, or\n"
" (b) declare the investigation complete.\n"
"\n"
"Use graph_overview / source_coverage / marginal_yield / budget_status to ground your judgment. "
"DO NOT propose a lead that just adds more same-direction evidence to an already-supported hypothesis "
"(harmonic damping makes it ~useless). DO propose leads when:\n"
" - A hypothesis is supported by edges from only ONE source — get cross-source corroboration.\n"
" - A hypothesis is in the active band (0.2 < conf < 0.8) — it needs the deciding evidence.\n"
" - A specific high-value artefact is uncovered on a source where the active hypotheses suggest it matters.\n"
"\n"
"Declare complete when marginal_yield is approaching zero AND no remaining active hypotheses have "
"obvious investigation paths."
)
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
terminal_tools = ("declare_investigation_complete",)
def _register_graph_tools(self):
# Read-only tools — strategist NEVER writes phenomena/edges directly.
# All graph writes happen via the workers it dispatches.
self._register_graph_read_tools()
# No graph_write_tools.
# Add strategy-specific tools:
for tool_name in (
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
"propose_lead", "declare_investigation_complete",
):
td = TOOL_CATALOG[tool_name]
self.register_tool(td.name, td.description, td.input_schema, td.executor)
```
注册到 `agent_factory._AGENT_CLASSES["strategist"]`
---
## 4. 编排器改造
### 4.1 删除/替换:现在的 Phase 3
`orchestrator.py:Phase 3` 当前逻辑(约 150 行):检查 leads → 派 worker →
检查 converged → 退出。**删除**。
### 4.2 新 Phase 3strategist loop
```python
async def _phase3_strategist_loop(self, run_dir: Path) -> None:
"""Belief-driven investigation: strategist proposes, workers execute, repeat."""
_log("Phase 3: Strategist-Driven Investigation", event="phase")
strategist = self.factory.get_or_create_agent("strategist")
max_rounds = self.config.get("budgets", {}).get("strategist_rounds_max", 10)
for round_num in range(1, max_rounds + 1):
# 1. Record round start + snapshot
rid = await self.graph.start_investigation_round(round_num)
# 2. Strategist run
_log(f"Strategist Round {round_num}", event="phase")
await strategist.run(
f"Review the graph and decide the next investigation action. "
f"This is round {round_num}/{max_rounds}. Budget used so far: see budget_status."
)
# 3. Did strategist declare complete?
if self.graph.is_round_terminal(rid):
_log(f"Strategist declared complete at round {round_num}", event="progress")
break
# 4. Collect new leads proposed this round
new_leads = self.graph.leads_from_round(round_num)
if not new_leads:
_log(f"No leads proposed in round {round_num} — stopping", event="progress")
break
# 5. Dispatch each lead
for lead in new_leads:
await self._execute_lead(lead, round_num)
# 6. Close round + record yield
await self.graph.complete_investigation_round(rid)
# 7. Hard budget check
if self._budget_exceeded():
_log(f"Budget exhausted at round {round_num}", event="progress")
break
```
### 4.3 `_execute_lead` 复用现有 worker 派发逻辑
```python
async def _execute_lead(self, lead: Lead, round_num: int) -> None:
agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
worker = self.factory.get_or_create_agent(agent_type)
if worker is None:
logger.warning(f"No worker for lead {lead.id}: {agent_type}")
return
src = self.graph.case.get_source(lead.source_id) if lead.source_id else None
if src:
self.graph.set_active_source(src)
_log(
f"Round {round_num} dispatching: {lead.description}",
event="dispatch", agent=agent_type,
)
await worker.run(
f"Investigate this specific lead from the strategist:\n\n"
f"REQUEST: {lead.description}\n"
f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis}\n"
f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type}\n"
f"RATIONALE: {lead.rationale}\n\n"
f"After investigating, record findings via add_phenomenon AND link relevant phenomena "
f"to {lead.motivating_hypothesis} via the appropriate edge_type."
)
lead.status = "completed"
self.graph._auto_save()
```
### 4.4 自动 hypothesis 重生成(可选,建议加)
新增 phenomena 可能产生**新假设**(不只是更新现有假设)。让 strategist 用
`propose_lead(target_agent="hypothesis", description="re-examine recent phenomena for new hypotheses")`
显式触发——这是 strategist 自决定的,不是定时触发。一致性优于自动定时。
---
## 5. 状态持久化
`graph_state.json` 新增顶层 key `investigation_rounds: list[InvestigationRound]`
`save_state` / `load_state` 处理。**断连恢复**时:
- 找最近一个未 completed 的 round → 视为该 round 失败
- 从下一个 round 重新开始
- 已完成 round 的 phenomena / edges 自然保留
---
## 6. 配置
`config.yaml` 新增:
```yaml
strategist:
enabled: true # false = 走老 Phase 3 逻辑safety fallback
max_rounds: 10
hard_stop_marginal_yield_zero_rounds: 3 # 连续 3 轮 yield=0 强制停
budgets:
tool_calls_total: 5000
wall_clock_minutes_max: 480
```
---
## 7. 测试策略
新文件 `tests/test_strategist.py` 或加入 `test_optimizations.py`。最少要测:
1. Strategist 调 `declare_complete` 时 loop 立即退出
2. Strategist 调 `propose_lead` 时 lead 入 graph 且 round_number 正确
3. Round snapshot 正确捕获 before/after status
4. 预算耗尽时即使 strategist 还想继续也强制停
5. 断连恢复:中途中断后重启从下一 round 开始
6. `graph_overview` 输出包含 `distinct_sources` 标注
7. `source_coverage` 对未触达项标 ✗
8. `marginal_yield` 数字与 `confidence_log` 一致
不写 LLM 集成测试——strategist 行为通过 mock LLM 验证(已有这种模式见
`test_forced_record_retry_fires_when_zero_phenomena`)。
---
## 8. 实施顺序
按依赖排(**每步独立 commit**——结构性改造,单点回滚关键):
| 步 | 内容 | 依赖 | 工作量估算 |
|---|---|---|---|
| 1 | `Lead` 加 4 字段 + `InvestigationRound` 数据类 + 序列化 | — | 60 行 + 测试 |
| 2 | `graph_overview` / `source_coverage` / `marginal_yield` / `budget_status` 实现 | 1 | 250 行 + 测试 |
| 3 | `propose_lead` / `declare_investigation_complete` 工具 | 1 | 80 行 + 测试 |
| 4 | `InvestigationStrategist` agent class | 2, 3 | 120 行 + 测试 |
| 5 | 编排器 Phase 3 重写 | 4 | 150 行(替换 ~50 行旧)+ 测试 |
| 6 | config schema + 加载逻辑 | 5 | 30 行 |
| 7 | 断连恢复处理 | 5 | 40 行 + 测试 |
| 8 | 真实案件 smoke run小规模USB only | 7 | 0 代码 |
| 9 | 文档DESIGN.md §4.9 改写 + 本文件归档 | 8 | 文档 |
总:~800 行新代码 + 测试 + 文档。
---
## 9. 风险 + 缓解
| 风险 | 缓解 |
|---|---|
| Strategist 太保守(永远 declare_complete | 加 prompt 例子展示什么是"该深入的情况";测试时小样本验证 |
| Strategist 太激进(每轮都 propose 7+ leads | `propose_lead` 工具 schema 限制每轮最多 3-5 个prompt 强调"重质不重量" |
| 单 worker 跑不完 lead 导致预算雪崩 | worker 调用本身 max_iter 不变strategist 预算独立 |
| LLM 不理解 `distinct_sources` 这种暗示 | `graph_overview` 末尾加 1-2 句 plain-English 解读 "Hypothesis X has 23 edges but all from one source → cross-source corroboration would strengthen it" |
| Phase 1 触发产生的 leads 被 strategist 忽略 | strategist prompt 明确"先处理已有 pending leads再产新的" |
| 死循环strategist 反复产同样 lead | Lead 表上加 `(motivating_hyp, expected_type, source_id)` 三元组去重 |
| `EXPECTED_ARTEFACTS` 清单维护成本 | 故意保持"软提示"——清单不完整也不会破,只是某些深度需要更多 LLM 自觉 |
---
## 10. 开放问题
1. **InvestigationRound 该不该自己跑 hypothesis agent**
倾向 strategist 用 lead 显式触发(一致性更好),不做定时触发。
2. **预算超用怎么办——硬停 vs 软警告?**
当前设计硬停;可加 "strategist 看到 budget < 10% 时只能 declare_complete"
的 schema enforcement。
3. **跨 source 边的"独立性奖励"是否纳入 log-odds**
上次衰减用了 `1/k`,没区分跨源 vs 同源。如果要纳入,公式应改为
`1/k_within_source × bonus_for_distinct_sources`。这是后续单独工程。
4. **Strategist 输出的 `rationale` 该不该走 grounding**
它不会写 phenomena`rationale` 字段可能包含具体值
"based on inv-12345...")。倾向不强制——这是元层判断,不是事实落地。
5. **现 Phase 3 的 `max_investigation_rounds` config 留还是删?**
建议留作 `strategist.enabled=false` 时的 fallback 旋钮。
---
## 11. 与 DESIGN.md 的关系
本文档落地后DESIGN.md 需要的对应更新:
- **§4.5**:补一段「同时也要看 log_odds 的**结构**——edges_in 数 / distinct_sources
是 strategist 判断是否深入的关键信号,不只是 confidence 数值」
- **§4.9 Phase 3**表格内容从「leads 派发到源感知 agent」改为
「strategist 循环:看图、提案、执行、复盘、停 / 续」
- **§8**(设计取舍):新增第 6 条:「调度层 LLM 化的取舍——strategist 决定深度,
但每轮预算受 `budgets.*` 硬限制;这是"LLM 提议、代码裁决"原则在调度层的兑现」
---
## 12. 备忘:本设计**不解决**的问题
- 应试题 8% 命中率的根因是**工具集不全**(无 vision、无 ZIP 暴力破解、无 VeraCrypt
挂载、无 blockchain explorer不是调度问题。strategist 让现有工具被用得更狠,
但不会凭空多出工具。
- LLM 编造 `invocation_id`(已修补,见 `feedback_grounding_pending` memory
log-odds 通胀(已修补:调和衰减)是本设计的**前置依赖**,不在本设计范围内。
- Per-edge-type 的更精细贝叶斯建模(如跨源独立性 bonus是独立工程。

View File

@@ -24,12 +24,16 @@ def _load_agent_classes() -> None:
"""Lazy-import agent classes to avoid circular imports."""
if _AGENT_CLASSES:
return
from agents.android_artifact import AndroidArtifactAgent
from agents.communication import CommunicationAgent
from agents.filesystem import FileSystemAgent
from agents.hypothesis import HypothesisAgent
from agents.ios_artifact import IOSArtifactAgent
from agents.media import MediaAgent
from agents.network import NetworkAgent
from agents.registry import RegistryAgent
from agents.report import ReportAgent
from agents.strategist import InvestigationStrategist
from agents.timeline import TimelineAgent
_AGENT_CLASSES["filesystem"] = FileSystemAgent
_AGENT_CLASSES["registry"] = RegistryAgent
@@ -38,6 +42,51 @@ def _load_agent_classes() -> None:
_AGENT_CLASSES["timeline"] = TimelineAgent
_AGENT_CLASSES["hypothesis"] = HypothesisAgent
_AGENT_CLASSES["report"] = ReportAgent
_AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
_AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
_AGENT_CLASSES["media"] = MediaAgent
_AGENT_CLASSES["strategist"] = InvestigationStrategist
# Triage agent per (source.type, platform). disk_image is ambiguous on its
# own — both a Windows USB image and an Android raw dump are disk_image —
# so the routing helper also looks at source.meta.platform when present.
SOURCE_TYPE_AGENTS: dict[str, str] = {
"disk_image": "filesystem", # default for unknown platform
"mobile_extraction": "ios_artifact",
"archive": "filesystem",
"media_collection": "media",
}
# Per-platform overrides for disk_image sources. Keys come from
# source.meta.platform in case.yaml (lowercased).
_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
"windows": "filesystem",
"linux": "filesystem",
"android": "android_artifact",
"ios": "ios_artifact",
}
def get_triage_agent_type(source) -> str:
"""Pick the right Phase-1 agent for *source*.
Accepts either an :class:`EvidenceSource` or a raw source.type string
(for back-compat with the S5 signature). Disk-image sources additionally
consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
both type=disk_image — get different agents.
"""
# Back-compat: accept a plain type string.
if isinstance(source, str):
return SOURCE_TYPE_AGENTS.get(source, "filesystem")
src_type = getattr(source, "type", "disk_image")
if src_type == "disk_image":
meta = getattr(source, "meta", {}) or {}
platform = str(meta.get("platform", "")).lower()
if platform in _DISK_IMAGE_PLATFORM_AGENTS:
return _DISK_IMAGE_PLATFORM_AGENTS[platform]
return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")
logger = logging.getLogger(__name__)

View File

@@ -0,0 +1,58 @@
"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
its own analysable surface. Ext4-backed partitions (typically SYSTEM,
USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
``search_strings``. Userdata is the prize and is often FBE-encrypted on
modern devices — the agent must check fsstat before assuming readability
(see ``probe_android_partitions`` for the survey).
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class AndroidArtifactAgent(BaseAgent):
name = "android_artifact"
role = (
"Android forensic analyst. You navigate raw Android disk dumps "
"(blk0_sda-style images) partition by partition. Workflow: call "
"probe_android_partitions ONCE to map the disk; pick the partitions "
"with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
"EFS); for each, call set_active_partition(offset_from_512_sector_column) "
"and then list_directory / extract_file / search_strings as usual. "
"For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
"to search_strings — they have no filesystem. If USERDATA shows "
"fs_type=unknown it is almost certainly FBE-encrypted: record that "
"as a negative finding (the absence IS evidence) and move on to "
"what's reachable."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
# Android-specific
"probe_android_partitions",
"set_active_partition",
# Reused TSK toolset — partition_offset comes from active_source
"partition_info", "filesystem_info", "list_directory",
"extract_file", "find_file", "search_strings",
"count_deleted_files", "build_filesystem_timeline",
# Generic parsers
"read_text_file", "read_binary_preview", "search_text_file",
"read_text_file_section", "list_extracted_dir", "find_files",
# SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
"sqlite_tables", "sqlite_query",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

49
agents/ios_artifact.py Normal file
View File

@@ -0,0 +1,49 @@
"""iOS Artifact Agent — analyses unpacked iOS extractions.
DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
the system handles (alongside disk images and pcaps). This agent owns the
iOS-specific toolset; the grounded ``add_phenomenon`` contract from
BaseAgent applies unchanged — every fact must cite a tool invocation.
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class IOSArtifactAgent(BaseAgent):
name = "ios_artifact"
role = (
"iOS forensic analyst. You analyse unpacked iOS extractions — "
"binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
"AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
"iDevice_info.txt summary — to extract device identity, accounts, "
"messaging, contacts, and credential metadata. Domain-rooted iOS "
"trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
"are your map; navigate by path, not by inode."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
# navigation — find_files is the workhorse on 10k+-file iOS trees;
# list_extracted_dir is for initial layout summary only.
"list_extracted_dir", "find_files",
"read_text_file", "read_text_file_section", "read_binary_preview",
"search_text_file",
# iOS-specific parsers
"parse_plist",
"sqlite_tables", "sqlite_query",
"parse_ios_keychain",
"read_idevice_info",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

52
agents/media.py Normal file
View File

@@ -0,0 +1,52 @@
"""Media Agent — OCR-based analysis of screenshot/photo evidence.
DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
evidence must go through tesseract first. The agent runs OCR, then
records extracted strings — especially identifiers (wallet addresses,
phone numbers, usernames) — via the grounded observe_identity gateway so
they participate in cross-source coref the same way iOS keychain entries
or Windows account names do.
If the OCR runtime is missing on the host, ocr_image returns an explicit
install hint; the agent should record that as a negative finding ("no
text extracted — tesseract not installed") rather than guessing.
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class MediaAgent(BaseAgent):
name = "media"
role = (
"Media / OCR forensic analyst. You analyse screenshots, photos, and "
"scanned documents — any pixel-based evidence the LLM cannot read "
"directly. Workflow: list_extracted_dir to enumerate images, "
"ocr_image on each promising one, then add_phenomenon (with the "
"OCR'd text as the verified_fact value) and observe_identity for "
"any wallet addresses, phone numbers, email addresses, or "
"usernames the text contains. If OCR fails because tesseract is "
"missing, RECORD that as a negative finding instead of fabricating "
"image content — the absence is a real fact about this run."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
"ocr_image",
"list_extracted_dir", "find_files",
"read_binary_preview",
"read_text_file",
"search_text_file",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

View File

@@ -12,9 +12,20 @@ class ReportAgent(BaseAgent):
role = (
"Forensic report writer. You synthesize all findings from the investigation "
"into a structured, professional forensic analysis report organized by hypotheses.\n\n"
"Only include findings that have a source_tool attribution (marked VERIFIED). "
"If evidence lacks source attribution, mark it as UNVERIFIED. "
"Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence."
"Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
"TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
"writing the report, render verified_facts as primary evidence with their "
"invocation citations, and render interpretation as 'agent analysis' so the "
"reader can tell ground truth from inference. Do NOT invent or fabricate any "
"data, timestamps, or findings not present in the evidence.\n\n"
"This is a cross-source case: phenomena come from multiple evidence "
"sources, and entities discovered on different sources may refer to the "
"same real-world actor. ALWAYS include:\n"
" - 'Findings by Source' section sourced from get_phenomena_by_source\n"
" - 'Actor Clusters' section sourced from get_actor_clusters (the "
"cross-source attribution view — multi-source clusters answer "
"'which findings on different devices belong to the same person')\n"
" - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
)
# Calling save_report is BOTH the recording action and the completion
# signal. tool_call_loop returns the moment save_report executes; the
@@ -38,9 +49,12 @@ class ReportAgent(BaseAgent):
f"Investigation state:\n{self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info "
f" to gather all the data needed for the report. Make these calls in parallel.\n"
f"2. Assemble the complete markdown forensic report.\n"
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
f" get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
f" and get_phenomena_by_source in parallel — these are the eight data\n"
f" sources you assemble the report from.\n"
f"2. Assemble the complete markdown forensic report. Cross-source\n"
f" actor clusters and per-source breakdown are MANDATORY sections.\n"
f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
f" This single call is the completion signal — the run ENDS the moment it executes.\n"
f" Do NOT call any read tools after this point; they will not run.\n"
@@ -83,6 +97,45 @@ class ReportAgent(BaseAgent):
executor=self._get_entities,
)
self.register_tool(
name="get_hypothesis_evidence_matrix",
description=(
"Render the hypothesis × evidence pivot as a markdown table. "
"Columns: per edge_type counts, log_odds, confidence, status. "
"Embed this directly in the report to show how each hypothesis "
"stands relative to the others on a single screen."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_hypothesis_evidence_matrix,
)
self.register_tool(
name="get_actor_clusters",
description=(
"Render the cross-source actor clusters: each cluster is the "
"set of Entity nodes the system currently treats as the same "
"actor (via active same_as edges backed by coref hypotheses "
"≥ 0.8). Includes the aggregated identifier evidence per "
"cluster. Use this in the report's 'Entities / Actors' "
"section so readers see who-is-who across devices, not just "
"raw entity rows."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_actor_clusters,
)
self.register_tool(
name="get_phenomena_by_source",
description=(
"Group every phenomenon by its originating evidence source "
"(source_id). Use this to drive the report's 'Findings by "
"Source' section so each evidence item's per-device "
"contribution is auditable."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_phenomena_by_source,
)
self.register_tool(
name="save_report",
description="Save the final report to a file.",
@@ -115,12 +168,24 @@ class ReportAgent(BaseAgent):
items = [ph for ph in phenomena.values() if ph.category == cat]
lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
for ph in items:
verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
# Grounded = at least one verified fact AND a source_tool.
grounded = bool(ph.verified_facts) and bool(ph.source_tool)
marker = "GROUNDED" if grounded else (
"TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
)
lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
lines.append(f" Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
if ph.timestamp:
lines.append(f" Timestamp: {ph.timestamp}")
lines.append(f" {ph.description[:500]}")
if ph.verified_facts:
lines.append(f" Verified facts ({len(ph.verified_facts)}):")
for f in ph.verified_facts:
lines.append(
f" - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
f"(cite: {f.get('invocation_id','?')})"
)
if ph.interpretation:
lines.append(f" Analysis: {ph.interpretation[:500]}")
return "\n".join(lines)
async def _get_hypotheses_with_evidence(self) -> str:
@@ -150,12 +215,87 @@ class ReportAgent(BaseAgent):
return "\n".join(lines)
async def _get_case_info(self) -> str:
info = self.graph.case_info
lines = ["=== Case Information ==="]
for k, v in info.items():
lines.append(f" {k}: {v}")
lines.append(f" Image path: {self.graph.image_path}")
lines.append(f" Partition offset: {self.graph.partition_offset}")
case = self.graph.case
if case is not None:
lines.append(f" case_id: {case.case_id}")
lines.append(f" name: {case.name}")
for k, v in (case.meta or {}).items():
lines.append(f" {k}: {v}")
lines.append(f" sources: {len(case.sources)}")
for s in case.sources:
owner = f", owner={s.owner}" if s.owner else ""
platform = s.meta.get("platform") if s.meta else None
plat = f", platform={platform}" if platform else ""
lines.append(
f" - {s.id}: {s.label} "
f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
)
else:
# Legacy single-image fallback — surface whatever case_info dict
# was passed in (e.g. the old CFReDS MD5 block).
for k, v in (self.graph.case_info or {}).items():
lines.append(f" {k}: {v}")
lines.append(f" Image path: {self.graph.image_path}")
lines.append(f" Partition offset: {self.graph.partition_offset}")
return "\n".join(lines)
async def _get_hypothesis_evidence_matrix(self) -> str:
return self.graph.hypothesis_evidence_matrix_markdown()
async def _get_actor_clusters(self) -> str:
clusters = self.graph.actor_clusters()
if not clusters:
return "(no entities recorded)"
# Show multi-member clusters first — they're the cross-source links
# the human reader most needs to see.
clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
for i, c in enumerate(clusters, 1):
members = c["members"]
label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
for eid in members:
ent = self.graph.entities.get(eid)
if ent:
lines.append(f" - {ent.summary()}")
if c["identifiers"]:
lines.append(" Aggregated identifiers:")
for ident in c["identifiers"]:
strong_tag = "strong" if ident.get("strong") else "weak"
lines.append(
f" [{strong_tag}] {ident.get('type')}={ident.get('value')} "
f"(on {ident.get('on_entity')})"
)
if c["coref_hypotheses"]:
lines.append(" Backing coref hypotheses (≥0.8 active):")
for hid in c["coref_hypotheses"]:
hyp = self.graph.hypotheses.get(hid)
if hyp:
lines.append(f" - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
return "\n".join(lines)
async def _get_phenomena_by_source(self) -> str:
by_src: dict[str, list] = {}
for ph in self.graph.phenomena.values():
by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
if not by_src:
return "(no phenomena recorded)"
# Resolve source labels via graph.case when possible.
def _label(src_id: str) -> str:
if self.graph.case:
src = self.graph.case.get_source(src_id)
if src:
return f"{src_id}{src.label} ({src.type})"
return src_id
lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
for src_id in sorted(by_src):
phs = by_src[src_id]
lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
for ph in phs:
grounded = "G" if ph.verified_facts and ph.source_tool else "·"
lines.append(f" [{grounded}] {ph.summary()}")
return "\n".join(lines)
async def _get_entities(self) -> str:
@@ -174,18 +314,27 @@ class ReportAgent(BaseAgent):
return "\n".join(lines)
async def _verify_phenomena(self) -> str:
verified = []
unverified = []
grounded: list[str] = []
tool_only: list[str] = []
unverified: list[str] = []
for ph in self.graph.phenomena.values():
entry = f" [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
if ph.source_tool:
verified.append(entry)
nf = len(ph.verified_facts)
entry = (
f" [{ph.category}] {ph.title} "
f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
)
if ph.verified_facts and ph.source_tool:
grounded.append(entry)
elif ph.source_tool:
tool_only.append(entry)
else:
unverified.append(entry)
lines = ["=== Phenomena Verification Report ==="]
lines.append(f"\nVERIFIED ({len(verified)}have source_tool):")
lines.extend(verified)
lines.append(f"\nGROUNDED ({len(grounded)}facts + source_tool):")
lines.extend(grounded)
lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
lines.extend(tool_only)
lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
lines.extend(unverified)
return "\n".join(lines)

134
agents/strategist.py Normal file
View File

@@ -0,0 +1,134 @@
"""InvestigationStrategist — the LLM that decides depth vs breadth.
DESIGN_STRATEGIST.md §3.
The strategist does NOT run forensic tools. Its job per round is exactly one
decision: propose 1-3 leads that would move an active hypothesis, OR declare
the investigation complete. It reads the graph through four read-only views
(graph_overview / source_coverage / marginal_yield / budget_status) and
expresses its decision through two write tools (propose_lead /
declare_investigation_complete).
This is the smallest possible agent in the system — the entire point is that
strategy decisions live in one agent so they're auditable and the rest of the
codebase doesn't carry implicit depth/breadth policy.
"""
from __future__ import annotations
import logging
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
logger = logging.getLogger(__name__)
class InvestigationStrategist(BaseAgent):
name = "strategist"
role = (
"Investigation strategist. You do not run forensic tools yourself. "
"Each round you take ONE decision: propose 1-3 new investigation leads "
"that would materially affect an active hypothesis, OR declare the "
"investigation complete. Your judgment is grounded in the graph "
"(hypotheses, sources, coverage, marginal yield, budget) — never in "
"speculation."
)
# At least one of these must be called every round, otherwise BaseAgent's
# forced RECORD retry kicks in and re-prompts the strategist to take a
# documented decision.
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
# declare_complete is terminal — calling it short-circuits the tool loop,
# which is what we want (strategist returns immediately on "done").
terminal_tools = ("declare_investigation_complete",)
# Strategist-specific tools, plus the read-only graph queries inherited
# from BaseAgent. NO graph write tools (no add_phenomenon / link_to_entity
# / observe_identity); the strategist must NOT mutate evidence directly.
_STRATEGY_TOOLS = (
"graph_overview",
"source_coverage",
"marginal_yield",
"budget_status",
"propose_lead",
"declare_investigation_complete",
)
def _register_graph_tools(self) -> None:
"""Strategist gets read-only graph queries + the six strategy tools.
It does NOT get write tools (no add_phenomenon, observe_identity,
link_to_entity, add_temporal_edge). Every graph mutation must come
from a dispatched worker, not from the planner.
"""
self._register_graph_read_tools()
for tool_name in self._STRATEGY_TOOLS:
td = TOOL_CATALOG.get(tool_name)
if td is None:
logger.warning(
"Strategist could not find tool %s in TOOL_CATALOG — "
"register_all_tools must run before agent instantiation.",
tool_name,
)
continue
self.register_tool(td.name, td.description, td.input_schema, td.executor)
def _build_system_prompt(self, task: str) -> str:
"""Strategist-specific prompt. Replaces the BaseAgent default which
walks an INVESTIGATE→RECORD→LINK workflow that is wrong for a
planner agent.
"""
return (
f"You are {self.name}, the investigation strategist.\n"
f"Role: {self.role}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW (do this exactly):\n"
f" 1. Call graph_overview FIRST. Look at: which hypotheses are\n"
f" active (conf 0.2-0.8) vs already supported/refuted; which\n"
f" ones have many edges but only 1 distinct_source; which had\n"
f" a recent_flip vs none in two rounds.\n"
f" 2. Call marginal_yield to see if the last rounds produced anything.\n"
f" 3. Call budget_status to know your runway.\n"
f" 4. For each candidate lead direction, call source_coverage on\n"
f" the relevant source to see what's been touched.\n"
f" 5. Take exactly ONE of these terminal actions:\n"
f" (a) Call propose_lead 1-3 times for leads that would\n"
f" materially move an active hypothesis. STOP after this.\n"
f" (b) Call declare_investigation_complete with a specific\n"
f" reason. STOP after this.\n"
f"\n"
f"DECISION CRITERIA — when to propose vs when to stop:\n"
f" PROPOSE when:\n"
f" - A hypothesis is supported only by ONE source — get\n"
f" cross-source corroboration. Same-source repeats are\n"
f" cheap (harmonic damping).\n"
f" - A hypothesis is in the active band (0.2 < conf < 0.8) —\n"
f" it needs the deciding evidence.\n"
f" - A high-value artefact is ✗ on source_coverage AND an\n"
f" active hypothesis depends on the kind of evidence that\n"
f" artefact would produce.\n"
f" STOP (declare_complete) when:\n"
f" - marginal_yield shows zero across 2+ rounds.\n"
f" - budget_status warns ≥90% on tool_calls or rounds.\n"
f" - all active hypotheses are resolved (supported or refuted).\n"
f" - coverage saturation: every ✗ on every source is irrelevant\n"
f" to active hypotheses.\n"
f"\n"
f"HARD RULES:\n"
f" - You CANNOT call investigation tools (list_directory,\n"
f" sqlite_query, parse_registry_key, extract_file, etc.) — your\n"
f" job is to direct workers, not to investigate yourself.\n"
f" - You CANNOT call write tools (add_phenomenon, observe_identity,\n"
f" link_to_entity, add_hypothesis, add_temporal_edge). All\n"
f" evidence mutations come from the workers you dispatch.\n"
f" - Every propose_lead MUST cite a real hyp-id from\n"
f" graph_overview's table — fabricated ids will be rejected.\n"
f" - Don't propose more than 3 leads in one round. Quality over\n"
f" quantity — a 4th lead almost always means you're not really\n"
f" sure what would move the graph.\n"
f" - Don't re-propose a lead that's already pending. The system\n"
f" deduplicates (motivating_hyp, expected_type, agent, source)\n"
f" so duplicates silently no-op, but they waste your budget."
)

View File

@@ -122,7 +122,15 @@ class TimelineAgent(BaseAgent):
lines = []
for ph in items:
lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
lines.append(f" {ph.description[:150]}")
preview = ph.interpretation[:150] if ph.interpretation else ""
if ph.verified_facts:
fact_preview = ", ".join(
f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
for f in ph.verified_facts[:3]
)
preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
if preview:
lines.append(f" {preview}")
return "\n".join(lines)
async def _add_temporal_edge(

View File

@@ -5,6 +5,7 @@ from __future__ import annotations
import json
import logging
import time
import uuid
from typing import Any
from evidence_graph import EvidenceGraph
@@ -36,7 +37,9 @@ class BaseAgent:
# forced retry with an explicit "you forgot to record" instruction.
# Subclasses override to declare their own recording responsibility
# (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
# observe_identity (S5) counts as a recording too — it writes through the
# same grounding gateway and produces an identity_observation phenomenon.
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")
# Tools whose invocation ends the run immediately. After any terminal tool
# is called, tool_call_loop returns with that tool's result text as
@@ -110,8 +113,23 @@ class BaseAgent:
f" Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
f" Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
f" Create add_lead for anything outside your expertise.\n\n"
f"Phase B — RECORD PHENOMENA:\n"
f" For EACH significant finding from Phase A, call add_phenomenon.\n"
f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
f" For EACH significant finding from Phase A, call add_phenomenon with:\n"
f" * interpretation: your analysis — free text, NOT verified.\n"
f" * verified_facts: one entry per concrete atom (path, timestamp,\n"
f" inode, hash, identifier, count) you want recorded as truth.\n"
f" Each entry MUST have:\n"
f" - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
f" - value: a VERBATIM substring from the tool output\n"
f" - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
f" header at the top of the tool result that produced this value\n"
f" IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
f" whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
f" address, MAC, UDID, persistent nickname, or display name. Same\n"
f" grounding contract: value must be verbatim in the cited tool\n"
f" output. This is HOW cross-source attribution gets built — without\n"
f" it, we can't tell whether the Apple ID in keychain belongs to the\n"
f" same person as the Windows account on the USB.\n"
f" Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
f"Phase C — LINK ENTITIES:\n"
f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
@@ -125,20 +143,22 @@ class BaseAgent:
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
f"a registry key) and found NOTHING, that absence IS evidence — call "
f"add_phenomenon with a 'No matches for X' title and the search scope in "
f"raw_data. Negative findings constrain the hypothesis space and prevent "
f"the next agent from wasting time re-searching.\n"
f"add_phenomenon with a 'No matches for X' title, the search scope in "
f"raw_data, and cite the search tool's invocation_id (verified_facts may "
f"be empty for a true negative; the cited invocation in source_tool still "
f"anchors it). Negative findings constrain the hypothesis space.\n"
f"- If you stop without having called add_phenomenon at least once, the task "
f"is FAILED and a forced retry will fire.\n"
f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
f"that produced each finding.\n\n"
f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
f"- ONLY record findings that appear VERBATIM in tool results you received\n"
f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
f"- Quote exact strings from tool output when recording evidence descriptions\n"
f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
f"is FAILED and a forced retry will fire.\n\n"
f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
f" is what you cite in each fact's invocation_id.\n"
f"- fact.value must be a substring of the cited invocation's output.\n"
f" Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
f" anything else fabricated is REJECTED with a per-fact reason.\n"
f"- On REJECTED: quote the literal text from the output (or drop the\n"
f" fact), and put guesses / inferred paths / model names in\n"
f" `interpretation` instead. Then call add_phenomenon again.\n"
f"- You may cite ONLY invocations made within THIS task."
)
async def run(self, task: str, lead_id: str | None = None) -> str:
@@ -146,6 +166,11 @@ class BaseAgent:
_log(task, event="agent_start", agent=self.name)
self.graph.agent_status[self.name] = "running"
self.graph._current_agent = self.name
# Fresh task scope per agent run. Used by the grounding gateway to
# check that facts in add_phenomenon cite invocations made *within
# this run* — preventing the agent from forwarding stale IDs from
# earlier work or another agent.
self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
self._current_lead_id = lead_id
self._register_graph_tools()
@@ -203,12 +228,27 @@ class BaseAgent:
f"what you already found. Then end."
),
})
# Narrow the retry tool surface so the agent can't wander off
# to investigate again — only RECORD and read-only graph
# query tools survive. Each grounding-rejected call burns one
# iteration, so the cap is 30 (not the original 10): a
# Timeline agent writing ~10 temporal edges with one rejection
# apiece needs ~20 turns under the rewritten gateway.
retry_tool_names = set(registered_mandatory) | {
"list_phenomena", "list_assets", "search_graph",
"add_temporal_edge", "link_to_entity", "add_lead",
"add_hypothesis", "save_report",
}
retry_tools = [
td for td in self.get_tool_definitions()
if td["name"] in retry_tool_names
]
final_text, _ = await self.llm.tool_call_loop(
messages=conversation,
tools=self.get_tool_definitions(),
tools=retry_tools,
tool_executor=self._executors,
system=system,
max_iterations=10,
max_iterations=30,
terminal_tools=self.terminal_tools,
)
@@ -350,20 +390,67 @@ class BaseAgent:
self.register_tool(
name="add_phenomenon",
description=(
"Record a forensic finding (phenomenon) on the evidence graph. "
"You MUST specify source_tool: the name of the tool call that produced this finding."
"Record a forensic finding on the evidence graph. The finding is "
"split into provenance-bound atoms (verified_facts) and free-form "
"analysis (interpretation). Each fact MUST cite the invocation_id "
"of a tool call you made in THIS task — the gateway checks every "
"fact's value against that call's real output, byte-for-byte. "
"Any fact that fails grounding causes the whole record to be "
"rejected with a list of failures; fix the facts and call again."
),
input_schema={
"type": "object",
"properties": {
"category": {"type": "string", "description": "Category of the finding."},
"title": {"type": "string", "description": "Short title."},
"description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
"interpretation": {
"type": "string",
"description": (
"Free-form analysis text — your reasoning, why this "
"matters, what it implies. NOT verified by the gateway. "
"Rendered in reports as 'agent analysis', not truth."
),
},
"verified_facts": {
"type": "array",
"description": (
"Atoms you want preserved as ground truth. Each must "
"appear verbatim in the cited tool output."
),
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": (
"Kind of fact: path, timestamp, inode, "
"hash, identifier, count, raw, ..."
),
},
"value": {
"type": "string",
"description": (
"Verbatim substring from the cited tool "
"output. The gateway does a literal "
"string-in-string check — no paraphrasing."
),
},
"invocation_id": {
"type": "string",
"description": (
"ID from the '[invocation: inv-xxx]' header "
"of the tool call that produced this value."
),
},
},
"required": ["type", "value", "invocation_id"],
},
},
"raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
"timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
"source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
},
"required": ["category", "title", "description", "source_tool"],
"required": ["category", "title", "source_tool"],
},
executor=self._add_phenomenon,
)
@@ -414,6 +501,67 @@ class BaseAgent:
executor=self._link_to_entity,
)
self.register_tool(
name="observe_identity",
description=(
"Record a typed identifier (email / phone / Apple ID / IMEI / "
"wallet address / nickname / display name / …) for an entity. "
"Goes through the same grounding gateway as add_phenomenon — "
"value MUST be a verbatim substring of the cited tool output. "
"After attachment, the engine automatically proposes / "
"strengthens / weakens cross-source coreference hypotheses "
"between this entity and any others carrying the same or "
"conflicting identifiers. This is how 'is the Apple ID in iOS "
"keychain the same person as the Windows login name?' gets "
"answered. Call this in ADDITION to add_phenomenon for "
"identifier-bearing findings."
),
input_schema={
"type": "object",
"properties": {
"entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
"entity_type": {
"type": "string",
"enum": ["person", "program", "file", "host", "ip_address"],
"description": "Kind of entity this identifier belongs to (usually 'person').",
},
"identifier_type": {
"type": "string",
"description": (
"Strong (near-unique): email, phone_number, imei, "
"imsi, apple_id, icloud_id, google_account, "
"wallet_address, udid, mac_address, device_serial. "
"Weak (free-form, may collide): nickname, "
"display_name, username, screen_name."
),
},
"value": {
"type": "string",
"description": (
"The identifier value, quoted VERBATIM from the "
"tool output you cite in invocation_id."
),
},
"invocation_id": {
"type": "string",
"description": (
"ID from the '[invocation: inv-xxx]' header of "
"the tool call that surfaced this identifier."
),
},
"source_tool": {
"type": "string",
"description": "Name of the tool that produced the identifier.",
},
},
"required": [
"entity_name", "entity_type", "identifier_type",
"value", "invocation_id",
],
},
executor=self._observe_identity,
)
# ---- Tool executors -----------------------------------------------------
async def _list_phenomena(self, category: str | None = None) -> str:
@@ -453,16 +601,29 @@ class BaseAgent:
self,
category: str,
title: str,
description: str,
interpretation: str = "",
verified_facts: list[dict] | None = None,
raw_data: dict | None = None,
timestamp: str | None = None,
source_tool: str = "",
# Back-compat: older prompts (and accidental LLM emissions) may pass
# ``description``; treat it as ``interpretation`` rather than failing.
description: str | None = None,
) -> str:
if description and not interpretation:
interpretation = description
# GroundingError propagates: llm_client._execute_single_tool turns
# raised exceptions into "Error executing add_phenomenon: <msg>" tool
# results the LLM sees, and _wrap_record_executor does NOT increment
# the mandatory-record counter (the increment only runs after a
# successful return), so the forced-retry mechanism still fires if
# the agent never lands a grounded phenomenon.
pid, merged = await self.graph.add_phenomenon(
source_agent=self.name,
category=category,
title=title,
description=description,
interpretation=interpretation,
verified_facts=verified_facts,
raw_data=raw_data,
timestamp=timestamp,
source_tool=source_tool,
@@ -508,6 +669,51 @@ class BaseAgent:
status = "linked to existing" if existing else "created and linked"
return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"
async def _observe_identity(
self,
entity_name: str,
entity_type: str,
identifier_type: str,
value: str,
invocation_id: str,
source_tool: str = "",
) -> str:
# GroundingError / ValueError propagate to llm_client's per-tool
# exception handler, which formats them back to the LLM. That keeps
# the mandatory-record counter honest — only a successful return
# triggers the increment in _wrap_record_executor.
result = await self.graph.observe_identity(
entity_name=entity_name,
entity_type=entity_type,
identifier_type=identifier_type,
value=value,
source_agent=self.name,
source_tool=source_tool,
invocation_id=invocation_id,
)
lines = [
f"Identity observed: {identifier_type}={value} "
f"on entity {result['entity_id']} ({entity_name})."
]
if result.get("new_identifier"):
lines.append(
f" Observation phenomenon: {result['phenomenon_id']}"
)
else:
lines.append(" (identifier already recorded on this entity — idempotent)")
for prop in result.get("coref_proposals", []):
lines.append(
f" → Coref candidate: {prop['other_entity_id']} via "
f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
f"hypothesis={prop['hypothesis_id']})"
)
for c in prop.get("conflicts", []):
lines.append(
f" ⚠ conflict on {c['type']}: "
f"{c['new_value']} vs {c['other_value']}"
)
return "\n".join(lines)
async def _list_assets(self, category: str | None = None) -> str:
results = self.graph.list_assets(category)
if not results:

41
case.example.yaml Normal file
View File

@@ -0,0 +1,41 @@
# MASForensics case definition — template
#
# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
# exists in the working directory, `python main.py` loads it automatically;
# otherwise main.py falls back to interactive single-image selection.
#
# A case is a set of evidence sources. Each source has:
# id optional — auto-derived from label if omitted ("src-<slug>")
# label human-readable name
# type disk_image | mobile_extraction | archive | media_collection
# access_mode image | tree (optional — defaults by type)
# image = block device / disk image, navigated by Sleuth Kit
# tree = mounted filesystem / unpacked extraction, path-based
# owner optional — the person the source is associated with
# path filesystem path (relative paths resolve against this file)
# partition_offset image-mode only — sector offset of the partition to analyze
# meta optional free-form notes
#
# NOTE: at the current refit stage only image-mode (disk) sources are
# analysable; tree-mode sources are accepted but skipped.
case_id: example-case
name: "Example forensic case"
meta:
notes: "free-form case-level metadata"
sources:
- id: src-suspect-laptop
label: "Suspect laptop disk image"
type: disk_image
access_mode: image
owner: "John Doe"
path: image/suspect_laptop.E01
partition_offset: 0 # run `mmls <image>` to find the right offset
- id: src-suspect-phone
label: "Suspect phone extraction"
type: mobile_extraction
access_mode: tree
owner: "John Doe"
path: image/suspect_phone.zip

226
case.py Normal file
View File

@@ -0,0 +1,226 @@
"""Case and evidence-source model — the foundation for multi-evidence analysis.
A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
that determines how forensic tools reach its contents:
- ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
inode addressing (raw, E01, dd, ...).
- ``"tree"`` — an already-mounted filesystem or unpacked extraction,
navigated by ordinary filesystem paths.
This module is pure data model + loading. Partition probing and interactive
selection live in ``main.py``.
"""
from __future__ import annotations
import logging
import re
from dataclasses import asdict, dataclass, field
from pathlib import Path
logger = logging.getLogger(__name__)
# Recognised source types and access modes.
SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
ACCESS_MODES = {"image", "tree"}
# Disk-image file extensions for interactive discovery.
# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
# raw block-device dumps such as ``blk0_sda.bin``.
DISK_IMAGE_EXTS = {
".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
}
# Default access mode per source type.
_DEFAULT_ACCESS_MODE = {
"disk_image": "image",
"mobile_extraction": "tree",
"archive": "tree",
"media_collection": "tree",
}
def slugify(text: str) -> str:
"""Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
return slug or "src"
@dataclass
class EvidenceSource:
"""One piece of evidence within a :class:`Case`."""
id: str # "src-<slug>"
label: str # human-readable name
type: str # one of SOURCE_TYPES
path: str # filesystem path to the evidence
access_mode: str # "image" | "tree"
owner: str = "" # associated person, if known
partition_offset: int = 0 # sector offset (image-mode sources only)
meta: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return asdict(self)
@classmethod
def from_dict(cls, d: dict) -> EvidenceSource:
"""Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
known = set(cls.__dataclass_fields__)
return cls(**{k: v for k, v in d.items() if k in known})
def summary(self) -> str:
loc = (
f"@{self.partition_offset}"
if self.access_mode == "image" and self.partition_offset
else ""
)
owner = f" owner={self.owner}" if self.owner else ""
return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
@dataclass
class Case:
"""A forensic case: a set of evidence sources plus metadata."""
case_id: str
name: str
sources: list[EvidenceSource] = field(default_factory=list)
meta: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"case_id": self.case_id,
"name": self.name,
"sources": [s.to_dict() for s in self.sources],
"meta": dict(self.meta),
}
@classmethod
def from_dict(cls, d: dict) -> Case:
return cls(
case_id=d.get("case_id", ""),
name=d.get("name", ""),
sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
meta=d.get("meta", {}),
)
def get_source(self, source_id: str) -> EvidenceSource | None:
for s in self.sources:
if s.id == source_id:
return s
return None
# ---------------------------------------------------------------------------
# case.yaml loading
# ---------------------------------------------------------------------------
def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
"""Validate and normalise one source entry from case.yaml.
Missing ``id`` is derived from the label; missing ``access_mode`` defaults
by type; relative paths are resolved against *base_dir* (the case file's
directory).
"""
label = str(raw.get("label") or raw.get("id") or f"source-{index}")
src_type = str(raw.get("type", "disk_image"))
if src_type not in SOURCE_TYPES:
logger.warning("Unknown source type %r for %r — treating as disk_image",
src_type, label)
src_type = "disk_image"
access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
if access_mode not in ACCESS_MODES:
logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
src_id = str(raw.get("id") or f"src-{slugify(label)}")
if not src_id.startswith("src-"):
src_id = f"src-{slugify(src_id)}"
raw_path = str(raw.get("path", "")).strip()
path = raw_path
if raw_path:
p = Path(raw_path).expanduser()
if not p.is_absolute():
p = (base_dir / p)
path = str(p)
return EvidenceSource(
id=src_id,
label=label,
type=src_type,
path=path,
access_mode=access_mode,
owner=str(raw.get("owner", "")),
partition_offset=int(raw.get("partition_offset", 0) or 0),
meta=dict(raw.get("meta", {})),
)
def build_case(data: dict, base_dir: Path | None = None) -> Case:
"""Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
base_dir = base_dir or Path.cwd()
sources: list[EvidenceSource] = []
seen_ids: set[str] = set()
for i, raw in enumerate(data.get("sources", []) or []):
if not isinstance(raw, dict):
logger.warning("Skipping malformed source entry #%d", i)
continue
src = _build_source(raw, base_dir, i)
if src.id in seen_ids:
src.id = f"{src.id}-{i}"
seen_ids.add(src.id)
if not src.path:
logger.warning("Source %r has no path — keeping but it is not analysable",
src.label)
sources.append(src)
return Case(
case_id=str(data.get("case_id", "case")),
name=str(data.get("name", "Untitled case")),
sources=sources,
meta=dict(data.get("meta", {})),
)
def load_case(path: str | Path = "case.yaml") -> Case | None:
"""Load a :class:`Case` from a case.yaml file. Returns None if absent."""
case_path = Path(path)
if not case_path.exists():
return None
import yaml
try:
data = yaml.safe_load(case_path.read_text()) or {}
except Exception as e:
logger.error("Failed to parse %s: %s", case_path, e)
return None
if not isinstance(data, dict):
logger.error("%s is not a YAML mapping", case_path)
return None
case = build_case(data, base_dir=case_path.resolve().parent)
logger.info("Loaded case %r with %d source(s) from %s",
case.name, len(case.sources), case_path)
return case
def single_source_case(
image_path: str,
partition_offset: int = 0,
label: str | None = None,
) -> Case:
"""Wrap a single disk image as a one-source Case (interactive fallback)."""
name = label or Path(image_path).name
src = EvidenceSource(
id=f"src-{slugify(Path(image_path).stem)}",
label=name,
type="disk_image",
path=image_path,
access_mode="image",
partition_offset=partition_offset,
)
return Case(case_id="adhoc", name=name, sources=[src])

71
config.example.yaml Normal file
View File

@@ -0,0 +1,71 @@
# MASForensics Configuration — template.
#
# Copy this file to `config.yaml` and fill in your API key. config.yaml is
# git-ignored so secrets don't land in commits. The two files share schema;
# only this template is tracked.
agent:
base_url: "https://api.deepseek.com"
api_key: "YOUR-API-KEY-HERE"
model: "deepseek-v4-pro"
max_tokens: 16384
reasoning_effort: "high" # DeepSeek/o1-style reasoning depth; omit to disable
thinking_enabled: true # DeepSeek extra_body.thinking switch
# Maximum rounds of hypothesis-directed investigation (Phase 3).
# Only consulted when strategist.enabled is false (legacy fallback path).
max_investigation_rounds: 1
# Phase 3 strategist loop (DESIGN_STRATEGIST.md). When enabled, the
# InvestigationStrategist agent decides each round whether to propose new
# leads or declare the investigation complete. When disabled, the legacy
# fixed-round investigation loop runs instead.
strategist:
enabled: true
max_rounds: 10
# Safety net: if the strategist keeps proposing leads but yield (new
# phenomena + edges + status flips) is zero for this many consecutive
# rounds, the orchestrator force-stops Phase 3 regardless.
hard_stop_marginal_yield_zero_rounds: 3
# Hard caps that bound the whole run. The strategist's budget_status tool
# reads these to pace its proposals; the orchestrator also enforces them
# as hard stops (DESIGN_STRATEGIST.md §4.2 step 7). Comment out any cap
# to make it unbounded.
budgets:
tool_calls_total: 5000
strategist_rounds_max: 10
wall_clock_minutes_max: 480
# Optional: override the per-edge-type log₁₀(LR) calibration table.
# Confidence updates accumulate these in odds space (additive, order-
# independent), then map back to probability via sigmoid. Single edge
# magnitudes: ≥ +0.602 lifts confidence above the 0.8 supported threshold,
# ≤ 0.602 drops it below the 0.2 refuted threshold.
# If omitted, evidence_graph._DEFAULT_LOG_LR is used.
# hypothesis_log_lr:
# direct_evidence: 2.0
# supports: 1.0
# consequence_observed: 1.0
# prerequisite_met: 0.5
# weakens: -0.5
# contradicts: -2.0
# Optional: manually specify initial hypotheses. If omitted, the
# HypothesisAgent auto-generates them from Phase 1 findings.
# hypotheses:
# - title: "..."
# description: "..."
# Investigation areas — LLM-derived from active hypotheses after Phase 2.
# Each entry below acts as a MANUAL OVERRIDE: it is seeded into the graph
# before the LLM derives areas, so manual entries always survive (slug-based
# dedupe; LLM only augments keyword/tool lists, never overwrites).
#
# investigation_areas:
# - area: shutdown_time
# description: "Last recorded shutdown time"
# agent: registry
# priority: 3
# keywords: [shutdown, last shutdown]
# tools: [get_shutdown_time]

File diff suppressed because it is too large Load Diff

View File

@@ -142,6 +142,14 @@ READ_ONLY_TOOLS: set[str] = {
# Parser reads
"read_text_file", "read_binary_preview", "search_text_file",
"read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
"find_files",
# iOS plugin reads (S4)
"parse_plist", "sqlite_tables", "sqlite_query",
"parse_ios_keychain", "read_idevice_info",
# Android + media reads (S6) — set_active_partition is NOT read-only.
"probe_android_partitions", "ocr_image",
# Strategist view tools (DESIGN_STRATEGIST.md §2) — pure renders.
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
}
@@ -503,7 +511,7 @@ class LLMClient:
tools: list[dict],
tool_executor: dict[str, Any],
system: str | None = None,
max_iterations: int = 40,
max_iterations: int = 60,
terminal_tools: tuple[str, ...] = (),
) -> tuple[str, list[dict]]:
"""Run a tool-calling loop using OpenAI-native tool calls.

162
main.py
View File

@@ -15,17 +15,21 @@ from pathlib import Path
import yaml
from agent_factory import AgentFactory
from case import (
DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
)
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from log_config import setup_logging
from orchestrator import AnalysisAborted, Orchestrator
from tool_registry import register_all_tools
from tools.archive import unzip_archive_sync
RUNS_DIR = Path("runs")
IMAGE_DIR = Path("image")
# Common forensic image extensions (only first segment / single-file formats)
_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
# Persistent unpack cache for tree-mode sources (zip extractions). Lives
# at project root so multiple runs can reuse the same unpacked tree.
SOURCE_CACHE_DIR = Path(".cache/sources")
def load_config(path: str = "config.yaml") -> dict:
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
# ---------------------------------------------------------------------------
def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
"""Find forensic disk image files under *search_dir*."""
images: set[Path] = set()
for glob in _IMAGE_GLOBS:
images.update(search_dir.glob(glob))
return sorted(images)
"""Find forensic disk image files under *search_dir* (case-insensitive ext)."""
if not search_dir.is_dir():
return []
return sorted(
p for p in search_dir.iterdir()
if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
)
def _parse_mmls(output: str) -> list[dict]:
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
images = _discover_images(image_dir)
if not images:
print(f"No disk images found in {image_dir}/")
print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
sys.exit(1)
if len(images) == 1:
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
print("Invalid choice.")
def resolve_case() -> Case:
"""Resolve the Case to analyze.
Priority: an explicit case file given as a CLI argument, then ./case.yaml
in the working directory, then legacy interactive single-image selection.
"""
# 1. Explicit case file passed on the command line
if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
case = load_case(sys.argv[1])
if case is None:
print(f"Error: could not load case file {sys.argv[1]}")
sys.exit(1)
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
return case
# 2. ./case.yaml in the working directory
case = load_case()
if case is not None:
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
return case
# 3. Legacy interactive single-image selection
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
image_path, partition_offset = select_image_interactive(cli_dir)
return single_source_case(image_path, partition_offset)
def _is_analysable(src: EvidenceSource) -> bool:
"""A source is analysable when it has a path AND its mode has tooling.
S4 lights up tree-mode iOS extractions; image-mode disks were already
supported. Media-collection (screenshots) remain skipped until S6.
"""
if not src.path:
return False
if src.access_mode == "image":
return True
if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
return True
return False
def list_analysable_sources(case: Case) -> list[EvidenceSource]:
"""Return every analysable source in the case (orchestrator iterates them).
Pre-S6 main.py used to force-choose one source here; the multi-source
orchestrator (Phase 1 per-source triage) now consumes the full list.
Skipped sources are still reported for visibility.
"""
analysable = [s for s in case.sources if _is_analysable(s)]
skipped = [s for s in case.sources if not _is_analysable(s)]
if skipped:
print(
f"Note: {len(skipped)} source(s) not analysable in this build: "
+ ", ".join(f"{s.label} ({s.type})" for s in skipped)
)
if not analysable:
print("No analysable sources in this case.")
sys.exit(1)
print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
for s in analysable:
print(f" - {s.summary()}")
return analysable
def prepare_source(src: EvidenceSource) -> EvidenceSource:
"""Materialise a tree-mode source for analysis.
Mobile / archive sources arrive as .zip files. We unpack once into a
project-level cache (``.cache/sources/<src.id>/``) and rewrite
``src.path`` to point at the unpacked directory. Idempotent — a
second run with the cache present is a no-op (unzip_archive_sync
skips files that already exist with the matching size).
Disk-image and already-tree sources pass through unchanged.
"""
if src.access_mode != "tree":
return src
p = Path(src.path)
if p.is_dir():
return src # already a directory, nothing to do
if not p.is_file():
print(f"Warning: source path {src.path} does not exist; leaving as-is.")
return src
if p.suffix.lower() != ".zip":
# Other archive types (tar, 7z, ...) — not handled yet.
print(f"Warning: tree-mode source {src.id} is not a .zip "
f"({p.suffix}); leaving as-is.")
return src
dest = SOURCE_CACHE_DIR / src.id
dest.mkdir(parents=True, exist_ok=True)
# Password-protected zips (e.g. CTF artefacts) carry their key in
# case.yaml's meta.password — never logged, never persisted.
password = (src.meta or {}).get("password")
pw_note = " (password from meta)" if password else ""
print(f"Unpacking {p.name}{dest}{pw_note} (idempotent) ...")
result = unzip_archive_sync(str(p), str(dest), password=password)
first_line = result.split("\n", 1)[0]
print(" " + first_line)
if first_line.startswith("Error:"):
# Surface the multi-line guidance from _do_extract verbatim.
for extra in result.split("\n")[1:]:
print(" " + extra)
print(f" Source {src.id} stays unanalysable until this is resolved.")
# Leave src.path unchanged so the source remains marked unanalysable.
return src
src.path = str(dest)
src.access_mode = "tree"
return src
def find_resumable_run() -> Path | None:
"""Find the most recent incomplete run with a saved graph state."""
if not RUNS_DIR.exists():
@@ -225,22 +343,30 @@ async def async_main() -> None:
# Initialize evidence graph
if graph is None:
# CLI arg takes priority, otherwise interactive prompt
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
image_path, partition_offset = select_image_interactive(cli_dir)
case = resolve_case()
# case_info derived from THIS case's meta (case.yaml), not from
# config.yaml's legacy `cfreds_hacking_case` block. Without this,
# the old CFReDS evidence MD5s would be embedded in reports for
# every subsequent unrelated case.
graph = EvidenceGraph(
case_info=config.get("cfreds_hacking_case", {}),
case_info=dict(case.meta or {}),
persist_path=run_dir / "graph_state.json",
edge_weights=config.get("hypothesis_edge_weights"),
edge_log_lr=config.get("hypothesis_log_lr"),
)
graph.image_path = image_path
graph.partition_offset = partition_offset
graph.case = case
graph.extracted_dir = str(run_dir / "extracted")
analysable = list_analysable_sources(case)
# Prepare every analysable source up front (unzip tree-mode zips,
# etc.). Idempotent on cache hits — second run is a no-op.
prepared = [prepare_source(s) for s in analysable]
# Seed the active source so tools that resolve lazily have a target
# before Phase 1 begins; the orchestrator resets it per source.
graph.set_active_source(prepared[0])
else:
graph._persist_path = run_dir / "graph_state.json"
# Register all tools with bound image path
register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
# Register all tools — they resolve the active evidence source at call time
register_all_tools(graph)
# Create agent factory
factory = AgentFactory(llm, graph)

View File

@@ -10,7 +10,7 @@ import time
from datetime import datetime
from pathlib import Path
from agent_factory import AgentFactory
from agent_factory import AgentFactory, get_triage_agent_type
from evidence_graph import EvidenceGraph
from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads
from tool_registry import TOOL_CATALOG
@@ -119,6 +119,11 @@ class Orchestrator:
self._failure_count = 0
self._max_failures = 3
self._start_time = datetime.now()
# Make budgets visible to strategy tools via the graph object. The
# budget_status tool reads graph.budgets / graph.run_start_monotonic
# directly so it does not need a back-reference to the orchestrator.
self.graph.budgets = dict(self.config.get("budgets", {}) or {})
self.graph.run_start_monotonic = time.monotonic()
def _resolve_agent_type(self, agent_type: str) -> str:
return AGENT_ALIASES.get(agent_type, agent_type)
@@ -195,6 +200,298 @@ class Orchestrator:
lead.context["retry"] = True
await self._dispatch_leads_parallel(failed)
# ---- Phase 3: strategist loop (DESIGN_STRATEGIST.md §4) ------------------
def _budget_exceeded(self) -> bool:
"""Hard budget enforcement, complementing strategist self-throttling.
Any of these triggers an immediate Phase 3 exit even if the
strategist hasn't called declare_investigation_complete. Each cap
is optional — leave it out of config to make it unbounded.
"""
b = self.graph.budgets or {}
tc_cap = b.get("tool_calls_total")
if tc_cap and len(self.graph.tool_invocations) >= tc_cap:
return True
wc_cap = b.get("wall_clock_minutes_max")
if wc_cap and self.graph.run_start_monotonic is not None:
elapsed_min = (time.monotonic() - self.graph.run_start_monotonic) / 60.0
if elapsed_min >= wc_cap:
return True
return False
async def _execute_strategist_lead(self, lead, round_num: int) -> None:
"""Dispatch one strategist-proposed lead to its target worker.
Unlike the legacy bulk dispatcher this runs leads serially so each
worker run reads a graph that includes prior leads' findings — the
strategist's next round can see the cumulative effect of this round.
"""
agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
worker = self.factory.get_or_create_agent(agent_type)
if worker is None:
logger.warning(
"No worker registered for lead %s: target_agent=%s",
lead.id, agent_type,
)
lead.status = "failed"
lead.context["failure_reason"] = f"no worker for agent type '{agent_type}'"
self.graph._auto_save()
return
source_id = (lead.context or {}).get("source_id", "")
if source_id and self.graph.case is not None:
src = self.graph.case.get_source(source_id)
if src:
self.graph.set_active_source(src)
rationale = (lead.context or {}).get("rationale", "")
worker_task = (
f"Investigate this specific lead from the strategist:\n\n"
f"REQUEST: {lead.description}\n"
f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis or '(unspecified)'}\n"
f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type or '(unspecified)'}\n"
f"RATIONALE: {rationale or '(unspecified)'}\n\n"
f"After investigating, record findings via add_phenomenon AND "
f"link relevant phenomena to "
f"{lead.motivating_hypothesis or 'the motivating hypothesis'} via the "
f"appropriate edge_type. If your investigation produces no relevant "
f"finding, record that as a negative phenomenon so the strategist "
f"can see the gap was probed."
)
_log(
f"Round {round_num} dispatching: {lead.description[:80]}",
event="dispatch", agent=agent_type, lead=lead.id,
)
lead.status = "assigned"
self.graph._auto_save()
try:
await worker.run(worker_task, lead_id=lead.id)
lead.status = "completed"
except Exception as e:
logger.error("Strategist lead %s failed: %s", lead.id, e, exc_info=True)
lead.status = "failed"
lead.context["failure_reason"] = str(e)
finally:
self.graph._auto_save()
async def _resume_strategist_state(self) -> int:
"""Repair any open InvestigationRound after a resume and return the
next round number to use.
An "open" round is one with ``started_at`` set but ``completed_at``
empty — interrupted before its complete step. Mark it as completed
with action=interrupted_resume so the run history is self-describing,
and mark any leads still in the "assigned" state from that round as
"failed" so the gap-analysis / retry paths can re-process them.
Returns the round number the strategist loop should start from
(1 + the highest existing round_number).
"""
if not self.graph.investigation_rounds:
return 1
highest = max(r.round_number for r in self.graph.investigation_rounds)
last = self.graph.latest_round()
if last is not None and not last.completed_at:
assigned_in_round = [
l for l in self.graph.leads
if l.round_number == last.round_number
and l.status == "assigned"
]
for lead in assigned_in_round:
lead.status = "failed"
lead.context["failure_reason"] = "interrupted before complete"
await self.graph.complete_investigation_round(
last.id, strategist_action="interrupted_resume",
decision_rationale=(
f"resume repair: this round was interrupted before "
f"completion; {len(assigned_in_round)} assigned leads "
f"have been re-marked as failed."
),
)
logger.info(
"Strategist resume: repaired open round R%d (closed %d assigned leads)",
last.round_number, len(assigned_in_round),
)
return highest + 1
async def _phase3_strategist_loop(self) -> None:
"""Belief-driven investigation: strategist proposes, workers execute,
repeat. Replaces the legacy fixed-round investigation loop.
"""
_log("Phase 3: Strategist-Driven Investigation", event="phase")
strategist_cfg = self.config.get("strategist", {}) or {}
max_rounds = int(strategist_cfg.get("max_rounds", 10))
zero_yield_cap = int(strategist_cfg.get("hard_stop_marginal_yield_zero_rounds", 3))
strategist = self.factory.get_or_create_agent("strategist")
if strategist is None:
logger.error(
"InvestigationStrategist agent not registered — falling back "
"to legacy Phase 3 loop. Check agent_factory._AGENT_CLASSES."
)
await self._phase3_legacy_loop()
return
# Resume support: if we're restarting after an interruption, repair
# any half-open round and pick up at the next number.
start_round = await self._resume_strategist_state()
if start_round > 1:
_log(
f"Resuming strategist loop at round {start_round} "
f"(history: {len(self.graph.investigation_rounds)} prior rounds)",
event="progress",
)
zero_yield_streak = 0
for round_num in range(start_round, max_rounds + 1):
# Reset per-round flags so a previous round's declare_complete
# doesn't leak across iterations (defensive — strategist also
# only sets True, never False).
self.graph.strategist_complete_requested = False
self.graph.current_strategist_round = round_num
rid = await self.graph.start_investigation_round(round_num)
_log(
f"Strategist Round {round_num}/{max_rounds}", event="phase",
round=round_num,
)
t0 = time.monotonic()
try:
await strategist.run(
f"Review the current investigation state and decide the "
f"next action. This is round {round_num}/{max_rounds}. "
f"Use graph_overview / marginal_yield / budget_status / "
f"source_coverage to ground your decision, then call "
f"propose_lead 1-3 times OR declare_investigation_complete."
)
except Exception as e:
logger.error("Strategist round %d failed: %s", round_num, e, exc_info=True)
await self.graph.complete_investigation_round(
rid, decision_rationale=f"strategist crashed: {e}",
)
break
# Strategist declared complete → no leads execute, exit loop.
if self.graph.strategist_complete_requested:
_log(
f"Strategist declared complete at round {round_num}",
event="progress", elapsed=time.monotonic() - t0,
)
await self.graph.complete_investigation_round(
rid, strategist_action="declare_complete",
decision_rationale="strategist declare_investigation_complete",
)
break
# Collect this round's leads (proposed_by=strategist + matching round).
new_leads = [
l for l in self.graph.leads
if l.round_number == round_num
and l.proposed_by == "strategist"
and l.status == "pending"
]
if not new_leads:
_log(
f"Round {round_num}: strategist proposed no new leads — exiting loop",
event="progress", elapsed=time.monotonic() - t0,
)
await self.graph.complete_investigation_round(
rid, strategist_action="no_leads",
decision_rationale="strategist proposed no new leads",
)
break
# Dispatch each lead to its worker.
for lead in new_leads:
await self._execute_strategist_lead(lead, round_num)
# After workers run, judge any new phenomena against existing
# hypotheses (so confidence updates happen before the next round
# of strategist reasoning).
if self.graph.phenomena and self.graph.hypotheses:
await self._judge_new_phenomena()
closed = await self.graph.complete_investigation_round(
rid, strategist_action="propose_leads",
leads_executed=[l.id for l in new_leads],
)
# Show round outcome.
for h in self.graph.hypotheses.values():
_log(f" {h.summary()}", event="hypothesis")
_log(
_progress_summary(self.graph) + f" (yield: +{closed.new_phenomena_count}ph, +{closed.new_edges_count}edges, {closed.status_flips}flips)",
event="progress", elapsed=time.monotonic() - t0,
)
# Marginal-yield hard stop. Distinct from strategist self-throttle:
# if the strategist insists on continuing through repeated dry
# rounds, force-stop. This protects against an over-eager
# strategist + a confused worker that produces no edges.
yield_total = (
closed.new_phenomena_count
+ closed.new_edges_count
+ closed.status_flips
)
if yield_total == 0:
zero_yield_streak += 1
if zero_yield_streak >= zero_yield_cap:
_log(
f"Hard stop: {zero_yield_streak} consecutive "
f"zero-yield rounds (cap {zero_yield_cap})",
event="progress",
)
break
else:
zero_yield_streak = 0
if self._budget_exceeded():
_log(
f"Budget exhausted after round {round_num} — exiting Phase 3",
event="progress",
)
break
else:
_log(
f"Strategist max_rounds={max_rounds} reached", event="progress",
)
# Always reset the round counter on exit so subsequent runs don't
# inherit the last value.
self.graph.current_strategist_round = 0
async def _phase3_legacy_loop(self) -> None:
"""Legacy fixed-round Phase 3 — preserved for fallback / regression.
Engaged when config has ``strategist.enabled: false`` or when the
strategist agent class is somehow not registered. Behaves identically
to the pre-DESIGN_STRATEGIST orchestrator: bounded iteration,
hypothesis-derived leads, parallel dispatch, gap analysis.
"""
max_rounds = self.config.get("max_investigation_rounds", 5)
for round_num in range(max_rounds):
_log(f"Phase 3: Investigation Round {round_num}", event="phase")
t0 = time.monotonic()
if self.graph.hypotheses_converged():
_log("All hypotheses converged — stopping", event="progress")
break
await self._generate_hypothesis_leads()
pending = await self.graph.get_pending_leads()
if not pending:
_log("No pending leads — round complete", event="progress")
break
await self._dispatch_leads_parallel(pending)
await self._judge_new_phenomena()
for h in self.graph.hypotheses.values():
_log(f" {h.summary()}", event="hypothesis")
_log(_progress_summary(self.graph), event="progress", elapsed=time.monotonic() - t0)
# ---- Hypothesis generation -----------------------------------------------
async def _generate_hypotheses_manual(self, hypotheses_config: list[dict]) -> None:
@@ -518,7 +815,7 @@ class Orchestrator:
if not unlinked:
return
valid_types = list(self.graph.edge_weights.keys())
valid_types = list(self.graph.edge_log_lr.keys())
hyp_section = "\n".join(
f" [{h.id}] {h.title}: {h.description}" for h in active
@@ -551,7 +848,7 @@ class Orchestrator:
if (
hyp_id in self.graph.hypotheses
and ph_id in self.graph.phenomena
and edge_type in self.graph.edge_weights
and edge_type in self.graph.edge_log_lr
):
await self.graph.update_hypothesis_confidence(
hyp_id=hyp_id,
@@ -593,7 +890,7 @@ class Orchestrator:
ph_id = j.get("phenomenon_id", "")
edge_type = j.get("edge_type", "")
reason = j.get("reason", "")
if ph_id in self.graph.phenomena and edge_type in self.graph.edge_weights:
if ph_id in self.graph.phenomena and edge_type in self.graph.edge_log_lr:
await self.graph.update_hypothesis_confidence(
hyp_id=hyp.id,
phenomenon_id=ph_id,
@@ -618,7 +915,10 @@ class Orchestrator:
phenomena (deterministic — the canonical tool was actually called).
"""
evidence_text = " ".join(
f"{ph.category} {ph.title} {ph.description}".lower()
(
f"{ph.category} {ph.title} {ph.interpretation} "
+ " ".join(str(f.get("value", "")) for f in ph.verified_facts)
).lower()
for ph in self.graph.phenomena.values()
)
used_tools: set[str] = {
@@ -747,28 +1047,103 @@ class Orchestrator:
# ---- Main pipeline -------------------------------------------------------
# ---- Phase 1 helpers (multi-source triage) -------------------------------
@staticmethod
def _is_analysable(src) -> bool:
"""Mirror of main._is_analysable so the orchestrator doesn't depend
on main.py's import. Disk-image sources need a path; tree-mode
sources are analysable when they're mobile_extraction or archive.
"""
if not getattr(src, "path", ""):
return False
if src.access_mode == "image":
return True
if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
return True
# media_collection is analysable too once a MediaAgent is registered.
if src.type == "media_collection":
return True
return False
def _sources_to_triage(self) -> list:
"""Pick every analysable source in the case (or fall back to the
single active_source for the legacy single-image path).
"""
case = self.graph.case
if case is None or not case.sources:
return [self.graph.active_source] if self.graph.active_source else []
return [s for s in case.sources if self._is_analysable(s)]
async def _phase1_triage_source(self, src) -> tuple[int, int]:
"""Run the right triage agent on one source. Returns (Δphenomena, Δleads)."""
ph_before = len(self.graph.phenomena)
leads_before = sum(1 for l in self.graph.leads if l.status == "pending")
self.graph.set_active_source(src)
agent_type = get_triage_agent_type(src)
agent = self.factory.get_or_create_agent(agent_type)
if agent is None:
logger.warning(
"No agent registered for type %s — skipping source %s",
agent_type, src.id,
)
return 0, 0
_log(
f"Phase 1 triage: {src.id} ({src.label}) → {agent_type}",
event="dispatch", agent=agent_type, source=src.id,
)
try:
await agent.run(
f"Perform an initial Phase-1 triage of source {src.id} "
f"({src.label}, type={src.type}). Survey the source's "
f"structure, identify the most interesting artefacts, and "
f"record significant findings via add_phenomenon. Call "
f"observe_identity for any concrete identifiers (email, "
f"phone, Apple ID, IMEI, wallet address, persistent "
f"username) you encounter — that's how this finding will "
f"link across the other sources in the case. Create "
f"add_lead for follow-up that's outside your scope."
)
except Exception as e:
logger.error("Phase 1 agent [%s] failed on %s: %s", agent_type, src.id, e)
return (
len(self.graph.phenomena) - ph_before,
sum(1 for l in self.graph.leads if l.status == "pending") - leads_before,
)
async def run(self, resume_phase: int = 1) -> str:
"""Run the 5-phase hypothesis-driven forensic analysis pipeline."""
_log(f"Phase 1: Filesystem Survey (image: {Path(self.graph.image_path).name})", event="phase")
sources = self._sources_to_triage()
_log(
f"Phase 1: per-source triage ({len(sources)} source(s))",
event="phase",
)
report = ""
try:
# Phase 1: Initial filesystem survey
# Phase 1: Initial per-source triage (S6 multi-source).
# Runs sequentially so each agent gets its own task_id scope —
# the grounding gateway requires that, and shared graph state
# (active_source, partition_offset) would race under parallel
# dispatch anyway.
if resume_phase <= 1:
t0 = time.monotonic()
ph_before = len(self.graph.phenomena)
fs_agent = self.factory.get_or_create_agent("filesystem")
if fs_agent:
await fs_agent.run(
"Perform an initial survey of this disk image. "
"Examine the partition table, filesystem type, and root directory structure. "
"List key user directories and identify interesting files (documents, emails, "
"chat logs, installed programs, registry hives). "
"Create leads for other agents based on what you find."
for src in sources:
new_ph, new_leads = await self._phase1_triage_source(src)
_log(
f" {src.id}: +{new_ph} phenomena, +{new_leads} leads",
event="progress", source=src.id,
)
new_ph = len(self.graph.phenomena) - ph_before
new_leads = sum(1 for l in self.graph.leads if l.status == "pending")
_log(f"+{new_ph} phenomena, +{new_leads} leads", event="progress", elapsed=time.monotonic() - t0)
total_ph = len(self.graph.phenomena) - ph_before
total_leads = sum(1 for l in self.graph.leads if l.status == "pending")
_log(
f"Phase 1 total: +{total_ph} phenomena, {total_leads} pending leads",
event="progress", elapsed=time.monotonic() - t0,
)
# Phase 2: Hypothesis generation
if resume_phase <= 2:
@@ -803,39 +1178,26 @@ class Orchestrator:
event="progress", elapsed=time.monotonic() - t0,
)
# Phase 3: Hypothesis-directed investigation (iterative)
# Phase 3: Strategist-driven investigation (DESIGN_STRATEGIST.md)
if resume_phase <= 3:
max_rounds = self.config.get("max_investigation_rounds", 5)
for round_num in range(max_rounds):
_log(f"Phase 3: Investigation Round {round_num}", event="phase")
t0 = time.monotonic()
strategist_cfg = self.config.get("strategist", {}) or {}
strategist_enabled = strategist_cfg.get("enabled", True)
if strategist_enabled:
await self._phase3_strategist_loop()
else:
# Legacy fallback — keep the old hypothesis-directed
# iterative loop available for runs that explicitly
# disable the strategist (debugging, regression
# comparison, or environments without the strategist
# agent registered).
await self._phase3_legacy_loop()
if self.graph.hypotheses_converged():
_log("All hypotheses converged — stopping", event="progress")
break
await self._generate_hypothesis_leads()
pending = await self.graph.get_pending_leads()
if not pending:
_log("No pending leads — round complete", event="progress")
break
await self._dispatch_leads_parallel(pending)
await self._judge_new_phenomena()
# Show hypothesis status update
for h in self.graph.hypotheses.values():
_log(f" {h.summary()}", event="hypothesis")
_log(_progress_summary(self.graph), event="progress", elapsed=time.monotonic() - t0)
# Retry failed leads
# Retry failed leads + Gap Analysis run regardless of which
# Phase 3 variant was used — they operate on the leads/
# hypothesis graph the strategist loop leaves behind.
await self._retry_failed_leads()
# Gap analysis
_log("Phase 3: Gap Analysis", event="phase")
await self._run_gap_analysis()
self.graph.mark_remaining_inconclusive()
# Phase 4: Timeline construction
@@ -865,8 +1227,15 @@ class Orchestrator:
"6. Conclusions and Recommendations"
)
image_stem = Path(self.graph.image_path).stem
report_name = f"{image_stem}_forensic_report.md"
# Multi-source case → name by case_id (stable across sources).
# Legacy single-image runs without a Case → fall back to the
# last active image's stem so old workflows still produce a
# plausible filename.
if self.graph.case and self.graph.case.case_id:
stem = self.graph.case.case_id
else:
stem = Path(self.graph.image_path).stem or "case"
report_name = f"{stem}_forensic_report.md"
report_path = (self.run_dir / report_name) if self.run_dir else Path(report_name)
try:
report_path.write_text(report)

View File

@@ -6,6 +6,8 @@ requires-python = ">=3.14"
dependencies = [
"httpx[socks]>=0.28.1",
"openai>=2.36.0",
"pillow>=12.2.0",
"pytesseract>=0.3.13",
"pyyaml",
"regipy>=6.2.1",
]

View File

@@ -32,10 +32,10 @@ async def main() -> None:
config = yaml.safe_load(open("config.yaml"))
agent_cfg = config["agent"]
# Load graph (edge_weights from config — applied to the loaded graph)
# Load graph (edge_log_lr from config — applied to the loaded graph)
graph = EvidenceGraph.load_state(
state_path,
edge_weights=config.get("hypothesis_edge_weights"),
edge_log_lr=config.get("hypothesis_log_lr"),
)
print(f"Loaded: {graph.stats_summary()}")
@@ -49,7 +49,7 @@ async def main() -> None:
thinking_enabled=agent_cfg.get("thinking_enabled", False),
)
register_all_tools(graph.image_path, graph.partition_offset, graph)
register_all_tools(graph)
factory = AgentFactory(llm, graph)
# Run only the report agent

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

156
tools/archive.py Normal file
View File

@@ -0,0 +1,156 @@
"""Archive extraction tools — generic unzip for tree-mode evidence sources.
Mobile extractions (iOS / Android backups), archive sources, and shared
work products all arrive as .zip files. The forensic agents work on the
unpacked tree; this module is the single entry point for safely turning
an archive into a directory.
Stdlib-only. No graph dependency.
"""
from __future__ import annotations
import logging
import os
import zipfile
from pathlib import Path
logger = logging.getLogger(__name__)
def _is_within(base: Path, target: Path) -> bool:
"""True when *target* resolves to a path inside *base* — symlink-safe."""
try:
base_r = base.resolve()
target_r = target.resolve()
except OSError:
return False
try:
target_r.relative_to(base_r)
except ValueError:
return False
return True
def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
"""True when any entry has the zip 'encrypted' flag bit set."""
return any(info.flag_bits & 0x1 for info in zf.infolist())
def _do_extract(
zip_path: str,
dest_dir: str,
password: str | None = None,
) -> str:
"""Shared core for unzip_archive (async) and unzip_archive_sync.
Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
whose target already exists at the matching size are skipped). Returns
a multi-line summary the agent can read directly.
"""
zp = Path(zip_path)
if not zp.is_file():
return f"Error: {zip_path} is not a file."
dest = Path(dest_dir)
dest.mkdir(parents=True, exist_ok=True)
extracted = 0
skipped: list[str] = []
total_bytes = 0
pwd_bytes = password.encode("utf-8") if password else None
try:
with zipfile.ZipFile(zp, "r") as zf:
encrypted = _is_zip_encrypted(zf)
if encrypted and pwd_bytes is None:
return (
f"Error: {zip_path} is password-protected. "
f"Provide the password via case.yaml's "
f"meta.password on this source, or pass `password=` "
f"explicitly. Stdlib zipfile only supports the legacy "
f"ZipCrypto algorithm — AES-encrypted zips (created by "
f"7-Zip / WinZip) need an external tool like 7z."
)
for info in zf.infolist():
name = info.filename
# Block absolute paths and parent-escape attempts up front.
if name.startswith(("/", "\\")) or ".." in Path(name).parts:
skipped.append(f"escape: {name}")
continue
target = dest / name
if not _is_within(dest, target):
skipped.append(f"escape: {name}")
continue
# Symlink entries — skip rather than risk traversing out.
if info.external_attr >> 16 & 0o120000 == 0o120000:
skipped.append(f"symlink: {name}")
continue
if info.is_dir():
target.mkdir(parents=True, exist_ok=True)
continue
# Skip if already extracted with matching size (idempotent rerun).
if target.exists() and target.stat().st_size == info.file_size:
continue
target.parent.mkdir(parents=True, exist_ok=True)
try:
with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
while True:
chunk = src.read(65536)
if not chunk:
break
out.write(chunk)
except RuntimeError as e:
# zipfile raises RuntimeError for bad-password / AES-encrypted.
msg = str(e)
if "Bad password" in msg or "password required" in msg:
return (
f"Error: bad or missing password for {zip_path}. "
f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
f"cannot decrypt it — use `7z x -p<pwd> ...` "
f"externally and point the source path at the result."
)
raise
extracted += 1
total_bytes += info.file_size
except zipfile.BadZipFile as e:
return f"Error: {zip_path} is not a valid zip archive: {e}"
except Exception as e:
return f"Error extracting {zip_path}: {e}"
parts = [
f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
]
if skipped:
parts.append(f"Skipped {len(skipped)} unsafe entries:")
for s in skipped[:10]:
parts.append(f" - {s}")
if len(skipped) > 10:
parts.append(f" ... ({len(skipped) - 10} more)")
return "\n".join(parts)
async def unzip_archive(
zip_path: str, dest_dir: str, password: str | None = None,
) -> str:
"""Extract *zip_path* into *dest_dir*. Idempotent on rerun.
Defensive: rejects entries with absolute paths, leading '..', or that
would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
entries are skipped (we never follow symlinks into the host filesystem).
Password-protected zips need the password argument (or
``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
only handles the legacy ZipCrypto algorithm.
"""
return _do_extract(zip_path, dest_dir, password)
def unzip_archive_sync(
zip_path: str, dest_dir: str, password: str | None = None,
) -> str:
"""Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
Same behaviour, just no async wrapping — used before the event loop
starts so we don't have to spin one up just to unpack a zip.
"""
return _do_extract(zip_path, dest_dir, password)

87
tools/media.py Normal file
View File

@@ -0,0 +1,87 @@
"""Media plugin — OCR for image evidence.
DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
OCR locally for any image-bearing evidence. Tesseract via pytesseract is
the default; if the runtime is missing those packages, the tool returns a
clear install hint rather than failing silently.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
logger = logging.getLogger(__name__)
MAX_OUTPUT = 8000
_INSTALL_HINT = (
"Error: OCR runtime not available. Install with:\n"
" pip install pytesseract pillow\n"
" sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
"(or the equivalent for your distribution). Then retry."
)
def _has_ocr_runtime() -> tuple[bool, str]:
"""Return (available, reason). reason is empty when available."""
try:
import pytesseract # noqa: F401
from PIL import Image # noqa: F401
except ImportError as e:
return False, f"missing python package: {e.name}"
# Check the tesseract binary too.
import shutil
if shutil.which("tesseract") is None:
return False, "tesseract binary not on PATH"
return True, ""
async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
"""Extract text from an image via tesseract.
*lang* defaults to English + Simplified + Traditional Chinese, matching
the multi-language artefacts the current case involves. Pass a single
language code (e.g. ``"eng"``) to skip language packs that aren't
installed.
"""
p = Path(file_path)
if not p.is_file():
return f"Error: {file_path} is not a file."
available, reason = _has_ocr_runtime()
if not available:
return f"{_INSTALL_HINT}\n[detail: {reason}]"
import pytesseract
from PIL import Image
try:
img = Image.open(p)
except Exception as e:
return f"Error: could not open image {file_path}: {e}"
try:
text = pytesseract.image_to_string(img, lang=lang)
except pytesseract.TesseractError as e:
msg = str(e)
if "Failed loading language" in msg or "Error opening data file" in msg:
return (
f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
f"different `lang`. Detail: {msg}"
)
return f"Error running tesseract: {msg}"
except Exception as e:
return f"Error during OCR: {e}"
size = p.stat().st_size
header = (
f"ocr: {file_path} ({size} bytes, lang={lang}, "
f"{len(text.splitlines())} line(s))\n"
)
if len(text) > MAX_OUTPUT - len(header):
body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
else:
body = text
return header + body

160
tools/mobile_android.py Normal file
View File

@@ -0,0 +1,160 @@
"""Android plugin tools — partition survey + sector translation.
DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
toolchain already handles ext4/F2FS reads, so once the agent picks a partition
offset the standard list_directory / extract_file / search_strings tools work.
Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
emits BOTH unit systems so the agent can plug the right ``partition_offset``
value into ``set_active_partition``.
"""
from __future__ import annotations
import asyncio
import logging
import re
from pathlib import Path
logger = logging.getLogger(__name__)
MAX_OUTPUT = 8000
# Partitions worth flagging when we encounter them — informs the agent's
# strategy. Not exhaustive; just opinionated hints.
_PARTITION_HINTS: dict[str, str] = {
"EFS": "modem firmware area; often contains IMEI / MAC / serial",
"PARAM": "boot parameters; cmdline + flags",
"BOOT": "kernel + initramfs (raw image)",
"RECOVERY": "recovery image (raw)",
"SYSTEM": "Android /system — read-only OS partition (ext4)",
"CACHE": "downloaded OTA payloads; usually transient",
"USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
"PERSISTENT": "Samsung persistent partition; carrier/device flags",
"STEADY": "Samsung steady-state config",
"HIDDEN": "Samsung hidden partition; check before assuming empty",
"CP_DEBUG": "modem debug logs",
"TOMBSTONES": "userland crash dumps",
}
def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
"""Parse mmls output, returning (sector_size_bytes, partitions).
mmls states ``Units are in N-byte sectors`` near the top; we extract N
to translate between image-native units and the 512-byte units TSK
tools accept via ``-o``.
"""
sector_size = 512
m = re.search(r"Units are in (\d+)-byte sectors", output)
if m:
sector_size = int(m.group(1))
parts: list[dict] = []
for line in output.splitlines():
m = re.match(
r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
line,
)
if not m:
continue
_row, slot, start, end, length, desc = m.groups()
if slot == "Meta" or slot.startswith("---"):
continue
parts.append({
"slot": slot,
"start_native": int(start),
"end_native": int(end),
"length_native": int(length),
"description": desc.strip(),
})
return sector_size, parts
async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
except asyncio.TimeoutError:
proc.kill()
return 124, "", f"timeout after {timeout}s"
return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
"""Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
fsstat's "Cannot determine file system type" is treated as 'unknown'
typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
(modern userdata under FBE).
"""
rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
if rc != 0:
return "unknown"
m = _FS_TYPE_RE.search(out)
if m:
return m.group(1)
return "unknown"
async def probe_android_partitions(image_path: str) -> str:
"""Survey every partition on an Android disk dump and return a table.
The agent reads this once to plan its work: which partitions are
Ext4/F2FS (use TSK), which are raw (extract image / strings only),
which are encrypted (skip until decrypted).
"""
p = Path(image_path)
if not p.is_file():
return f"Error: {image_path} is not a file."
rc, out, err = await _run(["mmls", str(p)], timeout=30)
if rc != 0:
return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
sector_size, parts = _parse_mmls_with_unit(out)
if not parts:
return f"No partitions detected in {image_path}."
lines = [
f"Android partition survey: {image_path}",
f" mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
f" {len(parts)} data partitions",
"",
"| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
"|---|---|---:|---:|---|---|---|",
]
for prt in parts:
sector_512 = prt["start_native"] * sector_size // 512
bytes_size = prt["length_native"] * sector_size
# human-readable size
if bytes_size >= 1 << 30:
size_h = f"{bytes_size / (1 << 30):.1f} GB"
elif bytes_size >= 1 << 20:
size_h = f"{bytes_size / (1 << 20):.1f} MB"
else:
size_h = f"{bytes_size // 1024} KB"
fs_type = await _classify_partition(str(p), sector_512)
# Try to extract a friendly partition name from the description
# (mmls description often includes the partition name uppercase).
name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
pname = name_match.group(0) if name_match else prt["description"][:20]
hint = _PARTITION_HINTS.get(pname, "")
lines.append(
f"| {prt['slot']} | {pname} | {prt['start_native']} | "
f"{sector_512} | {size_h} | {fs_type} | {hint} |"
)
body = "\n".join(lines)
if len(body) > MAX_OUTPUT:
body = body[:MAX_OUTPUT] + "\n\n[truncated]"
return body

274
tools/mobile_ios.py Normal file
View File

@@ -0,0 +1,274 @@
"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
Kit, no graph dependency. Stdlib + sqlite3 only.
iOS extractions typically arrive as a zip containing domain-rooted trees
(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
binary/XML plists, and several SQLite databases (sms.db, AddressBook,
keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import plistlib
import re
import sqlite3
from pathlib import Path
logger = logging.getLogger(__name__)
# Output cap (chars) — keeps a single tool result under the LLM context budget.
MAX_OUTPUT = 8000
def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
if len(text) <= limit:
return text
return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
# ---------------------------------------------------------------------------
# plist
# ---------------------------------------------------------------------------
def _to_jsonable(obj):
"""Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
import datetime
if isinstance(obj, bytes):
if len(obj) <= 64:
return {"_bytes_hex": obj.hex()}
return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
if isinstance(obj, datetime.datetime):
return obj.isoformat()
if isinstance(obj, dict):
return {str(k): _to_jsonable(v) for k, v in obj.items()}
if isinstance(obj, (list, tuple)):
return [_to_jsonable(v) for v in obj]
return obj
async def parse_plist(file_path: str) -> str:
"""Parse a .plist file (XML or binary) and return its contents as JSON.
Both formats are handled transparently by ``plistlib.load``.
"""
p = Path(file_path)
if not p.is_file():
return f"Error: {file_path} is not a file."
try:
with open(p, "rb") as f:
data = plistlib.load(f)
except plistlib.InvalidFileException as e:
return f"Error: {file_path} is not a valid plist ({e})"
except Exception as e:
return f"Error parsing plist {file_path}: {e}"
serial = _to_jsonable(data)
rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
return header + _trunc(rendered)
# ---------------------------------------------------------------------------
# sqlite
# ---------------------------------------------------------------------------
_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
async def sqlite_tables(db_path: str) -> str:
"""List user tables in a sqlite file with row counts and column names."""
p = Path(db_path)
if not p.is_file():
return f"Error: {db_path} is not a file."
try:
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db_path} (read-only): {e}"
try:
cur = conn.cursor()
cur.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
)
tables = [r[0] for r in cur.fetchall()]
if not tables:
return f"No user tables in {db_path}."
lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
for name in tables:
try:
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
count = cur.fetchone()[0]
except sqlite3.DatabaseError as e:
count = f"(count failed: {e})"
try:
cur.execute(f"PRAGMA table_info(\"{name}\")")
cols = [r[1] for r in cur.fetchall()]
except sqlite3.DatabaseError:
cols = []
lines.append(f" {name}: {count} row(s); cols: {', '.join(cols)}")
return _trunc("\n".join(lines))
finally:
conn.close()
async def sqlite_query(
db_path: str,
query: str,
max_rows: int = 100,
) -> str:
"""Run a single read-only SELECT against a sqlite file.
Multi-statement queries and anything other than a SELECT are rejected
(we open the database in read-only mode anyway, so writes would fail
too — but the explicit check keeps the agent honest).
"""
if not _SELECT_RE.match(query):
return "Error: only single SELECT statements are allowed."
if ";" in query.rstrip(";"):
return "Error: multi-statement queries are not allowed."
p = Path(db_path)
if not p.is_file():
return f"Error: {db_path} is not a file."
try:
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db_path} (read-only): {e}"
try:
cur = conn.cursor()
try:
cur.execute(query)
except sqlite3.DatabaseError as e:
return f"Error executing query: {e}"
cols = [d[0] for d in cur.description] if cur.description else []
rows = cur.fetchmany(max(1, int(max_rows)))
lines = [
f"sqlite query: {db_path}",
f"columns: {cols}",
f"rows ({len(rows)}, capped at {max_rows}):",
]
for row in rows:
rendered = [
(v.hex() if isinstance(v, bytes) else str(v))
for v in row
]
lines.append(" " + " | ".join(rendered))
return _trunc("\n".join(lines))
finally:
conn.close()
# ---------------------------------------------------------------------------
# iOS keychain (keychain-2.db)
# ---------------------------------------------------------------------------
# Standard iOS keychain tables. genp = generic passwords, inet = internet
# passwords, cert = certificates, keys = key material. Forensic extractions
# of locked keychains have ``data`` columns NULL but accounting metadata
# (agrp, acct, svce) intact — already useful for attribution work.
_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
async def parse_ios_keychain(keychain_root: str) -> str:
"""Locate and summarize iOS keychain entries under *keychain_root*.
*keychain_root* may be a path to ``keychain-2.db`` directly or to a
directory that contains it (e.g. ``.../var/keychains``).
"""
root = Path(keychain_root)
db: Path | None = None
if root.is_file() and root.name == "keychain-2.db":
db = root
elif root.is_dir():
candidate = root / "keychain-2.db"
if candidate.is_file():
db = candidate
else:
# Fall back to a shallow recursive search.
for found in root.rglob("keychain-2.db"):
db = found
break
if db is None:
return f"No keychain-2.db found under {keychain_root}."
try:
conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db}: {e}"
try:
cur = conn.cursor()
cur.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name IN ({})".format(
",".join("?" * len(_KEYCHAIN_TABLES))
),
_KEYCHAIN_TABLES,
)
present = [r[0] for r in cur.fetchall()]
if not present:
return f"keychain-2.db at {db} has no recognised tables."
lines = [f"keychain: {db}"]
for name in present:
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
count = cur.fetchone()[0]
lines.append(f"\n[{name}] {count} row(s)")
cur.execute(f"PRAGMA table_info(\"{name}\")")
cols = [r[1] for r in cur.fetchall()]
# Pick a useful subset of accounting columns when present.
preferred = [
c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
if c in cols
]
if not preferred:
preferred = cols[:5]
sel = ", ".join(f'"{c}"' for c in preferred)
cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
for row in cur.fetchall():
lines.append(" " + " | ".join(
(v.hex() if isinstance(v, bytes) else str(v))
for v in row
))
return _trunc("\n".join(lines))
finally:
conn.close()
# ---------------------------------------------------------------------------
# iDevice_info.txt
# ---------------------------------------------------------------------------
async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
"""Read the standard iDevice_info.txt summary at the root of an iOS extraction.
The file is a flat ``Key: value`` dump from libimobiledevice / native
extraction tools. We surface the first *max_chars* of content verbatim
— the agent can search/extract specific keys via search_text_file if
the head isn't enough.
"""
p = Path(file_path)
if p.is_dir():
# Be helpful: if the agent passed the extraction root, find the file.
candidate = p / "iDevice_info.txt"
if candidate.is_file():
p = candidate
if not p.is_file():
return f"Error: {file_path} is not a file."
try:
with open(p, "r", encoding="utf-8", errors="replace") as f:
content = f.read(max_chars)
size = p.stat().st_size
header = f"iDevice_info: {p} ({size} bytes)\n"
if size > max_chars:
content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
return header + content
except Exception as e:
return f"Error reading {file_path}: {e}"

View File

@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
return f"[Error parsing Prefetch: {e}]"
async def list_extracted_dir(dir_path: str) -> str:
"""List files in an extracted directory."""
async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
"""Smart summary of a (potentially huge) extracted tree.
Earlier versions dumped up to 200 random entries then truncated — that
leaves the agent blind on 10k+-file iOS extractions. The new layout
returns a compact summary that scales: total counts, extension
breakdown, top-level directories with their sizes, and the largest
files. For targeted lookups (e.g. find every ``*.sqlite`` under the
tree) the agent should use ``find_files`` instead.
"""
if not os.path.isdir(dir_path):
return f"[Error: {dir_path} is not a directory]"
try:
entries = []
for root, dirs, files in os.walk(dir_path):
total_files = 0
total_bytes = 0
ext_counts: dict[str, int] = {}
ext_bytes: dict[str, int] = {}
top_level_dirs: dict[str, dict] = {}
biggest: list[tuple[int, str]] = [] # (size, relpath)
dir_path_abs = os.path.abspath(dir_path)
for root, dirs, files in os.walk(dir_path_abs):
# Track top-level directory aggregates (cheap; no per-entry cost
# beyond the walk we're already doing).
rel_root = os.path.relpath(root, dir_path_abs)
if rel_root == ".":
top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
top_level_dirs.update(top_dirs)
top_key = None
else:
top_key = rel_root.split(os.sep, 1)[0]
if top_key not in top_level_dirs:
top_level_dirs[top_key] = {"files": 0, "bytes": 0}
for f in files:
full = os.path.join(root, f)
rel = os.path.relpath(full, dir_path)
size = os.path.getsize(full)
entries.append(f" {rel} ({size} bytes)")
if len(entries) > 200:
entries.append(f" ... (truncated)")
break
try:
size = os.path.getsize(full)
except OSError:
continue
total_files += 1
total_bytes += size
ext = os.path.splitext(f)[1].lower() or "(no ext)"
ext_counts[ext] = ext_counts.get(ext, 0) + 1
ext_bytes[ext] = ext_bytes.get(ext, 0) + size
if top_key is not None:
top_level_dirs[top_key]["files"] += 1
top_level_dirs[top_key]["bytes"] += size
# Maintain a top-10 largest list cheaply (bounded insertion).
if len(biggest) < 10:
biggest.append((size, os.path.relpath(full, dir_path_abs)))
biggest.sort(reverse=True)
elif size > biggest[-1][0]:
biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
biggest.sort(reverse=True)
return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
def _human(n: int) -> str:
for unit in ("B", "KB", "MB", "GB"):
if n < 1024:
return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
n /= 1024
return f"{n:.1f}TB"
lines = [
f"Directory: {dir_path}",
f" Total: {total_files} file(s), {_human(total_bytes)}",
]
# Top-level directory layout (immediate children, sorted by file count).
if top_level_dirs:
lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
sorted_tlds = sorted(
top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
)[:15]
for d, stats in sorted_tlds:
lines.append(
f" {d}/ ({stats['files']} files, {_human(stats['bytes'])})"
)
if len(top_level_dirs) > 15:
lines.append(f" ... ({len(top_level_dirs) - 15} more top-level dirs)")
# Extension breakdown.
if ext_counts:
lines.append(f"\nExtension breakdown (top 15):")
for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
lines.append(
f" {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
)
# Largest files (often the highest-value forensic targets).
if biggest:
lines.append("\nLargest files:")
for size, rel in biggest:
lines.append(f" {rel} ({_human(size)})")
lines.append(
f"\nNext step: call find_files with a pattern like "
f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
)
return "\n".join(lines)
except Exception as e:
return f"[Error listing {dir_path}: {e}]"
async def find_files(
root: str,
pattern: str,
max_results: int = 500,
) -> str:
"""Recursively find files under *root* whose path matches *pattern*.
Uses fnmatch-style globs against the *full relative path*; ``**`` is
treated as "any number of path segments" (so ``**/*.plist`` finds
every plist no matter how deep). Examples:
- ``**/sms.db`` — iOS SMS database
- ``**/keychain-2.db`` — iOS keychain
- ``**/ChatStorage.sqlite`` — WhatsApp app store
- ``HomeDomain/Library/**`` — anchor at a known iOS domain root
- ``**/*.{plist,sqlite,db}`` — multi-extension (use 2+ calls or a regex if needed)
Results are sorted by size descending — the biggest hits usually
matter most. Capped at *max_results* to keep the LLM context bounded.
"""
import fnmatch
if not os.path.isdir(root):
return f"[Error: {root} is not a directory]"
root_abs = os.path.abspath(root)
# Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
# fnmatch doesn't natively distinguish segment vs path; expanding ``**``
# to ``*`` and letting fnmatch match the full relpath is good enough for
# forensic lookups.
fn_pattern = pattern.replace("**", "*")
hits: list[tuple[int, str]] = []
truncated = False
try:
for dirpath, _dirs, files in os.walk(root_abs):
for f in files:
full = os.path.join(dirpath, f)
rel = os.path.relpath(full, root_abs)
if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
try:
size = os.path.getsize(full)
except OSError:
size = 0
hits.append((size, rel))
if len(hits) >= max_results * 4:
# Hard upper bound to keep the walk cheap on huge trees.
truncated = True
break
if truncated:
break
except Exception as e:
return f"[Error searching {root}: {e}]"
hits.sort(reverse=True)
if len(hits) > max_results:
truncated = True
hits = hits[:max_results]
lines = [
f"find_files: pattern={pattern!r} under {root}",
f" matches: {len(hits)}" + (" (truncated)" if truncated else ""),
]
if not hits:
lines.append(" (no matches)")
else:
for size, rel in hits:
lines.append(f" {rel} ({size} bytes)")
return "\n".join(lines)

485
tools/strategy.py Normal file
View File

@@ -0,0 +1,485 @@
"""Strategist-loop tools — read-only views over graph state that let the
InvestigationStrategist agent decide whether to keep investigating or to
declare the investigation complete.
DESIGN_STRATEGIST.md §2. Four read-only views:
graph_overview() → hypotheses + sources + pending leads snapshot
source_coverage(src_id) → which artefact categories on this source have
been touched vs are still ✗
marginal_yield(n_rounds) → how much information the last N rounds added
budget_status() → tool calls / rounds / wall-clock against caps
These are pure render functions over the graph — they MUST NOT mutate state.
The strategist never writes phenomena/edges directly; all graph mutations
happen through worker agents that the strategist dispatches via propose_lead
(which is registered separately in tool_registry).
"""
from __future__ import annotations
import time
from typing import Any
# ---------------------------------------------------------------------------
# Expected artefact catalogue (per source type)
#
# These are SOFT HINTS — items the strategist might want to check on a given
# source type if any active hypothesis depends on them. The catalogue is
# intentionally compact; expand it in-place when a new forensic specialty
# joins the toolset. Each entry:
#
# name human-readable artefact category
# detector how to recognise that this category has been touched — either
# a tool name OR a `<tool>@<path-substring>` pattern, joined with
# `|` for alternatives. The matcher is substring on the tool name
# and on the args' string representation.
# value_for one-line description of why this category might matter
# ---------------------------------------------------------------------------
EXPECTED_ARTEFACTS: dict[str, list[dict[str, str]]] = {
"disk_image+windows": [
{"name": "partition layout", "detector": "partition_info|mmls",
"value_for": "deleted files, hidden partitions"},
{"name": "filesystem walk", "detector": "list_directory|fls",
"value_for": "directory tree, recoverable deleted entries"},
{"name": "registry hives", "detector": "parse_registry_key|list_installed_software|get_user_activity",
"value_for": "installed software, user activity, timezone"},
{"name": "browser history", "detector": "list_directory@AppData|read_text_file@History|read_text_file@Bookmarks",
"value_for": "URL access, downloads, web search terms"},
{"name": "prefetch", "detector": "parse_prefetch|extract_file@Prefetch",
"value_for": "program execution evidence"},
{"name": "email/IM config", "detector": "get_email_config",
"value_for": "user accounts, configured mail/IM clients"},
{"name": "recycle bin", "detector": "list_directory@$Recycle|count_deleted_files",
"value_for": "deleted file metadata and recovery"},
],
"disk_image+android": [
{"name": "partition probe", "detector": "probe_android_partitions",
"value_for": "discover EFS / SYSTEM / USERDATA layout"},
{"name": "system properties", "detector": "read_text_file@build.prop|read_text_file@default.prop",
"value_for": "device model, OS version, CSC region"},
{"name": "app inventory", "detector": "list_directory@data/app|list_directory@data/data",
"value_for": "installed apps, package names"},
{"name": "user data dbs", "detector": "list_directory@data/data|sqlite_query",
"value_for": "messages, contacts, app-specific data"},
{"name": "device identity", "detector": "search_strings@imei|search_strings@serial|search_strings@DRI",
"value_for": "IMEI, serial, device fingerprint"},
],
"mobile_extraction": [
{"name": "device info", "detector": "read_idevice_info|read_text_file@iDevice_info",
"value_for": "model, iOS version, IMEI, ICCID, Bluetooth MAC, UDID"},
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb",
"value_for": "contacts, owner identity"},
{"name": "SMS / iMessage", "detector": "sqlite_query@sms.db",
"value_for": "messaging content, OTP / verification codes"},
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite|sqlite_query@WhatsApp",
"value_for": "WhatsApp content, group membership, call records"},
{"name": "WeChat", "detector": "sqlite_query@MM.sqlite|sqlite_query@wcdb|list_directory@WeChat",
"value_for": "WeChat IDs, messages, follow targets"},
{"name": "Call history", "detector": "sqlite_query@CallHistory|sqlite_query@call_history",
"value_for": "incoming/outgoing call log"},
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text_file@Bookmarks.plist|parse_plist@Bookmarks",
"value_for": "URL access, bookmarks, search queries"},
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite|parse_plist@Photos",
"value_for": "photo metadata, EXIF, geolocation, source app"},
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3|parse_ios_keychain",
"value_for": "Apple ID, registered services, authentication tokens"},
{"name": "app inventory", "detector": "list_directory@Bundle/Application|list_directory@Containers",
"value_for": "installed apps, app-specific containers"},
{"name": "Wi-Fi history", "detector": "parse_plist@com.apple.wifi|read_text_file@known_networks",
"value_for": "connected SSIDs, keys, first/last seen times"},
],
"media_collection": [
{"name": "archive unpack", "detector": "unzip_archive|list_directory",
"value_for": "extract images / docs for downstream analysis"},
{"name": "OCR text", "detector": "ocr_image",
"value_for": "screenshot text content (chat, transaction, IDs)"},
{"name": "metadata", "detector": "read_binary_preview|search_strings",
"value_for": "EXIF, embedded timestamps, device fingerprints"},
],
"archive": [
{"name": "archive unpack", "detector": "unzip_archive",
"value_for": "expose contents for further analysis"},
],
}
def _key_for_source(src) -> str:
"""Return the EXPECTED_ARTEFACTS key for a source: 'disk_image+platform'
when platform is set in meta, otherwise just the source type."""
src_type = getattr(src, "type", "")
if src_type == "disk_image":
platform = (getattr(src, "meta", {}) or {}).get("platform", "").lower()
if platform:
return f"disk_image+{platform}"
return src_type
def _detector_matches(detector: str, tool_name: str, args_str: str) -> bool:
"""Return True if any '|'-separated branch of `detector` matches.
A branch like ``sqlite_query@AddressBook.sqlitedb`` requires both the
tool name (substring) AND the args (substring) to match. A branch like
``parse_prefetch`` is a tool-name-only check.
"""
for branch in detector.split("|"):
branch = branch.strip()
if not branch:
continue
if "@" in branch:
t, sub = branch.split("@", 1)
if t in tool_name and sub.lower() in args_str.lower():
return True
else:
if branch in tool_name:
return True
return False
# ---------------------------------------------------------------------------
# graph_overview()
# ---------------------------------------------------------------------------
def graph_overview(graph) -> str:
"""Render hypotheses + sources + pending leads as the strategist's
primary decision view.
Annotates each hypothesis with the count of distinct sources that
contribute supporting (positive-LR) edges. A hypothesis with many edges
but only one source is a strategist signal to seek cross-source
corroboration.
"""
lines: list[str] = ["# Investigation State", ""]
# Hypotheses table.
if graph.hypotheses:
lines.append(f"## Hypotheses ({len(graph.hypotheses)})")
lines.append("")
lines.append(
"| id | title | L | conf | status | edges_in | distinct_sources | recent_flip |"
)
lines.append("|----|-------|---|------|--------|---------:|-----------------:|--------------|")
# Sort by absolute log-odds magnitude descending so the strategist
# sees the most decided hypotheses first; active ones float to the
# middle of the table where decisions matter most.
for hid, h in sorted(
graph.hypotheses.items(),
key=lambda kv: (kv[1].status != "active", -abs(kv[1].log_odds)),
):
in_edges = graph._adj_rev.get(hid, [])
edges_in = len(in_edges)
# Distinct sources contributing edges (looked up via source
# phenomenon's source_id; entity→entity edges have no source).
distinct_sources: set[str] = set()
for e in in_edges:
src_node = graph.phenomena.get(e.source_id)
if src_node is not None and src_node.source_id:
distinct_sources.add(src_node.source_id)
# Did this hypothesis's status change in the last 2 rounds?
recent = "no"
recent_rounds = graph.investigation_rounds[-2:]
for r in recent_rounds:
before = r.hypothesis_status_snapshot_before.get(hid)
after = r.hypothesis_status_snapshot_after.get(hid)
if before and after and before != after:
recent = f"yes ({before}{after} in R{r.round_number})"
break
title = (h.title or "")[:60].replace("|", "/")
lines.append(
f"| {hid[:14]} | {title} | {h.log_odds:+.2f} | "
f"{h.confidence:.2f} | {h.status} | {edges_in} | "
f"{len(distinct_sources)} | {recent} |"
)
lines.append("")
else:
lines.append("## Hypotheses\n\n_(none yet — Phase 2 has not produced any)_\n")
# Sources table.
if graph.case and graph.case.sources:
lines.append(f"## Sources ({len(graph.case.sources)})")
lines.append("")
lines.append(
"| id | type | phenomena | identities | last_touched_in_round |"
)
lines.append("|----|------|----------:|-----------:|----------------------|")
for src in graph.case.sources:
ph_count = sum(
1 for p in graph.phenomena.values() if p.source_id == src.id
)
id_count = sum(
1 for e in graph.entities.values()
for i in e.identifiers
if any(
p.source_id == src.id
for p in graph.phenomena.values()
if p.id == i.get("phenomenon_id")
)
)
# Latest round in which a tool invocation was made against this src.
last_r = ""
for r in reversed(graph.investigation_rounds):
if r.new_phenomena_count > 0:
# Heuristic: if any phenomenon created during this round
# was on this source, mark this round as the last touch.
in_round = [
p for p in graph.phenomena.values()
if p.source_id == src.id
and r.started_at <= p.created_at
and (not r.completed_at or p.created_at <= r.completed_at)
]
if in_round:
last_r = f"R{r.round_number}"
break
lines.append(
f"| {src.id} | {src.type} | {ph_count} | {id_count} | {last_r} |"
)
lines.append("")
# Pending leads.
pending = [l for l in graph.leads if l.status == "pending"]
if pending:
lines.append(f"## Pending Leads ({len(pending)})")
lines.append("")
lines.append("| id | from | target_agent | for_hypothesis | description |")
lines.append("|----|------|--------------|----------------|-------------|")
for l in pending[:20]:
desc = (l.description or "")[:80].replace("|", "/")
mh = l.motivating_hypothesis or l.hypothesis_id or ""
lines.append(
f"| {l.id} | {l.proposed_by or ''} | {l.target_agent} | "
f"{mh[:14] if mh != '' else ''} | {desc} |"
)
if len(pending) > 20:
lines.append(f"\n_(+{len(pending) - 20} more pending leads not shown)_")
lines.append("")
else:
lines.append("## Pending Leads\n\n_(none — no investigations queued)_\n")
# Interpretation hint at the end, plain English.
lines.append("---")
lines.append(
"**Interpretation hints**: A hypothesis with many edges but only one "
"distinct_source has fragile cross-source independence — a single "
"edge from a *different* source would do more for it than another "
"edge from the same source (harmonic damping makes repeats cheap). "
"Hypotheses in the active band (0.2 < conf < 0.8) are the ones a "
"well-targeted lead can flip. recent_flip = 'yes' means belief is "
"still moving on that hypothesis; 'no' across 2 rounds suggests "
"stability."
)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# source_coverage(source_id)
# ---------------------------------------------------------------------------
def source_coverage(graph, source_id: str) -> str:
"""Render which expected artefact categories have been touched on
*source_id*, and which remain ✗.
Output is markdown. The closing paragraph reminds the strategist that
coverage hints are heuristics — investigate ✗ items only when an active
hypothesis depends on them. This is the design's central guardrail
against the system devolving into a fixed forensic checklist.
"""
src = graph.case.get_source(source_id) if graph.case else None
if src is None:
return f"Error: source_id {source_id!r} not found in case."
key = _key_for_source(src)
expected = EXPECTED_ARTEFACTS.get(key, [])
# Collect this source's invocation history.
invs = [
inv for inv in graph.tool_invocations.values()
if inv.source_id == source_id
]
# For each expected category, decide ✓ / ✗ + show example invocation if ✓.
rows: list[tuple[str, str, str, str]] = []
for entry in expected:
name = entry["name"]
detector = entry["detector"]
value_for = entry["value_for"]
matched: str | None = None
for inv in invs:
args_str = ""
try:
args_str = " ".join(f"{k}={v}" for k, v in (inv.args or {}).items())
except Exception:
args_str = str(inv.args)
if _detector_matches(detector, inv.tool, args_str):
matched = f"{inv.tool}({args_str[:60]})"
break
mark = "" if matched else ""
evidence = matched or ""
rows.append((mark, name, evidence, value_for))
lines: list[str] = [
f"# Coverage of source `{source_id}` ({src.label})",
"",
f"Source type: `{src.type}` / access_mode: `{src.access_mode}`",
f"Invocations made against this source: **{len(invs)}**",
"",
]
if not expected:
lines.append(
f"_(no expected-artefact catalogue entry for source type `{key}` — "
"coverage cannot be assessed against a baseline)_"
)
else:
lines.append(
"| ✓/✗ | category | example invocation | what it would tell us |"
)
lines.append("|-----|----------|---------------------|------------------------|")
for mark, name, evidence, value_for in rows:
lines.append(
f"| {mark} | {name} | {evidence[:70].replace('|','/')} | {value_for} |"
)
n_covered = sum(1 for r in rows if r[0] == "")
n_total = len(rows)
lines.append("")
lines.append(f"Coverage: **{n_covered}/{n_total}** ({n_covered*100//max(n_total,1)}%)")
# Other invocations on this source that didn't match any expected entry —
# could be genuine novel exploration; strategist might want to know.
lines.append("")
lines.append("---")
lines.append(
"**Coverage hints are heuristics, not requirements.** Skip an item if "
"the case theory makes it irrelevant — a financial-fraud case has no "
"reason to OCR every photo. Investigate ✗ items only when they could "
"materially affect an active hypothesis. If you propose a lead just "
"because something is ✗, the strategist prompt is being misused."
)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# marginal_yield(last_n_rounds)
# ---------------------------------------------------------------------------
def marginal_yield(graph, last_n_rounds: int = 2) -> str:
"""Render the last N investigation rounds' yield deltas.
Yield columns:
- new_phenomena: phenomena created during the round
- new_edges: edges (any direction) added during the round
- status_flips: hypotheses whose status changed during the round
A row of zeros means that round didn't move the graph. Two consecutive
such rows is strong evidence of diminishing returns; the strategist
should consider declare_investigation_complete with reason
marginal_yield_zero.
"""
rounds = [r for r in graph.investigation_rounds if r.completed_at]
if not rounds:
return (
"# Marginal Yield\n\n"
"_(no completed investigation rounds yet — yield not applicable)_"
)
recent = rounds[-max(1, last_n_rounds):]
lines = [f"# Marginal Yield (last {len(recent)} of {len(rounds)} rounds)", ""]
lines.append("| round | new_phenomena | new_edges | status_flips |")
lines.append("|-------|--------------:|----------:|-------------:|")
yields: list[tuple[int, int, int]] = []
for r in recent:
yields.append((r.new_phenomena_count, r.new_edges_count, r.status_flips))
lines.append(
f"| R{r.round_number} | {r.new_phenomena_count} | "
f"{r.new_edges_count} | {r.status_flips} |"
)
# Trend interpretation aid.
lines.append("")
if all(y == (0, 0, 0) for y in yields):
trend = (
"Yield is zero across these rounds — diminishing returns are "
"confirmed. Strongly consider declare_investigation_complete "
"(reason: marginal_yield_zero)."
)
elif len(yields) >= 2:
first = yields[0][0] + yields[0][1] + yields[0][2]
last = yields[-1][0] + yields[-1][1] + yields[-1][2]
if last == 0 and first > 0:
trend = (
"Yield collapsed to zero in the most recent round. One more "
"well-targeted probe is reasonable; another zero-yield round "
"after that means stop."
)
elif last < first / 2 and first > 0:
trend = (
f"Decelerating ({last}/{first}"
f"{int(100*last/first)}% of the earlier round). Diminishing "
"returns are accumulating."
)
else:
trend = "Yield is still active — further investigation is paying off."
else:
trend = (
"Only one completed round — too early to call a trend. Run at "
"least one more before considering completion."
)
lines.append(f"**Trend**: {trend}")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# budget_status()
# ---------------------------------------------------------------------------
def budget_status(graph, budgets: dict[str, Any] | None, start_time: float | None) -> str:
"""Render budget usage against config.yaml `budgets` block.
Counters:
- tool_calls: len(graph.tool_invocations)
- strategist_rounds: len(graph.investigation_rounds)
- wall_clock_minutes: now - start_time (when start_time is supplied)
"""
budgets = budgets or {}
tool_calls_used = len(graph.tool_invocations)
rounds_used = len(graph.investigation_rounds)
minutes_used: float | None = None
if start_time is not None:
minutes_used = (time.monotonic() - start_time) / 60.0
def _row(name: str, used: float, cap: Any) -> str:
if cap is None:
return f"| {name} | {used:g} | — | (unbounded) |"
pct = (used / cap) * 100 if cap else 0
return f"| {name} | {used:g} | {cap} | {pct:.0f}% |"
lines = ["# Budget Status", ""]
lines.append("| metric | used | cap | pct |")
lines.append("|--------|-----:|----:|----:|")
lines.append(_row("tool_calls", tool_calls_used, budgets.get("tool_calls_total")))
lines.append(_row("strategist_rounds", rounds_used, budgets.get("strategist_rounds_max")))
if minutes_used is not None:
lines.append(_row(
"wall_clock_minutes", round(minutes_used, 1),
budgets.get("wall_clock_minutes_max"),
))
# Pacing hint.
lines.append("")
flags = []
cap_calls = budgets.get("tool_calls_total")
cap_rounds = budgets.get("strategist_rounds_max")
if cap_calls and tool_calls_used / cap_calls >= 0.9:
flags.append("tool_calls budget ≥ 90% used — favour declare_complete")
if cap_rounds and rounds_used / cap_rounds >= 0.7:
flags.append("strategist rounds ≥ 70% used — only propose leads with high expected yield")
if flags:
lines.append("**Budget warnings**:")
for f in flags:
lines.append(f"- {f}")
else:
lines.append(
"Budget room remains. Standard rule: each propose_lead should "
"name a specific hypothesis it expects to move; otherwise skip it."
)
return "\n".join(lines)

50
uv.lock generated
View File

@@ -170,6 +170,8 @@ source = { virtual = "." }
dependencies = [
{ name = "httpx", extra = ["socks"] },
{ name = "openai" },
{ name = "pillow" },
{ name = "pytesseract" },
{ name = "pyyaml" },
{ name = "regipy" },
]
@@ -184,6 +186,8 @@ dev = [
requires-dist = [
{ name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
{ name = "openai", specifier = ">=2.36.0" },
{ name = "pillow", specifier = ">=12.2.0" },
{ name = "pytesseract", specifier = ">=0.3.13" },
{ name = "pyyaml" },
{ name = "regipy", specifier = ">=6.2.1" },
]
@@ -222,6 +226,39 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
]
[[package]]
name = "pillow"
version = "12.2.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
{ url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
{ url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
{ url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
{ url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
{ url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
{ url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
{ url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
{ url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
{ url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
{ url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
{ url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
{ url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
{ url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
{ url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
{ url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
{ url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
{ url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
{ url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
{ url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
{ url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
{ url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
{ url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
{ url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
{ url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
]
[[package]]
name = "pluggy"
version = "1.6.0"
@@ -296,6 +333,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
]
[[package]]
name = "pytesseract"
version = "0.3.13"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "packaging" },
{ name = "pillow" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
]
[[package]]
name = "pytest"
version = "9.0.2"