Compare commits
12 Commits
main
...
8b964b5dec
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8b964b5dec | ||
|
|
388321ee30 | ||
|
|
093f3cec1f | ||
|
|
a103c17bdb | ||
|
|
65745d21dc | ||
|
|
ff3a05d7ce | ||
|
|
6ebbc675c1 | ||
|
|
ca96f29849 | ||
|
|
8020c24776 | ||
|
|
f04ccd4bc7 | ||
|
|
6b485b98f7 | ||
|
|
81ade8f7ac |
317
DESIGN.md
Normal file
317
DESIGN.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# MASForensics 系统改造设计
|
||||
|
||||
> 目标:把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
|
||||
> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
|
||||
> (已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿)。
|
||||
>
|
||||
> 触发本次改造的实际案件:2025 美亚杯资格赛 Individual —— 5 份证据
|
||||
> (1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图),
|
||||
> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
|
||||
|
||||
---
|
||||
|
||||
## 1. 设计原则(贯穿全文的不变式)
|
||||
|
||||
1. **LLM 提议,代码裁决**。LLM 负责语言/分类/感知;它**不持有案件状态、
|
||||
不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
|
||||
2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
|
||||
3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中;
|
||||
支持一种新设备 = 写插件,绝不改核心。
|
||||
4. **看似不可逆的操作(如实体归并)实为可逆、带证据的论断**,可被推翻。
|
||||
|
||||
这四条不是口号——下文每个设计决策都对应其中一条。
|
||||
|
||||
---
|
||||
|
||||
## 2. 现状问题诊断
|
||||
|
||||
| # | 问题 | 位置 | 后果 |
|
||||
|---|---|---|---|
|
||||
| P1 | **单镜像假设深植**:工具是闭包绑死 `image_path`,图是单源,主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools`、`main.py:91-153` | 无法摄取多份证据,无法跨设备关联 |
|
||||
| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话,错误事实进入案件记录且**事后无法识别** |
|
||||
| P3 | **置信度公式无统计含义且有序依赖缺陷**:`delta=weight*(1-conf)`(正)/`weight*conf`(负),正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
|
||||
| P4 | **工件分类是 Windows 专属**:靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
|
||||
| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
|
||||
| P6 | **镜像发现靠扩展名 glob**,`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
|
||||
| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备,跨源关联无锚点 |
|
||||
|
||||
改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
|
||||
|
||||
---
|
||||
|
||||
## 3. 目标架构
|
||||
|
||||
```
|
||||
case.yaml ──► Case ──► N × EvidenceSource
|
||||
├ id / type / owner / path
|
||||
└ access_mode: image | tree
|
||||
│
|
||||
┌──────────────┴───────────────┐
|
||||
image-backed tree-backed
|
||||
(TSK, inode 寻址) (路径寻址:已挂载/已解包)
|
||||
│ │
|
||||
└────────────┬─────────────────┘
|
||||
▼
|
||||
SourceRegistry ── source_id → SourceHandle(解析 path/offset/mode)
|
||||
│
|
||||
ToolRegistry ── 工具按 access_mode 注册,调用时绑定 source_id
|
||||
│
|
||||
┌──────────────────────┼───────────────────────┐
|
||||
▼ ▼ ▼
|
||||
Knowledge-Source Graph Write Gateway ToolInvocationLog
|
||||
Agents (LLM) ──► (唯一写入口,强制 (每次工具调用留痕:
|
||||
只能经网关写图 前置条件 = grounding) args / 输出 / sha256)
|
||||
│ │
|
||||
└──────────────────────┴──► Grounded Evidence Graph (GEG)
|
||||
Phenomenon / Hypothesis / Entity
|
||||
置信度 = 对数几率累加
|
||||
```
|
||||
|
||||
**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
|
||||
`AgentFactory` 动态组合——这些设计是好的,不重写,只适配。
|
||||
|
||||
---
|
||||
|
||||
## 4. 核心设计
|
||||
|
||||
### 4.1 证据源抽象(解决 P1/P5/P6/P7,地基)
|
||||
|
||||
新增 `case.py`:
|
||||
|
||||
- **`EvidenceSource`** 数据类:`id`、`label`、`type`、`owner`(关联人)、
|
||||
`path`、`access_mode`、`meta`(类型特定,如分区 offset / 解包后根目录)。
|
||||
- **`Case`**:持有 `list[EvidenceSource]` + 案件元数据,从 `case.yaml` 加载。
|
||||
- **`access_mode` 是关键设计区分**:
|
||||
- `image`:块设备/磁盘镜像,用 TSK 按 inode 寻址(USB E01、安卓 `blk0_sda` 各分区)。
|
||||
- `tree`:已挂载文件系统或已解包目录,按路径寻址(iOS 提取解压后、归档展开后)。
|
||||
- 工具按 access_mode 分族注册(见 4.2)。一份证据可经「准备」从 image 变为 tree
|
||||
(如分区 mount、zip 解包)。
|
||||
|
||||
`main.py` 的 `select_image_interactive`(:91-153)改为加载/构造 `Case`;
|
||||
`_IMAGE_GLOBS` 改为类型探测(`mmls` 试探 + 文件头嗅探),不再靠扩展名。
|
||||
`config.yaml` 删除 `cfreds_hacking_case`,案件信息移入 `case.yaml`。
|
||||
|
||||
### 4.2 工具注册按源参数化(解决 P1)
|
||||
|
||||
现状:`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
|
||||
(`tool_registry.py:159+`)。改造:
|
||||
|
||||
- 工具执行器签名增加 `source_id`;执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
|
||||
- `TOOL_CATALOG` 按 `access_mode` 标注工具适用性;agent 拿到的工具集由其
|
||||
负责的源类型决定。
|
||||
- **「当前源」上下文**:编排器为 agent 设置 current source(类比现有
|
||||
`graph._current_agent`),工具默认作用于它——LLM 不必每次传 `source_id`
|
||||
(减少出错)。跨源工具(时间线合并、实体查询)显式跨源。
|
||||
- 缓存键 `_cache_key`(`tool_registry.py:41`)纳入 `source_id`,防止跨源串味。
|
||||
|
||||
### 4.3 图写入网关(解决 P2,落实原则 1)
|
||||
|
||||
现状:agent 通过 `add_phenomenon` 等工具直接写图,约束只在 prompt。改造:
|
||||
|
||||
- 所有图变更(`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …)
|
||||
收敛到**一个写入网关**。网关在代码层强制前置条件。
|
||||
- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
|
||||
(INVESTIGATE→RECORD→LINK→ANSWER)不变——变的是 RECORD 这一步底下的网关变严。
|
||||
- `base_agent.py` 的 `mandatory_record_tools` 机制保留(它保证 agent 真的记录了东西)。
|
||||
|
||||
### 4.4 证据落地约束 Grounding(解决 P2,落实原则 2)
|
||||
|
||||
这是系统可靠性的核心机制。
|
||||
|
||||
**ToolInvocationLog**:每次工具调用留痕一条记录
|
||||
`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`。
|
||||
现有结果缓存(`tool_registry.py:29`)已存确定性输出,扩展为完整留痕即可。
|
||||
|
||||
**Phenomenon 一分为二**——把「事实」和「解读」分开:
|
||||
|
||||
- `verified_facts`: `list[{type, value, invocation_id}]`,
|
||||
`type ∈ {path, timestamp, inode, hash, identifier, count, ...}`。
|
||||
- `interpretation`: 自由文本,agent 的分析叙述。
|
||||
|
||||
**`add_phenomenon` 网关前置条件**:
|
||||
|
||||
1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`。
|
||||
2. 代码校验 `fact.value` 命中该次调用的输出:
|
||||
- 文本输出 → 逐字 substring 匹配;
|
||||
- 结构化/二进制工具输出 → 与解析后的字段匹配。
|
||||
3. 任一 fact 不通过 → **整条拒绝写入**,返回失败的 fact,agent 须修正重试。
|
||||
4. 通过 → 写入;`verified_facts` 每条带 `invocation_id`(可重跑复核),
|
||||
`interpretation` 标记为「未核验分析」。
|
||||
|
||||
**效果**:在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
|
||||
**结构性地不可能**。LLM 仍可能写错 `interpretation`,但报告会把
|
||||
verified facts(带重跑指令的引证)与 interpretation(明确标注的分析)
|
||||
**分开渲染**,人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
|
||||
|
||||
> 现有 `_make_auto_record`(`tool_registry.py:126`)把工具输出直接转 phenomenon——
|
||||
> 那是「平凡落地」的特例(描述即输出),新设计是它的一般化与形式化。
|
||||
|
||||
### 4.5 假设置信度:似然比 / 对数几率(解决 P3)
|
||||
|
||||
把 `evidence_graph.py:26` 的 `_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
|
||||
换成基于**似然比(LR)**的对数几率累加:
|
||||
|
||||
- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
|
||||
(这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …),
|
||||
数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**(延续现有「LLM 选类型、
|
||||
代码算数值」哲学并赋予统计基础)。
|
||||
- 置信度更新:
|
||||
```
|
||||
L_post = L_prior + Σ log₁₀(LR_i) # 对数几率,可交换 → 无序依赖
|
||||
confidence = 1 / (1 + 10^(−L_post))
|
||||
```
|
||||
- 边类型 → `log₁₀(LR)` 标定表(初值,后续可由标注案例校准):
|
||||
|
||||
| 边类型 | log₁₀LR |
|
||||
|---|---:|
|
||||
| `direct_evidence` | +2.0 |
|
||||
| `supports` / `consequence_observed` | +1.0 |
|
||||
| `prerequisite_met` | +0.5 |
|
||||
| `weakens` | −0.5 |
|
||||
| `contradicts` | −2.0 |
|
||||
|
||||
- 阈值不变(≥0.8 supported / ≤0.2 refuted),只是改由 `L_post` 推出。
|
||||
- `prior_prob` 成为可配置量(默认 0.5 → `L_prior=0`)。
|
||||
- **同类证据调和衰减**(2026-05 落地):同 `(hypothesis, edge_type)` 的第 k 条边
|
||||
贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`(调和级数,~ ln N)。
|
||||
解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
|
||||
(2026-05-20 实战数据)。单条边不变(k=1, 衰减=1.0)。**结构信号**比绝对值
|
||||
更重要:strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
|
||||
|
||||
附带产出一个 **假设 × 证据矩阵**视图,供报告与线索选择使用。
|
||||
|
||||
### 4.6 跨源实体解析(解决「复杂场景」的关联难题,落实原则 4)
|
||||
|
||||
复杂取证的核心难题:iPhone keychain 里的 Apple ID、安卓短信库里的号码、
|
||||
USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人?**
|
||||
|
||||
**关键设计:「身份共指」本身就是一条假设**——于是实体解析不是独立子系统,
|
||||
而是 4.5 假设机制的复用:
|
||||
|
||||
- agent 观察到标识符即经网关 `observe_identity`,记一条**类型化**的标识符
|
||||
(强标识符:IMEI / 钱包地址 / email / 电话号;弱标识符:昵称 / 显示名),
|
||||
挂到暂定 `Entity`。
|
||||
- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`;共享强标识符 = 强 +LR 边,
|
||||
共享弱标识符 = 弱 +LR 边,冲突的强标识符 = 强 −LR 边——用 4.5 同一套计算打分。
|
||||
- **不做破坏性归并**:跨阈值时在两个 Entity 间加一条 `same_as` 边(由该 coref
|
||||
假设背书)。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
|
||||
可被后续 contradicts 证据推翻**(落实原则 4)。
|
||||
- **Blocking**:只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设,
|
||||
避免 O(n²)。
|
||||
|
||||
跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
|
||||
|
||||
### 4.7 能力插件层(接入 5 类证据)
|
||||
|
||||
每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
|
||||
|
||||
| 插件 | 摄取 | 新工具 | 知识源 agent |
|
||||
|---|---|---|---|
|
||||
| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
|
||||
| **安卓整盘** | `mmls` 分区→各分区 `image` 源;可 mount 为 `tree` | 复用 TSK;ext4/F2FS 读取;`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
|
||||
| **磁盘镜像(E01)** | 已支持(TSK 含 ewf) | 现有 TSK 工具链 | 现有 filesystem/registry |
|
||||
| **归档** | `unzip_archive` 通用解包 | —— | —— |
|
||||
| **媒体/截图** | —— | `ocr_image`(tesseract;注意 DeepSeek 无视觉能力,必须走 OCR) | `MediaAgent` |
|
||||
|
||||
**安卓风险**:`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
|
||||
探明:未加密→TSK 直接用;加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
|
||||
|
||||
`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展:分类由源插件提供自己的
|
||||
工件分类表,而非全局 Windows 关键词表(解决 P4)。
|
||||
|
||||
### 4.8 Agent 体系重组
|
||||
|
||||
现有 7 个 agent 按 Windows 工件命名(registry、communication=邮件/IRC、
|
||||
network=浏览器/PCAP)。改为按**调查职能**组织,并增加平台特定 agent:
|
||||
|
||||
- `agent_factory.py` 的 `_AGENT_CLASSES`(:34-40)扩充:新增 `ios_artifact`、
|
||||
`android_artifact`、`financial`(钱包/交易)、`media`。
|
||||
- `communication` 泛化:邮件 + IM + 短信,跨平台。
|
||||
- 新增 **源类型 → 适任 agent** 映射,供 Phase 1 逐源派 triage agent。
|
||||
- `create_specialized_agent`(:69)的动态组合机制保留——它本就是应对能力缺口的
|
||||
正确手段,只是工具目录变大后选择空间更丰富。
|
||||
|
||||
### 4.9 编排器多源流水线
|
||||
|
||||
| 阶段 | 改造 |
|
||||
|---|---|
|
||||
| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**,每源派类型适配的 agent |
|
||||
| Phase 2 | 假设跨源生成;身份共指假设在此首次登记 |
|
||||
| Phase 3 | **Strategist 循环**:LLM 元 agent 每轮看图决定 propose_lead 或 declare_complete;workers 执行 lead;hypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
|
||||
| Phase 4 | 跨源时间线合并,**按源做时区归一**(iOS UTC vs 安卓本地时间) |
|
||||
| Phase 5 | 一案一份综合报告:含假设结论、实体关联图、每条结论的 provenance 引证 |
|
||||
|
||||
**Phase 3 的"LLM 决定深度"**(2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地):调度层从代码硬决策("max_rounds=N, converged→stop")转为 LLM 元 agent 驱动。
|
||||
|
||||
- 新 agent `InvestigationStrategist`(`agents/strategist.py`)每轮取一个动作:propose 1-3 lead,或 declare_investigation_complete
|
||||
- 4 个只读视图工具:`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status`(`tools/strategy.py`)让 LLM 看到调度信号
|
||||
- 2 个写入决策工具:`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
|
||||
- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
|
||||
- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
|
||||
|
||||
断连恢复、运行归档逻辑保留;`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
|
||||
|
||||
---
|
||||
|
||||
## 5. 数据模型变更汇总
|
||||
|
||||
| 节点/结构 | 变更 |
|
||||
|---|---|
|
||||
| `EvidenceSource` | **新增**一等节点(`src-*`) |
|
||||
| `ToolInvocation` | **新增**留痕记录(`inv-*`),随 graph 持久化 |
|
||||
| `Phenomenon` | + `source_id`;description 拆为 `verified_facts[]` + `interpretation`;澄清/移除语义含混的 `confidence`(默认 1.0),观测的可靠性由 grounding 表达 |
|
||||
| `Hypothesis` | + `prior_prob`、`log_odds`(累加量);`confidence` 改为派生值 |
|
||||
| `Entity` | + 类型化标识符集合;通过 `same_as` 边跨源连通 |
|
||||
| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS`);同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
|
||||
| Entity→Entity 边 | **新增** `same_as`(由 coref 假设背书,可逆) |
|
||||
| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`(strategist 注解) |
|
||||
| `InvestigationRound` | **新增**:strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
|
||||
|
||||
`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
|
||||
|
||||
---
|
||||
|
||||
## 6. 组件改动清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|---|---|
|
||||
| `case.py` | **新建**:`Case` / `EvidenceSource` / `SourceRegistry` |
|
||||
| `main.py` | 选源逻辑改为加载 `Case`;类型探测替代扩展名 glob |
|
||||
| `tool_registry.py` | 工具按 `source_id` 参数化;缓存键含 source;`_auto_categorize` 改可扩展;`ToolInvocationLog` |
|
||||
| `evidence_graph.py` | 数据模型变更(第 5 节);LR/对数几率置信度;写入网关 + grounding 校验 |
|
||||
| `base_agent.py` | RECORD 走网关;`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
|
||||
| `agent_factory.py` | `_AGENT_CLASSES` 扩充;源类型→agent 映射 |
|
||||
| `orchestrator.py` | Phase 1 逐源;Phase 4 跨源时区归一;Phase 5 综合报告 |
|
||||
| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py`;`communication.py` 泛化 |
|
||||
| `tools/` | 新增 `mobile_ios.py`(plist/sqlite/keychain)、`media.py`(OCR)、`archive.py`(解包) |
|
||||
| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`;新建 `case.yaml` 证据清单 |
|
||||
|
||||
---
|
||||
|
||||
## 7. 构建顺序(按依赖排序)
|
||||
|
||||
| 阶段 | 内容 | 依赖 | 价值 |
|
||||
|---|---|---|---|
|
||||
| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基;先只在 USB E01 上跑通验证不破坏现有逻辑 |
|
||||
| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心;可量化「零幻觉录入」 |
|
||||
| **S3** | 4.5 LR/对数几率置信度 | 独立(可与 S2 并行) | 修 P3;置信度可辩护 |
|
||||
| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
|
||||
| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联,复杂场景能力成型 |
|
||||
| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
|
||||
|
||||
S1+S2+S3 是「把系统改对」;S4-S6 是「把能力铺全」。建议严格按序——
|
||||
S1 不稳,后面全是空中楼阁。
|
||||
|
||||
---
|
||||
|
||||
## 8. 设计取舍与未决问题
|
||||
|
||||
1. **grounding 对自由文本的边界**:只硬核验 `verified_facts` 里的结构化原子,
|
||||
`interpretation` 不做逐字核验(诚实划界)。可加一个二级 lint:扫描
|
||||
interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
|
||||
2. **LR 标定表初值人定**:先用第 4.5 节的初值跑通;「从标注案例学习 LR」是后续工作。
|
||||
3. **安卓 userdata 加密**:能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
|
||||
4. **实体解析的破坏性 vs 可逆**:本设计选**可逆的 `same_as` 边**而非破坏性归并——
|
||||
牺牲一点查询效率换取完全可审计可回滚,符合原则 4。
|
||||
5. **报告粒度**:定为「一案一份综合报告」,内嵌每证据小节 + 跨源关联,
|
||||
而非每证据独立成篇。
|
||||
543
DESIGN_STRATEGIST.md
Normal file
543
DESIGN_STRATEGIST.md
Normal file
@@ -0,0 +1,543 @@
|
||||
# Strategist Loop —— Phase 3 信念驱动改造
|
||||
|
||||
> 这是 DESIGN.md 的补充设计文档,针对 §4.9 编排器 Phase 3 的具体重写。
|
||||
>
|
||||
> **触发动因**:2026-05-20 第一次全 6-source 实战(`runs/2026-05-20T20-15-04/`)
|
||||
> 暴露 Phase 3 不工作——8 条 pending leads 一个都没派发,因为
|
||||
> log-odds 通胀让所有 hypothesis 立即 converged。即使在「调和衰减」修复
|
||||
> log-odds 数学后(commit 在 `evidence_graph.py:update_hypothesis_confidence`),
|
||||
> Phase 3 在当前架构下仍然是「单轮触发、规则收敛」的机械流程——LLM
|
||||
> 在调度层完全没有发言权。本设计把 Phase 3 改为 LLM 驱动的探索循环。
|
||||
|
||||
---
|
||||
|
||||
## 0. 范围
|
||||
|
||||
### 做什么
|
||||
|
||||
把 `orchestrator.py:Phase 3` 从「单轮、规则触发」改造为「strategist-loop、信念驱动」:
|
||||
新增一个 `InvestigationStrategist` agent + 4 个决策视图工具 + 2 个决策动作工具
|
||||
+ 编排器循环改写。
|
||||
|
||||
### 不做什么
|
||||
|
||||
- 不改 Phase 1(per-source triage 保持现状)
|
||||
- 不改 Phase 2(HypothesisAgent 不动;strategist 可以**调用**它,但不替代)
|
||||
- 不改 Phase 4/5(timeline / report)
|
||||
- 不写专家级 per-source 检查清单(只在 `source_coverage` 工具里塞**软提示**清单)
|
||||
- 不引入新的图节点类型;leads 复用现有结构
|
||||
|
||||
### 保留的不变式
|
||||
|
||||
- DESIGN.md §4.3 grounding 网关,所有写入仍走它
|
||||
- DESIGN.md §4.5 log-odds + 调和衰减
|
||||
- DESIGN.md §4.4 verified_facts vs interpretation 划界
|
||||
- 断连恢复(`graph_state.json` 序列化兼容)
|
||||
|
||||
### 设计原则
|
||||
|
||||
1. **"LLM 提议,代码裁决" 上移到调度层**:DESIGN.md 第一原则现在只在事实层
|
||||
(grounding)兑现,调度层「该不该深入、深入哪里、何时停」目前是代码硬决策。
|
||||
本设计让 LLM 持有调度决策权。
|
||||
2. **应试能力存在但不被绑死**:系统的工具集和软提示清单覆盖应试场景所需的工件
|
||||
类别;但是否查某个工件、查到什么深度,由 strategist 看具体案件性质决定,
|
||||
不被预定义清单强制。
|
||||
3. **可解释、可审计**:每一轮 strategist 决策、动机、产出收益都被记入持久化的
|
||||
`InvestigationRound`,可事后复盘。
|
||||
|
||||
---
|
||||
|
||||
## 1. 数据模型变更
|
||||
|
||||
### 1.1 `Lead` 扩 4 字段
|
||||
|
||||
`evidence_graph.py:Lead` 现有 `(id, title, description, target_agent, source_id, status, …)`。
|
||||
新增:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class Lead:
|
||||
# ... existing fields
|
||||
proposed_by: str = "" # "strategist" | "filesystem" | ... — 提案 agent
|
||||
motivating_hypothesis: str = "" # hyp-id this lead is meant to corroborate/refute
|
||||
expected_evidence_type: str = "" # one of edge_types — 期望产出的边类型
|
||||
round_number: int = 0 # 哪一轮 strategist 产生
|
||||
```
|
||||
|
||||
`motivating_hypothesis` 是关键——它把 lead 和 hypothesis 显式挂钩,让事后能算
|
||||
"这条 lead 跑完到底有没有改变假设状态",即 strategist 的边际收益度量。
|
||||
|
||||
### 1.2 新增 `InvestigationRound` 节点
|
||||
|
||||
记录每一轮 strategist 的决策本身——provenance 也要可审计:
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class InvestigationRound:
|
||||
id: str # "round-001"
|
||||
round_number: int
|
||||
started_at: str
|
||||
completed_at: str = ""
|
||||
strategist_action: str = "" # "propose_leads" | "declare_complete"
|
||||
leads_proposed: list[str] = field(default_factory=list)
|
||||
leads_executed: list[str] = field(default_factory=list)
|
||||
hypothesis_status_snapshot_before: dict = field(default_factory=dict) # hyp_id → status
|
||||
hypothesis_status_snapshot_after: dict = field(default_factory=dict)
|
||||
new_phenomena_count: int = 0
|
||||
new_edges_count: int = 0
|
||||
decision_rationale: str = "" # strategist 自述
|
||||
```
|
||||
|
||||
随 graph 序列化(加进 `to_dict`/`from_dict`)。
|
||||
|
||||
---
|
||||
|
||||
## 2. 新工具
|
||||
|
||||
放在新文件 `tools/strategy.py`。按现有 `TOOL_CATALOG` 注册模式登记。
|
||||
|
||||
### 2.1 `graph_overview()` — 全局态势(只读)
|
||||
|
||||
**Signature**: `graph_overview() -> str`
|
||||
|
||||
**输出**(markdown,比 JSON 更易 LLM 解读):
|
||||
|
||||
```markdown
|
||||
# Investigation State
|
||||
|
||||
## Hypotheses (8)
|
||||
| id | title | L | conf | status | edges_in | distinct_sources | flipped_in_last_2_rounds |
|
||||
|----|-------|---|------|--------|----------|------------------|---------------------------|
|
||||
| hyp-83db8748 | Multi-Device Composite | +8.75 | 0.99 | supported | 23 | 1 | no |
|
||||
| hyp-daa7c704 | Multiple Identity Aliases | +9.21 | 0.99 | supported | 11 | 3 | no |
|
||||
| hyp-7fa9b13e | Sunny.zip contains timer_a | +2.08 | 0.99 | supported | 4 | 1 | yes (active→supported in R2) |
|
||||
| ...
|
||||
|
||||
## Sources (6)
|
||||
| id | type | phenomena | identities | last_touched_in_round |
|
||||
| src-usb-leung | disk_image | 8 | 1 | R1 |
|
||||
| ...
|
||||
|
||||
## Pending Leads (3)
|
||||
| id | from | targeting | for_hypothesis | reason |
|
||||
| lead-aaa | filesystem | src-ios-chan/Safari | hyp-83db8748 | Safari history likely contains device-switching evidence |
|
||||
```
|
||||
|
||||
**关键标注**:`distinct_sources` 一栏暴露了"这个假设只靠一个源支撑"——strategist
|
||||
看到 23 边都来自 android 源会自动判断"需要从别处独立证据"。
|
||||
|
||||
### 2.2 `source_coverage(source_id: str)` — 单源覆盖度(只读)
|
||||
|
||||
**Signature**: `source_coverage(source_id: str) -> str`
|
||||
|
||||
**实现**:扫 `graph.tool_invocations`,过滤 `source_id == 该源`,按工具名 + 主要 args
|
||||
分组。然后跟 `EXPECTED_ARTEFACTS[source_type]` 比对,未触达项打 ✗。
|
||||
|
||||
```python
|
||||
# tools/strategy.py
|
||||
EXPECTED_ARTEFACTS: dict[str, list[dict]] = {
|
||||
"disk_image+windows": [
|
||||
{"name": "filesystem layout", "detector": "fls|mmls", "value_for": "deleted files, hidden partitions"},
|
||||
{"name": "registry hives", "detector": "parse_registry_key", "value_for": "user activity, installed software"},
|
||||
{"name": "browser history", "detector": "list_directory@AppData/.../History", "value_for": "URL access, downloads"},
|
||||
{"name": "prefetch", "detector": "extract_file@Windows/Prefetch", "value_for": "program execution evidence"},
|
||||
# ...
|
||||
],
|
||||
"mobile_extraction": [
|
||||
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb", "value_for": "contacts"},
|
||||
{"name": "SMS messages", "detector": "sqlite_query@sms.db", "value_for": "messaging content"},
|
||||
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite", "value_for": "WhatsApp content"},
|
||||
{"name": "Call history", "detector": "sqlite_query@CallHistoryDB", "value_for": "call records"},
|
||||
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text@Bookmarks.plist", "value_for": "web browsing"},
|
||||
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite", "value_for": "photo metadata, EXIF, geolocation"},
|
||||
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3.sqlite|parse_keychain", "value_for": "Apple ID, services"},
|
||||
{"name": "App inventory", "detector": "list_directory@var/containers/Bundle/Application", "value_for": "installed apps"},
|
||||
],
|
||||
"disk_image+android": [...],
|
||||
"media_collection": [
|
||||
{"name": "OCR text", "detector": "ocr_image", "value_for": "screenshot text"},
|
||||
{"name": "EXIF metadata", "detector": "exif_image", "value_for": "device, timestamps, geolocation"},
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
**软提示语义**:output 末尾必带一句:
|
||||
|
||||
> Coverage hints are heuristics, not requirements. Skip an item if the case theory
|
||||
> makes it irrelevant. Investigate ✗ items only when they could materially affect
|
||||
> an active hypothesis.
|
||||
|
||||
这一句是**"应试能力存在但不被绑死"的关键**——LLM 看到 ✗ 不会盲投,会先看
|
||||
hypothesis 列表问"这个工件对当前任何 hypothesis 有意义吗"。
|
||||
|
||||
### 2.3 `marginal_yield(last_n_rounds: int = 2)` — 边际收益(只读)
|
||||
|
||||
**Signature**: `marginal_yield(last_n_rounds: int = 2) -> str`
|
||||
|
||||
**实现**:扫最近 N 个 `InvestigationRound`,统计:
|
||||
- 每轮新增 phenomena 数
|
||||
- 每轮新增 P→H 边数
|
||||
- 每轮 hypothesis status flips 数(active→supported / 反向)
|
||||
|
||||
**输出**:
|
||||
|
||||
```markdown
|
||||
# Marginal Yield (last 2 rounds)
|
||||
|
||||
| round | new_phenomena | new_edges | status_flips |
|
||||
| R3 | 5 | 7 | 1 |
|
||||
| R4 | 2 | 1 | 0 |
|
||||
|
||||
Trend: decelerating (R4 yield 33% of R3).
|
||||
Recommendation interpretation aid: yield trending to zero suggests diminishing
|
||||
returns; consider declare_complete after one more probe.
|
||||
```
|
||||
|
||||
最后一行是 LLM-friendly heuristic prose,不是强制信号。
|
||||
|
||||
### 2.4 `budget_status()` — 预算视图(只读)
|
||||
|
||||
**Signature**: `budget_status() -> str`
|
||||
|
||||
```markdown
|
||||
# Budget Status
|
||||
|
||||
| metric | used | cap | pct |
|
||||
| tool_calls | 1248 | 5000 | 25% |
|
||||
| strategist_rounds | 3 | 10 | 30% |
|
||||
| wall_clock_minutes | 142 | 360 | 39% |
|
||||
|
||||
Phase 1 used 89% of allocated. Phase 2 used 4%. Phase 3 (strategist) so far: 7%.
|
||||
```
|
||||
|
||||
预算从 config.yaml 读,新增字段见 §6。无预算配置时进 unbounded 模式(仅靠
|
||||
strategist 自宣 complete + hard safety cap)。
|
||||
|
||||
### 2.5 决策动作工具(写入)
|
||||
|
||||
注册到 strategist 的 `mandatory_record_tools`。Strategist 每轮必须 call 至少一个,
|
||||
否则 forced-retry 触发(复用现有机制)。
|
||||
|
||||
**`propose_lead(...)`**:
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "propose_lead",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"required": [
|
||||
"description", "target_agent",
|
||||
"motivating_hypothesis", "expected_evidence_type",
|
||||
],
|
||||
"properties": {
|
||||
"description": {
|
||||
"type": "string",
|
||||
"description": "1-2 sentence specific investigation request, including target source/artefact",
|
||||
},
|
||||
"target_agent": {
|
||||
"type": "string",
|
||||
"enum": ["filesystem","registry","communication","network","ios_artifact","android_artifact","media"],
|
||||
},
|
||||
"source_id": {"type": "string", "description": "which source to investigate"},
|
||||
"motivating_hypothesis": {
|
||||
"type": "string",
|
||||
"description": "hyp-id this lead is meant to corroborate or refute",
|
||||
},
|
||||
"expected_evidence_type": {
|
||||
"type": "string",
|
||||
"enum": ["direct_evidence","supports","contradicts","weakens","prerequisite_met","consequence_observed"],
|
||||
},
|
||||
"rationale": {"type": "string", "description": "why this fills a real gap"},
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**`declare_investigation_complete(...)`**:
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "declare_investigation_complete",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"required": ["reason"],
|
||||
"properties": {
|
||||
"reason": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"marginal_yield_zero",
|
||||
"budget_exhausted",
|
||||
"all_hypotheses_resolved",
|
||||
"coverage_saturated",
|
||||
"other",
|
||||
],
|
||||
},
|
||||
"rationale": {"type": "string"},
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Terminal tool —— 调用即结束循环(复用现有 `terminal_tools` 机制)。
|
||||
|
||||
---
|
||||
|
||||
## 3. `InvestigationStrategist` agent
|
||||
|
||||
新文件 `agents/strategist.py`,约 150 行。
|
||||
|
||||
```python
|
||||
class InvestigationStrategist(BaseAgent):
|
||||
name = "strategist"
|
||||
role = (
|
||||
"You are the investigation strategist. You do not run forensic tools yourself. "
|
||||
"Your job is to read the current evidence graph and decide ONE of:\n"
|
||||
" (a) propose 1-3 new investigation leads that would materially affect an active hypothesis, or\n"
|
||||
" (b) declare the investigation complete.\n"
|
||||
"\n"
|
||||
"Use graph_overview / source_coverage / marginal_yield / budget_status to ground your judgment. "
|
||||
"DO NOT propose a lead that just adds more same-direction evidence to an already-supported hypothesis "
|
||||
"(harmonic damping makes it ~useless). DO propose leads when:\n"
|
||||
" - A hypothesis is supported by edges from only ONE source — get cross-source corroboration.\n"
|
||||
" - A hypothesis is in the active band (0.2 < conf < 0.8) — it needs the deciding evidence.\n"
|
||||
" - A specific high-value artefact is uncovered on a source where the active hypotheses suggest it matters.\n"
|
||||
"\n"
|
||||
"Declare complete when marginal_yield is approaching zero AND no remaining active hypotheses have "
|
||||
"obvious investigation paths."
|
||||
)
|
||||
|
||||
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
|
||||
terminal_tools = ("declare_investigation_complete",)
|
||||
|
||||
def _register_graph_tools(self):
|
||||
# Read-only tools — strategist NEVER writes phenomena/edges directly.
|
||||
# All graph writes happen via the workers it dispatches.
|
||||
self._register_graph_read_tools()
|
||||
# No graph_write_tools.
|
||||
# Add strategy-specific tools:
|
||||
for tool_name in (
|
||||
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
|
||||
"propose_lead", "declare_investigation_complete",
|
||||
):
|
||||
td = TOOL_CATALOG[tool_name]
|
||||
self.register_tool(td.name, td.description, td.input_schema, td.executor)
|
||||
```
|
||||
|
||||
注册到 `agent_factory._AGENT_CLASSES["strategist"]`。
|
||||
|
||||
---
|
||||
|
||||
## 4. 编排器改造
|
||||
|
||||
### 4.1 删除/替换:现在的 Phase 3
|
||||
|
||||
`orchestrator.py:Phase 3` 当前逻辑(约 150 行):检查 leads → 派 worker →
|
||||
检查 converged → 退出。**删除**。
|
||||
|
||||
### 4.2 新 Phase 3:strategist loop
|
||||
|
||||
```python
|
||||
async def _phase3_strategist_loop(self, run_dir: Path) -> None:
|
||||
"""Belief-driven investigation: strategist proposes, workers execute, repeat."""
|
||||
_log("Phase 3: Strategist-Driven Investigation", event="phase")
|
||||
|
||||
strategist = self.factory.get_or_create_agent("strategist")
|
||||
max_rounds = self.config.get("budgets", {}).get("strategist_rounds_max", 10)
|
||||
|
||||
for round_num in range(1, max_rounds + 1):
|
||||
# 1. Record round start + snapshot
|
||||
rid = await self.graph.start_investigation_round(round_num)
|
||||
|
||||
# 2. Strategist run
|
||||
_log(f"Strategist Round {round_num}", event="phase")
|
||||
await strategist.run(
|
||||
f"Review the graph and decide the next investigation action. "
|
||||
f"This is round {round_num}/{max_rounds}. Budget used so far: see budget_status."
|
||||
)
|
||||
|
||||
# 3. Did strategist declare complete?
|
||||
if self.graph.is_round_terminal(rid):
|
||||
_log(f"Strategist declared complete at round {round_num}", event="progress")
|
||||
break
|
||||
|
||||
# 4. Collect new leads proposed this round
|
||||
new_leads = self.graph.leads_from_round(round_num)
|
||||
if not new_leads:
|
||||
_log(f"No leads proposed in round {round_num} — stopping", event="progress")
|
||||
break
|
||||
|
||||
# 5. Dispatch each lead
|
||||
for lead in new_leads:
|
||||
await self._execute_lead(lead, round_num)
|
||||
|
||||
# 6. Close round + record yield
|
||||
await self.graph.complete_investigation_round(rid)
|
||||
|
||||
# 7. Hard budget check
|
||||
if self._budget_exceeded():
|
||||
_log(f"Budget exhausted at round {round_num}", event="progress")
|
||||
break
|
||||
```
|
||||
|
||||
### 4.3 `_execute_lead` 复用现有 worker 派发逻辑
|
||||
|
||||
```python
|
||||
async def _execute_lead(self, lead: Lead, round_num: int) -> None:
|
||||
agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
|
||||
worker = self.factory.get_or_create_agent(agent_type)
|
||||
if worker is None:
|
||||
logger.warning(f"No worker for lead {lead.id}: {agent_type}")
|
||||
return
|
||||
|
||||
src = self.graph.case.get_source(lead.source_id) if lead.source_id else None
|
||||
if src:
|
||||
self.graph.set_active_source(src)
|
||||
|
||||
_log(
|
||||
f"Round {round_num} dispatching: {lead.description}",
|
||||
event="dispatch", agent=agent_type,
|
||||
)
|
||||
await worker.run(
|
||||
f"Investigate this specific lead from the strategist:\n\n"
|
||||
f"REQUEST: {lead.description}\n"
|
||||
f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis}\n"
|
||||
f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type}\n"
|
||||
f"RATIONALE: {lead.rationale}\n\n"
|
||||
f"After investigating, record findings via add_phenomenon AND link relevant phenomena "
|
||||
f"to {lead.motivating_hypothesis} via the appropriate edge_type."
|
||||
)
|
||||
lead.status = "completed"
|
||||
self.graph._auto_save()
|
||||
```
|
||||
|
||||
### 4.4 自动 hypothesis 重生成(可选,建议加)
|
||||
|
||||
新增 phenomena 可能产生**新假设**(不只是更新现有假设)。让 strategist 用
|
||||
`propose_lead(target_agent="hypothesis", description="re-examine recent phenomena for new hypotheses")`
|
||||
显式触发——这是 strategist 自决定的,不是定时触发。一致性优于自动定时。
|
||||
|
||||
---
|
||||
|
||||
## 5. 状态持久化
|
||||
|
||||
`graph_state.json` 新增顶层 key `investigation_rounds: list[InvestigationRound]`。
|
||||
`save_state` / `load_state` 处理。**断连恢复**时:
|
||||
|
||||
- 找最近一个未 completed 的 round → 视为该 round 失败
|
||||
- 从下一个 round 重新开始
|
||||
- 已完成 round 的 phenomena / edges 自然保留
|
||||
|
||||
---
|
||||
|
||||
## 6. 配置
|
||||
|
||||
`config.yaml` 新增:
|
||||
|
||||
```yaml
|
||||
strategist:
|
||||
enabled: true # false = 走老 Phase 3 逻辑(safety fallback)
|
||||
max_rounds: 10
|
||||
hard_stop_marginal_yield_zero_rounds: 3 # 连续 3 轮 yield=0 强制停
|
||||
|
||||
budgets:
|
||||
tool_calls_total: 5000
|
||||
wall_clock_minutes_max: 480
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. 测试策略
|
||||
|
||||
新文件 `tests/test_strategist.py` 或加入 `test_optimizations.py`。最少要测:
|
||||
|
||||
1. Strategist 调 `declare_complete` 时 loop 立即退出
|
||||
2. Strategist 调 `propose_lead` 时 lead 入 graph 且 round_number 正确
|
||||
3. Round snapshot 正确捕获 before/after status
|
||||
4. 预算耗尽时即使 strategist 还想继续也强制停
|
||||
5. 断连恢复:中途中断后重启从下一 round 开始
|
||||
6. `graph_overview` 输出包含 `distinct_sources` 标注
|
||||
7. `source_coverage` 对未触达项标 ✗
|
||||
8. `marginal_yield` 数字与 `confidence_log` 一致
|
||||
|
||||
不写 LLM 集成测试——strategist 行为通过 mock LLM 验证(已有这种模式见
|
||||
`test_forced_record_retry_fires_when_zero_phenomena`)。
|
||||
|
||||
---
|
||||
|
||||
## 8. 实施顺序
|
||||
|
||||
按依赖排(**每步独立 commit**——结构性改造,单点回滚关键):
|
||||
|
||||
| 步 | 内容 | 依赖 | 工作量估算 |
|
||||
|---|---|---|---|
|
||||
| 1 | `Lead` 加 4 字段 + `InvestigationRound` 数据类 + 序列化 | — | 60 行 + 测试 |
|
||||
| 2 | `graph_overview` / `source_coverage` / `marginal_yield` / `budget_status` 实现 | 1 | 250 行 + 测试 |
|
||||
| 3 | `propose_lead` / `declare_investigation_complete` 工具 | 1 | 80 行 + 测试 |
|
||||
| 4 | `InvestigationStrategist` agent class | 2, 3 | 120 行 + 测试 |
|
||||
| 5 | 编排器 Phase 3 重写 | 4 | 150 行(替换 ~50 行旧)+ 测试 |
|
||||
| 6 | config schema + 加载逻辑 | 5 | 30 行 |
|
||||
| 7 | 断连恢复处理 | 5 | 40 行 + 测试 |
|
||||
| 8 | 真实案件 smoke run(小规模:USB only) | 7 | 0 代码 |
|
||||
| 9 | 文档:DESIGN.md §4.9 改写 + 本文件归档 | 8 | 文档 |
|
||||
|
||||
总:~800 行新代码 + 测试 + 文档。
|
||||
|
||||
---
|
||||
|
||||
## 9. 风险 + 缓解
|
||||
|
||||
| 风险 | 缓解 |
|
||||
|---|---|
|
||||
| Strategist 太保守(永远 declare_complete) | 加 prompt 例子展示什么是"该深入的情况";测试时小样本验证 |
|
||||
| Strategist 太激进(每轮都 propose 7+ leads) | `propose_lead` 工具 schema 限制每轮最多 3-5 个;prompt 强调"重质不重量" |
|
||||
| 单 worker 跑不完 lead 导致预算雪崩 | worker 调用本身 max_iter 不变;strategist 预算独立 |
|
||||
| LLM 不理解 `distinct_sources` 这种暗示 | `graph_overview` 末尾加 1-2 句 plain-English 解读 "Hypothesis X has 23 edges but all from one source → cross-source corroboration would strengthen it" |
|
||||
| Phase 1 触发产生的 leads 被 strategist 忽略 | strategist prompt 明确"先处理已有 pending leads,再产新的" |
|
||||
| 死循环(strategist 反复产同样 lead) | Lead 表上加 `(motivating_hyp, expected_type, source_id)` 三元组去重 |
|
||||
| `EXPECTED_ARTEFACTS` 清单维护成本 | 故意保持"软提示"——清单不完整也不会破,只是某些深度需要更多 LLM 自觉 |
|
||||
|
||||
---
|
||||
|
||||
## 10. 开放问题
|
||||
|
||||
1. **InvestigationRound 该不该自己跑 hypothesis agent?**
|
||||
倾向 strategist 用 lead 显式触发(一致性更好),不做定时触发。
|
||||
|
||||
2. **预算超用怎么办——硬停 vs 软警告?**
|
||||
当前设计硬停;可加 "strategist 看到 budget < 10% 时只能 declare_complete"
|
||||
的 schema enforcement。
|
||||
|
||||
3. **跨 source 边的"独立性奖励"是否纳入 log-odds?**
|
||||
上次衰减用了 `1/k`,没区分跨源 vs 同源。如果要纳入,公式应改为
|
||||
`1/k_within_source × bonus_for_distinct_sources`。这是后续单独工程。
|
||||
|
||||
4. **Strategist 输出的 `rationale` 该不该走 grounding?**
|
||||
它不会写 phenomena,但 `rationale` 字段可能包含具体值
|
||||
("based on inv-12345...")。倾向不强制——这是元层判断,不是事实落地。
|
||||
|
||||
5. **现 Phase 3 的 `max_investigation_rounds` config 留还是删?**
|
||||
建议留作 `strategist.enabled=false` 时的 fallback 旋钮。
|
||||
|
||||
---
|
||||
|
||||
## 11. 与 DESIGN.md 的关系
|
||||
|
||||
本文档落地后,DESIGN.md 需要的对应更新:
|
||||
|
||||
- **§4.5**:补一段「同时也要看 log_odds 的**结构**——edges_in 数 / distinct_sources
|
||||
是 strategist 判断是否深入的关键信号,不只是 confidence 数值」
|
||||
- **§4.9 Phase 3**:表格内容从「leads 派发到源感知 agent」改为
|
||||
「strategist 循环:看图、提案、执行、复盘、停 / 续」
|
||||
- **§8**(设计取舍):新增第 6 条:「调度层 LLM 化的取舍——strategist 决定深度,
|
||||
但每轮预算受 `budgets.*` 硬限制;这是"LLM 提议、代码裁决"原则在调度层的兑现」
|
||||
|
||||
---
|
||||
|
||||
## 12. 备忘:本设计**不解决**的问题
|
||||
|
||||
- 应试题 8% 命中率的根因是**工具集不全**(无 vision、无 ZIP 暴力破解、无 VeraCrypt
|
||||
挂载、无 blockchain explorer),不是调度问题。strategist 让现有工具被用得更狠,
|
||||
但不会凭空多出工具。
|
||||
- LLM 编造 `invocation_id`(已修补,见 `feedback_grounding_pending` memory)和
|
||||
log-odds 通胀(已修补:调和衰减)是本设计的**前置依赖**,不在本设计范围内。
|
||||
- Per-edge-type 的更精细贝叶斯建模(如跨源独立性 bonus)是独立工程。
|
||||
@@ -24,12 +24,16 @@ def _load_agent_classes() -> None:
|
||||
"""Lazy-import agent classes to avoid circular imports."""
|
||||
if _AGENT_CLASSES:
|
||||
return
|
||||
from agents.android_artifact import AndroidArtifactAgent
|
||||
from agents.communication import CommunicationAgent
|
||||
from agents.filesystem import FileSystemAgent
|
||||
from agents.hypothesis import HypothesisAgent
|
||||
from agents.ios_artifact import IOSArtifactAgent
|
||||
from agents.media import MediaAgent
|
||||
from agents.network import NetworkAgent
|
||||
from agents.registry import RegistryAgent
|
||||
from agents.report import ReportAgent
|
||||
from agents.strategist import InvestigationStrategist
|
||||
from agents.timeline import TimelineAgent
|
||||
_AGENT_CLASSES["filesystem"] = FileSystemAgent
|
||||
_AGENT_CLASSES["registry"] = RegistryAgent
|
||||
@@ -38,6 +42,51 @@ def _load_agent_classes() -> None:
|
||||
_AGENT_CLASSES["timeline"] = TimelineAgent
|
||||
_AGENT_CLASSES["hypothesis"] = HypothesisAgent
|
||||
_AGENT_CLASSES["report"] = ReportAgent
|
||||
_AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
|
||||
_AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
|
||||
_AGENT_CLASSES["media"] = MediaAgent
|
||||
_AGENT_CLASSES["strategist"] = InvestigationStrategist
|
||||
|
||||
|
||||
# Triage agent per (source.type, platform). disk_image is ambiguous on its
|
||||
# own — both a Windows USB image and an Android raw dump are disk_image —
|
||||
# so the routing helper also looks at source.meta.platform when present.
|
||||
SOURCE_TYPE_AGENTS: dict[str, str] = {
|
||||
"disk_image": "filesystem", # default for unknown platform
|
||||
"mobile_extraction": "ios_artifact",
|
||||
"archive": "filesystem",
|
||||
"media_collection": "media",
|
||||
}
|
||||
|
||||
# Per-platform overrides for disk_image sources. Keys come from
|
||||
# source.meta.platform in case.yaml (lowercased).
|
||||
_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
|
||||
"windows": "filesystem",
|
||||
"linux": "filesystem",
|
||||
"android": "android_artifact",
|
||||
"ios": "ios_artifact",
|
||||
}
|
||||
|
||||
|
||||
def get_triage_agent_type(source) -> str:
|
||||
"""Pick the right Phase-1 agent for *source*.
|
||||
|
||||
Accepts either an :class:`EvidenceSource` or a raw source.type string
|
||||
(for back-compat with the S5 signature). Disk-image sources additionally
|
||||
consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
|
||||
both type=disk_image — get different agents.
|
||||
"""
|
||||
# Back-compat: accept a plain type string.
|
||||
if isinstance(source, str):
|
||||
return SOURCE_TYPE_AGENTS.get(source, "filesystem")
|
||||
|
||||
src_type = getattr(source, "type", "disk_image")
|
||||
if src_type == "disk_image":
|
||||
meta = getattr(source, "meta", {}) or {}
|
||||
platform = str(meta.get("platform", "")).lower()
|
||||
if platform in _DISK_IMAGE_PLATFORM_AGENTS:
|
||||
return _DISK_IMAGE_PLATFORM_AGENTS[platform]
|
||||
return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
58
agents/android_artifact.py
Normal file
58
agents/android_artifact.py
Normal file
@@ -0,0 +1,58 @@
|
||||
"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
|
||||
|
||||
DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
|
||||
its own analysable surface. Ext4-backed partitions (typically SYSTEM,
|
||||
USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
|
||||
partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
|
||||
``search_strings``. Userdata is the prize and is often FBE-encrypted on
|
||||
modern devices — the agent must check fsstat before assuming readability
|
||||
(see ``probe_android_partitions`` for the survey).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from base_agent import BaseAgent
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient
|
||||
from tool_registry import TOOL_CATALOG
|
||||
|
||||
|
||||
class AndroidArtifactAgent(BaseAgent):
|
||||
name = "android_artifact"
|
||||
role = (
|
||||
"Android forensic analyst. You navigate raw Android disk dumps "
|
||||
"(blk0_sda-style images) partition by partition. Workflow: call "
|
||||
"probe_android_partitions ONCE to map the disk; pick the partitions "
|
||||
"with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
|
||||
"EFS); for each, call set_active_partition(offset_from_512_sector_column) "
|
||||
"and then list_directory / extract_file / search_strings as usual. "
|
||||
"For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
|
||||
"to search_strings — they have no filesystem. If USERDATA shows "
|
||||
"fs_type=unknown it is almost certainly FBE-encrypted: record that "
|
||||
"as a negative finding (the absence IS evidence) and move on to "
|
||||
"what's reachable."
|
||||
)
|
||||
|
||||
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
|
||||
super().__init__(llm, graph)
|
||||
self._register_tools()
|
||||
|
||||
def _register_tools(self) -> None:
|
||||
tool_names = [
|
||||
# Android-specific
|
||||
"probe_android_partitions",
|
||||
"set_active_partition",
|
||||
# Reused TSK toolset — partition_offset comes from active_source
|
||||
"partition_info", "filesystem_info", "list_directory",
|
||||
"extract_file", "find_file", "search_strings",
|
||||
"count_deleted_files", "build_filesystem_timeline",
|
||||
# Generic parsers
|
||||
"read_text_file", "read_binary_preview", "search_text_file",
|
||||
"read_text_file_section", "list_extracted_dir", "find_files",
|
||||
# SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
|
||||
"sqlite_tables", "sqlite_query",
|
||||
]
|
||||
for name in tool_names:
|
||||
td = TOOL_CATALOG.get(name)
|
||||
if td:
|
||||
self.register_tool(td.name, td.description, td.input_schema, td.executor)
|
||||
49
agents/ios_artifact.py
Normal file
49
agents/ios_artifact.py
Normal file
@@ -0,0 +1,49 @@
|
||||
"""iOS Artifact Agent — analyses unpacked iOS extractions.
|
||||
|
||||
DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
|
||||
the system handles (alongside disk images and pcaps). This agent owns the
|
||||
iOS-specific toolset; the grounded ``add_phenomenon`` contract from
|
||||
BaseAgent applies unchanged — every fact must cite a tool invocation.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from base_agent import BaseAgent
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient
|
||||
from tool_registry import TOOL_CATALOG
|
||||
|
||||
|
||||
class IOSArtifactAgent(BaseAgent):
|
||||
name = "ios_artifact"
|
||||
role = (
|
||||
"iOS forensic analyst. You analyse unpacked iOS extractions — "
|
||||
"binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
|
||||
"AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
|
||||
"iDevice_info.txt summary — to extract device identity, accounts, "
|
||||
"messaging, contacts, and credential metadata. Domain-rooted iOS "
|
||||
"trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
|
||||
"are your map; navigate by path, not by inode."
|
||||
)
|
||||
|
||||
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
|
||||
super().__init__(llm, graph)
|
||||
self._register_tools()
|
||||
|
||||
def _register_tools(self) -> None:
|
||||
tool_names = [
|
||||
# navigation — find_files is the workhorse on 10k+-file iOS trees;
|
||||
# list_extracted_dir is for initial layout summary only.
|
||||
"list_extracted_dir", "find_files",
|
||||
"read_text_file", "read_text_file_section", "read_binary_preview",
|
||||
"search_text_file",
|
||||
# iOS-specific parsers
|
||||
"parse_plist",
|
||||
"sqlite_tables", "sqlite_query",
|
||||
"parse_ios_keychain",
|
||||
"read_idevice_info",
|
||||
]
|
||||
for name in tool_names:
|
||||
td = TOOL_CATALOG.get(name)
|
||||
if td:
|
||||
self.register_tool(td.name, td.description, td.input_schema, td.executor)
|
||||
52
agents/media.py
Normal file
52
agents/media.py
Normal file
@@ -0,0 +1,52 @@
|
||||
"""Media Agent — OCR-based analysis of screenshot/photo evidence.
|
||||
|
||||
DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
|
||||
evidence must go through tesseract first. The agent runs OCR, then
|
||||
records extracted strings — especially identifiers (wallet addresses,
|
||||
phone numbers, usernames) — via the grounded observe_identity gateway so
|
||||
they participate in cross-source coref the same way iOS keychain entries
|
||||
or Windows account names do.
|
||||
|
||||
If the OCR runtime is missing on the host, ocr_image returns an explicit
|
||||
install hint; the agent should record that as a negative finding ("no
|
||||
text extracted — tesseract not installed") rather than guessing.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from base_agent import BaseAgent
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient
|
||||
from tool_registry import TOOL_CATALOG
|
||||
|
||||
|
||||
class MediaAgent(BaseAgent):
|
||||
name = "media"
|
||||
role = (
|
||||
"Media / OCR forensic analyst. You analyse screenshots, photos, and "
|
||||
"scanned documents — any pixel-based evidence the LLM cannot read "
|
||||
"directly. Workflow: list_extracted_dir to enumerate images, "
|
||||
"ocr_image on each promising one, then add_phenomenon (with the "
|
||||
"OCR'd text as the verified_fact value) and observe_identity for "
|
||||
"any wallet addresses, phone numbers, email addresses, or "
|
||||
"usernames the text contains. If OCR fails because tesseract is "
|
||||
"missing, RECORD that as a negative finding instead of fabricating "
|
||||
"image content — the absence is a real fact about this run."
|
||||
)
|
||||
|
||||
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
|
||||
super().__init__(llm, graph)
|
||||
self._register_tools()
|
||||
|
||||
def _register_tools(self) -> None:
|
||||
tool_names = [
|
||||
"ocr_image",
|
||||
"list_extracted_dir", "find_files",
|
||||
"read_binary_preview",
|
||||
"read_text_file",
|
||||
"search_text_file",
|
||||
]
|
||||
for name in tool_names:
|
||||
td = TOOL_CATALOG.get(name)
|
||||
if td:
|
||||
self.register_tool(td.name, td.description, td.input_schema, td.executor)
|
||||
191
agents/report.py
191
agents/report.py
@@ -12,9 +12,20 @@ class ReportAgent(BaseAgent):
|
||||
role = (
|
||||
"Forensic report writer. You synthesize all findings from the investigation "
|
||||
"into a structured, professional forensic analysis report organized by hypotheses.\n\n"
|
||||
"Only include findings that have a source_tool attribution (marked VERIFIED). "
|
||||
"If evidence lacks source attribution, mark it as UNVERIFIED. "
|
||||
"Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence."
|
||||
"Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
|
||||
"TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
|
||||
"writing the report, render verified_facts as primary evidence with their "
|
||||
"invocation citations, and render interpretation as 'agent analysis' so the "
|
||||
"reader can tell ground truth from inference. Do NOT invent or fabricate any "
|
||||
"data, timestamps, or findings not present in the evidence.\n\n"
|
||||
"This is a cross-source case: phenomena come from multiple evidence "
|
||||
"sources, and entities discovered on different sources may refer to the "
|
||||
"same real-world actor. ALWAYS include:\n"
|
||||
" - 'Findings by Source' section sourced from get_phenomena_by_source\n"
|
||||
" - 'Actor Clusters' section sourced from get_actor_clusters (the "
|
||||
"cross-source attribution view — multi-source clusters answer "
|
||||
"'which findings on different devices belong to the same person')\n"
|
||||
" - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
|
||||
)
|
||||
# Calling save_report is BOTH the recording action and the completion
|
||||
# signal. tool_call_loop returns the moment save_report executes; the
|
||||
@@ -38,9 +49,12 @@ class ReportAgent(BaseAgent):
|
||||
f"Investigation state:\n{self.graph.stats_summary()}\n\n"
|
||||
f"Your task: {task}\n\n"
|
||||
f"WORKFLOW:\n"
|
||||
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info "
|
||||
f" to gather all the data needed for the report. Make these calls in parallel.\n"
|
||||
f"2. Assemble the complete markdown forensic report.\n"
|
||||
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
|
||||
f" get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
|
||||
f" and get_phenomena_by_source in parallel — these are the eight data\n"
|
||||
f" sources you assemble the report from.\n"
|
||||
f"2. Assemble the complete markdown forensic report. Cross-source\n"
|
||||
f" actor clusters and per-source breakdown are MANDATORY sections.\n"
|
||||
f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
|
||||
f" This single call is the completion signal — the run ENDS the moment it executes.\n"
|
||||
f" Do NOT call any read tools after this point; they will not run.\n"
|
||||
@@ -83,6 +97,45 @@ class ReportAgent(BaseAgent):
|
||||
executor=self._get_entities,
|
||||
)
|
||||
|
||||
self.register_tool(
|
||||
name="get_hypothesis_evidence_matrix",
|
||||
description=(
|
||||
"Render the hypothesis × evidence pivot as a markdown table. "
|
||||
"Columns: per edge_type counts, log_odds, confidence, status. "
|
||||
"Embed this directly in the report to show how each hypothesis "
|
||||
"stands relative to the others on a single screen."
|
||||
),
|
||||
input_schema={"type": "object", "properties": {}},
|
||||
executor=self._get_hypothesis_evidence_matrix,
|
||||
)
|
||||
|
||||
self.register_tool(
|
||||
name="get_actor_clusters",
|
||||
description=(
|
||||
"Render the cross-source actor clusters: each cluster is the "
|
||||
"set of Entity nodes the system currently treats as the same "
|
||||
"actor (via active same_as edges backed by coref hypotheses "
|
||||
"≥ 0.8). Includes the aggregated identifier evidence per "
|
||||
"cluster. Use this in the report's 'Entities / Actors' "
|
||||
"section so readers see who-is-who across devices, not just "
|
||||
"raw entity rows."
|
||||
),
|
||||
input_schema={"type": "object", "properties": {}},
|
||||
executor=self._get_actor_clusters,
|
||||
)
|
||||
|
||||
self.register_tool(
|
||||
name="get_phenomena_by_source",
|
||||
description=(
|
||||
"Group every phenomenon by its originating evidence source "
|
||||
"(source_id). Use this to drive the report's 'Findings by "
|
||||
"Source' section so each evidence item's per-device "
|
||||
"contribution is auditable."
|
||||
),
|
||||
input_schema={"type": "object", "properties": {}},
|
||||
executor=self._get_phenomena_by_source,
|
||||
)
|
||||
|
||||
self.register_tool(
|
||||
name="save_report",
|
||||
description="Save the final report to a file.",
|
||||
@@ -115,12 +168,24 @@ class ReportAgent(BaseAgent):
|
||||
items = [ph for ph in phenomena.values() if ph.category == cat]
|
||||
lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
|
||||
for ph in items:
|
||||
verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
|
||||
lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
|
||||
# Grounded = at least one verified fact AND a source_tool.
|
||||
grounded = bool(ph.verified_facts) and bool(ph.source_tool)
|
||||
marker = "GROUNDED" if grounded else (
|
||||
"TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
|
||||
)
|
||||
lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
|
||||
lines.append(f" Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
|
||||
if ph.timestamp:
|
||||
lines.append(f" Timestamp: {ph.timestamp}")
|
||||
lines.append(f" {ph.description[:500]}")
|
||||
if ph.verified_facts:
|
||||
lines.append(f" Verified facts ({len(ph.verified_facts)}):")
|
||||
for f in ph.verified_facts:
|
||||
lines.append(
|
||||
f" - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
|
||||
f"(cite: {f.get('invocation_id','?')})"
|
||||
)
|
||||
if ph.interpretation:
|
||||
lines.append(f" Analysis: {ph.interpretation[:500]}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _get_hypotheses_with_evidence(self) -> str:
|
||||
@@ -150,12 +215,87 @@ class ReportAgent(BaseAgent):
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _get_case_info(self) -> str:
|
||||
info = self.graph.case_info
|
||||
lines = ["=== Case Information ==="]
|
||||
for k, v in info.items():
|
||||
lines.append(f" {k}: {v}")
|
||||
lines.append(f" Image path: {self.graph.image_path}")
|
||||
lines.append(f" Partition offset: {self.graph.partition_offset}")
|
||||
case = self.graph.case
|
||||
if case is not None:
|
||||
lines.append(f" case_id: {case.case_id}")
|
||||
lines.append(f" name: {case.name}")
|
||||
for k, v in (case.meta or {}).items():
|
||||
lines.append(f" {k}: {v}")
|
||||
lines.append(f" sources: {len(case.sources)}")
|
||||
for s in case.sources:
|
||||
owner = f", owner={s.owner}" if s.owner else ""
|
||||
platform = s.meta.get("platform") if s.meta else None
|
||||
plat = f", platform={platform}" if platform else ""
|
||||
lines.append(
|
||||
f" - {s.id}: {s.label} "
|
||||
f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
|
||||
)
|
||||
else:
|
||||
# Legacy single-image fallback — surface whatever case_info dict
|
||||
# was passed in (e.g. the old CFReDS MD5 block).
|
||||
for k, v in (self.graph.case_info or {}).items():
|
||||
lines.append(f" {k}: {v}")
|
||||
lines.append(f" Image path: {self.graph.image_path}")
|
||||
lines.append(f" Partition offset: {self.graph.partition_offset}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _get_hypothesis_evidence_matrix(self) -> str:
|
||||
return self.graph.hypothesis_evidence_matrix_markdown()
|
||||
|
||||
async def _get_actor_clusters(self) -> str:
|
||||
clusters = self.graph.actor_clusters()
|
||||
if not clusters:
|
||||
return "(no entities recorded)"
|
||||
# Show multi-member clusters first — they're the cross-source links
|
||||
# the human reader most needs to see.
|
||||
clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
|
||||
lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
|
||||
for i, c in enumerate(clusters, 1):
|
||||
members = c["members"]
|
||||
label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
|
||||
lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
|
||||
for eid in members:
|
||||
ent = self.graph.entities.get(eid)
|
||||
if ent:
|
||||
lines.append(f" - {ent.summary()}")
|
||||
if c["identifiers"]:
|
||||
lines.append(" Aggregated identifiers:")
|
||||
for ident in c["identifiers"]:
|
||||
strong_tag = "strong" if ident.get("strong") else "weak"
|
||||
lines.append(
|
||||
f" [{strong_tag}] {ident.get('type')}={ident.get('value')} "
|
||||
f"(on {ident.get('on_entity')})"
|
||||
)
|
||||
if c["coref_hypotheses"]:
|
||||
lines.append(" Backing coref hypotheses (≥0.8 active):")
|
||||
for hid in c["coref_hypotheses"]:
|
||||
hyp = self.graph.hypotheses.get(hid)
|
||||
if hyp:
|
||||
lines.append(f" - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _get_phenomena_by_source(self) -> str:
|
||||
by_src: dict[str, list] = {}
|
||||
for ph in self.graph.phenomena.values():
|
||||
by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
|
||||
if not by_src:
|
||||
return "(no phenomena recorded)"
|
||||
# Resolve source labels via graph.case when possible.
|
||||
def _label(src_id: str) -> str:
|
||||
if self.graph.case:
|
||||
src = self.graph.case.get_source(src_id)
|
||||
if src:
|
||||
return f"{src_id} — {src.label} ({src.type})"
|
||||
return src_id
|
||||
|
||||
lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
|
||||
for src_id in sorted(by_src):
|
||||
phs = by_src[src_id]
|
||||
lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
|
||||
for ph in phs:
|
||||
grounded = "G" if ph.verified_facts and ph.source_tool else "·"
|
||||
lines.append(f" [{grounded}] {ph.summary()}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _get_entities(self) -> str:
|
||||
@@ -174,18 +314,27 @@ class ReportAgent(BaseAgent):
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _verify_phenomena(self) -> str:
|
||||
verified = []
|
||||
unverified = []
|
||||
grounded: list[str] = []
|
||||
tool_only: list[str] = []
|
||||
unverified: list[str] = []
|
||||
for ph in self.graph.phenomena.values():
|
||||
entry = f" [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
|
||||
if ph.source_tool:
|
||||
verified.append(entry)
|
||||
nf = len(ph.verified_facts)
|
||||
entry = (
|
||||
f" [{ph.category}] {ph.title} "
|
||||
f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
|
||||
)
|
||||
if ph.verified_facts and ph.source_tool:
|
||||
grounded.append(entry)
|
||||
elif ph.source_tool:
|
||||
tool_only.append(entry)
|
||||
else:
|
||||
unverified.append(entry)
|
||||
|
||||
lines = ["=== Phenomena Verification Report ==="]
|
||||
lines.append(f"\nVERIFIED ({len(verified)} — have source_tool):")
|
||||
lines.extend(verified)
|
||||
lines.append(f"\nGROUNDED ({len(grounded)} — facts + source_tool):")
|
||||
lines.extend(grounded)
|
||||
lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
|
||||
lines.extend(tool_only)
|
||||
lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
|
||||
lines.extend(unverified)
|
||||
return "\n".join(lines)
|
||||
|
||||
134
agents/strategist.py
Normal file
134
agents/strategist.py
Normal file
@@ -0,0 +1,134 @@
|
||||
"""InvestigationStrategist — the LLM that decides depth vs breadth.
|
||||
|
||||
DESIGN_STRATEGIST.md §3.
|
||||
|
||||
The strategist does NOT run forensic tools. Its job per round is exactly one
|
||||
decision: propose 1-3 leads that would move an active hypothesis, OR declare
|
||||
the investigation complete. It reads the graph through four read-only views
|
||||
(graph_overview / source_coverage / marginal_yield / budget_status) and
|
||||
expresses its decision through two write tools (propose_lead /
|
||||
declare_investigation_complete).
|
||||
|
||||
This is the smallest possible agent in the system — the entire point is that
|
||||
strategy decisions live in one agent so they're auditable and the rest of the
|
||||
codebase doesn't carry implicit depth/breadth policy.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
from base_agent import BaseAgent
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient
|
||||
from tool_registry import TOOL_CATALOG
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class InvestigationStrategist(BaseAgent):
|
||||
name = "strategist"
|
||||
role = (
|
||||
"Investigation strategist. You do not run forensic tools yourself. "
|
||||
"Each round you take ONE decision: propose 1-3 new investigation leads "
|
||||
"that would materially affect an active hypothesis, OR declare the "
|
||||
"investigation complete. Your judgment is grounded in the graph "
|
||||
"(hypotheses, sources, coverage, marginal yield, budget) — never in "
|
||||
"speculation."
|
||||
)
|
||||
# At least one of these must be called every round, otherwise BaseAgent's
|
||||
# forced RECORD retry kicks in and re-prompts the strategist to take a
|
||||
# documented decision.
|
||||
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
|
||||
# declare_complete is terminal — calling it short-circuits the tool loop,
|
||||
# which is what we want (strategist returns immediately on "done").
|
||||
terminal_tools = ("declare_investigation_complete",)
|
||||
|
||||
# Strategist-specific tools, plus the read-only graph queries inherited
|
||||
# from BaseAgent. NO graph write tools (no add_phenomenon / link_to_entity
|
||||
# / observe_identity); the strategist must NOT mutate evidence directly.
|
||||
_STRATEGY_TOOLS = (
|
||||
"graph_overview",
|
||||
"source_coverage",
|
||||
"marginal_yield",
|
||||
"budget_status",
|
||||
"propose_lead",
|
||||
"declare_investigation_complete",
|
||||
)
|
||||
|
||||
def _register_graph_tools(self) -> None:
|
||||
"""Strategist gets read-only graph queries + the six strategy tools.
|
||||
|
||||
It does NOT get write tools (no add_phenomenon, observe_identity,
|
||||
link_to_entity, add_temporal_edge). Every graph mutation must come
|
||||
from a dispatched worker, not from the planner.
|
||||
"""
|
||||
self._register_graph_read_tools()
|
||||
for tool_name in self._STRATEGY_TOOLS:
|
||||
td = TOOL_CATALOG.get(tool_name)
|
||||
if td is None:
|
||||
logger.warning(
|
||||
"Strategist could not find tool %s in TOOL_CATALOG — "
|
||||
"register_all_tools must run before agent instantiation.",
|
||||
tool_name,
|
||||
)
|
||||
continue
|
||||
self.register_tool(td.name, td.description, td.input_schema, td.executor)
|
||||
|
||||
def _build_system_prompt(self, task: str) -> str:
|
||||
"""Strategist-specific prompt. Replaces the BaseAgent default which
|
||||
walks an INVESTIGATE→RECORD→LINK workflow that is wrong for a
|
||||
planner agent.
|
||||
"""
|
||||
return (
|
||||
f"You are {self.name}, the investigation strategist.\n"
|
||||
f"Role: {self.role}\n\n"
|
||||
f"Your task: {task}\n\n"
|
||||
f"WORKFLOW (do this exactly):\n"
|
||||
f" 1. Call graph_overview FIRST. Look at: which hypotheses are\n"
|
||||
f" active (conf 0.2-0.8) vs already supported/refuted; which\n"
|
||||
f" ones have many edges but only 1 distinct_source; which had\n"
|
||||
f" a recent_flip vs none in two rounds.\n"
|
||||
f" 2. Call marginal_yield to see if the last rounds produced anything.\n"
|
||||
f" 3. Call budget_status to know your runway.\n"
|
||||
f" 4. For each candidate lead direction, call source_coverage on\n"
|
||||
f" the relevant source to see what's been touched.\n"
|
||||
f" 5. Take exactly ONE of these terminal actions:\n"
|
||||
f" (a) Call propose_lead 1-3 times for leads that would\n"
|
||||
f" materially move an active hypothesis. STOP after this.\n"
|
||||
f" (b) Call declare_investigation_complete with a specific\n"
|
||||
f" reason. STOP after this.\n"
|
||||
f"\n"
|
||||
f"DECISION CRITERIA — when to propose vs when to stop:\n"
|
||||
f" PROPOSE when:\n"
|
||||
f" - A hypothesis is supported only by ONE source — get\n"
|
||||
f" cross-source corroboration. Same-source repeats are\n"
|
||||
f" cheap (harmonic damping).\n"
|
||||
f" - A hypothesis is in the active band (0.2 < conf < 0.8) —\n"
|
||||
f" it needs the deciding evidence.\n"
|
||||
f" - A high-value artefact is ✗ on source_coverage AND an\n"
|
||||
f" active hypothesis depends on the kind of evidence that\n"
|
||||
f" artefact would produce.\n"
|
||||
f" STOP (declare_complete) when:\n"
|
||||
f" - marginal_yield shows zero across 2+ rounds.\n"
|
||||
f" - budget_status warns ≥90% on tool_calls or rounds.\n"
|
||||
f" - all active hypotheses are resolved (supported or refuted).\n"
|
||||
f" - coverage saturation: every ✗ on every source is irrelevant\n"
|
||||
f" to active hypotheses.\n"
|
||||
f"\n"
|
||||
f"HARD RULES:\n"
|
||||
f" - You CANNOT call investigation tools (list_directory,\n"
|
||||
f" sqlite_query, parse_registry_key, extract_file, etc.) — your\n"
|
||||
f" job is to direct workers, not to investigate yourself.\n"
|
||||
f" - You CANNOT call write tools (add_phenomenon, observe_identity,\n"
|
||||
f" link_to_entity, add_hypothesis, add_temporal_edge). All\n"
|
||||
f" evidence mutations come from the workers you dispatch.\n"
|
||||
f" - Every propose_lead MUST cite a real hyp-id from\n"
|
||||
f" graph_overview's table — fabricated ids will be rejected.\n"
|
||||
f" - Don't propose more than 3 leads in one round. Quality over\n"
|
||||
f" quantity — a 4th lead almost always means you're not really\n"
|
||||
f" sure what would move the graph.\n"
|
||||
f" - Don't re-propose a lead that's already pending. The system\n"
|
||||
f" deduplicates (motivating_hyp, expected_type, agent, source)\n"
|
||||
f" so duplicates silently no-op, but they waste your budget."
|
||||
)
|
||||
@@ -122,7 +122,15 @@ class TimelineAgent(BaseAgent):
|
||||
lines = []
|
||||
for ph in items:
|
||||
lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
|
||||
lines.append(f" {ph.description[:150]}")
|
||||
preview = ph.interpretation[:150] if ph.interpretation else ""
|
||||
if ph.verified_facts:
|
||||
fact_preview = ", ".join(
|
||||
f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
|
||||
for f in ph.verified_facts[:3]
|
||||
)
|
||||
preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
|
||||
if preview:
|
||||
lines.append(f" {preview}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _add_temporal_edge(
|
||||
|
||||
254
base_agent.py
254
base_agent.py
@@ -5,6 +5,7 @@ from __future__ import annotations
|
||||
import json
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
from typing import Any
|
||||
|
||||
from evidence_graph import EvidenceGraph
|
||||
@@ -36,7 +37,9 @@ class BaseAgent:
|
||||
# forced retry with an explicit "you forgot to record" instruction.
|
||||
# Subclasses override to declare their own recording responsibility
|
||||
# (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
|
||||
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
|
||||
# observe_identity (S5) counts as a recording too — it writes through the
|
||||
# same grounding gateway and produces an identity_observation phenomenon.
|
||||
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")
|
||||
|
||||
# Tools whose invocation ends the run immediately. After any terminal tool
|
||||
# is called, tool_call_loop returns with that tool's result text as
|
||||
@@ -110,8 +113,23 @@ class BaseAgent:
|
||||
f" Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
|
||||
f" Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
|
||||
f" Create add_lead for anything outside your expertise.\n\n"
|
||||
f"Phase B — RECORD PHENOMENA:\n"
|
||||
f" For EACH significant finding from Phase A, call add_phenomenon.\n"
|
||||
f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
|
||||
f" For EACH significant finding from Phase A, call add_phenomenon with:\n"
|
||||
f" * interpretation: your analysis — free text, NOT verified.\n"
|
||||
f" * verified_facts: one entry per concrete atom (path, timestamp,\n"
|
||||
f" inode, hash, identifier, count) you want recorded as truth.\n"
|
||||
f" Each entry MUST have:\n"
|
||||
f" - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
|
||||
f" - value: a VERBATIM substring from the tool output\n"
|
||||
f" - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
|
||||
f" header at the top of the tool result that produced this value\n"
|
||||
f" IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
|
||||
f" whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
|
||||
f" address, MAC, UDID, persistent nickname, or display name. Same\n"
|
||||
f" grounding contract: value must be verbatim in the cited tool\n"
|
||||
f" output. This is HOW cross-source attribution gets built — without\n"
|
||||
f" it, we can't tell whether the Apple ID in keychain belongs to the\n"
|
||||
f" same person as the Windows account on the USB.\n"
|
||||
f" Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
|
||||
f"Phase C — LINK ENTITIES:\n"
|
||||
f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
|
||||
@@ -125,20 +143,22 @@ class BaseAgent:
|
||||
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
|
||||
f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
|
||||
f"a registry key) and found NOTHING, that absence IS evidence — call "
|
||||
f"add_phenomenon with a 'No matches for X' title and the search scope in "
|
||||
f"raw_data. Negative findings constrain the hypothesis space and prevent "
|
||||
f"the next agent from wasting time re-searching.\n"
|
||||
f"add_phenomenon with a 'No matches for X' title, the search scope in "
|
||||
f"raw_data, and cite the search tool's invocation_id (verified_facts may "
|
||||
f"be empty for a true negative; the cited invocation in source_tool still "
|
||||
f"anchors it). Negative findings constrain the hypothesis space.\n"
|
||||
f"- If you stop without having called add_phenomenon at least once, the task "
|
||||
f"is FAILED and a forced retry will fire.\n"
|
||||
f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
|
||||
f"that produced each finding.\n\n"
|
||||
f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
|
||||
f"- ONLY record findings that appear VERBATIM in tool results you received\n"
|
||||
f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
|
||||
f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
|
||||
f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
|
||||
f"- Quote exact strings from tool output when recording evidence descriptions\n"
|
||||
f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
|
||||
f"is FAILED and a forced retry will fire.\n\n"
|
||||
f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
|
||||
f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
|
||||
f" is what you cite in each fact's invocation_id.\n"
|
||||
f"- fact.value must be a substring of the cited invocation's output.\n"
|
||||
f" Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
|
||||
f" anything else fabricated is REJECTED with a per-fact reason.\n"
|
||||
f"- On REJECTED: quote the literal text from the output (or drop the\n"
|
||||
f" fact), and put guesses / inferred paths / model names in\n"
|
||||
f" `interpretation` instead. Then call add_phenomenon again.\n"
|
||||
f"- You may cite ONLY invocations made within THIS task."
|
||||
)
|
||||
|
||||
async def run(self, task: str, lead_id: str | None = None) -> str:
|
||||
@@ -146,6 +166,11 @@ class BaseAgent:
|
||||
_log(task, event="agent_start", agent=self.name)
|
||||
self.graph.agent_status[self.name] = "running"
|
||||
self.graph._current_agent = self.name
|
||||
# Fresh task scope per agent run. Used by the grounding gateway to
|
||||
# check that facts in add_phenomenon cite invocations made *within
|
||||
# this run* — preventing the agent from forwarding stale IDs from
|
||||
# earlier work or another agent.
|
||||
self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
|
||||
self._current_lead_id = lead_id
|
||||
|
||||
self._register_graph_tools()
|
||||
@@ -203,12 +228,27 @@ class BaseAgent:
|
||||
f"what you already found. Then end."
|
||||
),
|
||||
})
|
||||
# Narrow the retry tool surface so the agent can't wander off
|
||||
# to investigate again — only RECORD and read-only graph
|
||||
# query tools survive. Each grounding-rejected call burns one
|
||||
# iteration, so the cap is 30 (not the original 10): a
|
||||
# Timeline agent writing ~10 temporal edges with one rejection
|
||||
# apiece needs ~20 turns under the rewritten gateway.
|
||||
retry_tool_names = set(registered_mandatory) | {
|
||||
"list_phenomena", "list_assets", "search_graph",
|
||||
"add_temporal_edge", "link_to_entity", "add_lead",
|
||||
"add_hypothesis", "save_report",
|
||||
}
|
||||
retry_tools = [
|
||||
td for td in self.get_tool_definitions()
|
||||
if td["name"] in retry_tool_names
|
||||
]
|
||||
final_text, _ = await self.llm.tool_call_loop(
|
||||
messages=conversation,
|
||||
tools=self.get_tool_definitions(),
|
||||
tools=retry_tools,
|
||||
tool_executor=self._executors,
|
||||
system=system,
|
||||
max_iterations=10,
|
||||
max_iterations=30,
|
||||
terminal_tools=self.terminal_tools,
|
||||
)
|
||||
|
||||
@@ -350,20 +390,67 @@ class BaseAgent:
|
||||
self.register_tool(
|
||||
name="add_phenomenon",
|
||||
description=(
|
||||
"Record a forensic finding (phenomenon) on the evidence graph. "
|
||||
"You MUST specify source_tool: the name of the tool call that produced this finding."
|
||||
"Record a forensic finding on the evidence graph. The finding is "
|
||||
"split into provenance-bound atoms (verified_facts) and free-form "
|
||||
"analysis (interpretation). Each fact MUST cite the invocation_id "
|
||||
"of a tool call you made in THIS task — the gateway checks every "
|
||||
"fact's value against that call's real output, byte-for-byte. "
|
||||
"Any fact that fails grounding causes the whole record to be "
|
||||
"rejected with a list of failures; fix the facts and call again."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"category": {"type": "string", "description": "Category of the finding."},
|
||||
"title": {"type": "string", "description": "Short title."},
|
||||
"description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
|
||||
"interpretation": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Free-form analysis text — your reasoning, why this "
|
||||
"matters, what it implies. NOT verified by the gateway. "
|
||||
"Rendered in reports as 'agent analysis', not truth."
|
||||
),
|
||||
},
|
||||
"verified_facts": {
|
||||
"type": "array",
|
||||
"description": (
|
||||
"Atoms you want preserved as ground truth. Each must "
|
||||
"appear verbatim in the cited tool output."
|
||||
),
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"type": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Kind of fact: path, timestamp, inode, "
|
||||
"hash, identifier, count, raw, ..."
|
||||
),
|
||||
},
|
||||
"value": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Verbatim substring from the cited tool "
|
||||
"output. The gateway does a literal "
|
||||
"string-in-string check — no paraphrasing."
|
||||
),
|
||||
},
|
||||
"invocation_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"ID from the '[invocation: inv-xxx]' header "
|
||||
"of the tool call that produced this value."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["type", "value", "invocation_id"],
|
||||
},
|
||||
},
|
||||
"raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
|
||||
"timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
|
||||
"source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
|
||||
},
|
||||
"required": ["category", "title", "description", "source_tool"],
|
||||
"required": ["category", "title", "source_tool"],
|
||||
},
|
||||
executor=self._add_phenomenon,
|
||||
)
|
||||
@@ -414,6 +501,67 @@ class BaseAgent:
|
||||
executor=self._link_to_entity,
|
||||
)
|
||||
|
||||
self.register_tool(
|
||||
name="observe_identity",
|
||||
description=(
|
||||
"Record a typed identifier (email / phone / Apple ID / IMEI / "
|
||||
"wallet address / nickname / display name / …) for an entity. "
|
||||
"Goes through the same grounding gateway as add_phenomenon — "
|
||||
"value MUST be a verbatim substring of the cited tool output. "
|
||||
"After attachment, the engine automatically proposes / "
|
||||
"strengthens / weakens cross-source coreference hypotheses "
|
||||
"between this entity and any others carrying the same or "
|
||||
"conflicting identifiers. This is how 'is the Apple ID in iOS "
|
||||
"keychain the same person as the Windows login name?' gets "
|
||||
"answered. Call this in ADDITION to add_phenomenon for "
|
||||
"identifier-bearing findings."
|
||||
),
|
||||
input_schema={
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
|
||||
"entity_type": {
|
||||
"type": "string",
|
||||
"enum": ["person", "program", "file", "host", "ip_address"],
|
||||
"description": "Kind of entity this identifier belongs to (usually 'person').",
|
||||
},
|
||||
"identifier_type": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"Strong (near-unique): email, phone_number, imei, "
|
||||
"imsi, apple_id, icloud_id, google_account, "
|
||||
"wallet_address, udid, mac_address, device_serial. "
|
||||
"Weak (free-form, may collide): nickname, "
|
||||
"display_name, username, screen_name."
|
||||
),
|
||||
},
|
||||
"value": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"The identifier value, quoted VERBATIM from the "
|
||||
"tool output you cite in invocation_id."
|
||||
),
|
||||
},
|
||||
"invocation_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"ID from the '[invocation: inv-xxx]' header of "
|
||||
"the tool call that surfaced this identifier."
|
||||
),
|
||||
},
|
||||
"source_tool": {
|
||||
"type": "string",
|
||||
"description": "Name of the tool that produced the identifier.",
|
||||
},
|
||||
},
|
||||
"required": [
|
||||
"entity_name", "entity_type", "identifier_type",
|
||||
"value", "invocation_id",
|
||||
],
|
||||
},
|
||||
executor=self._observe_identity,
|
||||
)
|
||||
|
||||
# ---- Tool executors -----------------------------------------------------
|
||||
|
||||
async def _list_phenomena(self, category: str | None = None) -> str:
|
||||
@@ -453,16 +601,29 @@ class BaseAgent:
|
||||
self,
|
||||
category: str,
|
||||
title: str,
|
||||
description: str,
|
||||
interpretation: str = "",
|
||||
verified_facts: list[dict] | None = None,
|
||||
raw_data: dict | None = None,
|
||||
timestamp: str | None = None,
|
||||
source_tool: str = "",
|
||||
# Back-compat: older prompts (and accidental LLM emissions) may pass
|
||||
# ``description``; treat it as ``interpretation`` rather than failing.
|
||||
description: str | None = None,
|
||||
) -> str:
|
||||
if description and not interpretation:
|
||||
interpretation = description
|
||||
# GroundingError propagates: llm_client._execute_single_tool turns
|
||||
# raised exceptions into "Error executing add_phenomenon: <msg>" tool
|
||||
# results the LLM sees, and _wrap_record_executor does NOT increment
|
||||
# the mandatory-record counter (the increment only runs after a
|
||||
# successful return), so the forced-retry mechanism still fires if
|
||||
# the agent never lands a grounded phenomenon.
|
||||
pid, merged = await self.graph.add_phenomenon(
|
||||
source_agent=self.name,
|
||||
category=category,
|
||||
title=title,
|
||||
description=description,
|
||||
interpretation=interpretation,
|
||||
verified_facts=verified_facts,
|
||||
raw_data=raw_data,
|
||||
timestamp=timestamp,
|
||||
source_tool=source_tool,
|
||||
@@ -508,6 +669,51 @@ class BaseAgent:
|
||||
status = "linked to existing" if existing else "created and linked"
|
||||
return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"
|
||||
|
||||
async def _observe_identity(
|
||||
self,
|
||||
entity_name: str,
|
||||
entity_type: str,
|
||||
identifier_type: str,
|
||||
value: str,
|
||||
invocation_id: str,
|
||||
source_tool: str = "",
|
||||
) -> str:
|
||||
# GroundingError / ValueError propagate to llm_client's per-tool
|
||||
# exception handler, which formats them back to the LLM. That keeps
|
||||
# the mandatory-record counter honest — only a successful return
|
||||
# triggers the increment in _wrap_record_executor.
|
||||
result = await self.graph.observe_identity(
|
||||
entity_name=entity_name,
|
||||
entity_type=entity_type,
|
||||
identifier_type=identifier_type,
|
||||
value=value,
|
||||
source_agent=self.name,
|
||||
source_tool=source_tool,
|
||||
invocation_id=invocation_id,
|
||||
)
|
||||
lines = [
|
||||
f"Identity observed: {identifier_type}={value} "
|
||||
f"on entity {result['entity_id']} ({entity_name})."
|
||||
]
|
||||
if result.get("new_identifier"):
|
||||
lines.append(
|
||||
f" Observation phenomenon: {result['phenomenon_id']}"
|
||||
)
|
||||
else:
|
||||
lines.append(" (identifier already recorded on this entity — idempotent)")
|
||||
for prop in result.get("coref_proposals", []):
|
||||
lines.append(
|
||||
f" → Coref candidate: {prop['other_entity_id']} via "
|
||||
f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
|
||||
f"hypothesis={prop['hypothesis_id']})"
|
||||
)
|
||||
for c in prop.get("conflicts", []):
|
||||
lines.append(
|
||||
f" ⚠ conflict on {c['type']}: "
|
||||
f"{c['new_value']} vs {c['other_value']}"
|
||||
)
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _list_assets(self, category: str | None = None) -> str:
|
||||
results = self.graph.list_assets(category)
|
||||
if not results:
|
||||
|
||||
41
case.example.yaml
Normal file
41
case.example.yaml
Normal file
@@ -0,0 +1,41 @@
|
||||
# MASForensics case definition — template
|
||||
#
|
||||
# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
|
||||
# exists in the working directory, `python main.py` loads it automatically;
|
||||
# otherwise main.py falls back to interactive single-image selection.
|
||||
#
|
||||
# A case is a set of evidence sources. Each source has:
|
||||
# id optional — auto-derived from label if omitted ("src-<slug>")
|
||||
# label human-readable name
|
||||
# type disk_image | mobile_extraction | archive | media_collection
|
||||
# access_mode image | tree (optional — defaults by type)
|
||||
# image = block device / disk image, navigated by Sleuth Kit
|
||||
# tree = mounted filesystem / unpacked extraction, path-based
|
||||
# owner optional — the person the source is associated with
|
||||
# path filesystem path (relative paths resolve against this file)
|
||||
# partition_offset image-mode only — sector offset of the partition to analyze
|
||||
# meta optional free-form notes
|
||||
#
|
||||
# NOTE: at the current refit stage only image-mode (disk) sources are
|
||||
# analysable; tree-mode sources are accepted but skipped.
|
||||
|
||||
case_id: example-case
|
||||
name: "Example forensic case"
|
||||
meta:
|
||||
notes: "free-form case-level metadata"
|
||||
|
||||
sources:
|
||||
- id: src-suspect-laptop
|
||||
label: "Suspect laptop disk image"
|
||||
type: disk_image
|
||||
access_mode: image
|
||||
owner: "John Doe"
|
||||
path: image/suspect_laptop.E01
|
||||
partition_offset: 0 # run `mmls <image>` to find the right offset
|
||||
|
||||
- id: src-suspect-phone
|
||||
label: "Suspect phone extraction"
|
||||
type: mobile_extraction
|
||||
access_mode: tree
|
||||
owner: "John Doe"
|
||||
path: image/suspect_phone.zip
|
||||
226
case.py
Normal file
226
case.py
Normal file
@@ -0,0 +1,226 @@
|
||||
"""Case and evidence-source model — the foundation for multi-evidence analysis.
|
||||
|
||||
A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
|
||||
has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
|
||||
that determines how forensic tools reach its contents:
|
||||
|
||||
- ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
|
||||
inode addressing (raw, E01, dd, ...).
|
||||
- ``"tree"`` — an already-mounted filesystem or unpacked extraction,
|
||||
navigated by ordinary filesystem paths.
|
||||
|
||||
This module is pure data model + loading. Partition probing and interactive
|
||||
selection live in ``main.py``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Recognised source types and access modes.
|
||||
SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
|
||||
ACCESS_MODES = {"image", "tree"}
|
||||
|
||||
# Disk-image file extensions for interactive discovery.
|
||||
# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
|
||||
# raw block-device dumps such as ``blk0_sda.bin``.
|
||||
DISK_IMAGE_EXTS = {
|
||||
".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
|
||||
}
|
||||
|
||||
# Default access mode per source type.
|
||||
_DEFAULT_ACCESS_MODE = {
|
||||
"disk_image": "image",
|
||||
"mobile_extraction": "tree",
|
||||
"archive": "tree",
|
||||
"media_collection": "tree",
|
||||
}
|
||||
|
||||
|
||||
def slugify(text: str) -> str:
|
||||
"""Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
|
||||
slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
|
||||
return slug or "src"
|
||||
|
||||
|
||||
@dataclass
|
||||
class EvidenceSource:
|
||||
"""One piece of evidence within a :class:`Case`."""
|
||||
|
||||
id: str # "src-<slug>"
|
||||
label: str # human-readable name
|
||||
type: str # one of SOURCE_TYPES
|
||||
path: str # filesystem path to the evidence
|
||||
access_mode: str # "image" | "tree"
|
||||
owner: str = "" # associated person, if known
|
||||
partition_offset: int = 0 # sector offset (image-mode sources only)
|
||||
meta: dict = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return asdict(self)
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, d: dict) -> EvidenceSource:
|
||||
"""Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
|
||||
known = set(cls.__dataclass_fields__)
|
||||
return cls(**{k: v for k, v in d.items() if k in known})
|
||||
|
||||
def summary(self) -> str:
|
||||
loc = (
|
||||
f"@{self.partition_offset}"
|
||||
if self.access_mode == "image" and self.partition_offset
|
||||
else ""
|
||||
)
|
||||
owner = f" owner={self.owner}" if self.owner else ""
|
||||
return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Case:
|
||||
"""A forensic case: a set of evidence sources plus metadata."""
|
||||
|
||||
case_id: str
|
||||
name: str
|
||||
sources: list[EvidenceSource] = field(default_factory=list)
|
||||
meta: dict = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"case_id": self.case_id,
|
||||
"name": self.name,
|
||||
"sources": [s.to_dict() for s in self.sources],
|
||||
"meta": dict(self.meta),
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, d: dict) -> Case:
|
||||
return cls(
|
||||
case_id=d.get("case_id", ""),
|
||||
name=d.get("name", ""),
|
||||
sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
|
||||
meta=d.get("meta", {}),
|
||||
)
|
||||
|
||||
def get_source(self, source_id: str) -> EvidenceSource | None:
|
||||
for s in self.sources:
|
||||
if s.id == source_id:
|
||||
return s
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# case.yaml loading
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
|
||||
"""Validate and normalise one source entry from case.yaml.
|
||||
|
||||
Missing ``id`` is derived from the label; missing ``access_mode`` defaults
|
||||
by type; relative paths are resolved against *base_dir* (the case file's
|
||||
directory).
|
||||
"""
|
||||
label = str(raw.get("label") or raw.get("id") or f"source-{index}")
|
||||
src_type = str(raw.get("type", "disk_image"))
|
||||
if src_type not in SOURCE_TYPES:
|
||||
logger.warning("Unknown source type %r for %r — treating as disk_image",
|
||||
src_type, label)
|
||||
src_type = "disk_image"
|
||||
|
||||
access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
|
||||
if access_mode not in ACCESS_MODES:
|
||||
logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
|
||||
access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
|
||||
|
||||
src_id = str(raw.get("id") or f"src-{slugify(label)}")
|
||||
if not src_id.startswith("src-"):
|
||||
src_id = f"src-{slugify(src_id)}"
|
||||
|
||||
raw_path = str(raw.get("path", "")).strip()
|
||||
path = raw_path
|
||||
if raw_path:
|
||||
p = Path(raw_path).expanduser()
|
||||
if not p.is_absolute():
|
||||
p = (base_dir / p)
|
||||
path = str(p)
|
||||
|
||||
return EvidenceSource(
|
||||
id=src_id,
|
||||
label=label,
|
||||
type=src_type,
|
||||
path=path,
|
||||
access_mode=access_mode,
|
||||
owner=str(raw.get("owner", "")),
|
||||
partition_offset=int(raw.get("partition_offset", 0) or 0),
|
||||
meta=dict(raw.get("meta", {})),
|
||||
)
|
||||
|
||||
|
||||
def build_case(data: dict, base_dir: Path | None = None) -> Case:
|
||||
"""Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
|
||||
base_dir = base_dir or Path.cwd()
|
||||
sources: list[EvidenceSource] = []
|
||||
seen_ids: set[str] = set()
|
||||
for i, raw in enumerate(data.get("sources", []) or []):
|
||||
if not isinstance(raw, dict):
|
||||
logger.warning("Skipping malformed source entry #%d", i)
|
||||
continue
|
||||
src = _build_source(raw, base_dir, i)
|
||||
if src.id in seen_ids:
|
||||
src.id = f"{src.id}-{i}"
|
||||
seen_ids.add(src.id)
|
||||
if not src.path:
|
||||
logger.warning("Source %r has no path — keeping but it is not analysable",
|
||||
src.label)
|
||||
sources.append(src)
|
||||
|
||||
return Case(
|
||||
case_id=str(data.get("case_id", "case")),
|
||||
name=str(data.get("name", "Untitled case")),
|
||||
sources=sources,
|
||||
meta=dict(data.get("meta", {})),
|
||||
)
|
||||
|
||||
|
||||
def load_case(path: str | Path = "case.yaml") -> Case | None:
|
||||
"""Load a :class:`Case` from a case.yaml file. Returns None if absent."""
|
||||
case_path = Path(path)
|
||||
if not case_path.exists():
|
||||
return None
|
||||
import yaml
|
||||
|
||||
try:
|
||||
data = yaml.safe_load(case_path.read_text()) or {}
|
||||
except Exception as e:
|
||||
logger.error("Failed to parse %s: %s", case_path, e)
|
||||
return None
|
||||
if not isinstance(data, dict):
|
||||
logger.error("%s is not a YAML mapping", case_path)
|
||||
return None
|
||||
|
||||
case = build_case(data, base_dir=case_path.resolve().parent)
|
||||
logger.info("Loaded case %r with %d source(s) from %s",
|
||||
case.name, len(case.sources), case_path)
|
||||
return case
|
||||
|
||||
|
||||
def single_source_case(
|
||||
image_path: str,
|
||||
partition_offset: int = 0,
|
||||
label: str | None = None,
|
||||
) -> Case:
|
||||
"""Wrap a single disk image as a one-source Case (interactive fallback)."""
|
||||
name = label or Path(image_path).name
|
||||
src = EvidenceSource(
|
||||
id=f"src-{slugify(Path(image_path).stem)}",
|
||||
label=name,
|
||||
type="disk_image",
|
||||
path=image_path,
|
||||
access_mode="image",
|
||||
partition_offset=partition_offset,
|
||||
)
|
||||
return Case(case_id="adhoc", name=name, sources=[src])
|
||||
71
config.example.yaml
Normal file
71
config.example.yaml
Normal file
@@ -0,0 +1,71 @@
|
||||
# MASForensics Configuration — template.
|
||||
#
|
||||
# Copy this file to `config.yaml` and fill in your API key. config.yaml is
|
||||
# git-ignored so secrets don't land in commits. The two files share schema;
|
||||
# only this template is tracked.
|
||||
|
||||
agent:
|
||||
base_url: "https://api.deepseek.com"
|
||||
api_key: "YOUR-API-KEY-HERE"
|
||||
model: "deepseek-v4-pro"
|
||||
max_tokens: 16384
|
||||
reasoning_effort: "high" # DeepSeek/o1-style reasoning depth; omit to disable
|
||||
thinking_enabled: true # DeepSeek extra_body.thinking switch
|
||||
|
||||
# Maximum rounds of hypothesis-directed investigation (Phase 3).
|
||||
# Only consulted when strategist.enabled is false (legacy fallback path).
|
||||
max_investigation_rounds: 1
|
||||
|
||||
# Phase 3 strategist loop (DESIGN_STRATEGIST.md). When enabled, the
|
||||
# InvestigationStrategist agent decides each round whether to propose new
|
||||
# leads or declare the investigation complete. When disabled, the legacy
|
||||
# fixed-round investigation loop runs instead.
|
||||
strategist:
|
||||
enabled: true
|
||||
max_rounds: 10
|
||||
# Safety net: if the strategist keeps proposing leads but yield (new
|
||||
# phenomena + edges + status flips) is zero for this many consecutive
|
||||
# rounds, the orchestrator force-stops Phase 3 regardless.
|
||||
hard_stop_marginal_yield_zero_rounds: 3
|
||||
|
||||
# Hard caps that bound the whole run. The strategist's budget_status tool
|
||||
# reads these to pace its proposals; the orchestrator also enforces them
|
||||
# as hard stops (DESIGN_STRATEGIST.md §4.2 step 7). Comment out any cap
|
||||
# to make it unbounded.
|
||||
budgets:
|
||||
tool_calls_total: 5000
|
||||
strategist_rounds_max: 10
|
||||
wall_clock_minutes_max: 480
|
||||
|
||||
# Optional: override the per-edge-type log₁₀(LR) calibration table.
|
||||
# Confidence updates accumulate these in odds space (additive, order-
|
||||
# independent), then map back to probability via sigmoid. Single edge
|
||||
# magnitudes: ≥ +0.602 lifts confidence above the 0.8 supported threshold,
|
||||
# ≤ −0.602 drops it below the 0.2 refuted threshold.
|
||||
# If omitted, evidence_graph._DEFAULT_LOG_LR is used.
|
||||
# hypothesis_log_lr:
|
||||
# direct_evidence: 2.0
|
||||
# supports: 1.0
|
||||
# consequence_observed: 1.0
|
||||
# prerequisite_met: 0.5
|
||||
# weakens: -0.5
|
||||
# contradicts: -2.0
|
||||
|
||||
# Optional: manually specify initial hypotheses. If omitted, the
|
||||
# HypothesisAgent auto-generates them from Phase 1 findings.
|
||||
# hypotheses:
|
||||
# - title: "..."
|
||||
# description: "..."
|
||||
|
||||
# Investigation areas — LLM-derived from active hypotheses after Phase 2.
|
||||
# Each entry below acts as a MANUAL OVERRIDE: it is seeded into the graph
|
||||
# before the LLM derives areas, so manual entries always survive (slug-based
|
||||
# dedupe; LLM only augments keyword/tool lists, never overwrites).
|
||||
#
|
||||
# investigation_areas:
|
||||
# - area: shutdown_time
|
||||
# description: "Last recorded shutdown time"
|
||||
# agent: registry
|
||||
# priority: 3
|
||||
# keywords: [shutdown, last shutdown]
|
||||
# tools: [get_shutdown_time]
|
||||
1412
evidence_graph.py
1412
evidence_graph.py
File diff suppressed because it is too large
Load Diff
@@ -142,6 +142,14 @@ READ_ONLY_TOOLS: set[str] = {
|
||||
# Parser reads
|
||||
"read_text_file", "read_binary_preview", "search_text_file",
|
||||
"read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
|
||||
"find_files",
|
||||
# iOS plugin reads (S4)
|
||||
"parse_plist", "sqlite_tables", "sqlite_query",
|
||||
"parse_ios_keychain", "read_idevice_info",
|
||||
# Android + media reads (S6) — set_active_partition is NOT read-only.
|
||||
"probe_android_partitions", "ocr_image",
|
||||
# Strategist view tools (DESIGN_STRATEGIST.md §2) — pure renders.
|
||||
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
|
||||
}
|
||||
|
||||
|
||||
@@ -503,7 +511,7 @@ class LLMClient:
|
||||
tools: list[dict],
|
||||
tool_executor: dict[str, Any],
|
||||
system: str | None = None,
|
||||
max_iterations: int = 40,
|
||||
max_iterations: int = 60,
|
||||
terminal_tools: tuple[str, ...] = (),
|
||||
) -> tuple[str, list[dict]]:
|
||||
"""Run a tool-calling loop using OpenAI-native tool calls.
|
||||
|
||||
162
main.py
162
main.py
@@ -15,17 +15,21 @@ from pathlib import Path
|
||||
import yaml
|
||||
|
||||
from agent_factory import AgentFactory
|
||||
from case import (
|
||||
DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
|
||||
)
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient
|
||||
from log_config import setup_logging
|
||||
from orchestrator import AnalysisAborted, Orchestrator
|
||||
from tool_registry import register_all_tools
|
||||
from tools.archive import unzip_archive_sync
|
||||
|
||||
RUNS_DIR = Path("runs")
|
||||
IMAGE_DIR = Path("image")
|
||||
|
||||
# Common forensic image extensions (only first segment / single-file formats)
|
||||
_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
|
||||
# Persistent unpack cache for tree-mode sources (zip extractions). Lives
|
||||
# at project root so multiple runs can reuse the same unpacked tree.
|
||||
SOURCE_CACHE_DIR = Path(".cache/sources")
|
||||
|
||||
|
||||
def load_config(path: str = "config.yaml") -> dict:
|
||||
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
|
||||
"""Find forensic disk image files under *search_dir*."""
|
||||
images: set[Path] = set()
|
||||
for glob in _IMAGE_GLOBS:
|
||||
images.update(search_dir.glob(glob))
|
||||
return sorted(images)
|
||||
"""Find forensic disk image files under *search_dir* (case-insensitive ext)."""
|
||||
if not search_dir.is_dir():
|
||||
return []
|
||||
return sorted(
|
||||
p for p in search_dir.iterdir()
|
||||
if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
|
||||
)
|
||||
|
||||
|
||||
def _parse_mmls(output: str) -> list[dict]:
|
||||
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
|
||||
images = _discover_images(image_dir)
|
||||
if not images:
|
||||
print(f"No disk images found in {image_dir}/")
|
||||
print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
|
||||
print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
|
||||
sys.exit(1)
|
||||
|
||||
if len(images) == 1:
|
||||
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
|
||||
print("Invalid choice.")
|
||||
|
||||
|
||||
def resolve_case() -> Case:
|
||||
"""Resolve the Case to analyze.
|
||||
|
||||
Priority: an explicit case file given as a CLI argument, then ./case.yaml
|
||||
in the working directory, then legacy interactive single-image selection.
|
||||
"""
|
||||
# 1. Explicit case file passed on the command line
|
||||
if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
|
||||
case = load_case(sys.argv[1])
|
||||
if case is None:
|
||||
print(f"Error: could not load case file {sys.argv[1]}")
|
||||
sys.exit(1)
|
||||
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
|
||||
return case
|
||||
|
||||
# 2. ./case.yaml in the working directory
|
||||
case = load_case()
|
||||
if case is not None:
|
||||
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
|
||||
return case
|
||||
|
||||
# 3. Legacy interactive single-image selection
|
||||
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
|
||||
image_path, partition_offset = select_image_interactive(cli_dir)
|
||||
return single_source_case(image_path, partition_offset)
|
||||
|
||||
|
||||
def _is_analysable(src: EvidenceSource) -> bool:
|
||||
"""A source is analysable when it has a path AND its mode has tooling.
|
||||
|
||||
S4 lights up tree-mode iOS extractions; image-mode disks were already
|
||||
supported. Media-collection (screenshots) remain skipped until S6.
|
||||
"""
|
||||
if not src.path:
|
||||
return False
|
||||
if src.access_mode == "image":
|
||||
return True
|
||||
if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def list_analysable_sources(case: Case) -> list[EvidenceSource]:
|
||||
"""Return every analysable source in the case (orchestrator iterates them).
|
||||
|
||||
Pre-S6 main.py used to force-choose one source here; the multi-source
|
||||
orchestrator (Phase 1 per-source triage) now consumes the full list.
|
||||
Skipped sources are still reported for visibility.
|
||||
"""
|
||||
analysable = [s for s in case.sources if _is_analysable(s)]
|
||||
skipped = [s for s in case.sources if not _is_analysable(s)]
|
||||
if skipped:
|
||||
print(
|
||||
f"Note: {len(skipped)} source(s) not analysable in this build: "
|
||||
+ ", ".join(f"{s.label} ({s.type})" for s in skipped)
|
||||
)
|
||||
if not analysable:
|
||||
print("No analysable sources in this case.")
|
||||
sys.exit(1)
|
||||
print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
|
||||
for s in analysable:
|
||||
print(f" - {s.summary()}")
|
||||
return analysable
|
||||
|
||||
|
||||
def prepare_source(src: EvidenceSource) -> EvidenceSource:
|
||||
"""Materialise a tree-mode source for analysis.
|
||||
|
||||
Mobile / archive sources arrive as .zip files. We unpack once into a
|
||||
project-level cache (``.cache/sources/<src.id>/``) and rewrite
|
||||
``src.path`` to point at the unpacked directory. Idempotent — a
|
||||
second run with the cache present is a no-op (unzip_archive_sync
|
||||
skips files that already exist with the matching size).
|
||||
|
||||
Disk-image and already-tree sources pass through unchanged.
|
||||
"""
|
||||
if src.access_mode != "tree":
|
||||
return src
|
||||
p = Path(src.path)
|
||||
if p.is_dir():
|
||||
return src # already a directory, nothing to do
|
||||
if not p.is_file():
|
||||
print(f"Warning: source path {src.path} does not exist; leaving as-is.")
|
||||
return src
|
||||
if p.suffix.lower() != ".zip":
|
||||
# Other archive types (tar, 7z, ...) — not handled yet.
|
||||
print(f"Warning: tree-mode source {src.id} is not a .zip "
|
||||
f"({p.suffix}); leaving as-is.")
|
||||
return src
|
||||
|
||||
dest = SOURCE_CACHE_DIR / src.id
|
||||
dest.mkdir(parents=True, exist_ok=True)
|
||||
# Password-protected zips (e.g. CTF artefacts) carry their key in
|
||||
# case.yaml's meta.password — never logged, never persisted.
|
||||
password = (src.meta or {}).get("password")
|
||||
pw_note = " (password from meta)" if password else ""
|
||||
print(f"Unpacking {p.name} → {dest}{pw_note} (idempotent) ...")
|
||||
result = unzip_archive_sync(str(p), str(dest), password=password)
|
||||
first_line = result.split("\n", 1)[0]
|
||||
print(" " + first_line)
|
||||
if first_line.startswith("Error:"):
|
||||
# Surface the multi-line guidance from _do_extract verbatim.
|
||||
for extra in result.split("\n")[1:]:
|
||||
print(" " + extra)
|
||||
print(f" Source {src.id} stays unanalysable until this is resolved.")
|
||||
# Leave src.path unchanged so the source remains marked unanalysable.
|
||||
return src
|
||||
src.path = str(dest)
|
||||
src.access_mode = "tree"
|
||||
return src
|
||||
|
||||
|
||||
def find_resumable_run() -> Path | None:
|
||||
"""Find the most recent incomplete run with a saved graph state."""
|
||||
if not RUNS_DIR.exists():
|
||||
@@ -225,22 +343,30 @@ async def async_main() -> None:
|
||||
|
||||
# Initialize evidence graph
|
||||
if graph is None:
|
||||
# CLI arg takes priority, otherwise interactive prompt
|
||||
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
|
||||
image_path, partition_offset = select_image_interactive(cli_dir)
|
||||
case = resolve_case()
|
||||
# case_info derived from THIS case's meta (case.yaml), not from
|
||||
# config.yaml's legacy `cfreds_hacking_case` block. Without this,
|
||||
# the old CFReDS evidence MD5s would be embedded in reports for
|
||||
# every subsequent unrelated case.
|
||||
graph = EvidenceGraph(
|
||||
case_info=config.get("cfreds_hacking_case", {}),
|
||||
case_info=dict(case.meta or {}),
|
||||
persist_path=run_dir / "graph_state.json",
|
||||
edge_weights=config.get("hypothesis_edge_weights"),
|
||||
edge_log_lr=config.get("hypothesis_log_lr"),
|
||||
)
|
||||
graph.image_path = image_path
|
||||
graph.partition_offset = partition_offset
|
||||
graph.case = case
|
||||
graph.extracted_dir = str(run_dir / "extracted")
|
||||
analysable = list_analysable_sources(case)
|
||||
# Prepare every analysable source up front (unzip tree-mode zips,
|
||||
# etc.). Idempotent on cache hits — second run is a no-op.
|
||||
prepared = [prepare_source(s) for s in analysable]
|
||||
# Seed the active source so tools that resolve lazily have a target
|
||||
# before Phase 1 begins; the orchestrator resets it per source.
|
||||
graph.set_active_source(prepared[0])
|
||||
else:
|
||||
graph._persist_path = run_dir / "graph_state.json"
|
||||
|
||||
# Register all tools with bound image path
|
||||
register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
|
||||
# Register all tools — they resolve the active evidence source at call time
|
||||
register_all_tools(graph)
|
||||
|
||||
# Create agent factory
|
||||
factory = AgentFactory(llm, graph)
|
||||
|
||||
465
orchestrator.py
465
orchestrator.py
@@ -10,7 +10,7 @@ import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
from agent_factory import AgentFactory
|
||||
from agent_factory import AgentFactory, get_triage_agent_type
|
||||
from evidence_graph import EvidenceGraph
|
||||
from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads
|
||||
from tool_registry import TOOL_CATALOG
|
||||
@@ -119,6 +119,11 @@ class Orchestrator:
|
||||
self._failure_count = 0
|
||||
self._max_failures = 3
|
||||
self._start_time = datetime.now()
|
||||
# Make budgets visible to strategy tools via the graph object. The
|
||||
# budget_status tool reads graph.budgets / graph.run_start_monotonic
|
||||
# directly so it does not need a back-reference to the orchestrator.
|
||||
self.graph.budgets = dict(self.config.get("budgets", {}) or {})
|
||||
self.graph.run_start_monotonic = time.monotonic()
|
||||
|
||||
def _resolve_agent_type(self, agent_type: str) -> str:
|
||||
return AGENT_ALIASES.get(agent_type, agent_type)
|
||||
@@ -195,6 +200,298 @@ class Orchestrator:
|
||||
lead.context["retry"] = True
|
||||
await self._dispatch_leads_parallel(failed)
|
||||
|
||||
# ---- Phase 3: strategist loop (DESIGN_STRATEGIST.md §4) ------------------
|
||||
|
||||
def _budget_exceeded(self) -> bool:
|
||||
"""Hard budget enforcement, complementing strategist self-throttling.
|
||||
|
||||
Any of these triggers an immediate Phase 3 exit even if the
|
||||
strategist hasn't called declare_investigation_complete. Each cap
|
||||
is optional — leave it out of config to make it unbounded.
|
||||
"""
|
||||
b = self.graph.budgets or {}
|
||||
tc_cap = b.get("tool_calls_total")
|
||||
if tc_cap and len(self.graph.tool_invocations) >= tc_cap:
|
||||
return True
|
||||
wc_cap = b.get("wall_clock_minutes_max")
|
||||
if wc_cap and self.graph.run_start_monotonic is not None:
|
||||
elapsed_min = (time.monotonic() - self.graph.run_start_monotonic) / 60.0
|
||||
if elapsed_min >= wc_cap:
|
||||
return True
|
||||
return False
|
||||
|
||||
async def _execute_strategist_lead(self, lead, round_num: int) -> None:
|
||||
"""Dispatch one strategist-proposed lead to its target worker.
|
||||
|
||||
Unlike the legacy bulk dispatcher this runs leads serially so each
|
||||
worker run reads a graph that includes prior leads' findings — the
|
||||
strategist's next round can see the cumulative effect of this round.
|
||||
"""
|
||||
agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
|
||||
worker = self.factory.get_or_create_agent(agent_type)
|
||||
if worker is None:
|
||||
logger.warning(
|
||||
"No worker registered for lead %s: target_agent=%s",
|
||||
lead.id, agent_type,
|
||||
)
|
||||
lead.status = "failed"
|
||||
lead.context["failure_reason"] = f"no worker for agent type '{agent_type}'"
|
||||
self.graph._auto_save()
|
||||
return
|
||||
|
||||
source_id = (lead.context or {}).get("source_id", "")
|
||||
if source_id and self.graph.case is not None:
|
||||
src = self.graph.case.get_source(source_id)
|
||||
if src:
|
||||
self.graph.set_active_source(src)
|
||||
|
||||
rationale = (lead.context or {}).get("rationale", "")
|
||||
worker_task = (
|
||||
f"Investigate this specific lead from the strategist:\n\n"
|
||||
f"REQUEST: {lead.description}\n"
|
||||
f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis or '(unspecified)'}\n"
|
||||
f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type or '(unspecified)'}\n"
|
||||
f"RATIONALE: {rationale or '(unspecified)'}\n\n"
|
||||
f"After investigating, record findings via add_phenomenon AND "
|
||||
f"link relevant phenomena to "
|
||||
f"{lead.motivating_hypothesis or 'the motivating hypothesis'} via the "
|
||||
f"appropriate edge_type. If your investigation produces no relevant "
|
||||
f"finding, record that as a negative phenomenon so the strategist "
|
||||
f"can see the gap was probed."
|
||||
)
|
||||
_log(
|
||||
f"Round {round_num} dispatching: {lead.description[:80]}",
|
||||
event="dispatch", agent=agent_type, lead=lead.id,
|
||||
)
|
||||
lead.status = "assigned"
|
||||
self.graph._auto_save()
|
||||
try:
|
||||
await worker.run(worker_task, lead_id=lead.id)
|
||||
lead.status = "completed"
|
||||
except Exception as e:
|
||||
logger.error("Strategist lead %s failed: %s", lead.id, e, exc_info=True)
|
||||
lead.status = "failed"
|
||||
lead.context["failure_reason"] = str(e)
|
||||
finally:
|
||||
self.graph._auto_save()
|
||||
|
||||
async def _resume_strategist_state(self) -> int:
|
||||
"""Repair any open InvestigationRound after a resume and return the
|
||||
next round number to use.
|
||||
|
||||
An "open" round is one with ``started_at`` set but ``completed_at``
|
||||
empty — interrupted before its complete step. Mark it as completed
|
||||
with action=interrupted_resume so the run history is self-describing,
|
||||
and mark any leads still in the "assigned" state from that round as
|
||||
"failed" so the gap-analysis / retry paths can re-process them.
|
||||
Returns the round number the strategist loop should start from
|
||||
(1 + the highest existing round_number).
|
||||
"""
|
||||
if not self.graph.investigation_rounds:
|
||||
return 1
|
||||
highest = max(r.round_number for r in self.graph.investigation_rounds)
|
||||
last = self.graph.latest_round()
|
||||
if last is not None and not last.completed_at:
|
||||
assigned_in_round = [
|
||||
l for l in self.graph.leads
|
||||
if l.round_number == last.round_number
|
||||
and l.status == "assigned"
|
||||
]
|
||||
for lead in assigned_in_round:
|
||||
lead.status = "failed"
|
||||
lead.context["failure_reason"] = "interrupted before complete"
|
||||
await self.graph.complete_investigation_round(
|
||||
last.id, strategist_action="interrupted_resume",
|
||||
decision_rationale=(
|
||||
f"resume repair: this round was interrupted before "
|
||||
f"completion; {len(assigned_in_round)} assigned leads "
|
||||
f"have been re-marked as failed."
|
||||
),
|
||||
)
|
||||
logger.info(
|
||||
"Strategist resume: repaired open round R%d (closed %d assigned leads)",
|
||||
last.round_number, len(assigned_in_round),
|
||||
)
|
||||
return highest + 1
|
||||
|
||||
async def _phase3_strategist_loop(self) -> None:
|
||||
"""Belief-driven investigation: strategist proposes, workers execute,
|
||||
repeat. Replaces the legacy fixed-round investigation loop.
|
||||
"""
|
||||
_log("Phase 3: Strategist-Driven Investigation", event="phase")
|
||||
strategist_cfg = self.config.get("strategist", {}) or {}
|
||||
max_rounds = int(strategist_cfg.get("max_rounds", 10))
|
||||
zero_yield_cap = int(strategist_cfg.get("hard_stop_marginal_yield_zero_rounds", 3))
|
||||
|
||||
strategist = self.factory.get_or_create_agent("strategist")
|
||||
if strategist is None:
|
||||
logger.error(
|
||||
"InvestigationStrategist agent not registered — falling back "
|
||||
"to legacy Phase 3 loop. Check agent_factory._AGENT_CLASSES."
|
||||
)
|
||||
await self._phase3_legacy_loop()
|
||||
return
|
||||
|
||||
# Resume support: if we're restarting after an interruption, repair
|
||||
# any half-open round and pick up at the next number.
|
||||
start_round = await self._resume_strategist_state()
|
||||
if start_round > 1:
|
||||
_log(
|
||||
f"Resuming strategist loop at round {start_round} "
|
||||
f"(history: {len(self.graph.investigation_rounds)} prior rounds)",
|
||||
event="progress",
|
||||
)
|
||||
|
||||
zero_yield_streak = 0
|
||||
|
||||
for round_num in range(start_round, max_rounds + 1):
|
||||
# Reset per-round flags so a previous round's declare_complete
|
||||
# doesn't leak across iterations (defensive — strategist also
|
||||
# only sets True, never False).
|
||||
self.graph.strategist_complete_requested = False
|
||||
self.graph.current_strategist_round = round_num
|
||||
rid = await self.graph.start_investigation_round(round_num)
|
||||
_log(
|
||||
f"Strategist Round {round_num}/{max_rounds}", event="phase",
|
||||
round=round_num,
|
||||
)
|
||||
t0 = time.monotonic()
|
||||
|
||||
try:
|
||||
await strategist.run(
|
||||
f"Review the current investigation state and decide the "
|
||||
f"next action. This is round {round_num}/{max_rounds}. "
|
||||
f"Use graph_overview / marginal_yield / budget_status / "
|
||||
f"source_coverage to ground your decision, then call "
|
||||
f"propose_lead 1-3 times OR declare_investigation_complete."
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error("Strategist round %d failed: %s", round_num, e, exc_info=True)
|
||||
await self.graph.complete_investigation_round(
|
||||
rid, decision_rationale=f"strategist crashed: {e}",
|
||||
)
|
||||
break
|
||||
|
||||
# Strategist declared complete → no leads execute, exit loop.
|
||||
if self.graph.strategist_complete_requested:
|
||||
_log(
|
||||
f"Strategist declared complete at round {round_num}",
|
||||
event="progress", elapsed=time.monotonic() - t0,
|
||||
)
|
||||
await self.graph.complete_investigation_round(
|
||||
rid, strategist_action="declare_complete",
|
||||
decision_rationale="strategist declare_investigation_complete",
|
||||
)
|
||||
break
|
||||
|
||||
# Collect this round's leads (proposed_by=strategist + matching round).
|
||||
new_leads = [
|
||||
l for l in self.graph.leads
|
||||
if l.round_number == round_num
|
||||
and l.proposed_by == "strategist"
|
||||
and l.status == "pending"
|
||||
]
|
||||
if not new_leads:
|
||||
_log(
|
||||
f"Round {round_num}: strategist proposed no new leads — exiting loop",
|
||||
event="progress", elapsed=time.monotonic() - t0,
|
||||
)
|
||||
await self.graph.complete_investigation_round(
|
||||
rid, strategist_action="no_leads",
|
||||
decision_rationale="strategist proposed no new leads",
|
||||
)
|
||||
break
|
||||
|
||||
# Dispatch each lead to its worker.
|
||||
for lead in new_leads:
|
||||
await self._execute_strategist_lead(lead, round_num)
|
||||
|
||||
# After workers run, judge any new phenomena against existing
|
||||
# hypotheses (so confidence updates happen before the next round
|
||||
# of strategist reasoning).
|
||||
if self.graph.phenomena and self.graph.hypotheses:
|
||||
await self._judge_new_phenomena()
|
||||
|
||||
closed = await self.graph.complete_investigation_round(
|
||||
rid, strategist_action="propose_leads",
|
||||
leads_executed=[l.id for l in new_leads],
|
||||
)
|
||||
|
||||
# Show round outcome.
|
||||
for h in self.graph.hypotheses.values():
|
||||
_log(f" {h.summary()}", event="hypothesis")
|
||||
_log(
|
||||
_progress_summary(self.graph) + f" (yield: +{closed.new_phenomena_count}ph, +{closed.new_edges_count}edges, {closed.status_flips}flips)",
|
||||
event="progress", elapsed=time.monotonic() - t0,
|
||||
)
|
||||
|
||||
# Marginal-yield hard stop. Distinct from strategist self-throttle:
|
||||
# if the strategist insists on continuing through repeated dry
|
||||
# rounds, force-stop. This protects against an over-eager
|
||||
# strategist + a confused worker that produces no edges.
|
||||
yield_total = (
|
||||
closed.new_phenomena_count
|
||||
+ closed.new_edges_count
|
||||
+ closed.status_flips
|
||||
)
|
||||
if yield_total == 0:
|
||||
zero_yield_streak += 1
|
||||
if zero_yield_streak >= zero_yield_cap:
|
||||
_log(
|
||||
f"Hard stop: {zero_yield_streak} consecutive "
|
||||
f"zero-yield rounds (cap {zero_yield_cap})",
|
||||
event="progress",
|
||||
)
|
||||
break
|
||||
else:
|
||||
zero_yield_streak = 0
|
||||
|
||||
if self._budget_exceeded():
|
||||
_log(
|
||||
f"Budget exhausted after round {round_num} — exiting Phase 3",
|
||||
event="progress",
|
||||
)
|
||||
break
|
||||
else:
|
||||
_log(
|
||||
f"Strategist max_rounds={max_rounds} reached", event="progress",
|
||||
)
|
||||
|
||||
# Always reset the round counter on exit so subsequent runs don't
|
||||
# inherit the last value.
|
||||
self.graph.current_strategist_round = 0
|
||||
|
||||
async def _phase3_legacy_loop(self) -> None:
|
||||
"""Legacy fixed-round Phase 3 — preserved for fallback / regression.
|
||||
|
||||
Engaged when config has ``strategist.enabled: false`` or when the
|
||||
strategist agent class is somehow not registered. Behaves identically
|
||||
to the pre-DESIGN_STRATEGIST orchestrator: bounded iteration,
|
||||
hypothesis-derived leads, parallel dispatch, gap analysis.
|
||||
"""
|
||||
max_rounds = self.config.get("max_investigation_rounds", 5)
|
||||
for round_num in range(max_rounds):
|
||||
_log(f"Phase 3: Investigation Round {round_num}", event="phase")
|
||||
t0 = time.monotonic()
|
||||
|
||||
if self.graph.hypotheses_converged():
|
||||
_log("All hypotheses converged — stopping", event="progress")
|
||||
break
|
||||
|
||||
await self._generate_hypothesis_leads()
|
||||
|
||||
pending = await self.graph.get_pending_leads()
|
||||
if not pending:
|
||||
_log("No pending leads — round complete", event="progress")
|
||||
break
|
||||
|
||||
await self._dispatch_leads_parallel(pending)
|
||||
await self._judge_new_phenomena()
|
||||
|
||||
for h in self.graph.hypotheses.values():
|
||||
_log(f" {h.summary()}", event="hypothesis")
|
||||
_log(_progress_summary(self.graph), event="progress", elapsed=time.monotonic() - t0)
|
||||
|
||||
# ---- Hypothesis generation -----------------------------------------------
|
||||
|
||||
async def _generate_hypotheses_manual(self, hypotheses_config: list[dict]) -> None:
|
||||
@@ -518,7 +815,7 @@ class Orchestrator:
|
||||
if not unlinked:
|
||||
return
|
||||
|
||||
valid_types = list(self.graph.edge_weights.keys())
|
||||
valid_types = list(self.graph.edge_log_lr.keys())
|
||||
|
||||
hyp_section = "\n".join(
|
||||
f" [{h.id}] {h.title}: {h.description}" for h in active
|
||||
@@ -551,7 +848,7 @@ class Orchestrator:
|
||||
if (
|
||||
hyp_id in self.graph.hypotheses
|
||||
and ph_id in self.graph.phenomena
|
||||
and edge_type in self.graph.edge_weights
|
||||
and edge_type in self.graph.edge_log_lr
|
||||
):
|
||||
await self.graph.update_hypothesis_confidence(
|
||||
hyp_id=hyp_id,
|
||||
@@ -593,7 +890,7 @@ class Orchestrator:
|
||||
ph_id = j.get("phenomenon_id", "")
|
||||
edge_type = j.get("edge_type", "")
|
||||
reason = j.get("reason", "")
|
||||
if ph_id in self.graph.phenomena and edge_type in self.graph.edge_weights:
|
||||
if ph_id in self.graph.phenomena and edge_type in self.graph.edge_log_lr:
|
||||
await self.graph.update_hypothesis_confidence(
|
||||
hyp_id=hyp.id,
|
||||
phenomenon_id=ph_id,
|
||||
@@ -618,7 +915,10 @@ class Orchestrator:
|
||||
phenomena (deterministic — the canonical tool was actually called).
|
||||
"""
|
||||
evidence_text = " ".join(
|
||||
f"{ph.category} {ph.title} {ph.description}".lower()
|
||||
(
|
||||
f"{ph.category} {ph.title} {ph.interpretation} "
|
||||
+ " ".join(str(f.get("value", "")) for f in ph.verified_facts)
|
||||
).lower()
|
||||
for ph in self.graph.phenomena.values()
|
||||
)
|
||||
used_tools: set[str] = {
|
||||
@@ -747,28 +1047,103 @@ class Orchestrator:
|
||||
|
||||
# ---- Main pipeline -------------------------------------------------------
|
||||
|
||||
# ---- Phase 1 helpers (multi-source triage) -------------------------------
|
||||
|
||||
@staticmethod
|
||||
def _is_analysable(src) -> bool:
|
||||
"""Mirror of main._is_analysable so the orchestrator doesn't depend
|
||||
on main.py's import. Disk-image sources need a path; tree-mode
|
||||
sources are analysable when they're mobile_extraction or archive.
|
||||
"""
|
||||
if not getattr(src, "path", ""):
|
||||
return False
|
||||
if src.access_mode == "image":
|
||||
return True
|
||||
if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
|
||||
return True
|
||||
# media_collection is analysable too once a MediaAgent is registered.
|
||||
if src.type == "media_collection":
|
||||
return True
|
||||
return False
|
||||
|
||||
def _sources_to_triage(self) -> list:
|
||||
"""Pick every analysable source in the case (or fall back to the
|
||||
single active_source for the legacy single-image path).
|
||||
"""
|
||||
case = self.graph.case
|
||||
if case is None or not case.sources:
|
||||
return [self.graph.active_source] if self.graph.active_source else []
|
||||
return [s for s in case.sources if self._is_analysable(s)]
|
||||
|
||||
async def _phase1_triage_source(self, src) -> tuple[int, int]:
|
||||
"""Run the right triage agent on one source. Returns (Δphenomena, Δleads)."""
|
||||
ph_before = len(self.graph.phenomena)
|
||||
leads_before = sum(1 for l in self.graph.leads if l.status == "pending")
|
||||
|
||||
self.graph.set_active_source(src)
|
||||
agent_type = get_triage_agent_type(src)
|
||||
agent = self.factory.get_or_create_agent(agent_type)
|
||||
if agent is None:
|
||||
logger.warning(
|
||||
"No agent registered for type %s — skipping source %s",
|
||||
agent_type, src.id,
|
||||
)
|
||||
return 0, 0
|
||||
|
||||
_log(
|
||||
f"Phase 1 triage: {src.id} ({src.label}) → {agent_type}",
|
||||
event="dispatch", agent=agent_type, source=src.id,
|
||||
)
|
||||
try:
|
||||
await agent.run(
|
||||
f"Perform an initial Phase-1 triage of source {src.id} "
|
||||
f"({src.label}, type={src.type}). Survey the source's "
|
||||
f"structure, identify the most interesting artefacts, and "
|
||||
f"record significant findings via add_phenomenon. Call "
|
||||
f"observe_identity for any concrete identifiers (email, "
|
||||
f"phone, Apple ID, IMEI, wallet address, persistent "
|
||||
f"username) you encounter — that's how this finding will "
|
||||
f"link across the other sources in the case. Create "
|
||||
f"add_lead for follow-up that's outside your scope."
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error("Phase 1 agent [%s] failed on %s: %s", agent_type, src.id, e)
|
||||
|
||||
return (
|
||||
len(self.graph.phenomena) - ph_before,
|
||||
sum(1 for l in self.graph.leads if l.status == "pending") - leads_before,
|
||||
)
|
||||
|
||||
async def run(self, resume_phase: int = 1) -> str:
|
||||
"""Run the 5-phase hypothesis-driven forensic analysis pipeline."""
|
||||
_log(f"Phase 1: Filesystem Survey (image: {Path(self.graph.image_path).name})", event="phase")
|
||||
sources = self._sources_to_triage()
|
||||
_log(
|
||||
f"Phase 1: per-source triage ({len(sources)} source(s))",
|
||||
event="phase",
|
||||
)
|
||||
|
||||
report = ""
|
||||
try:
|
||||
# Phase 1: Initial filesystem survey
|
||||
# Phase 1: Initial per-source triage (S6 multi-source).
|
||||
# Runs sequentially so each agent gets its own task_id scope —
|
||||
# the grounding gateway requires that, and shared graph state
|
||||
# (active_source, partition_offset) would race under parallel
|
||||
# dispatch anyway.
|
||||
if resume_phase <= 1:
|
||||
t0 = time.monotonic()
|
||||
ph_before = len(self.graph.phenomena)
|
||||
fs_agent = self.factory.get_or_create_agent("filesystem")
|
||||
if fs_agent:
|
||||
await fs_agent.run(
|
||||
"Perform an initial survey of this disk image. "
|
||||
"Examine the partition table, filesystem type, and root directory structure. "
|
||||
"List key user directories and identify interesting files (documents, emails, "
|
||||
"chat logs, installed programs, registry hives). "
|
||||
"Create leads for other agents based on what you find."
|
||||
for src in sources:
|
||||
new_ph, new_leads = await self._phase1_triage_source(src)
|
||||
_log(
|
||||
f" {src.id}: +{new_ph} phenomena, +{new_leads} leads",
|
||||
event="progress", source=src.id,
|
||||
)
|
||||
new_ph = len(self.graph.phenomena) - ph_before
|
||||
new_leads = sum(1 for l in self.graph.leads if l.status == "pending")
|
||||
_log(f"+{new_ph} phenomena, +{new_leads} leads", event="progress", elapsed=time.monotonic() - t0)
|
||||
total_ph = len(self.graph.phenomena) - ph_before
|
||||
total_leads = sum(1 for l in self.graph.leads if l.status == "pending")
|
||||
_log(
|
||||
f"Phase 1 total: +{total_ph} phenomena, {total_leads} pending leads",
|
||||
event="progress", elapsed=time.monotonic() - t0,
|
||||
)
|
||||
|
||||
# Phase 2: Hypothesis generation
|
||||
if resume_phase <= 2:
|
||||
@@ -803,39 +1178,26 @@ class Orchestrator:
|
||||
event="progress", elapsed=time.monotonic() - t0,
|
||||
)
|
||||
|
||||
# Phase 3: Hypothesis-directed investigation (iterative)
|
||||
# Phase 3: Strategist-driven investigation (DESIGN_STRATEGIST.md)
|
||||
if resume_phase <= 3:
|
||||
max_rounds = self.config.get("max_investigation_rounds", 5)
|
||||
for round_num in range(max_rounds):
|
||||
_log(f"Phase 3: Investigation Round {round_num}", event="phase")
|
||||
t0 = time.monotonic()
|
||||
strategist_cfg = self.config.get("strategist", {}) or {}
|
||||
strategist_enabled = strategist_cfg.get("enabled", True)
|
||||
if strategist_enabled:
|
||||
await self._phase3_strategist_loop()
|
||||
else:
|
||||
# Legacy fallback — keep the old hypothesis-directed
|
||||
# iterative loop available for runs that explicitly
|
||||
# disable the strategist (debugging, regression
|
||||
# comparison, or environments without the strategist
|
||||
# agent registered).
|
||||
await self._phase3_legacy_loop()
|
||||
|
||||
if self.graph.hypotheses_converged():
|
||||
_log("All hypotheses converged — stopping", event="progress")
|
||||
break
|
||||
|
||||
await self._generate_hypothesis_leads()
|
||||
|
||||
pending = await self.graph.get_pending_leads()
|
||||
if not pending:
|
||||
_log("No pending leads — round complete", event="progress")
|
||||
break
|
||||
|
||||
await self._dispatch_leads_parallel(pending)
|
||||
await self._judge_new_phenomena()
|
||||
|
||||
# Show hypothesis status update
|
||||
for h in self.graph.hypotheses.values():
|
||||
_log(f" {h.summary()}", event="hypothesis")
|
||||
_log(_progress_summary(self.graph), event="progress", elapsed=time.monotonic() - t0)
|
||||
|
||||
# Retry failed leads
|
||||
# Retry failed leads + Gap Analysis run regardless of which
|
||||
# Phase 3 variant was used — they operate on the leads/
|
||||
# hypothesis graph the strategist loop leaves behind.
|
||||
await self._retry_failed_leads()
|
||||
|
||||
# Gap analysis
|
||||
_log("Phase 3: Gap Analysis", event="phase")
|
||||
await self._run_gap_analysis()
|
||||
|
||||
self.graph.mark_remaining_inconclusive()
|
||||
|
||||
# Phase 4: Timeline construction
|
||||
@@ -865,8 +1227,15 @@ class Orchestrator:
|
||||
"6. Conclusions and Recommendations"
|
||||
)
|
||||
|
||||
image_stem = Path(self.graph.image_path).stem
|
||||
report_name = f"{image_stem}_forensic_report.md"
|
||||
# Multi-source case → name by case_id (stable across sources).
|
||||
# Legacy single-image runs without a Case → fall back to the
|
||||
# last active image's stem so old workflows still produce a
|
||||
# plausible filename.
|
||||
if self.graph.case and self.graph.case.case_id:
|
||||
stem = self.graph.case.case_id
|
||||
else:
|
||||
stem = Path(self.graph.image_path).stem or "case"
|
||||
report_name = f"{stem}_forensic_report.md"
|
||||
report_path = (self.run_dir / report_name) if self.run_dir else Path(report_name)
|
||||
try:
|
||||
report_path.write_text(report)
|
||||
|
||||
@@ -6,6 +6,8 @@ requires-python = ">=3.14"
|
||||
dependencies = [
|
||||
"httpx[socks]>=0.28.1",
|
||||
"openai>=2.36.0",
|
||||
"pillow>=12.2.0",
|
||||
"pytesseract>=0.3.13",
|
||||
"pyyaml",
|
||||
"regipy>=6.2.1",
|
||||
]
|
||||
|
||||
@@ -32,10 +32,10 @@ async def main() -> None:
|
||||
config = yaml.safe_load(open("config.yaml"))
|
||||
agent_cfg = config["agent"]
|
||||
|
||||
# Load graph (edge_weights from config — applied to the loaded graph)
|
||||
# Load graph (edge_log_lr from config — applied to the loaded graph)
|
||||
graph = EvidenceGraph.load_state(
|
||||
state_path,
|
||||
edge_weights=config.get("hypothesis_edge_weights"),
|
||||
edge_log_lr=config.get("hypothesis_log_lr"),
|
||||
)
|
||||
print(f"Loaded: {graph.stats_summary()}")
|
||||
|
||||
@@ -49,7 +49,7 @@ async def main() -> None:
|
||||
thinking_enabled=agent_cfg.get("thinking_enabled", False),
|
||||
)
|
||||
|
||||
register_all_tools(graph.image_path, graph.partition_offset, graph)
|
||||
register_all_tools(graph)
|
||||
factory = AgentFactory(llm, graph)
|
||||
|
||||
# Run only the report agent
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
800
tool_registry.py
800
tool_registry.py
File diff suppressed because it is too large
Load Diff
156
tools/archive.py
Normal file
156
tools/archive.py
Normal file
@@ -0,0 +1,156 @@
|
||||
"""Archive extraction tools — generic unzip for tree-mode evidence sources.
|
||||
|
||||
Mobile extractions (iOS / Android backups), archive sources, and shared
|
||||
work products all arrive as .zip files. The forensic agents work on the
|
||||
unpacked tree; this module is the single entry point for safely turning
|
||||
an archive into a directory.
|
||||
|
||||
Stdlib-only. No graph dependency.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import zipfile
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _is_within(base: Path, target: Path) -> bool:
|
||||
"""True when *target* resolves to a path inside *base* — symlink-safe."""
|
||||
try:
|
||||
base_r = base.resolve()
|
||||
target_r = target.resolve()
|
||||
except OSError:
|
||||
return False
|
||||
try:
|
||||
target_r.relative_to(base_r)
|
||||
except ValueError:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
|
||||
"""True when any entry has the zip 'encrypted' flag bit set."""
|
||||
return any(info.flag_bits & 0x1 for info in zf.infolist())
|
||||
|
||||
|
||||
def _do_extract(
|
||||
zip_path: str,
|
||||
dest_dir: str,
|
||||
password: str | None = None,
|
||||
) -> str:
|
||||
"""Shared core for unzip_archive (async) and unzip_archive_sync.
|
||||
|
||||
Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
|
||||
whose target already exists at the matching size are skipped). Returns
|
||||
a multi-line summary the agent can read directly.
|
||||
"""
|
||||
zp = Path(zip_path)
|
||||
if not zp.is_file():
|
||||
return f"Error: {zip_path} is not a file."
|
||||
|
||||
dest = Path(dest_dir)
|
||||
dest.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
extracted = 0
|
||||
skipped: list[str] = []
|
||||
total_bytes = 0
|
||||
pwd_bytes = password.encode("utf-8") if password else None
|
||||
|
||||
try:
|
||||
with zipfile.ZipFile(zp, "r") as zf:
|
||||
encrypted = _is_zip_encrypted(zf)
|
||||
if encrypted and pwd_bytes is None:
|
||||
return (
|
||||
f"Error: {zip_path} is password-protected. "
|
||||
f"Provide the password via case.yaml's "
|
||||
f"meta.password on this source, or pass `password=` "
|
||||
f"explicitly. Stdlib zipfile only supports the legacy "
|
||||
f"ZipCrypto algorithm — AES-encrypted zips (created by "
|
||||
f"7-Zip / WinZip) need an external tool like 7z."
|
||||
)
|
||||
for info in zf.infolist():
|
||||
name = info.filename
|
||||
# Block absolute paths and parent-escape attempts up front.
|
||||
if name.startswith(("/", "\\")) or ".." in Path(name).parts:
|
||||
skipped.append(f"escape: {name}")
|
||||
continue
|
||||
target = dest / name
|
||||
if not _is_within(dest, target):
|
||||
skipped.append(f"escape: {name}")
|
||||
continue
|
||||
# Symlink entries — skip rather than risk traversing out.
|
||||
if info.external_attr >> 16 & 0o120000 == 0o120000:
|
||||
skipped.append(f"symlink: {name}")
|
||||
continue
|
||||
if info.is_dir():
|
||||
target.mkdir(parents=True, exist_ok=True)
|
||||
continue
|
||||
# Skip if already extracted with matching size (idempotent rerun).
|
||||
if target.exists() and target.stat().st_size == info.file_size:
|
||||
continue
|
||||
target.parent.mkdir(parents=True, exist_ok=True)
|
||||
try:
|
||||
with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
|
||||
while True:
|
||||
chunk = src.read(65536)
|
||||
if not chunk:
|
||||
break
|
||||
out.write(chunk)
|
||||
except RuntimeError as e:
|
||||
# zipfile raises RuntimeError for bad-password / AES-encrypted.
|
||||
msg = str(e)
|
||||
if "Bad password" in msg or "password required" in msg:
|
||||
return (
|
||||
f"Error: bad or missing password for {zip_path}. "
|
||||
f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
|
||||
f"cannot decrypt it — use `7z x -p<pwd> ...` "
|
||||
f"externally and point the source path at the result."
|
||||
)
|
||||
raise
|
||||
extracted += 1
|
||||
total_bytes += info.file_size
|
||||
except zipfile.BadZipFile as e:
|
||||
return f"Error: {zip_path} is not a valid zip archive: {e}"
|
||||
except Exception as e:
|
||||
return f"Error extracting {zip_path}: {e}"
|
||||
|
||||
parts = [
|
||||
f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
|
||||
]
|
||||
if skipped:
|
||||
parts.append(f"Skipped {len(skipped)} unsafe entries:")
|
||||
for s in skipped[:10]:
|
||||
parts.append(f" - {s}")
|
||||
if len(skipped) > 10:
|
||||
parts.append(f" ... ({len(skipped) - 10} more)")
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
async def unzip_archive(
|
||||
zip_path: str, dest_dir: str, password: str | None = None,
|
||||
) -> str:
|
||||
"""Extract *zip_path* into *dest_dir*. Idempotent on rerun.
|
||||
|
||||
Defensive: rejects entries with absolute paths, leading '..', or that
|
||||
would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
|
||||
entries are skipped (we never follow symlinks into the host filesystem).
|
||||
Password-protected zips need the password argument (or
|
||||
``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
|
||||
only handles the legacy ZipCrypto algorithm.
|
||||
"""
|
||||
return _do_extract(zip_path, dest_dir, password)
|
||||
|
||||
|
||||
def unzip_archive_sync(
|
||||
zip_path: str, dest_dir: str, password: str | None = None,
|
||||
) -> str:
|
||||
"""Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
|
||||
|
||||
Same behaviour, just no async wrapping — used before the event loop
|
||||
starts so we don't have to spin one up just to unpack a zip.
|
||||
"""
|
||||
return _do_extract(zip_path, dest_dir, password)
|
||||
87
tools/media.py
Normal file
87
tools/media.py
Normal file
@@ -0,0 +1,87 @@
|
||||
"""Media plugin — OCR for image evidence.
|
||||
|
||||
DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
|
||||
OCR locally for any image-bearing evidence. Tesseract via pytesseract is
|
||||
the default; if the runtime is missing those packages, the tool returns a
|
||||
clear install hint rather than failing silently.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MAX_OUTPUT = 8000
|
||||
|
||||
_INSTALL_HINT = (
|
||||
"Error: OCR runtime not available. Install with:\n"
|
||||
" pip install pytesseract pillow\n"
|
||||
" sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
|
||||
"(or the equivalent for your distribution). Then retry."
|
||||
)
|
||||
|
||||
|
||||
def _has_ocr_runtime() -> tuple[bool, str]:
|
||||
"""Return (available, reason). reason is empty when available."""
|
||||
try:
|
||||
import pytesseract # noqa: F401
|
||||
from PIL import Image # noqa: F401
|
||||
except ImportError as e:
|
||||
return False, f"missing python package: {e.name}"
|
||||
# Check the tesseract binary too.
|
||||
import shutil
|
||||
if shutil.which("tesseract") is None:
|
||||
return False, "tesseract binary not on PATH"
|
||||
return True, ""
|
||||
|
||||
|
||||
async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
|
||||
"""Extract text from an image via tesseract.
|
||||
|
||||
*lang* defaults to English + Simplified + Traditional Chinese, matching
|
||||
the multi-language artefacts the current case involves. Pass a single
|
||||
language code (e.g. ``"eng"``) to skip language packs that aren't
|
||||
installed.
|
||||
"""
|
||||
p = Path(file_path)
|
||||
if not p.is_file():
|
||||
return f"Error: {file_path} is not a file."
|
||||
available, reason = _has_ocr_runtime()
|
||||
if not available:
|
||||
return f"{_INSTALL_HINT}\n[detail: {reason}]"
|
||||
|
||||
import pytesseract
|
||||
from PIL import Image
|
||||
|
||||
try:
|
||||
img = Image.open(p)
|
||||
except Exception as e:
|
||||
return f"Error: could not open image {file_path}: {e}"
|
||||
|
||||
try:
|
||||
text = pytesseract.image_to_string(img, lang=lang)
|
||||
except pytesseract.TesseractError as e:
|
||||
msg = str(e)
|
||||
if "Failed loading language" in msg or "Error opening data file" in msg:
|
||||
return (
|
||||
f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
|
||||
f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
|
||||
f"different `lang`. Detail: {msg}"
|
||||
)
|
||||
return f"Error running tesseract: {msg}"
|
||||
except Exception as e:
|
||||
return f"Error during OCR: {e}"
|
||||
|
||||
size = p.stat().st_size
|
||||
header = (
|
||||
f"ocr: {file_path} ({size} bytes, lang={lang}, "
|
||||
f"{len(text.splitlines())} line(s))\n"
|
||||
)
|
||||
if len(text) > MAX_OUTPUT - len(header):
|
||||
body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
|
||||
else:
|
||||
body = text
|
||||
return header + body
|
||||
160
tools/mobile_android.py
Normal file
160
tools/mobile_android.py
Normal file
@@ -0,0 +1,160 @@
|
||||
"""Android plugin tools — partition survey + sector translation.
|
||||
|
||||
DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
|
||||
``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
|
||||
toolchain already handles ext4/F2FS reads, so once the agent picks a partition
|
||||
offset the standard list_directory / extract_file / search_strings tools work.
|
||||
|
||||
Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
|
||||
TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
|
||||
emits BOTH unit systems so the agent can plug the right ``partition_offset``
|
||||
value into ``set_active_partition``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MAX_OUTPUT = 8000
|
||||
|
||||
# Partitions worth flagging when we encounter them — informs the agent's
|
||||
# strategy. Not exhaustive; just opinionated hints.
|
||||
_PARTITION_HINTS: dict[str, str] = {
|
||||
"EFS": "modem firmware area; often contains IMEI / MAC / serial",
|
||||
"PARAM": "boot parameters; cmdline + flags",
|
||||
"BOOT": "kernel + initramfs (raw image)",
|
||||
"RECOVERY": "recovery image (raw)",
|
||||
"SYSTEM": "Android /system — read-only OS partition (ext4)",
|
||||
"CACHE": "downloaded OTA payloads; usually transient",
|
||||
"USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
|
||||
"PERSISTENT": "Samsung persistent partition; carrier/device flags",
|
||||
"STEADY": "Samsung steady-state config",
|
||||
"HIDDEN": "Samsung hidden partition; check before assuming empty",
|
||||
"CP_DEBUG": "modem debug logs",
|
||||
"TOMBSTONES": "userland crash dumps",
|
||||
}
|
||||
|
||||
|
||||
def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
|
||||
"""Parse mmls output, returning (sector_size_bytes, partitions).
|
||||
|
||||
mmls states ``Units are in N-byte sectors`` near the top; we extract N
|
||||
to translate between image-native units and the 512-byte units TSK
|
||||
tools accept via ``-o``.
|
||||
"""
|
||||
sector_size = 512
|
||||
m = re.search(r"Units are in (\d+)-byte sectors", output)
|
||||
if m:
|
||||
sector_size = int(m.group(1))
|
||||
|
||||
parts: list[dict] = []
|
||||
for line in output.splitlines():
|
||||
m = re.match(
|
||||
r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
|
||||
line,
|
||||
)
|
||||
if not m:
|
||||
continue
|
||||
_row, slot, start, end, length, desc = m.groups()
|
||||
if slot == "Meta" or slot.startswith("---"):
|
||||
continue
|
||||
parts.append({
|
||||
"slot": slot,
|
||||
"start_native": int(start),
|
||||
"end_native": int(end),
|
||||
"length_native": int(length),
|
||||
"description": desc.strip(),
|
||||
})
|
||||
return sector_size, parts
|
||||
|
||||
|
||||
async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
try:
|
||||
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
|
||||
except asyncio.TimeoutError:
|
||||
proc.kill()
|
||||
return 124, "", f"timeout after {timeout}s"
|
||||
return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
|
||||
|
||||
|
||||
_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
|
||||
|
||||
|
||||
async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
|
||||
"""Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
|
||||
|
||||
fsstat's "Cannot determine file system type" is treated as 'unknown' —
|
||||
typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
|
||||
(modern userdata under FBE).
|
||||
"""
|
||||
rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
|
||||
if rc != 0:
|
||||
return "unknown"
|
||||
m = _FS_TYPE_RE.search(out)
|
||||
if m:
|
||||
return m.group(1)
|
||||
return "unknown"
|
||||
|
||||
|
||||
async def probe_android_partitions(image_path: str) -> str:
|
||||
"""Survey every partition on an Android disk dump and return a table.
|
||||
|
||||
The agent reads this once to plan its work: which partitions are
|
||||
Ext4/F2FS (use TSK), which are raw (extract image / strings only),
|
||||
which are encrypted (skip until decrypted).
|
||||
"""
|
||||
p = Path(image_path)
|
||||
if not p.is_file():
|
||||
return f"Error: {image_path} is not a file."
|
||||
|
||||
rc, out, err = await _run(["mmls", str(p)], timeout=30)
|
||||
if rc != 0:
|
||||
return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
|
||||
|
||||
sector_size, parts = _parse_mmls_with_unit(out)
|
||||
if not parts:
|
||||
return f"No partitions detected in {image_path}."
|
||||
|
||||
lines = [
|
||||
f"Android partition survey: {image_path}",
|
||||
f" mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
|
||||
f" {len(parts)} data partitions",
|
||||
"",
|
||||
"| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
|
||||
"|---|---|---:|---:|---|---|---|",
|
||||
]
|
||||
for prt in parts:
|
||||
sector_512 = prt["start_native"] * sector_size // 512
|
||||
bytes_size = prt["length_native"] * sector_size
|
||||
# human-readable size
|
||||
if bytes_size >= 1 << 30:
|
||||
size_h = f"{bytes_size / (1 << 30):.1f} GB"
|
||||
elif bytes_size >= 1 << 20:
|
||||
size_h = f"{bytes_size / (1 << 20):.1f} MB"
|
||||
else:
|
||||
size_h = f"{bytes_size // 1024} KB"
|
||||
fs_type = await _classify_partition(str(p), sector_512)
|
||||
# Try to extract a friendly partition name from the description
|
||||
# (mmls description often includes the partition name uppercase).
|
||||
name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
|
||||
pname = name_match.group(0) if name_match else prt["description"][:20]
|
||||
hint = _PARTITION_HINTS.get(pname, "")
|
||||
lines.append(
|
||||
f"| {prt['slot']} | {pname} | {prt['start_native']} | "
|
||||
f"{sector_512} | {size_h} | {fs_type} | {hint} |"
|
||||
)
|
||||
|
||||
body = "\n".join(lines)
|
||||
if len(body) > MAX_OUTPUT:
|
||||
body = body[:MAX_OUTPUT] + "\n\n[truncated]"
|
||||
return body
|
||||
274
tools/mobile_ios.py
Normal file
274
tools/mobile_ios.py
Normal file
@@ -0,0 +1,274 @@
|
||||
"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
|
||||
|
||||
DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
|
||||
Kit, no graph dependency. Stdlib + sqlite3 only.
|
||||
|
||||
iOS extractions typically arrive as a zip containing domain-rooted trees
|
||||
(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
|
||||
binary/XML plists, and several SQLite databases (sms.db, AddressBook,
|
||||
keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import plistlib
|
||||
import re
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Output cap (chars) — keeps a single tool result under the LLM context budget.
|
||||
MAX_OUTPUT = 8000
|
||||
|
||||
|
||||
def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
|
||||
if len(text) <= limit:
|
||||
return text
|
||||
return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# plist
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _to_jsonable(obj):
|
||||
"""Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
|
||||
import datetime
|
||||
if isinstance(obj, bytes):
|
||||
if len(obj) <= 64:
|
||||
return {"_bytes_hex": obj.hex()}
|
||||
return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
|
||||
if isinstance(obj, datetime.datetime):
|
||||
return obj.isoformat()
|
||||
if isinstance(obj, dict):
|
||||
return {str(k): _to_jsonable(v) for k, v in obj.items()}
|
||||
if isinstance(obj, (list, tuple)):
|
||||
return [_to_jsonable(v) for v in obj]
|
||||
return obj
|
||||
|
||||
|
||||
async def parse_plist(file_path: str) -> str:
|
||||
"""Parse a .plist file (XML or binary) and return its contents as JSON.
|
||||
|
||||
Both formats are handled transparently by ``plistlib.load``.
|
||||
"""
|
||||
p = Path(file_path)
|
||||
if not p.is_file():
|
||||
return f"Error: {file_path} is not a file."
|
||||
try:
|
||||
with open(p, "rb") as f:
|
||||
data = plistlib.load(f)
|
||||
except plistlib.InvalidFileException as e:
|
||||
return f"Error: {file_path} is not a valid plist ({e})"
|
||||
except Exception as e:
|
||||
return f"Error parsing plist {file_path}: {e}"
|
||||
|
||||
serial = _to_jsonable(data)
|
||||
rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
|
||||
header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
|
||||
return header + _trunc(rendered)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# sqlite
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
|
||||
|
||||
|
||||
async def sqlite_tables(db_path: str) -> str:
|
||||
"""List user tables in a sqlite file with row counts and column names."""
|
||||
p = Path(db_path)
|
||||
if not p.is_file():
|
||||
return f"Error: {db_path} is not a file."
|
||||
try:
|
||||
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
|
||||
except sqlite3.OperationalError as e:
|
||||
return f"Error opening {db_path} (read-only): {e}"
|
||||
try:
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"SELECT name FROM sqlite_master "
|
||||
"WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
|
||||
)
|
||||
tables = [r[0] for r in cur.fetchall()]
|
||||
if not tables:
|
||||
return f"No user tables in {db_path}."
|
||||
lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
|
||||
for name in tables:
|
||||
try:
|
||||
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
|
||||
count = cur.fetchone()[0]
|
||||
except sqlite3.DatabaseError as e:
|
||||
count = f"(count failed: {e})"
|
||||
try:
|
||||
cur.execute(f"PRAGMA table_info(\"{name}\")")
|
||||
cols = [r[1] for r in cur.fetchall()]
|
||||
except sqlite3.DatabaseError:
|
||||
cols = []
|
||||
lines.append(f" {name}: {count} row(s); cols: {', '.join(cols)}")
|
||||
return _trunc("\n".join(lines))
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
async def sqlite_query(
|
||||
db_path: str,
|
||||
query: str,
|
||||
max_rows: int = 100,
|
||||
) -> str:
|
||||
"""Run a single read-only SELECT against a sqlite file.
|
||||
|
||||
Multi-statement queries and anything other than a SELECT are rejected
|
||||
(we open the database in read-only mode anyway, so writes would fail
|
||||
too — but the explicit check keeps the agent honest).
|
||||
"""
|
||||
if not _SELECT_RE.match(query):
|
||||
return "Error: only single SELECT statements are allowed."
|
||||
if ";" in query.rstrip(";"):
|
||||
return "Error: multi-statement queries are not allowed."
|
||||
|
||||
p = Path(db_path)
|
||||
if not p.is_file():
|
||||
return f"Error: {db_path} is not a file."
|
||||
try:
|
||||
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
|
||||
except sqlite3.OperationalError as e:
|
||||
return f"Error opening {db_path} (read-only): {e}"
|
||||
|
||||
try:
|
||||
cur = conn.cursor()
|
||||
try:
|
||||
cur.execute(query)
|
||||
except sqlite3.DatabaseError as e:
|
||||
return f"Error executing query: {e}"
|
||||
cols = [d[0] for d in cur.description] if cur.description else []
|
||||
rows = cur.fetchmany(max(1, int(max_rows)))
|
||||
lines = [
|
||||
f"sqlite query: {db_path}",
|
||||
f"columns: {cols}",
|
||||
f"rows ({len(rows)}, capped at {max_rows}):",
|
||||
]
|
||||
for row in rows:
|
||||
rendered = [
|
||||
(v.hex() if isinstance(v, bytes) else str(v))
|
||||
for v in row
|
||||
]
|
||||
lines.append(" " + " | ".join(rendered))
|
||||
return _trunc("\n".join(lines))
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# iOS keychain (keychain-2.db)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Standard iOS keychain tables. genp = generic passwords, inet = internet
|
||||
# passwords, cert = certificates, keys = key material. Forensic extractions
|
||||
# of locked keychains have ``data`` columns NULL but accounting metadata
|
||||
# (agrp, acct, svce) intact — already useful for attribution work.
|
||||
_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
|
||||
|
||||
|
||||
async def parse_ios_keychain(keychain_root: str) -> str:
|
||||
"""Locate and summarize iOS keychain entries under *keychain_root*.
|
||||
|
||||
*keychain_root* may be a path to ``keychain-2.db`` directly or to a
|
||||
directory that contains it (e.g. ``.../var/keychains``).
|
||||
"""
|
||||
root = Path(keychain_root)
|
||||
db: Path | None = None
|
||||
if root.is_file() and root.name == "keychain-2.db":
|
||||
db = root
|
||||
elif root.is_dir():
|
||||
candidate = root / "keychain-2.db"
|
||||
if candidate.is_file():
|
||||
db = candidate
|
||||
else:
|
||||
# Fall back to a shallow recursive search.
|
||||
for found in root.rglob("keychain-2.db"):
|
||||
db = found
|
||||
break
|
||||
if db is None:
|
||||
return f"No keychain-2.db found under {keychain_root}."
|
||||
|
||||
try:
|
||||
conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
|
||||
except sqlite3.OperationalError as e:
|
||||
return f"Error opening {db}: {e}"
|
||||
|
||||
try:
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
"SELECT name FROM sqlite_master "
|
||||
"WHERE type='table' AND name IN ({})".format(
|
||||
",".join("?" * len(_KEYCHAIN_TABLES))
|
||||
),
|
||||
_KEYCHAIN_TABLES,
|
||||
)
|
||||
present = [r[0] for r in cur.fetchall()]
|
||||
if not present:
|
||||
return f"keychain-2.db at {db} has no recognised tables."
|
||||
|
||||
lines = [f"keychain: {db}"]
|
||||
for name in present:
|
||||
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
|
||||
count = cur.fetchone()[0]
|
||||
lines.append(f"\n[{name}] {count} row(s)")
|
||||
cur.execute(f"PRAGMA table_info(\"{name}\")")
|
||||
cols = [r[1] for r in cur.fetchall()]
|
||||
# Pick a useful subset of accounting columns when present.
|
||||
preferred = [
|
||||
c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
|
||||
if c in cols
|
||||
]
|
||||
if not preferred:
|
||||
preferred = cols[:5]
|
||||
sel = ", ".join(f'"{c}"' for c in preferred)
|
||||
cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
|
||||
for row in cur.fetchall():
|
||||
lines.append(" " + " | ".join(
|
||||
(v.hex() if isinstance(v, bytes) else str(v))
|
||||
for v in row
|
||||
))
|
||||
return _trunc("\n".join(lines))
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# iDevice_info.txt
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
|
||||
"""Read the standard iDevice_info.txt summary at the root of an iOS extraction.
|
||||
|
||||
The file is a flat ``Key: value`` dump from libimobiledevice / native
|
||||
extraction tools. We surface the first *max_chars* of content verbatim
|
||||
— the agent can search/extract specific keys via search_text_file if
|
||||
the head isn't enough.
|
||||
"""
|
||||
p = Path(file_path)
|
||||
if p.is_dir():
|
||||
# Be helpful: if the agent passed the extraction root, find the file.
|
||||
candidate = p / "iDevice_info.txt"
|
||||
if candidate.is_file():
|
||||
p = candidate
|
||||
if not p.is_file():
|
||||
return f"Error: {file_path} is not a file."
|
||||
try:
|
||||
with open(p, "r", encoding="utf-8", errors="replace") as f:
|
||||
content = f.read(max_chars)
|
||||
size = p.stat().st_size
|
||||
header = f"iDevice_info: {p} ({size} bytes)\n"
|
||||
if size > max_chars:
|
||||
content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
|
||||
return header + content
|
||||
except Exception as e:
|
||||
return f"Error reading {file_path}: {e}"
|
||||
180
tools/parsers.py
180
tools/parsers.py
@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
|
||||
return f"[Error parsing Prefetch: {e}]"
|
||||
|
||||
|
||||
async def list_extracted_dir(dir_path: str) -> str:
|
||||
"""List files in an extracted directory."""
|
||||
async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
|
||||
"""Smart summary of a (potentially huge) extracted tree.
|
||||
|
||||
Earlier versions dumped up to 200 random entries then truncated — that
|
||||
leaves the agent blind on 10k+-file iOS extractions. The new layout
|
||||
returns a compact summary that scales: total counts, extension
|
||||
breakdown, top-level directories with their sizes, and the largest
|
||||
files. For targeted lookups (e.g. find every ``*.sqlite`` under the
|
||||
tree) the agent should use ``find_files`` instead.
|
||||
"""
|
||||
if not os.path.isdir(dir_path):
|
||||
return f"[Error: {dir_path} is not a directory]"
|
||||
|
||||
try:
|
||||
entries = []
|
||||
for root, dirs, files in os.walk(dir_path):
|
||||
total_files = 0
|
||||
total_bytes = 0
|
||||
ext_counts: dict[str, int] = {}
|
||||
ext_bytes: dict[str, int] = {}
|
||||
top_level_dirs: dict[str, dict] = {}
|
||||
biggest: list[tuple[int, str]] = [] # (size, relpath)
|
||||
|
||||
dir_path_abs = os.path.abspath(dir_path)
|
||||
for root, dirs, files in os.walk(dir_path_abs):
|
||||
# Track top-level directory aggregates (cheap; no per-entry cost
|
||||
# beyond the walk we're already doing).
|
||||
rel_root = os.path.relpath(root, dir_path_abs)
|
||||
if rel_root == ".":
|
||||
top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
|
||||
top_level_dirs.update(top_dirs)
|
||||
top_key = None
|
||||
else:
|
||||
top_key = rel_root.split(os.sep, 1)[0]
|
||||
if top_key not in top_level_dirs:
|
||||
top_level_dirs[top_key] = {"files": 0, "bytes": 0}
|
||||
|
||||
for f in files:
|
||||
full = os.path.join(root, f)
|
||||
rel = os.path.relpath(full, dir_path)
|
||||
size = os.path.getsize(full)
|
||||
entries.append(f" {rel} ({size} bytes)")
|
||||
if len(entries) > 200:
|
||||
entries.append(f" ... (truncated)")
|
||||
break
|
||||
try:
|
||||
size = os.path.getsize(full)
|
||||
except OSError:
|
||||
continue
|
||||
total_files += 1
|
||||
total_bytes += size
|
||||
ext = os.path.splitext(f)[1].lower() or "(no ext)"
|
||||
ext_counts[ext] = ext_counts.get(ext, 0) + 1
|
||||
ext_bytes[ext] = ext_bytes.get(ext, 0) + size
|
||||
if top_key is not None:
|
||||
top_level_dirs[top_key]["files"] += 1
|
||||
top_level_dirs[top_key]["bytes"] += size
|
||||
# Maintain a top-10 largest list cheaply (bounded insertion).
|
||||
if len(biggest) < 10:
|
||||
biggest.append((size, os.path.relpath(full, dir_path_abs)))
|
||||
biggest.sort(reverse=True)
|
||||
elif size > biggest[-1][0]:
|
||||
biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
|
||||
biggest.sort(reverse=True)
|
||||
|
||||
return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
|
||||
def _human(n: int) -> str:
|
||||
for unit in ("B", "KB", "MB", "GB"):
|
||||
if n < 1024:
|
||||
return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
|
||||
n /= 1024
|
||||
return f"{n:.1f}TB"
|
||||
|
||||
lines = [
|
||||
f"Directory: {dir_path}",
|
||||
f" Total: {total_files} file(s), {_human(total_bytes)}",
|
||||
]
|
||||
|
||||
# Top-level directory layout (immediate children, sorted by file count).
|
||||
if top_level_dirs:
|
||||
lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
|
||||
sorted_tlds = sorted(
|
||||
top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
|
||||
)[:15]
|
||||
for d, stats in sorted_tlds:
|
||||
lines.append(
|
||||
f" {d}/ ({stats['files']} files, {_human(stats['bytes'])})"
|
||||
)
|
||||
if len(top_level_dirs) > 15:
|
||||
lines.append(f" ... ({len(top_level_dirs) - 15} more top-level dirs)")
|
||||
|
||||
# Extension breakdown.
|
||||
if ext_counts:
|
||||
lines.append(f"\nExtension breakdown (top 15):")
|
||||
for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
|
||||
lines.append(
|
||||
f" {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
|
||||
)
|
||||
|
||||
# Largest files (often the highest-value forensic targets).
|
||||
if biggest:
|
||||
lines.append("\nLargest files:")
|
||||
for size, rel in biggest:
|
||||
lines.append(f" {rel} ({_human(size)})")
|
||||
|
||||
lines.append(
|
||||
f"\nNext step: call find_files with a pattern like "
|
||||
f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
|
||||
)
|
||||
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"[Error listing {dir_path}: {e}]"
|
||||
|
||||
|
||||
async def find_files(
|
||||
root: str,
|
||||
pattern: str,
|
||||
max_results: int = 500,
|
||||
) -> str:
|
||||
"""Recursively find files under *root* whose path matches *pattern*.
|
||||
|
||||
Uses fnmatch-style globs against the *full relative path*; ``**`` is
|
||||
treated as "any number of path segments" (so ``**/*.plist`` finds
|
||||
every plist no matter how deep). Examples:
|
||||
|
||||
- ``**/sms.db`` — iOS SMS database
|
||||
- ``**/keychain-2.db`` — iOS keychain
|
||||
- ``**/ChatStorage.sqlite`` — WhatsApp app store
|
||||
- ``HomeDomain/Library/**`` — anchor at a known iOS domain root
|
||||
- ``**/*.{plist,sqlite,db}`` — multi-extension (use 2+ calls or a regex if needed)
|
||||
|
||||
Results are sorted by size descending — the biggest hits usually
|
||||
matter most. Capped at *max_results* to keep the LLM context bounded.
|
||||
"""
|
||||
import fnmatch
|
||||
|
||||
if not os.path.isdir(root):
|
||||
return f"[Error: {root} is not a directory]"
|
||||
|
||||
root_abs = os.path.abspath(root)
|
||||
# Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
|
||||
# fnmatch doesn't natively distinguish segment vs path; expanding ``**``
|
||||
# to ``*`` and letting fnmatch match the full relpath is good enough for
|
||||
# forensic lookups.
|
||||
fn_pattern = pattern.replace("**", "*")
|
||||
|
||||
hits: list[tuple[int, str]] = []
|
||||
truncated = False
|
||||
try:
|
||||
for dirpath, _dirs, files in os.walk(root_abs):
|
||||
for f in files:
|
||||
full = os.path.join(dirpath, f)
|
||||
rel = os.path.relpath(full, root_abs)
|
||||
if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
|
||||
try:
|
||||
size = os.path.getsize(full)
|
||||
except OSError:
|
||||
size = 0
|
||||
hits.append((size, rel))
|
||||
if len(hits) >= max_results * 4:
|
||||
# Hard upper bound to keep the walk cheap on huge trees.
|
||||
truncated = True
|
||||
break
|
||||
if truncated:
|
||||
break
|
||||
except Exception as e:
|
||||
return f"[Error searching {root}: {e}]"
|
||||
|
||||
hits.sort(reverse=True)
|
||||
if len(hits) > max_results:
|
||||
truncated = True
|
||||
hits = hits[:max_results]
|
||||
|
||||
lines = [
|
||||
f"find_files: pattern={pattern!r} under {root}",
|
||||
f" matches: {len(hits)}" + (" (truncated)" if truncated else ""),
|
||||
]
|
||||
if not hits:
|
||||
lines.append(" (no matches)")
|
||||
else:
|
||||
for size, rel in hits:
|
||||
lines.append(f" {rel} ({size} bytes)")
|
||||
return "\n".join(lines)
|
||||
|
||||
485
tools/strategy.py
Normal file
485
tools/strategy.py
Normal file
@@ -0,0 +1,485 @@
|
||||
"""Strategist-loop tools — read-only views over graph state that let the
|
||||
InvestigationStrategist agent decide whether to keep investigating or to
|
||||
declare the investigation complete.
|
||||
|
||||
DESIGN_STRATEGIST.md §2. Four read-only views:
|
||||
|
||||
graph_overview() → hypotheses + sources + pending leads snapshot
|
||||
source_coverage(src_id) → which artefact categories on this source have
|
||||
been touched vs are still ✗
|
||||
marginal_yield(n_rounds) → how much information the last N rounds added
|
||||
budget_status() → tool calls / rounds / wall-clock against caps
|
||||
|
||||
These are pure render functions over the graph — they MUST NOT mutate state.
|
||||
The strategist never writes phenomena/edges directly; all graph mutations
|
||||
happen through worker agents that the strategist dispatches via propose_lead
|
||||
(which is registered separately in tool_registry).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Expected artefact catalogue (per source type)
|
||||
#
|
||||
# These are SOFT HINTS — items the strategist might want to check on a given
|
||||
# source type if any active hypothesis depends on them. The catalogue is
|
||||
# intentionally compact; expand it in-place when a new forensic specialty
|
||||
# joins the toolset. Each entry:
|
||||
#
|
||||
# name human-readable artefact category
|
||||
# detector how to recognise that this category has been touched — either
|
||||
# a tool name OR a `<tool>@<path-substring>` pattern, joined with
|
||||
# `|` for alternatives. The matcher is substring on the tool name
|
||||
# and on the args' string representation.
|
||||
# value_for one-line description of why this category might matter
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
EXPECTED_ARTEFACTS: dict[str, list[dict[str, str]]] = {
|
||||
"disk_image+windows": [
|
||||
{"name": "partition layout", "detector": "partition_info|mmls",
|
||||
"value_for": "deleted files, hidden partitions"},
|
||||
{"name": "filesystem walk", "detector": "list_directory|fls",
|
||||
"value_for": "directory tree, recoverable deleted entries"},
|
||||
{"name": "registry hives", "detector": "parse_registry_key|list_installed_software|get_user_activity",
|
||||
"value_for": "installed software, user activity, timezone"},
|
||||
{"name": "browser history", "detector": "list_directory@AppData|read_text_file@History|read_text_file@Bookmarks",
|
||||
"value_for": "URL access, downloads, web search terms"},
|
||||
{"name": "prefetch", "detector": "parse_prefetch|extract_file@Prefetch",
|
||||
"value_for": "program execution evidence"},
|
||||
{"name": "email/IM config", "detector": "get_email_config",
|
||||
"value_for": "user accounts, configured mail/IM clients"},
|
||||
{"name": "recycle bin", "detector": "list_directory@$Recycle|count_deleted_files",
|
||||
"value_for": "deleted file metadata and recovery"},
|
||||
],
|
||||
"disk_image+android": [
|
||||
{"name": "partition probe", "detector": "probe_android_partitions",
|
||||
"value_for": "discover EFS / SYSTEM / USERDATA layout"},
|
||||
{"name": "system properties", "detector": "read_text_file@build.prop|read_text_file@default.prop",
|
||||
"value_for": "device model, OS version, CSC region"},
|
||||
{"name": "app inventory", "detector": "list_directory@data/app|list_directory@data/data",
|
||||
"value_for": "installed apps, package names"},
|
||||
{"name": "user data dbs", "detector": "list_directory@data/data|sqlite_query",
|
||||
"value_for": "messages, contacts, app-specific data"},
|
||||
{"name": "device identity", "detector": "search_strings@imei|search_strings@serial|search_strings@DRI",
|
||||
"value_for": "IMEI, serial, device fingerprint"},
|
||||
],
|
||||
"mobile_extraction": [
|
||||
{"name": "device info", "detector": "read_idevice_info|read_text_file@iDevice_info",
|
||||
"value_for": "model, iOS version, IMEI, ICCID, Bluetooth MAC, UDID"},
|
||||
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb",
|
||||
"value_for": "contacts, owner identity"},
|
||||
{"name": "SMS / iMessage", "detector": "sqlite_query@sms.db",
|
||||
"value_for": "messaging content, OTP / verification codes"},
|
||||
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite|sqlite_query@WhatsApp",
|
||||
"value_for": "WhatsApp content, group membership, call records"},
|
||||
{"name": "WeChat", "detector": "sqlite_query@MM.sqlite|sqlite_query@wcdb|list_directory@WeChat",
|
||||
"value_for": "WeChat IDs, messages, follow targets"},
|
||||
{"name": "Call history", "detector": "sqlite_query@CallHistory|sqlite_query@call_history",
|
||||
"value_for": "incoming/outgoing call log"},
|
||||
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text_file@Bookmarks.plist|parse_plist@Bookmarks",
|
||||
"value_for": "URL access, bookmarks, search queries"},
|
||||
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite|parse_plist@Photos",
|
||||
"value_for": "photo metadata, EXIF, geolocation, source app"},
|
||||
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3|parse_ios_keychain",
|
||||
"value_for": "Apple ID, registered services, authentication tokens"},
|
||||
{"name": "app inventory", "detector": "list_directory@Bundle/Application|list_directory@Containers",
|
||||
"value_for": "installed apps, app-specific containers"},
|
||||
{"name": "Wi-Fi history", "detector": "parse_plist@com.apple.wifi|read_text_file@known_networks",
|
||||
"value_for": "connected SSIDs, keys, first/last seen times"},
|
||||
],
|
||||
"media_collection": [
|
||||
{"name": "archive unpack", "detector": "unzip_archive|list_directory",
|
||||
"value_for": "extract images / docs for downstream analysis"},
|
||||
{"name": "OCR text", "detector": "ocr_image",
|
||||
"value_for": "screenshot text content (chat, transaction, IDs)"},
|
||||
{"name": "metadata", "detector": "read_binary_preview|search_strings",
|
||||
"value_for": "EXIF, embedded timestamps, device fingerprints"},
|
||||
],
|
||||
"archive": [
|
||||
{"name": "archive unpack", "detector": "unzip_archive",
|
||||
"value_for": "expose contents for further analysis"},
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _key_for_source(src) -> str:
|
||||
"""Return the EXPECTED_ARTEFACTS key for a source: 'disk_image+platform'
|
||||
when platform is set in meta, otherwise just the source type."""
|
||||
src_type = getattr(src, "type", "")
|
||||
if src_type == "disk_image":
|
||||
platform = (getattr(src, "meta", {}) or {}).get("platform", "").lower()
|
||||
if platform:
|
||||
return f"disk_image+{platform}"
|
||||
return src_type
|
||||
|
||||
|
||||
def _detector_matches(detector: str, tool_name: str, args_str: str) -> bool:
|
||||
"""Return True if any '|'-separated branch of `detector` matches.
|
||||
|
||||
A branch like ``sqlite_query@AddressBook.sqlitedb`` requires both the
|
||||
tool name (substring) AND the args (substring) to match. A branch like
|
||||
``parse_prefetch`` is a tool-name-only check.
|
||||
"""
|
||||
for branch in detector.split("|"):
|
||||
branch = branch.strip()
|
||||
if not branch:
|
||||
continue
|
||||
if "@" in branch:
|
||||
t, sub = branch.split("@", 1)
|
||||
if t in tool_name and sub.lower() in args_str.lower():
|
||||
return True
|
||||
else:
|
||||
if branch in tool_name:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# graph_overview()
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def graph_overview(graph) -> str:
|
||||
"""Render hypotheses + sources + pending leads as the strategist's
|
||||
primary decision view.
|
||||
|
||||
Annotates each hypothesis with the count of distinct sources that
|
||||
contribute supporting (positive-LR) edges. A hypothesis with many edges
|
||||
but only one source is a strategist signal to seek cross-source
|
||||
corroboration.
|
||||
"""
|
||||
lines: list[str] = ["# Investigation State", ""]
|
||||
|
||||
# Hypotheses table.
|
||||
if graph.hypotheses:
|
||||
lines.append(f"## Hypotheses ({len(graph.hypotheses)})")
|
||||
lines.append("")
|
||||
lines.append(
|
||||
"| id | title | L | conf | status | edges_in | distinct_sources | recent_flip |"
|
||||
)
|
||||
lines.append("|----|-------|---|------|--------|---------:|-----------------:|--------------|")
|
||||
# Sort by absolute log-odds magnitude descending so the strategist
|
||||
# sees the most decided hypotheses first; active ones float to the
|
||||
# middle of the table where decisions matter most.
|
||||
for hid, h in sorted(
|
||||
graph.hypotheses.items(),
|
||||
key=lambda kv: (kv[1].status != "active", -abs(kv[1].log_odds)),
|
||||
):
|
||||
in_edges = graph._adj_rev.get(hid, [])
|
||||
edges_in = len(in_edges)
|
||||
# Distinct sources contributing edges (looked up via source
|
||||
# phenomenon's source_id; entity→entity edges have no source).
|
||||
distinct_sources: set[str] = set()
|
||||
for e in in_edges:
|
||||
src_node = graph.phenomena.get(e.source_id)
|
||||
if src_node is not None and src_node.source_id:
|
||||
distinct_sources.add(src_node.source_id)
|
||||
# Did this hypothesis's status change in the last 2 rounds?
|
||||
recent = "no"
|
||||
recent_rounds = graph.investigation_rounds[-2:]
|
||||
for r in recent_rounds:
|
||||
before = r.hypothesis_status_snapshot_before.get(hid)
|
||||
after = r.hypothesis_status_snapshot_after.get(hid)
|
||||
if before and after and before != after:
|
||||
recent = f"yes ({before}→{after} in R{r.round_number})"
|
||||
break
|
||||
title = (h.title or "")[:60].replace("|", "/")
|
||||
lines.append(
|
||||
f"| {hid[:14]} | {title} | {h.log_odds:+.2f} | "
|
||||
f"{h.confidence:.2f} | {h.status} | {edges_in} | "
|
||||
f"{len(distinct_sources)} | {recent} |"
|
||||
)
|
||||
lines.append("")
|
||||
else:
|
||||
lines.append("## Hypotheses\n\n_(none yet — Phase 2 has not produced any)_\n")
|
||||
|
||||
# Sources table.
|
||||
if graph.case and graph.case.sources:
|
||||
lines.append(f"## Sources ({len(graph.case.sources)})")
|
||||
lines.append("")
|
||||
lines.append(
|
||||
"| id | type | phenomena | identities | last_touched_in_round |"
|
||||
)
|
||||
lines.append("|----|------|----------:|-----------:|----------------------|")
|
||||
for src in graph.case.sources:
|
||||
ph_count = sum(
|
||||
1 for p in graph.phenomena.values() if p.source_id == src.id
|
||||
)
|
||||
id_count = sum(
|
||||
1 for e in graph.entities.values()
|
||||
for i in e.identifiers
|
||||
if any(
|
||||
p.source_id == src.id
|
||||
for p in graph.phenomena.values()
|
||||
if p.id == i.get("phenomenon_id")
|
||||
)
|
||||
)
|
||||
# Latest round in which a tool invocation was made against this src.
|
||||
last_r = "—"
|
||||
for r in reversed(graph.investigation_rounds):
|
||||
if r.new_phenomena_count > 0:
|
||||
# Heuristic: if any phenomenon created during this round
|
||||
# was on this source, mark this round as the last touch.
|
||||
in_round = [
|
||||
p for p in graph.phenomena.values()
|
||||
if p.source_id == src.id
|
||||
and r.started_at <= p.created_at
|
||||
and (not r.completed_at or p.created_at <= r.completed_at)
|
||||
]
|
||||
if in_round:
|
||||
last_r = f"R{r.round_number}"
|
||||
break
|
||||
lines.append(
|
||||
f"| {src.id} | {src.type} | {ph_count} | {id_count} | {last_r} |"
|
||||
)
|
||||
lines.append("")
|
||||
|
||||
# Pending leads.
|
||||
pending = [l for l in graph.leads if l.status == "pending"]
|
||||
if pending:
|
||||
lines.append(f"## Pending Leads ({len(pending)})")
|
||||
lines.append("")
|
||||
lines.append("| id | from | target_agent | for_hypothesis | description |")
|
||||
lines.append("|----|------|--------------|----------------|-------------|")
|
||||
for l in pending[:20]:
|
||||
desc = (l.description or "")[:80].replace("|", "/")
|
||||
mh = l.motivating_hypothesis or l.hypothesis_id or "—"
|
||||
lines.append(
|
||||
f"| {l.id} | {l.proposed_by or '—'} | {l.target_agent} | "
|
||||
f"{mh[:14] if mh != '—' else '—'} | {desc} |"
|
||||
)
|
||||
if len(pending) > 20:
|
||||
lines.append(f"\n_(+{len(pending) - 20} more pending leads not shown)_")
|
||||
lines.append("")
|
||||
else:
|
||||
lines.append("## Pending Leads\n\n_(none — no investigations queued)_\n")
|
||||
|
||||
# Interpretation hint at the end, plain English.
|
||||
lines.append("---")
|
||||
lines.append(
|
||||
"**Interpretation hints**: A hypothesis with many edges but only one "
|
||||
"distinct_source has fragile cross-source independence — a single "
|
||||
"edge from a *different* source would do more for it than another "
|
||||
"edge from the same source (harmonic damping makes repeats cheap). "
|
||||
"Hypotheses in the active band (0.2 < conf < 0.8) are the ones a "
|
||||
"well-targeted lead can flip. recent_flip = 'yes' means belief is "
|
||||
"still moving on that hypothesis; 'no' across 2 rounds suggests "
|
||||
"stability."
|
||||
)
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# source_coverage(source_id)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def source_coverage(graph, source_id: str) -> str:
|
||||
"""Render which expected artefact categories have been touched on
|
||||
*source_id*, and which remain ✗.
|
||||
|
||||
Output is markdown. The closing paragraph reminds the strategist that
|
||||
coverage hints are heuristics — investigate ✗ items only when an active
|
||||
hypothesis depends on them. This is the design's central guardrail
|
||||
against the system devolving into a fixed forensic checklist.
|
||||
"""
|
||||
src = graph.case.get_source(source_id) if graph.case else None
|
||||
if src is None:
|
||||
return f"Error: source_id {source_id!r} not found in case."
|
||||
|
||||
key = _key_for_source(src)
|
||||
expected = EXPECTED_ARTEFACTS.get(key, [])
|
||||
|
||||
# Collect this source's invocation history.
|
||||
invs = [
|
||||
inv for inv in graph.tool_invocations.values()
|
||||
if inv.source_id == source_id
|
||||
]
|
||||
|
||||
# For each expected category, decide ✓ / ✗ + show example invocation if ✓.
|
||||
rows: list[tuple[str, str, str, str]] = []
|
||||
for entry in expected:
|
||||
name = entry["name"]
|
||||
detector = entry["detector"]
|
||||
value_for = entry["value_for"]
|
||||
matched: str | None = None
|
||||
for inv in invs:
|
||||
args_str = ""
|
||||
try:
|
||||
args_str = " ".join(f"{k}={v}" for k, v in (inv.args or {}).items())
|
||||
except Exception:
|
||||
args_str = str(inv.args)
|
||||
if _detector_matches(detector, inv.tool, args_str):
|
||||
matched = f"{inv.tool}({args_str[:60]})"
|
||||
break
|
||||
mark = "✓" if matched else "✗"
|
||||
evidence = matched or "—"
|
||||
rows.append((mark, name, evidence, value_for))
|
||||
|
||||
lines: list[str] = [
|
||||
f"# Coverage of source `{source_id}` ({src.label})",
|
||||
"",
|
||||
f"Source type: `{src.type}` / access_mode: `{src.access_mode}`",
|
||||
f"Invocations made against this source: **{len(invs)}**",
|
||||
"",
|
||||
]
|
||||
if not expected:
|
||||
lines.append(
|
||||
f"_(no expected-artefact catalogue entry for source type `{key}` — "
|
||||
"coverage cannot be assessed against a baseline)_"
|
||||
)
|
||||
else:
|
||||
lines.append(
|
||||
"| ✓/✗ | category | example invocation | what it would tell us |"
|
||||
)
|
||||
lines.append("|-----|----------|---------------------|------------------------|")
|
||||
for mark, name, evidence, value_for in rows:
|
||||
lines.append(
|
||||
f"| {mark} | {name} | {evidence[:70].replace('|','/')} | {value_for} |"
|
||||
)
|
||||
n_covered = sum(1 for r in rows if r[0] == "✓")
|
||||
n_total = len(rows)
|
||||
lines.append("")
|
||||
lines.append(f"Coverage: **{n_covered}/{n_total}** ({n_covered*100//max(n_total,1)}%)")
|
||||
|
||||
# Other invocations on this source that didn't match any expected entry —
|
||||
# could be genuine novel exploration; strategist might want to know.
|
||||
lines.append("")
|
||||
lines.append("---")
|
||||
lines.append(
|
||||
"**Coverage hints are heuristics, not requirements.** Skip an item if "
|
||||
"the case theory makes it irrelevant — a financial-fraud case has no "
|
||||
"reason to OCR every photo. Investigate ✗ items only when they could "
|
||||
"materially affect an active hypothesis. If you propose a lead just "
|
||||
"because something is ✗, the strategist prompt is being misused."
|
||||
)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# marginal_yield(last_n_rounds)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def marginal_yield(graph, last_n_rounds: int = 2) -> str:
|
||||
"""Render the last N investigation rounds' yield deltas.
|
||||
|
||||
Yield columns:
|
||||
- new_phenomena: phenomena created during the round
|
||||
- new_edges: edges (any direction) added during the round
|
||||
- status_flips: hypotheses whose status changed during the round
|
||||
|
||||
A row of zeros means that round didn't move the graph. Two consecutive
|
||||
such rows is strong evidence of diminishing returns; the strategist
|
||||
should consider declare_investigation_complete with reason
|
||||
marginal_yield_zero.
|
||||
"""
|
||||
rounds = [r for r in graph.investigation_rounds if r.completed_at]
|
||||
if not rounds:
|
||||
return (
|
||||
"# Marginal Yield\n\n"
|
||||
"_(no completed investigation rounds yet — yield not applicable)_"
|
||||
)
|
||||
recent = rounds[-max(1, last_n_rounds):]
|
||||
lines = [f"# Marginal Yield (last {len(recent)} of {len(rounds)} rounds)", ""]
|
||||
lines.append("| round | new_phenomena | new_edges | status_flips |")
|
||||
lines.append("|-------|--------------:|----------:|-------------:|")
|
||||
yields: list[tuple[int, int, int]] = []
|
||||
for r in recent:
|
||||
yields.append((r.new_phenomena_count, r.new_edges_count, r.status_flips))
|
||||
lines.append(
|
||||
f"| R{r.round_number} | {r.new_phenomena_count} | "
|
||||
f"{r.new_edges_count} | {r.status_flips} |"
|
||||
)
|
||||
|
||||
# Trend interpretation aid.
|
||||
lines.append("")
|
||||
if all(y == (0, 0, 0) for y in yields):
|
||||
trend = (
|
||||
"Yield is zero across these rounds — diminishing returns are "
|
||||
"confirmed. Strongly consider declare_investigation_complete "
|
||||
"(reason: marginal_yield_zero)."
|
||||
)
|
||||
elif len(yields) >= 2:
|
||||
first = yields[0][0] + yields[0][1] + yields[0][2]
|
||||
last = yields[-1][0] + yields[-1][1] + yields[-1][2]
|
||||
if last == 0 and first > 0:
|
||||
trend = (
|
||||
"Yield collapsed to zero in the most recent round. One more "
|
||||
"well-targeted probe is reasonable; another zero-yield round "
|
||||
"after that means stop."
|
||||
)
|
||||
elif last < first / 2 and first > 0:
|
||||
trend = (
|
||||
f"Decelerating ({last}/{first} ≈ "
|
||||
f"{int(100*last/first)}% of the earlier round). Diminishing "
|
||||
"returns are accumulating."
|
||||
)
|
||||
else:
|
||||
trend = "Yield is still active — further investigation is paying off."
|
||||
else:
|
||||
trend = (
|
||||
"Only one completed round — too early to call a trend. Run at "
|
||||
"least one more before considering completion."
|
||||
)
|
||||
lines.append(f"**Trend**: {trend}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# budget_status()
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def budget_status(graph, budgets: dict[str, Any] | None, start_time: float | None) -> str:
|
||||
"""Render budget usage against config.yaml `budgets` block.
|
||||
|
||||
Counters:
|
||||
- tool_calls: len(graph.tool_invocations)
|
||||
- strategist_rounds: len(graph.investigation_rounds)
|
||||
- wall_clock_minutes: now - start_time (when start_time is supplied)
|
||||
"""
|
||||
budgets = budgets or {}
|
||||
tool_calls_used = len(graph.tool_invocations)
|
||||
rounds_used = len(graph.investigation_rounds)
|
||||
minutes_used: float | None = None
|
||||
if start_time is not None:
|
||||
minutes_used = (time.monotonic() - start_time) / 60.0
|
||||
|
||||
def _row(name: str, used: float, cap: Any) -> str:
|
||||
if cap is None:
|
||||
return f"| {name} | {used:g} | — | (unbounded) |"
|
||||
pct = (used / cap) * 100 if cap else 0
|
||||
return f"| {name} | {used:g} | {cap} | {pct:.0f}% |"
|
||||
|
||||
lines = ["# Budget Status", ""]
|
||||
lines.append("| metric | used | cap | pct |")
|
||||
lines.append("|--------|-----:|----:|----:|")
|
||||
lines.append(_row("tool_calls", tool_calls_used, budgets.get("tool_calls_total")))
|
||||
lines.append(_row("strategist_rounds", rounds_used, budgets.get("strategist_rounds_max")))
|
||||
if minutes_used is not None:
|
||||
lines.append(_row(
|
||||
"wall_clock_minutes", round(minutes_used, 1),
|
||||
budgets.get("wall_clock_minutes_max"),
|
||||
))
|
||||
|
||||
# Pacing hint.
|
||||
lines.append("")
|
||||
flags = []
|
||||
cap_calls = budgets.get("tool_calls_total")
|
||||
cap_rounds = budgets.get("strategist_rounds_max")
|
||||
if cap_calls and tool_calls_used / cap_calls >= 0.9:
|
||||
flags.append("tool_calls budget ≥ 90% used — favour declare_complete")
|
||||
if cap_rounds and rounds_used / cap_rounds >= 0.7:
|
||||
flags.append("strategist rounds ≥ 70% used — only propose leads with high expected yield")
|
||||
if flags:
|
||||
lines.append("**Budget warnings**:")
|
||||
for f in flags:
|
||||
lines.append(f"- {f}")
|
||||
else:
|
||||
lines.append(
|
||||
"Budget room remains. Standard rule: each propose_lead should "
|
||||
"name a specific hypothesis it expects to move; otherwise skip it."
|
||||
)
|
||||
return "\n".join(lines)
|
||||
50
uv.lock
generated
50
uv.lock
generated
@@ -170,6 +170,8 @@ source = { virtual = "." }
|
||||
dependencies = [
|
||||
{ name = "httpx", extra = ["socks"] },
|
||||
{ name = "openai" },
|
||||
{ name = "pillow" },
|
||||
{ name = "pytesseract" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "regipy" },
|
||||
]
|
||||
@@ -184,6 +186,8 @@ dev = [
|
||||
requires-dist = [
|
||||
{ name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
|
||||
{ name = "openai", specifier = ">=2.36.0" },
|
||||
{ name = "pillow", specifier = ">=12.2.0" },
|
||||
{ name = "pytesseract", specifier = ">=0.3.13" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "regipy", specifier = ">=6.2.1" },
|
||||
]
|
||||
@@ -222,6 +226,39 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pillow"
|
||||
version = "12.2.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pluggy"
|
||||
version = "1.6.0"
|
||||
@@ -296,6 +333,19 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pytesseract"
|
||||
version = "0.3.13"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "packaging" },
|
||||
{ name = "pillow" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pytest"
|
||||
version = "9.0.2"
|
||||
|
||||
Reference in New Issue
Block a user