diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..731c450 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,305 @@ +# MASForensics 系统改造设计 + +> 目标:把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、 +> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档 +> (已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿)。 +> +> 触发本次改造的实际案件:2025 美亚杯资格赛 Individual —— 5 份证据 +> (1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图), +> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。 + +--- + +## 1. 设计原则(贯穿全文的不变式) + +1. **LLM 提议,代码裁决**。LLM 负责语言/分类/感知;它**不持有案件状态、 + 不产出数值、不写入未经核验的事实**。所有「真相」在符号层。 +2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。 +3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中; + 支持一种新设备 = 写插件,绝不改核心。 +4. **看似不可逆的操作(如实体归并)实为可逆、带证据的论断**,可被推翻。 + +这四条不是口号——下文每个设计决策都对应其中一条。 + +--- + +## 2. 现状问题诊断 + +| # | 问题 | 位置 | 后果 | +|---|---|---|---| +| P1 | **单镜像假设深植**:工具是闭包绑死 `image_path`,图是单源,主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools`、`main.py:91-153` | 无法摄取多份证据,无法跨设备关联 | +| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话,错误事实进入案件记录且**事后无法识别** | +| P3 | **置信度公式无统计含义且有序依赖缺陷**:`delta=weight*(1-conf)`(正)/`weight*conf`(负),正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 | +| P4 | **工件分类是 Windows 专属**:靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` | +| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 | +| P6 | **镜像发现靠扩展名 glob**,`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 | +| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备,跨源关联无锚点 | + +改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。 + +--- + +## 3. 目标架构 + +``` +case.yaml ──► Case ──► N × EvidenceSource + ├ id / type / owner / path + └ access_mode: image | tree + │ + ┌──────────────┴───────────────┐ + image-backed tree-backed + (TSK, inode 寻址) (路径寻址:已挂载/已解包) + │ │ + └────────────┬─────────────────┘ + ▼ + SourceRegistry ── source_id → SourceHandle(解析 path/offset/mode) + │ + ToolRegistry ── 工具按 access_mode 注册,调用时绑定 source_id + │ + ┌──────────────────────┼───────────────────────┐ + ▼ ▼ ▼ + Knowledge-Source Graph Write Gateway ToolInvocationLog + Agents (LLM) ──► (唯一写入口,强制 (每次工具调用留痕: + 只能经网关写图 前置条件 = grounding) args / 输出 / sha256) + │ │ + └──────────────────────┴──► Grounded Evidence Graph (GEG) + Phenomenon / Hypothesis / Entity + 置信度 = 对数几率累加 +``` + +**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、 +`AgentFactory` 动态组合——这些设计是好的,不重写,只适配。 + +--- + +## 4. 核心设计 + +### 4.1 证据源抽象(解决 P1/P5/P6/P7,地基) + +新增 `case.py`: + +- **`EvidenceSource`** 数据类:`id`、`label`、`type`、`owner`(关联人)、 + `path`、`access_mode`、`meta`(类型特定,如分区 offset / 解包后根目录)。 +- **`Case`**:持有 `list[EvidenceSource]` + 案件元数据,从 `case.yaml` 加载。 +- **`access_mode` 是关键设计区分**: + - `image`:块设备/磁盘镜像,用 TSK 按 inode 寻址(USB E01、安卓 `blk0_sda` 各分区)。 + - `tree`:已挂载文件系统或已解包目录,按路径寻址(iOS 提取解压后、归档展开后)。 + - 工具按 access_mode 分族注册(见 4.2)。一份证据可经「准备」从 image 变为 tree + (如分区 mount、zip 解包)。 + +`main.py` 的 `select_image_interactive`(:91-153)改为加载/构造 `Case`; +`_IMAGE_GLOBS` 改为类型探测(`mmls` 试探 + 文件头嗅探),不再靠扩展名。 +`config.yaml` 删除 `cfreds_hacking_case`,案件信息移入 `case.yaml`。 + +### 4.2 工具注册按源参数化(解决 P1) + +现状:`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具 +(`tool_registry.py:159+`)。改造: + +- 工具执行器签名增加 `source_id`;执行时经 `SourceRegistry` 解析出真实 path/offset/mode。 +- `TOOL_CATALOG` 按 `access_mode` 标注工具适用性;agent 拿到的工具集由其 + 负责的源类型决定。 +- **「当前源」上下文**:编排器为 agent 设置 current source(类比现有 + `graph._current_agent`),工具默认作用于它——LLM 不必每次传 `source_id` + (减少出错)。跨源工具(时间线合并、实体查询)显式跨源。 +- 缓存键 `_cache_key`(`tool_registry.py:41`)纳入 `source_id`,防止跨源串味。 + +### 4.3 图写入网关(解决 P2,落实原则 1) + +现状:agent 通过 `add_phenomenon` 等工具直接写图,约束只在 prompt。改造: + +- 所有图变更(`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …) + 收敛到**一个写入网关**。网关在代码层强制前置条件。 +- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流 + (INVESTIGATE→RECORD→LINK→ANSWER)不变——变的是 RECORD 这一步底下的网关变严。 +- `base_agent.py` 的 `mandatory_record_tools` 机制保留(它保证 agent 真的记录了东西)。 + +### 4.4 证据落地约束 Grounding(解决 P2,落实原则 2) + +这是系统可靠性的核心机制。 + +**ToolInvocationLog**:每次工具调用留痕一条记录 +`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`。 +现有结果缓存(`tool_registry.py:29`)已存确定性输出,扩展为完整留痕即可。 + +**Phenomenon 一分为二**——把「事实」和「解读」分开: + +- `verified_facts`: `list[{type, value, invocation_id}]`, + `type ∈ {path, timestamp, inode, hash, identifier, count, ...}`。 +- `interpretation`: 自由文本,agent 的分析叙述。 + +**`add_phenomenon` 网关前置条件**: + +1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`。 +2. 代码校验 `fact.value` 命中该次调用的输出: + - 文本输出 → 逐字 substring 匹配; + - 结构化/二进制工具输出 → 与解析后的字段匹配。 +3. 任一 fact 不通过 → **整条拒绝写入**,返回失败的 fact,agent 须修正重试。 +4. 通过 → 写入;`verified_facts` 每条带 `invocation_id`(可重跑复核), + `interpretation` 标记为「未核验分析」。 + +**效果**:在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」 +**结构性地不可能**。LLM 仍可能写错 `interpretation`,但报告会把 +verified facts(带重跑指令的引证)与 interpretation(明确标注的分析) +**分开渲染**,人类调查员一眼可辨。这是诚实划定边界的可靠性保证。 + +> 现有 `_make_auto_record`(`tool_registry.py:126`)把工具输出直接转 phenomenon—— +> 那是「平凡落地」的特例(描述即输出),新设计是它的一般化与形式化。 + +### 4.5 假设置信度:似然比 / 对数几率(解决 P3) + +把 `evidence_graph.py:26` 的 `_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」 +换成基于**似然比(LR)**的对数几率累加: + +- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类** + (这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …), + 数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**(延续现有「LLM 选类型、 + 代码算数值」哲学并赋予统计基础)。 +- 置信度更新: + ``` + L_post = L_prior + Σ log₁₀(LR_i) # 对数几率,可交换 → 无序依赖 + confidence = 1 / (1 + 10^(−L_post)) + ``` +- 边类型 → `log₁₀(LR)` 标定表(初值,后续可由标注案例校准): + + | 边类型 | log₁₀LR | + |---|---:| + | `direct_evidence` | +2.0 | + | `supports` / `consequence_observed` | +1.0 | + | `prerequisite_met` | +0.5 | + | `weakens` | −0.5 | + | `contradicts` | −2.0 | + +- 阈值不变(≥0.8 supported / ≤0.2 refuted),只是改由 `L_post` 推出。 +- `prior_prob` 成为可配置量(默认 0.5 → `L_prior=0`)。 +- **简化假设说明**:多条边按独立处理(朴素贝叶斯)。同类证据反复出现并非 + 完全独立——加一个旋钮:同 `(hypothesis, edge_type)` 的边数封顶或衰减,避免 + 「同一发现被多 agent 重复入图」虚高置信度(现有 Jaccard 去重已部分缓解)。 + +附带产出一个 **假设 × 证据矩阵**视图,供报告与线索选择使用。 + +### 4.6 跨源实体解析(解决「复杂场景」的关联难题,落实原则 4) + +复杂取证的核心难题:iPhone keychain 里的 Apple ID、安卓短信库里的号码、 +USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人?** + +**关键设计:「身份共指」本身就是一条假设**——于是实体解析不是独立子系统, +而是 4.5 假设机制的复用: + +- agent 观察到标识符即经网关 `observe_identity`,记一条**类型化**的标识符 + (强标识符:IMEI / 钱包地址 / email / 电话号;弱标识符:昵称 / 显示名), + 挂到暂定 `Entity`。 +- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`;共享强标识符 = 强 +LR 边, + 共享弱标识符 = 弱 +LR 边,冲突的强标识符 = 强 −LR 边——用 4.5 同一套计算打分。 +- **不做破坏性归并**:跨阈值时在两个 Entity 间加一条 `same_as` 边(由该 coref + 假设背书)。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、 + 可被后续 contradicts 证据推翻**(落实原则 4)。 +- **Blocking**:只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设, + 避免 O(n²)。 + +跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。 + +### 4.7 能力插件层(接入 5 类证据) + +每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。 + +| 插件 | 摄取 | 新工具 | 知识源 agent | +|---|---|---|---| +| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` | +| **安卓整盘** | `mmls` 分区→各分区 `image` 源;可 mount 为 `tree` | 复用 TSK;ext4/F2FS 读取;`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` | +| **磁盘镜像(E01)** | 已支持(TSK 含 ewf) | 现有 TSK 工具链 | 现有 filesystem/registry | +| **归档** | `unzip_archive` 通用解包 | —— | —— | +| **媒体/截图** | —— | `ocr_image`(tesseract;注意 DeepSeek 无视觉能力,必须走 OCR) | `MediaAgent` | + +**安卓风险**:`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区 +探明:未加密→TSK 直接用;加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。 + +`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展:分类由源插件提供自己的 +工件分类表,而非全局 Windows 关键词表(解决 P4)。 + +### 4.8 Agent 体系重组 + +现有 7 个 agent 按 Windows 工件命名(registry、communication=邮件/IRC、 +network=浏览器/PCAP)。改为按**调查职能**组织,并增加平台特定 agent: + +- `agent_factory.py` 的 `_AGENT_CLASSES`(:34-40)扩充:新增 `ios_artifact`、 + `android_artifact`、`financial`(钱包/交易)、`media`。 +- `communication` 泛化:邮件 + IM + 短信,跨平台。 +- 新增 **源类型 → 适任 agent** 映射,供 Phase 1 逐源派 triage agent。 +- `create_specialized_agent`(:69)的动态组合机制保留——它本就是应对能力缺口的 + 正确手段,只是工具目录变大后选择空间更丰富。 + +### 4.9 编排器多源流水线 + +| 阶段 | 改造 | +|---|---| +| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**,每源派类型适配的 agent | +| Phase 2 | 假设跨源生成;身份共指假设在此首次登记 | +| Phase 3 | leads 派发到源感知 agent;假设×证据矩阵实时更新 | +| Phase 4 | 跨源时间线合并,**按源做时区归一**(iOS UTC vs 安卓本地时间) | +| Phase 5 | 一案一份综合报告:含假设结论、实体关联图、每条结论的 provenance 引证 | + +断连恢复、运行归档逻辑保留,`graph_state.json` 增量纳入新字段。 + +--- + +## 5. 数据模型变更汇总 + +| 节点/结构 | 变更 | +|---|---| +| `EvidenceSource` | **新增**一等节点(`src-*`) | +| `ToolInvocation` | **新增**留痕记录(`inv-*`),随 graph 持久化 | +| `Phenomenon` | + `source_id`;description 拆为 `verified_facts[]` + `interpretation`;澄清/移除语义含混的 `confidence`(默认 1.0),观测的可靠性由 grounding 表达 | +| `Hypothesis` | + `prior_prob`、`log_odds`(累加量);`confidence` 改为派生值 | +| `Entity` | + 类型化标识符集合;通过 `same_as` 边跨源连通 | +| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS`) | +| Entity→Entity 边 | **新增** `same_as`(由 coref 假设背书,可逆) | + +`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。 + +--- + +## 6. 组件改动清单 + +| 文件 | 改动 | +|---|---| +| `case.py` | **新建**:`Case` / `EvidenceSource` / `SourceRegistry` | +| `main.py` | 选源逻辑改为加载 `Case`;类型探测替代扩展名 glob | +| `tool_registry.py` | 工具按 `source_id` 参数化;缓存键含 source;`_auto_categorize` 改可扩展;`ToolInvocationLog` | +| `evidence_graph.py` | 数据模型变更(第 5 节);LR/对数几率置信度;写入网关 + grounding 校验 | +| `base_agent.py` | RECORD 走网关;`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 | +| `agent_factory.py` | `_AGENT_CLASSES` 扩充;源类型→agent 映射 | +| `orchestrator.py` | Phase 1 逐源;Phase 4 跨源时区归一;Phase 5 综合报告 | +| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py`;`communication.py` 泛化 | +| `tools/` | 新增 `mobile_ios.py`(plist/sqlite/keychain)、`media.py`(OCR)、`archive.py`(解包) | +| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`;新建 `case.yaml` 证据清单 | + +--- + +## 7. 构建顺序(按依赖排序) + +| 阶段 | 内容 | 依赖 | 价值 | +|---|---|---|---| +| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基;先只在 USB E01 上跑通验证不破坏现有逻辑 | +| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心;可量化「零幻觉录入」 | +| **S3** | 4.5 LR/对数几率置信度 | 独立(可与 S2 并行) | 修 P3;置信度可辩护 | +| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 | +| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联,复杂场景能力成型 | +| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 | + +S1+S2+S3 是「把系统改对」;S4-S6 是「把能力铺全」。建议严格按序—— +S1 不稳,后面全是空中楼阁。 + +--- + +## 8. 设计取舍与未决问题 + +1. **grounding 对自由文本的边界**:只硬核验 `verified_facts` 里的结构化原子, + `interpretation` 不做逐字核验(诚实划界)。可加一个二级 lint:扫描 + interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。 +2. **LR 标定表初值人定**:先用第 4.5 节的初值跑通;「从标注案例学习 LR」是后续工作。 +3. **安卓 userdata 加密**:能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。 +4. **实体解析的破坏性 vs 可逆**:本设计选**可逆的 `same_as` 边**而非破坏性归并—— + 牺牲一点查询效率换取完全可审计可回滚,符合原则 4。 +5. **报告粒度**:定为「一案一份综合报告」,内嵌每证据小节 + 跨源关联, + 而非每证据独立成篇。 diff --git a/agent_factory.py b/agent_factory.py index 18582f6..b7a1492 100644 --- a/agent_factory.py +++ b/agent_factory.py @@ -24,9 +24,12 @@ def _load_agent_classes() -> None: """Lazy-import agent classes to avoid circular imports.""" if _AGENT_CLASSES: return + from agents.android_artifact import AndroidArtifactAgent from agents.communication import CommunicationAgent from agents.filesystem import FileSystemAgent from agents.hypothesis import HypothesisAgent + from agents.ios_artifact import IOSArtifactAgent + from agents.media import MediaAgent from agents.network import NetworkAgent from agents.registry import RegistryAgent from agents.report import ReportAgent @@ -38,6 +41,50 @@ def _load_agent_classes() -> None: _AGENT_CLASSES["timeline"] = TimelineAgent _AGENT_CLASSES["hypothesis"] = HypothesisAgent _AGENT_CLASSES["report"] = ReportAgent + _AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent + _AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent + _AGENT_CLASSES["media"] = MediaAgent + + +# Triage agent per (source.type, platform). disk_image is ambiguous on its +# own — both a Windows USB image and an Android raw dump are disk_image — +# so the routing helper also looks at source.meta.platform when present. +SOURCE_TYPE_AGENTS: dict[str, str] = { + "disk_image": "filesystem", # default for unknown platform + "mobile_extraction": "ios_artifact", + "archive": "filesystem", + "media_collection": "media", +} + +# Per-platform overrides for disk_image sources. Keys come from +# source.meta.platform in case.yaml (lowercased). +_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = { + "windows": "filesystem", + "linux": "filesystem", + "android": "android_artifact", + "ios": "ios_artifact", +} + + +def get_triage_agent_type(source) -> str: + """Pick the right Phase-1 agent for *source*. + + Accepts either an :class:`EvidenceSource` or a raw source.type string + (for back-compat with the S5 signature). Disk-image sources additionally + consult ``source.meta.platform`` so Windows USBs and Android raw dumps — + both type=disk_image — get different agents. + """ + # Back-compat: accept a plain type string. + if isinstance(source, str): + return SOURCE_TYPE_AGENTS.get(source, "filesystem") + + src_type = getattr(source, "type", "disk_image") + if src_type == "disk_image": + meta = getattr(source, "meta", {}) or {} + platform = str(meta.get("platform", "")).lower() + if platform in _DISK_IMAGE_PLATFORM_AGENTS: + return _DISK_IMAGE_PLATFORM_AGENTS[platform] + return SOURCE_TYPE_AGENTS.get(src_type, "filesystem") logger = logging.getLogger(__name__) diff --git a/agents/android_artifact.py b/agents/android_artifact.py new file mode 100644 index 0000000..be5e6b8 --- /dev/null +++ b/agents/android_artifact.py @@ -0,0 +1,58 @@ +"""Android Artifact Agent — multi-partition analysis of raw Android dumps. + +DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is +its own analysable surface. Ext4-backed partitions (typically SYSTEM, +USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw +partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with +``search_strings``. Userdata is the prize and is often FBE-encrypted on +modern devices — the agent must check fsstat before assuming readability +(see ``probe_android_partitions`` for the survey). +""" + +from __future__ import annotations + +from base_agent import BaseAgent +from evidence_graph import EvidenceGraph +from llm_client import LLMClient +from tool_registry import TOOL_CATALOG + + +class AndroidArtifactAgent(BaseAgent): + name = "android_artifact" + role = ( + "Android forensic analyst. You navigate raw Android disk dumps " + "(blk0_sda-style images) partition by partition. Workflow: call " + "probe_android_partitions ONCE to map the disk; pick the partitions " + "with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, " + "EFS); for each, call set_active_partition(offset_from_512_sector_column) " + "and then list_directory / extract_file / search_strings as usual. " + "For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly " + "to search_strings — they have no filesystem. If USERDATA shows " + "fs_type=unknown it is almost certainly FBE-encrypted: record that " + "as a negative finding (the absence IS evidence) and move on to " + "what's reachable." + ) + + def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None: + super().__init__(llm, graph) + self._register_tools() + + def _register_tools(self) -> None: + tool_names = [ + # Android-specific + "probe_android_partitions", + "set_active_partition", + # Reused TSK toolset — partition_offset comes from active_source + "partition_info", "filesystem_info", "list_directory", + "extract_file", "find_file", "search_strings", + "count_deleted_files", "build_filesystem_timeline", + # Generic parsers + "read_text_file", "read_binary_preview", "search_text_file", + "read_text_file_section", "list_extracted_dir", "find_files", + # SQLite — Android apps store data in sqlite too (WhatsApp, etc.) + "sqlite_tables", "sqlite_query", + ] + for name in tool_names: + td = TOOL_CATALOG.get(name) + if td: + self.register_tool(td.name, td.description, td.input_schema, td.executor) diff --git a/agents/ios_artifact.py b/agents/ios_artifact.py new file mode 100644 index 0000000..9e4fe38 --- /dev/null +++ b/agents/ios_artifact.py @@ -0,0 +1,49 @@ +"""iOS Artifact Agent — analyses unpacked iOS extractions. + +DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family +the system handles (alongside disk images and pcaps). This agent owns the +iOS-specific toolset; the grounded ``add_phenomenon`` contract from +BaseAgent applies unchanged — every fact must cite a tool invocation. +""" + +from __future__ import annotations + +from base_agent import BaseAgent +from evidence_graph import EvidenceGraph +from llm_client import LLMClient +from tool_registry import TOOL_CATALOG + + +class IOSArtifactAgent(BaseAgent): + name = "ios_artifact" + role = ( + "iOS forensic analyst. You analyse unpacked iOS extractions — " + "binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, " + "AddressBook.sqlitedb), the keychain (keychain-2.db), and the " + "iDevice_info.txt summary — to extract device identity, accounts, " + "messaging, contacts, and credential metadata. Domain-rooted iOS " + "trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) " + "are your map; navigate by path, not by inode." + ) + + def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None: + super().__init__(llm, graph) + self._register_tools() + + def _register_tools(self) -> None: + tool_names = [ + # navigation — find_files is the workhorse on 10k+-file iOS trees; + # list_extracted_dir is for initial layout summary only. + "list_extracted_dir", "find_files", + "read_text_file", "read_text_file_section", "read_binary_preview", + "search_text_file", + # iOS-specific parsers + "parse_plist", + "sqlite_tables", "sqlite_query", + "parse_ios_keychain", + "read_idevice_info", + ] + for name in tool_names: + td = TOOL_CATALOG.get(name) + if td: + self.register_tool(td.name, td.description, td.input_schema, td.executor) diff --git a/agents/media.py b/agents/media.py new file mode 100644 index 0000000..018127f --- /dev/null +++ b/agents/media.py @@ -0,0 +1,52 @@ +"""Media Agent — OCR-based analysis of screenshot/photo evidence. + +DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG +evidence must go through tesseract first. The agent runs OCR, then +records extracted strings — especially identifiers (wallet addresses, +phone numbers, usernames) — via the grounded observe_identity gateway so +they participate in cross-source coref the same way iOS keychain entries +or Windows account names do. + +If the OCR runtime is missing on the host, ocr_image returns an explicit +install hint; the agent should record that as a negative finding ("no +text extracted — tesseract not installed") rather than guessing. +""" + +from __future__ import annotations + +from base_agent import BaseAgent +from evidence_graph import EvidenceGraph +from llm_client import LLMClient +from tool_registry import TOOL_CATALOG + + +class MediaAgent(BaseAgent): + name = "media" + role = ( + "Media / OCR forensic analyst. You analyse screenshots, photos, and " + "scanned documents — any pixel-based evidence the LLM cannot read " + "directly. Workflow: list_extracted_dir to enumerate images, " + "ocr_image on each promising one, then add_phenomenon (with the " + "OCR'd text as the verified_fact value) and observe_identity for " + "any wallet addresses, phone numbers, email addresses, or " + "usernames the text contains. If OCR fails because tesseract is " + "missing, RECORD that as a negative finding instead of fabricating " + "image content — the absence is a real fact about this run." + ) + + def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None: + super().__init__(llm, graph) + self._register_tools() + + def _register_tools(self) -> None: + tool_names = [ + "ocr_image", + "list_extracted_dir", "find_files", + "read_binary_preview", + "read_text_file", + "search_text_file", + ] + for name in tool_names: + td = TOOL_CATALOG.get(name) + if td: + self.register_tool(td.name, td.description, td.input_schema, td.executor) diff --git a/agents/report.py b/agents/report.py index 2df76ea..41d7d21 100644 --- a/agents/report.py +++ b/agents/report.py @@ -12,9 +12,20 @@ class ReportAgent(BaseAgent): role = ( "Forensic report writer. You synthesize all findings from the investigation " "into a structured, professional forensic analysis report organized by hypotheses.\n\n" - "Only include findings that have a source_tool attribution (marked VERIFIED). " - "If evidence lacks source attribution, mark it as UNVERIFIED. " - "Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence." + "Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), " + "TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When " + "writing the report, render verified_facts as primary evidence with their " + "invocation citations, and render interpretation as 'agent analysis' so the " + "reader can tell ground truth from inference. Do NOT invent or fabricate any " + "data, timestamps, or findings not present in the evidence.\n\n" + "This is a cross-source case: phenomena come from multiple evidence " + "sources, and entities discovered on different sources may refer to the " + "same real-world actor. ALWAYS include:\n" + " - 'Findings by Source' section sourced from get_phenomena_by_source\n" + " - 'Actor Clusters' section sourced from get_actor_clusters (the " + "cross-source attribution view — multi-source clusters answer " + "'which findings on different devices belong to the same person')\n" + " - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix" ) # Calling save_report is BOTH the recording action and the completion # signal. tool_call_loop returns the moment save_report executes; the @@ -38,9 +49,12 @@ class ReportAgent(BaseAgent): f"Investigation state:\n{self.graph.stats_summary()}\n\n" f"Your task: {task}\n\n" f"WORKFLOW:\n" - f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities, get_case_info " - f" to gather all the data needed for the report. Make these calls in parallel.\n" - f"2. Assemble the complete markdown forensic report.\n" + f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n" + f" get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n" + f" and get_phenomena_by_source in parallel — these are the eight data\n" + f" sources you assemble the report from.\n" + f"2. Assemble the complete markdown forensic report. Cross-source\n" + f" actor clusters and per-source breakdown are MANDATORY sections.\n" f"3. Call save_report(content=, output_path=\"report.md\").\n" f" This single call is the completion signal — the run ENDS the moment it executes.\n" f" Do NOT call any read tools after this point; they will not run.\n" @@ -83,6 +97,45 @@ class ReportAgent(BaseAgent): executor=self._get_entities, ) + self.register_tool( + name="get_hypothesis_evidence_matrix", + description=( + "Render the hypothesis × evidence pivot as a markdown table. " + "Columns: per edge_type counts, log_odds, confidence, status. " + "Embed this directly in the report to show how each hypothesis " + "stands relative to the others on a single screen." + ), + input_schema={"type": "object", "properties": {}}, + executor=self._get_hypothesis_evidence_matrix, + ) + + self.register_tool( + name="get_actor_clusters", + description=( + "Render the cross-source actor clusters: each cluster is the " + "set of Entity nodes the system currently treats as the same " + "actor (via active same_as edges backed by coref hypotheses " + "≥ 0.8). Includes the aggregated identifier evidence per " + "cluster. Use this in the report's 'Entities / Actors' " + "section so readers see who-is-who across devices, not just " + "raw entity rows." + ), + input_schema={"type": "object", "properties": {}}, + executor=self._get_actor_clusters, + ) + + self.register_tool( + name="get_phenomena_by_source", + description=( + "Group every phenomenon by its originating evidence source " + "(source_id). Use this to drive the report's 'Findings by " + "Source' section so each evidence item's per-device " + "contribution is auditable." + ), + input_schema={"type": "object", "properties": {}}, + executor=self._get_phenomena_by_source, + ) + self.register_tool( name="save_report", description="Save the final report to a file.", @@ -115,12 +168,24 @@ class ReportAgent(BaseAgent): items = [ph for ph in phenomena.values() if ph.category == cat] lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---") for ph in items: - verified = "VERIFIED" if ph.source_tool else "UNVERIFIED" - lines.append(f"\n[{verified}] {ph.title} ({ph.id})") + # Grounded = at least one verified fact AND a source_tool. + grounded = bool(ph.verified_facts) and bool(ph.source_tool) + marker = "GROUNDED" if grounded else ( + "TOOL-ONLY" if ph.source_tool else "UNVERIFIED" + ) + lines.append(f"\n[{marker}] {ph.title} ({ph.id})") lines.append(f" Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}") if ph.timestamp: lines.append(f" Timestamp: {ph.timestamp}") - lines.append(f" {ph.description[:500]}") + if ph.verified_facts: + lines.append(f" Verified facts ({len(ph.verified_facts)}):") + for f in ph.verified_facts: + lines.append( + f" - [{f.get('type','?')}] {str(f.get('value',''))[:200]} " + f"(cite: {f.get('invocation_id','?')})" + ) + if ph.interpretation: + lines.append(f" Analysis: {ph.interpretation[:500]}") return "\n".join(lines) async def _get_hypotheses_with_evidence(self) -> str: @@ -150,12 +215,87 @@ class ReportAgent(BaseAgent): return "\n".join(lines) async def _get_case_info(self) -> str: - info = self.graph.case_info lines = ["=== Case Information ==="] - for k, v in info.items(): - lines.append(f" {k}: {v}") - lines.append(f" Image path: {self.graph.image_path}") - lines.append(f" Partition offset: {self.graph.partition_offset}") + case = self.graph.case + if case is not None: + lines.append(f" case_id: {case.case_id}") + lines.append(f" name: {case.name}") + for k, v in (case.meta or {}).items(): + lines.append(f" {k}: {v}") + lines.append(f" sources: {len(case.sources)}") + for s in case.sources: + owner = f", owner={s.owner}" if s.owner else "" + platform = s.meta.get("platform") if s.meta else None + plat = f", platform={platform}" if platform else "" + lines.append( + f" - {s.id}: {s.label} " + f"(type={s.type}, mode={s.access_mode}{plat}{owner})" + ) + else: + # Legacy single-image fallback — surface whatever case_info dict + # was passed in (e.g. the old CFReDS MD5 block). + for k, v in (self.graph.case_info or {}).items(): + lines.append(f" {k}: {v}") + lines.append(f" Image path: {self.graph.image_path}") + lines.append(f" Partition offset: {self.graph.partition_offset}") + return "\n".join(lines) + + async def _get_hypothesis_evidence_matrix(self) -> str: + return self.graph.hypothesis_evidence_matrix_markdown() + + async def _get_actor_clusters(self) -> str: + clusters = self.graph.actor_clusters() + if not clusters: + return "(no entities recorded)" + # Show multi-member clusters first — they're the cross-source links + # the human reader most needs to see. + clusters.sort(key=lambda c: (-len(c["members"]), c["members"])) + lines = [f"=== Actor Clusters ({len(clusters)}) ==="] + for i, c in enumerate(clusters, 1): + members = c["members"] + label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity" + lines.append(f"\n[{label} #{i}] {len(members)} member(s):") + for eid in members: + ent = self.graph.entities.get(eid) + if ent: + lines.append(f" - {ent.summary()}") + if c["identifiers"]: + lines.append(" Aggregated identifiers:") + for ident in c["identifiers"]: + strong_tag = "strong" if ident.get("strong") else "weak" + lines.append( + f" [{strong_tag}] {ident.get('type')}={ident.get('value')} " + f"(on {ident.get('on_entity')})" + ) + if c["coref_hypotheses"]: + lines.append(" Backing coref hypotheses (≥0.8 active):") + for hid in c["coref_hypotheses"]: + hyp = self.graph.hypotheses.get(hid) + if hyp: + lines.append(f" - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}") + return "\n".join(lines) + + async def _get_phenomena_by_source(self) -> str: + by_src: dict[str, list] = {} + for ph in self.graph.phenomena.values(): + by_src.setdefault(ph.source_id or "(unbound)", []).append(ph) + if not by_src: + return "(no phenomena recorded)" + # Resolve source labels via graph.case when possible. + def _label(src_id: str) -> str: + if self.graph.case: + src = self.graph.case.get_source(src_id) + if src: + return f"{src_id} — {src.label} ({src.type})" + return src_id + + lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="] + for src_id in sorted(by_src): + phs = by_src[src_id] + lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---") + for ph in phs: + grounded = "G" if ph.verified_facts and ph.source_tool else "·" + lines.append(f" [{grounded}] {ph.summary()}") return "\n".join(lines) async def _get_entities(self) -> str: @@ -174,18 +314,27 @@ class ReportAgent(BaseAgent): return "\n".join(lines) async def _verify_phenomena(self) -> str: - verified = [] - unverified = [] + grounded: list[str] = [] + tool_only: list[str] = [] + unverified: list[str] = [] for ph in self.graph.phenomena.values(): - entry = f" [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})" - if ph.source_tool: - verified.append(entry) + nf = len(ph.verified_facts) + entry = ( + f" [{ph.category}] {ph.title} " + f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})" + ) + if ph.verified_facts and ph.source_tool: + grounded.append(entry) + elif ph.source_tool: + tool_only.append(entry) else: unverified.append(entry) lines = ["=== Phenomena Verification Report ==="] - lines.append(f"\nVERIFIED ({len(verified)} — have source_tool):") - lines.extend(verified) + lines.append(f"\nGROUNDED ({len(grounded)} — facts + source_tool):") + lines.extend(grounded) + lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):") + lines.extend(tool_only) lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):") lines.extend(unverified) return "\n".join(lines) diff --git a/agents/timeline.py b/agents/timeline.py index 8efb955..4704daf 100644 --- a/agents/timeline.py +++ b/agents/timeline.py @@ -122,7 +122,15 @@ class TimelineAgent(BaseAgent): lines = [] for ph in items: lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})") - lines.append(f" {ph.description[:150]}") + preview = ph.interpretation[:150] if ph.interpretation else "" + if ph.verified_facts: + fact_preview = ", ".join( + f"{f.get('type','?')}={str(f.get('value',''))[:40]}" + for f in ph.verified_facts[:3] + ) + preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]" + if preview: + lines.append(f" {preview}") return "\n".join(lines) async def _add_temporal_edge( diff --git a/base_agent.py b/base_agent.py index 5357fe7..307887d 100644 --- a/base_agent.py +++ b/base_agent.py @@ -5,6 +5,7 @@ from __future__ import annotations import json import logging import time +import uuid from typing import Any from evidence_graph import EvidenceGraph @@ -36,7 +37,9 @@ class BaseAgent: # forced retry with an explicit "you forgot to record" instruction. # Subclasses override to declare their own recording responsibility # (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report). - mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",) + # observe_identity (S5) counts as a recording too — it writes through the + # same grounding gateway and produces an identity_observation phenomenon. + mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity") # Tools whose invocation ends the run immediately. After any terminal tool # is called, tool_call_loop returns with that tool's result text as @@ -110,8 +113,23 @@ class BaseAgent: f" Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n" f" Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n" f" Create add_lead for anything outside your expertise.\n\n" - f"Phase B — RECORD PHENOMENA:\n" - f" For EACH significant finding from Phase A, call add_phenomenon.\n" + f"Phase B — RECORD PHENOMENA (GROUNDED):\n" + f" For EACH significant finding from Phase A, call add_phenomenon with:\n" + f" * interpretation: your analysis — free text, NOT verified.\n" + f" * verified_facts: one entry per concrete atom (path, timestamp,\n" + f" inode, hash, identifier, count) you want recorded as truth.\n" + f" Each entry MUST have:\n" + f" - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n" + f" - value: a VERBATIM substring from the tool output\n" + f" - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n" + f" header at the top of the tool result that produced this value\n" + f" IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n" + f" whenever you see an email, phone number, Apple ID, IMEI, wallet\n" + f" address, MAC, UDID, persistent nickname, or display name. Same\n" + f" grounding contract: value must be verbatim in the cited tool\n" + f" output. This is HOW cross-source attribution gets built — without\n" + f" it, we can't tell whether the Apple ID in keychain belongs to the\n" + f" same person as the Windows account on the USB.\n" f" Do NOT call link_to_entity yet — just record all phenomena first.\n\n" f"Phase C — LINK ENTITIES:\n" f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n" @@ -125,20 +143,22 @@ class BaseAgent: f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n" f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, " f"a registry key) and found NOTHING, that absence IS evidence — call " - f"add_phenomenon with a 'No matches for X' title and the search scope in " - f"raw_data. Negative findings constrain the hypothesis space and prevent " - f"the next agent from wasting time re-searching.\n" + f"add_phenomenon with a 'No matches for X' title, the search scope in " + f"raw_data, and cite the search tool's invocation_id (verified_facts may " + f"be empty for a true negative; the cited invocation in source_tool still " + f"anchors it). Negative findings constrain the hypothesis space.\n" f"- If you stop without having called add_phenomenon at least once, the task " - f"is FAILED and a forced retry will fire.\n" - f"- Include exact file paths, inode numbers, timestamps, and the source_tool " - f"that produced each finding.\n\n" - f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n" - f"- ONLY record findings that appear VERBATIM in tool results you received\n" - f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n" - f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n" - f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n" - f"- Quote exact strings from tool output when recording evidence descriptions\n" - f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools" + f"is FAILED and a forced retry will fire.\n\n" + f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n" + f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n" + f" is what you cite in each fact's invocation_id.\n" + f"- fact.value must be a substring of the cited invocation's output.\n" + f" Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n" + f" anything else fabricated is REJECTED with a per-fact reason.\n" + f"- On REJECTED: quote the literal text from the output (or drop the\n" + f" fact), and put guesses / inferred paths / model names in\n" + f" `interpretation` instead. Then call add_phenomenon again.\n" + f"- You may cite ONLY invocations made within THIS task." ) async def run(self, task: str, lead_id: str | None = None) -> str: @@ -146,6 +166,11 @@ class BaseAgent: _log(task, event="agent_start", agent=self.name) self.graph.agent_status[self.name] = "running" self.graph._current_agent = self.name + # Fresh task scope per agent run. Used by the grounding gateway to + # check that facts in add_phenomenon cite invocations made *within + # this run* — preventing the agent from forwarding stale IDs from + # earlier work or another agent. + self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}" self._current_lead_id = lead_id self._register_graph_tools() @@ -350,20 +375,67 @@ class BaseAgent: self.register_tool( name="add_phenomenon", description=( - "Record a forensic finding (phenomenon) on the evidence graph. " - "You MUST specify source_tool: the name of the tool call that produced this finding." + "Record a forensic finding on the evidence graph. The finding is " + "split into provenance-bound atoms (verified_facts) and free-form " + "analysis (interpretation). Each fact MUST cite the invocation_id " + "of a tool call you made in THIS task — the gateway checks every " + "fact's value against that call's real output, byte-for-byte. " + "Any fact that fails grounding causes the whole record to be " + "rejected with a list of failures; fix the facts and call again." ), input_schema={ "type": "object", "properties": { "category": {"type": "string", "description": "Category of the finding."}, "title": {"type": "string", "description": "Short title."}, - "description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."}, + "interpretation": { + "type": "string", + "description": ( + "Free-form analysis text — your reasoning, why this " + "matters, what it implies. NOT verified by the gateway. " + "Rendered in reports as 'agent analysis', not truth." + ), + }, + "verified_facts": { + "type": "array", + "description": ( + "Atoms you want preserved as ground truth. Each must " + "appear verbatim in the cited tool output." + ), + "items": { + "type": "object", + "properties": { + "type": { + "type": "string", + "description": ( + "Kind of fact: path, timestamp, inode, " + "hash, identifier, count, raw, ..." + ), + }, + "value": { + "type": "string", + "description": ( + "Verbatim substring from the cited tool " + "output. The gateway does a literal " + "string-in-string check — no paraphrasing." + ), + }, + "invocation_id": { + "type": "string", + "description": ( + "ID from the '[invocation: inv-xxx]' header " + "of the tool call that produced this value." + ), + }, + }, + "required": ["type", "value", "invocation_id"], + }, + }, "raw_data": {"type": "object", "description": "Structured raw data supporting this finding."}, "timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."}, "source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."}, }, - "required": ["category", "title", "description", "source_tool"], + "required": ["category", "title", "source_tool"], }, executor=self._add_phenomenon, ) @@ -414,6 +486,67 @@ class BaseAgent: executor=self._link_to_entity, ) + self.register_tool( + name="observe_identity", + description=( + "Record a typed identifier (email / phone / Apple ID / IMEI / " + "wallet address / nickname / display name / …) for an entity. " + "Goes through the same grounding gateway as add_phenomenon — " + "value MUST be a verbatim substring of the cited tool output. " + "After attachment, the engine automatically proposes / " + "strengthens / weakens cross-source coreference hypotheses " + "between this entity and any others carrying the same or " + "conflicting identifiers. This is how 'is the Apple ID in iOS " + "keychain the same person as the Windows login name?' gets " + "answered. Call this in ADDITION to add_phenomenon for " + "identifier-bearing findings." + ), + input_schema={ + "type": "object", + "properties": { + "entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."}, + "entity_type": { + "type": "string", + "enum": ["person", "program", "file", "host", "ip_address"], + "description": "Kind of entity this identifier belongs to (usually 'person').", + }, + "identifier_type": { + "type": "string", + "description": ( + "Strong (near-unique): email, phone_number, imei, " + "imsi, apple_id, icloud_id, google_account, " + "wallet_address, udid, mac_address, device_serial. " + "Weak (free-form, may collide): nickname, " + "display_name, username, screen_name." + ), + }, + "value": { + "type": "string", + "description": ( + "The identifier value, quoted VERBATIM from the " + "tool output you cite in invocation_id." + ), + }, + "invocation_id": { + "type": "string", + "description": ( + "ID from the '[invocation: inv-xxx]' header of " + "the tool call that surfaced this identifier." + ), + }, + "source_tool": { + "type": "string", + "description": "Name of the tool that produced the identifier.", + }, + }, + "required": [ + "entity_name", "entity_type", "identifier_type", + "value", "invocation_id", + ], + }, + executor=self._observe_identity, + ) + # ---- Tool executors ----------------------------------------------------- async def _list_phenomena(self, category: str | None = None) -> str: @@ -453,16 +586,29 @@ class BaseAgent: self, category: str, title: str, - description: str, + interpretation: str = "", + verified_facts: list[dict] | None = None, raw_data: dict | None = None, timestamp: str | None = None, source_tool: str = "", + # Back-compat: older prompts (and accidental LLM emissions) may pass + # ``description``; treat it as ``interpretation`` rather than failing. + description: str | None = None, ) -> str: + if description and not interpretation: + interpretation = description + # GroundingError propagates: llm_client._execute_single_tool turns + # raised exceptions into "Error executing add_phenomenon: " tool + # results the LLM sees, and _wrap_record_executor does NOT increment + # the mandatory-record counter (the increment only runs after a + # successful return), so the forced-retry mechanism still fires if + # the agent never lands a grounded phenomenon. pid, merged = await self.graph.add_phenomenon( source_agent=self.name, category=category, title=title, - description=description, + interpretation=interpretation, + verified_facts=verified_facts, raw_data=raw_data, timestamp=timestamp, source_tool=source_tool, @@ -508,6 +654,51 @@ class BaseAgent: status = "linked to existing" if existing else "created and linked" return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}" + async def _observe_identity( + self, + entity_name: str, + entity_type: str, + identifier_type: str, + value: str, + invocation_id: str, + source_tool: str = "", + ) -> str: + # GroundingError / ValueError propagate to llm_client's per-tool + # exception handler, which formats them back to the LLM. That keeps + # the mandatory-record counter honest — only a successful return + # triggers the increment in _wrap_record_executor. + result = await self.graph.observe_identity( + entity_name=entity_name, + entity_type=entity_type, + identifier_type=identifier_type, + value=value, + source_agent=self.name, + source_tool=source_tool, + invocation_id=invocation_id, + ) + lines = [ + f"Identity observed: {identifier_type}={value} " + f"on entity {result['entity_id']} ({entity_name})." + ] + if result.get("new_identifier"): + lines.append( + f" Observation phenomenon: {result['phenomenon_id']}" + ) + else: + lines.append(" (identifier already recorded on this entity — idempotent)") + for prop in result.get("coref_proposals", []): + lines.append( + f" → Coref candidate: {prop['other_entity_id']} via " + f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, " + f"hypothesis={prop['hypothesis_id']})" + ) + for c in prop.get("conflicts", []): + lines.append( + f" ⚠ conflict on {c['type']}: " + f"{c['new_value']} vs {c['other_value']}" + ) + return "\n".join(lines) + async def _list_assets(self, category: str | None = None) -> str: results = self.graph.list_assets(category) if not results: diff --git a/case.example.yaml b/case.example.yaml new file mode 100644 index 0000000..b3ed843 --- /dev/null +++ b/case.example.yaml @@ -0,0 +1,41 @@ +# MASForensics case definition — template +# +# Copy this file to `case.yaml` and edit it for your case. If `case.yaml` +# exists in the working directory, `python main.py` loads it automatically; +# otherwise main.py falls back to interactive single-image selection. +# +# A case is a set of evidence sources. Each source has: +# id optional — auto-derived from label if omitted ("src-") +# label human-readable name +# type disk_image | mobile_extraction | archive | media_collection +# access_mode image | tree (optional — defaults by type) +# image = block device / disk image, navigated by Sleuth Kit +# tree = mounted filesystem / unpacked extraction, path-based +# owner optional — the person the source is associated with +# path filesystem path (relative paths resolve against this file) +# partition_offset image-mode only — sector offset of the partition to analyze +# meta optional free-form notes +# +# NOTE: at the current refit stage only image-mode (disk) sources are +# analysable; tree-mode sources are accepted but skipped. + +case_id: example-case +name: "Example forensic case" +meta: + notes: "free-form case-level metadata" + +sources: + - id: src-suspect-laptop + label: "Suspect laptop disk image" + type: disk_image + access_mode: image + owner: "John Doe" + path: image/suspect_laptop.E01 + partition_offset: 0 # run `mmls ` to find the right offset + + - id: src-suspect-phone + label: "Suspect phone extraction" + type: mobile_extraction + access_mode: tree + owner: "John Doe" + path: image/suspect_phone.zip diff --git a/case.py b/case.py new file mode 100644 index 0000000..f6c894d --- /dev/null +++ b/case.py @@ -0,0 +1,226 @@ +"""Case and evidence-source model — the foundation for multi-evidence analysis. + +A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source +has a *type* (disk image, mobile extraction, archive, ...) and an *access mode* +that determines how forensic tools reach its contents: + + - ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via + inode addressing (raw, E01, dd, ...). + - ``"tree"`` — an already-mounted filesystem or unpacked extraction, + navigated by ordinary filesystem paths. + +This module is pure data model + loading. Partition probing and interactive +selection live in ``main.py``. +""" + +from __future__ import annotations + +import logging +import re +from dataclasses import asdict, dataclass, field +from pathlib import Path + +logger = logging.getLogger(__name__) + +# Recognised source types and access modes. +SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"} +ACCESS_MODES = {"image", "tree"} + +# Disk-image file extensions for interactive discovery. +# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed +# raw block-device dumps such as ``blk0_sda.bin``. +DISK_IMAGE_EXTS = { + ".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd", +} + +# Default access mode per source type. +_DEFAULT_ACCESS_MODE = { + "disk_image": "image", + "mobile_extraction": "tree", + "archive": "tree", + "media_collection": "tree", +} + + +def slugify(text: str) -> str: + """Reduce *text* to a lowercase, hyphen-separated slug for use in IDs.""" + slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-") + return slug or "src" + + +@dataclass +class EvidenceSource: + """One piece of evidence within a :class:`Case`.""" + + id: str # "src-" + label: str # human-readable name + type: str # one of SOURCE_TYPES + path: str # filesystem path to the evidence + access_mode: str # "image" | "tree" + owner: str = "" # associated person, if known + partition_offset: int = 0 # sector offset (image-mode sources only) + meta: dict = field(default_factory=dict) + + def to_dict(self) -> dict: + return asdict(self) + + @classmethod + def from_dict(cls, d: dict) -> EvidenceSource: + """Reconstruct from a dict, ignoring unknown keys (forward-compatible).""" + known = set(cls.__dataclass_fields__) + return cls(**{k: v for k, v in d.items() if k in known}) + + def summary(self) -> str: + loc = ( + f"@{self.partition_offset}" + if self.access_mode == "image" and self.partition_offset + else "" + ) + owner = f" owner={self.owner}" if self.owner else "" + return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}" + + +@dataclass +class Case: + """A forensic case: a set of evidence sources plus metadata.""" + + case_id: str + name: str + sources: list[EvidenceSource] = field(default_factory=list) + meta: dict = field(default_factory=dict) + + def to_dict(self) -> dict: + return { + "case_id": self.case_id, + "name": self.name, + "sources": [s.to_dict() for s in self.sources], + "meta": dict(self.meta), + } + + @classmethod + def from_dict(cls, d: dict) -> Case: + return cls( + case_id=d.get("case_id", ""), + name=d.get("name", ""), + sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])], + meta=d.get("meta", {}), + ) + + def get_source(self, source_id: str) -> EvidenceSource | None: + for s in self.sources: + if s.id == source_id: + return s + return None + + +# --------------------------------------------------------------------------- +# case.yaml loading +# --------------------------------------------------------------------------- + +def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource: + """Validate and normalise one source entry from case.yaml. + + Missing ``id`` is derived from the label; missing ``access_mode`` defaults + by type; relative paths are resolved against *base_dir* (the case file's + directory). + """ + label = str(raw.get("label") or raw.get("id") or f"source-{index}") + src_type = str(raw.get("type", "disk_image")) + if src_type not in SOURCE_TYPES: + logger.warning("Unknown source type %r for %r — treating as disk_image", + src_type, label) + src_type = "disk_image" + + access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree")) + if access_mode not in ACCESS_MODES: + logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label) + access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree") + + src_id = str(raw.get("id") or f"src-{slugify(label)}") + if not src_id.startswith("src-"): + src_id = f"src-{slugify(src_id)}" + + raw_path = str(raw.get("path", "")).strip() + path = raw_path + if raw_path: + p = Path(raw_path).expanduser() + if not p.is_absolute(): + p = (base_dir / p) + path = str(p) + + return EvidenceSource( + id=src_id, + label=label, + type=src_type, + path=path, + access_mode=access_mode, + owner=str(raw.get("owner", "")), + partition_offset=int(raw.get("partition_offset", 0) or 0), + meta=dict(raw.get("meta", {})), + ) + + +def build_case(data: dict, base_dir: Path | None = None) -> Case: + """Build a validated :class:`Case` from a loosely-typed case.yaml dict.""" + base_dir = base_dir or Path.cwd() + sources: list[EvidenceSource] = [] + seen_ids: set[str] = set() + for i, raw in enumerate(data.get("sources", []) or []): + if not isinstance(raw, dict): + logger.warning("Skipping malformed source entry #%d", i) + continue + src = _build_source(raw, base_dir, i) + if src.id in seen_ids: + src.id = f"{src.id}-{i}" + seen_ids.add(src.id) + if not src.path: + logger.warning("Source %r has no path — keeping but it is not analysable", + src.label) + sources.append(src) + + return Case( + case_id=str(data.get("case_id", "case")), + name=str(data.get("name", "Untitled case")), + sources=sources, + meta=dict(data.get("meta", {})), + ) + + +def load_case(path: str | Path = "case.yaml") -> Case | None: + """Load a :class:`Case` from a case.yaml file. Returns None if absent.""" + case_path = Path(path) + if not case_path.exists(): + return None + import yaml + + try: + data = yaml.safe_load(case_path.read_text()) or {} + except Exception as e: + logger.error("Failed to parse %s: %s", case_path, e) + return None + if not isinstance(data, dict): + logger.error("%s is not a YAML mapping", case_path) + return None + + case = build_case(data, base_dir=case_path.resolve().parent) + logger.info("Loaded case %r with %d source(s) from %s", + case.name, len(case.sources), case_path) + return case + + +def single_source_case( + image_path: str, + partition_offset: int = 0, + label: str | None = None, +) -> Case: + """Wrap a single disk image as a one-source Case (interactive fallback).""" + name = label or Path(image_path).name + src = EvidenceSource( + id=f"src-{slugify(Path(image_path).stem)}", + label=name, + type="disk_image", + path=image_path, + access_mode="image", + partition_offset=partition_offset, + ) + return Case(case_id="adhoc", name=name, sources=[src]) diff --git a/evidence_graph.py b/evidence_graph.py index c1f9999..4735b8d 100644 --- a/evidence_graph.py +++ b/evidence_graph.py @@ -8,35 +8,180 @@ Edges: typed relationships with predefined weights for hypothesis confidence com from __future__ import annotations import asyncio +import contextvars +import hashlib import json import logging +import re import uuid from dataclasses import asdict, dataclass, field from datetime import datetime from pathlib import Path +# Per-asyncio-task scoped values for "which agent is currently running" and +# "which task scope does that agent's grounding live in". Backed by +# ContextVars so concurrent agent runs (Phase 3's _dispatch_leads_parallel) +# don't clobber each other — asyncio.create_task / asyncio.gather copies +# the parent context per child task, and writes inside one task stay there. +# Pre-P0 these were plain attributes on EvidenceGraph; the last setter won +# under concurrency, tagging tool invocations with the WRONG agent and +# making the grounding gateway falsely reject legitimate facts. +_current_agent_ctx: contextvars.ContextVar[str] = contextvars.ContextVar( + "masforensics_current_agent", default="", +) +_current_task_id_ctx: contextvars.ContextVar[str] = contextvars.ContextVar( + "masforensics_current_task_id", default="", +) + +from case import Case, EvidenceSource, single_source_case + logger = logging.getLogger(__name__) # --------------------------------------------------------------------------- -# Default edge weights for Phenomenon → Hypothesis relationships. -# LLM only picks the edge type (categorical); the weight is looked up here. -# Override per-graph via EvidenceGraph(edge_weights=...) or config.yaml's -# `hypothesis_edge_weights` section. +# Per-edge-type log₁₀(LR) — the calibration table backing hypothesis +# confidence updates (DESIGN.md §4.5). +# +# The LLM only picks the *category* (direct_evidence, supports, …); the +# numerical contribution is looked up here. Updates use the additive, +# order-independent log-odds form +# L_post = L_prior + Σ log10(LR_i) +# confidence = 1 / (1 + 10^(−L_post)) +# which fixes the pre-S3 delta-update bug whose result depended on the +# order edges arrived in. +# +# Override per-graph via EvidenceGraph(edge_log_lr=...) or config.yaml's +# `hypothesis_log_lr` section. # --------------------------------------------------------------------------- -_DEFAULT_EDGE_WEIGHTS: dict[str, float] = { - "direct_evidence": +0.25, - "supports": +0.15, - "prerequisite_met": +0.10, - "consequence_observed": +0.15, - "contradicts": -0.20, - "weakens": -0.10, +_DEFAULT_LOG_LR: dict[str, float] = { + "direct_evidence": +2.0, + "supports": +1.0, + "consequence_observed": +1.0, + "prerequisite_met": +0.5, + "weakens": -0.5, + "contradicts": -2.0, + # S5 cross-source coref (DESIGN.md §4.6) — same calibration scale. + # A single shared strong identifier (email, phone, wallet, IMEI…) is + # near-decisive; weak identifiers (nickname) accumulate slowly; a + # conflicting strong identifier is strong negative evidence. + "shared_strong_identifier": +2.0, + "shared_weak_identifier": +0.5, + "conflicting_strong_identifier": -2.0, } + +# DESIGN.md §4.6 identifier taxonomy. Strong identifiers approximate +# global uniqueness — sharing one is high-confidence coref evidence. +# Weak identifiers are nicknames / display names — accumulate via Bayes. +STRONG_IDENTIFIER_TYPES: set[str] = { + "email", + "phone_number", + "imei", + "imsi", + "apple_id", + "icloud_id", + "google_account", + "wallet_address", + "udid", + "mac_address", + "device_serial", +} + +WEAK_IDENTIFIER_TYPES: set[str] = { + "nickname", + "display_name", + "username", + "screen_name", +} + + +def is_strong_identifier(identifier_type: str) -> bool: + """True if the identifier carries enough uniqueness for a strong LR edge.""" + return identifier_type in STRONG_IDENTIFIER_TYPES + + +def _normalize_identifier(identifier_type: str, value: str) -> str: + """Canonicalise an identifier value so trivial spelling variants match. + + - Lowercase for case-insensitive identifiers (email, hostnames, hex). + - Strip whitespace and the leading '+' on phone numbers / international + dialling, then keep only digits for phone matching. + - Pass-through for free-form strings (nicknames). + """ + v = (value or "").strip() + if identifier_type in {"email", "apple_id", "icloud_id", "google_account", + "mac_address", "wallet_address", "udid", + "imei", "imsi", "device_serial"}: + v = v.lower() + if identifier_type == "phone_number": + import re as _re + v = _re.sub(r"\D", "", v) + return v + + +def prob_to_log_odds(p: float) -> float: + """Logit (base 10). Clipped to keep ±∞ out of the graph.""" + p = max(1e-9, min(1 - 1e-9, float(p))) + import math + return math.log10(p / (1.0 - p)) + + +def log_odds_to_prob(log_odds: float) -> float: + """Inverse of :func:`prob_to_log_odds`: 1 / (1 + 10^(−L)).""" + return 1.0 / (1.0 + 10.0 ** (-float(log_odds))) + + +_WS_RUN = re.compile(r"\s+") + + +def _normalize_for_grounding(s: str) -> str: + """Canonicalise a string for the loose-match branch of fact grounding. + + Strict ``value in inv.output`` rejected real evidence because the LLM + routinely normalises tool output before quoting: + - case-folds hex (``89 50 4e 47`` → ``89 50 4E 47``) + - flips path separators (``Sunny\\foo.exe`` → ``Sunny/foo.exe``) + - collapses whitespace across newlines (``AppleID:\\n alice@x.com`` + → ``AppleID: alice@x.com``) + + None of those are hallucinations — they're presentation choices. This + normaliser does the inverse so both sides line up: + - lowercase everything (handles hex case + email case + MAC case) + - collapse any run of whitespace to a single space + - replace ``\\`` with ``/`` (path-sep flip) + + Genuine fabrications still fail: a value that doesn't appear (in any + form) inside the output normalises to a string that isn't a substring + of the normalised output, and the gateway rejects exactly as before. + """ + if not s: + return "" + s = s.lower().replace("\\", "/") + s = _WS_RUN.sub(" ", s) + return s.strip() + +class GroundingError(ValueError): + """Raised by the add_phenomenon gateway when one or more verified_facts + fail the grounding check (missing/wrong invocation_id, wrong agent or + task, or fact.value not present in the cited tool output). + + Carries the failed facts so callers (BaseAgent) can format them back + to the LLM for a corrective retry. + """ + + def __init__(self, message: str, failures: list[dict]) -> None: + super().__init__(message) + self.failures = failures + + # All valid edge types across the graph. VALID_EDGE_TYPES: set[str] = { # Phenomenon → Hypothesis "direct_evidence", "supports", "prerequisite_met", "consequence_observed", "contradicts", "weakens", + # Phenomenon → Hypothesis (S5 coref-specific — used between identifier + # observation phenomena and the "Entity A ≡ Entity B" coref hypothesis) + "shared_strong_identifier", "shared_weak_identifier", + "conflicting_strong_identifier", # Phenomenon → Phenomenon "temporal", "causal", "input_to", "modifies", "co_located", "corroborates", # Phenomenon → Entity @@ -44,6 +189,8 @@ VALID_EDGE_TYPES: set[str] = { "associated_with", "found_on", "used_by", # Hypothesis → Hypothesis "refines", "conflicts", "depends_on", + # Entity → Entity (S5 — backed by a coref hypothesis ≥ threshold) + "same_as", } @@ -55,22 +202,30 @@ def _compute_quality_score( source_tool: str, timestamp: str | None, raw_data: dict, - description: str, + interpretation: str, + verified_facts: list[dict], related_ids: list[str], ) -> float: - """Compute a quality score (0.0-1.0) based on evidence completeness.""" + """Compute a quality score (0.0-1.0) based on evidence completeness. + + A grounded phenomenon (any verified_facts) outweighs a long free-text + interpretation: the facts carry provenance, the interpretation doesn't. + """ score = 0.0 if source_tool: - score += 0.25 - if timestamp is not None: score += 0.20 - if raw_data: - score += 0.25 - if len(description) >= 50: + if timestamp is not None: score += 0.15 + if raw_data: + score += 0.15 + if verified_facts: + # Capped contribution: 0.05 per fact up to 0.25. + score += min(0.25, 0.05 * len(verified_facts)) + if len(interpretation) >= 50: + score += 0.10 if related_ids: score += 0.15 - return score + return min(1.0, score) def _jaccard_similarity(a: str, b: str) -> float: @@ -84,17 +239,29 @@ def _jaccard_similarity(a: str, b: str) -> float: @dataclass class Phenomenon: - """Raw observable artifact found on disk.""" + """Raw observable artifact found on disk. + + DESIGN.md §4.4: a phenomenon is split into provenance-bound *facts* + and free-text *interpretation*. The gateway hard-validates every + fact against the recorded tool invocation it cites; interpretation + is the agent's narrative and is rendered as "agent analysis" in the + final report — not as truth. + """ id: str # "ph-{uuid8}" source_agent: str category: str # filesystem, registry, email, network, timeline title: str - description: str + # Free-form analysis text — the agent's reasoning. NOT verified. + interpretation: str = "" + # Grounded atoms. Each fact: {type, value, invocation_id}. + # type ∈ {path, timestamp, inode, hash, identifier, count, raw, ...} + verified_facts: list[dict] = field(default_factory=list) raw_data: dict = field(default_factory=dict) timestamp: str | None = None confidence: float = 1.0 source_tool: str = "" + source_id: str = "" # id of the EvidenceSource this finding came from corroborating_agents: list[str] = field(default_factory=list) from_lead_id: str | None = None created_at: str = "" @@ -104,46 +271,98 @@ class Phenomenon: @classmethod def from_dict(cls, d: dict) -> Phenomenon: - return cls(**d) + """Reconstruct from a dict; migrate legacy ``description`` field. + + Older runs persisted free text in ``description``; treat that as + ``interpretation`` so old graph_state.json files keep loading. + """ + d = dict(d) + if "description" in d: + legacy = d.pop("description") + d.setdefault("interpretation", legacy or "") + d.setdefault("verified_facts", []) + known = set(cls.__dataclass_fields__) + return cls(**{k: v for k, v in d.items() if k in known}) def summary(self) -> str: ts = f" @ {self.timestamp}" if self.timestamp else "" - return f"[{self.id}] [{self.category}] {self.title}{ts} (conf={self.confidence:.2f})" + nf = len(self.verified_facts) + facts_note = f" facts={nf}" if nf else "" + return ( + f"[{self.id}] [{self.category}] {self.title}{ts} " + f"(conf={self.confidence:.2f}{facts_note})" + ) @dataclass class Hypothesis: - """Interpretive claim about what happened on the system.""" + """Interpretive claim about what happened on the system. + + Confidence is a *derived* projection of ``log_odds`` (DESIGN.md §4.5): + every Phenomenon→Hypothesis edge contributes log₁₀(LR) to ``log_odds``, + and ``confidence = 1 / (1 + 10^(−log_odds))``. ``log_odds`` is the + canonical state; ``confidence`` is kept in sync for display and + threshold checks (≥0.8 supported / ≤0.2 refuted). + + ``prior_prob`` seeds the starting log_odds (default 0.5 → 0.0). + """ id: str # "hyp-{uuid8}" title: str description: str - confidence: float = 0.5 + prior_prob: float = 0.5 + log_odds: float = 0.0 + confidence: float = 0.5 # derived from log_odds — kept in sync on update status: str = "active" # active, supported, refuted, inconclusive parent_id: str | None = None created_by: str = "" # "manual", "hypothesis_agent", agent name created_at: str = "" confidence_log: list[dict] = field(default_factory=list) + # S5 coref-specific: pair of entity ids this hypothesis claims are the + # same actor. Lets update_hypothesis_confidence sync the backing + # ``same_as`` edge automatically when contradicting evidence arrives. + coref_entity_pair: list[str] = field(default_factory=list) def to_dict(self) -> dict: return asdict(self) @classmethod def from_dict(cls, d: dict) -> Hypothesis: - return cls(**d) + """Reconstruct from a dict. Migrates pre-S3 records that only had + ``confidence`` by deriving ``log_odds`` via the logit transform. + """ + d = dict(d) + if "log_odds" not in d: + d["log_odds"] = prob_to_log_odds(d.get("confidence", 0.5)) + d.setdefault("prior_prob", 0.5) + # Re-sync confidence from log_odds in case of drift in old files. + d["confidence"] = log_odds_to_prob(d["log_odds"]) + known = set(cls.__dataclass_fields__) + return cls(**{k: v for k, v in d.items() if k in known}) def summary(self) -> str: - return f"[{self.id}] {self.title} (conf={self.confidence:.2f}, {self.status})" + return ( + f"[{self.id}] {self.title} " + f"(conf={self.confidence:.2f}, L={self.log_odds:+.2f}, {self.status})" + ) @dataclass class Entity: - """Recurring actor or object across phenomena.""" + """Recurring actor or object across phenomena. + + DESIGN.md §4.6 attaches typed identifiers directly to the entity for + fast blocking lookups during coref. Each identifier entry: + {type, value, normalized, strong, invocation_id, phenomenon_id, observed_at} + where ``normalized`` is the canonicalised form used for matching + (lower-cased email, digits-only phone, …). + """ id: str # "ent-{uuid8}" name: str entity_type: str # person, program, file, host, ip_address description: str = "" + identifiers: list[dict] = field(default_factory=list) created_at: str = "" def to_dict(self) -> dict: @@ -151,10 +370,29 @@ class Entity: @classmethod def from_dict(cls, d: dict) -> Entity: - return cls(**d) + d = dict(d) + d.setdefault("identifiers", []) + known = set(cls.__dataclass_fields__) + return cls(**{k: v for k, v in d.items() if k in known}) + + def has_identifier(self, identifier_type: str, normalized_value: str) -> bool: + return any( + i.get("type") == identifier_type + and i.get("normalized") == normalized_value + for i in self.identifiers + ) def summary(self) -> str: - return f"[{self.id}] {self.entity_type}: {self.name}" + idents = "" + if self.identifiers: + top = self.identifiers[:3] + preview = ", ".join(f"{i.get('type')}={i.get('value')}" for i in top) + extra = ( + f" (+{len(self.identifiers) - 3} more)" + if len(self.identifiers) > 3 else "" + ) + idents = f" [{preview}{extra}]" + return f"[{self.id}] {self.entity_type}: {self.name}{idents}" @dataclass @@ -261,6 +499,41 @@ class ExtractedAsset: ) +@dataclass +class ToolInvocation: + """One recorded tool call — the provenance unit for grounded facts. + + Every wrapped tool executor records a ToolInvocation when it runs. The + grounding gateway looks these up by id when validating that a fact in + an ``add_phenomenon`` call traces back to a real tool output. Persisted + with the graph so a re-loaded run can still verify provenance. + """ + + id: str # "inv-{uuid8}" + tool: str # tool name as registered in TOOL_CATALOG + args: dict # kwargs passed to the executor + output: str # the raw output string the tool produced + output_sha256: str # hexdigest — tamper-evident hash of output + agent: str # agent that issued the call + task_id: str # agent run scope (graph._current_task_id at call time) + source_id: str # active evidence source at call time + created_at: str # ISO timestamp + cached: bool = False # served from result cache without re-running + + def to_dict(self) -> dict: + return asdict(self) + + @classmethod + def from_dict(cls, d: dict) -> ToolInvocation: + return cls(**d) + + def summary(self) -> str: + return ( + f"[{self.id}] {self.tool}({json.dumps(self.args, ensure_ascii=False)}) " + f"@{self.created_at} agent={self.agent} cached={self.cached}" + ) + + # --------------------------------------------------------------------------- # Evidence Graph # --------------------------------------------------------------------------- @@ -277,16 +550,25 @@ class EvidenceGraph: self, case_info: dict | None = None, persist_path: Path | None = None, - edge_weights: dict[str, float] | None = None, + edge_log_lr: dict[str, float] | None = None, ) -> None: self.case_info: dict = case_info or {} - self.edge_weights: dict[str, float] = ( - dict(edge_weights) if edge_weights else dict(_DEFAULT_EDGE_WEIGHTS) + # log₁₀(LR) per edge type — calibration table for confidence updates. + # Renamed from edge_weights (S3): the values are no longer deltas in + # confidence space, they are log-likelihood ratios in odds space. + self.edge_log_lr: dict[str, float] = ( + dict(edge_log_lr) if edge_log_lr else dict(_DEFAULT_LOG_LR) ) self.image_path: str = "" self.partition_offset: int = 0 self.extracted_dir: str = "extracted" + # Multi-evidence: the case and the source tools/phenomena bind to. + # image_path / partition_offset above mirror active_source for + # backward-compatible readers; set_active_source keeps them in sync. + self.case: Case | None = None + self.active_source: EvidenceSource | None = None + # Graph storage self.phenomena: dict[str, Phenomenon] = {} self.hypotheses: dict[str, Hypothesis] = {} @@ -310,12 +592,43 @@ class EvidenceGraph: # gap-analysis coverage check. self.investigation_areas: dict[str, InvestigationArea] = {} - # Set by BaseAgent.run() before each agent execution - self._current_agent: str = "" + # Tool invocations — provenance log for grounded facts. Every wrapped + # tool executor records one entry; add_phenomenon's grounding gateway + # looks them up to validate cited invocation_ids and substring-match + # claimed fact values against real tool outputs. + self.tool_invocations: dict[str, ToolInvocation] = {} + + # _current_agent / _current_task_id are exposed as @property below, + # backed by module-level ContextVars (race-free under asyncio.gather). self._lock = asyncio.Lock() self._persist_path: Path | None = persist_path + # ---- Per-asyncio-task scoped state --------------------------------------- + # + # Reads/writes through these properties hit ContextVars rather than + # instance attributes. Concurrent agent runs (Phase 3 parallel + # dispatch) each have their own task-local context, so writes inside + # one agent's run() are invisible to siblings — which means + # ``record_tool_invocation`` always tags an invocation with the agent + # and task scope that actually issued it. + + @property + def _current_agent(self) -> str: + return _current_agent_ctx.get() + + @_current_agent.setter + def _current_agent(self, value: str) -> None: + _current_agent_ctx.set(value or "") + + @property + def _current_task_id(self) -> str: + return _current_task_id_ctx.get() + + @_current_task_id.setter + def _current_task_id(self, value: str) -> None: + _current_task_id_ctx.set(value or "") + # ---- Persistence ------------------------------------------------------- def _auto_save(self) -> None: @@ -325,6 +638,10 @@ class EvidenceGraph: try: state = { "case_info": self.case_info, + "case": self.case.to_dict() if self.case else None, + "active_source_id": ( + self.active_source.id if self.active_source else "" + ), "image_path": self.image_path, "partition_offset": self.partition_offset, "extracted_dir": self.extracted_dir, @@ -338,6 +655,9 @@ class EvidenceGraph: "investigation_areas": { aid: a.to_dict() for aid, a in self.investigation_areas.items() }, + "tool_invocations": { + iid: inv.to_dict() for iid, inv in self.tool_invocations.items() + }, "saved_at": datetime.now().isoformat(), } tmp = self._persist_path.with_suffix(".tmp") @@ -357,18 +677,32 @@ class EvidenceGraph: def load_state( cls, path: Path, - edge_weights: dict[str, float] | None = None, + edge_log_lr: dict[str, float] | None = None, ) -> EvidenceGraph: """Restore an EvidenceGraph from a saved JSON state file.""" data = json.loads(path.read_text()) graph = cls( case_info=data.get("case_info", {}), persist_path=path, - edge_weights=edge_weights, + edge_log_lr=edge_log_lr, ) graph.image_path = data.get("image_path", "") graph.partition_offset = data.get("partition_offset", 0) graph.extracted_dir = data.get("extracted_dir", "extracted") + + # Restore the evidence-source model. State files predating the Case + # model carry only image_path/partition_offset → wrap as one source. + case_data = data.get("case") + if case_data: + graph.case = Case.from_dict(case_data) + elif graph.image_path: + graph.case = single_source_case( + graph.image_path, graph.partition_offset + ) + if graph.case and graph.case.sources: + active = graph.case.get_source(data.get("active_source_id", "")) + graph.set_active_source(active or graph.case.sources[0]) + graph.phenomena = { pid: Phenomenon.from_dict(p) for pid, p in data.get("phenomena", {}).items() @@ -392,6 +726,10 @@ class EvidenceGraph: aid: InvestigationArea.from_dict(a) for aid, a in data.get("investigation_areas", {}).items() } + graph.tool_invocations = { + iid: ToolInvocation.from_dict(inv) + for iid, inv in data.get("tool_invocations", {}).items() + } graph._rebuild_adjacency() logger.info( "EvidenceGraph restored: %d phenomena, %d hypotheses, %d entities, " @@ -409,6 +747,21 @@ class EvidenceGraph: self._adj.setdefault(edge.source_id, []).append(edge) self._adj_rev.setdefault(edge.target_id, []).append(edge) + # ---- Evidence source ---------------------------------------------------- + + def set_active_source(self, source: EvidenceSource | None) -> None: + """Bind tools and newly recorded phenomena to *source*. + + Syncs the legacy image_path / partition_offset fields so existing + readers (orchestrator logs, report naming, agent prompts) keep + working unchanged. The orchestrator calls this before dispatching an + agent; single-source runs call it once at startup. + """ + self.active_source = source + if source is not None: + self.image_path = source.path + self.partition_offset = source.partition_offset + # ---- Node helpers ------------------------------------------------------- def _node_exists(self, node_id: str) -> bool: @@ -432,7 +785,7 @@ class EvidenceGraph: # ---- Similarity merging (Phenomenon only) -------------------------------- def _find_similar_phenomenon( - self, title: str, description: str, category: str, + self, title: str, interpretation: str, category: str, ) -> Phenomenon | None: best_match: Phenomenon | None = None best_score = 0.0 @@ -442,7 +795,9 @@ class EvidenceGraph: title_sim = _jaccard_similarity(ph.title, title) if title_sim <= 0.6: continue - desc_sim = _jaccard_similarity(ph.description[:200], description[:200]) + desc_sim = _jaccard_similarity( + ph.interpretation[:200], interpretation[:200], + ) if desc_sim <= 0.4: continue combined = title_sim * 0.6 + desc_sim * 0.4 @@ -458,19 +813,54 @@ class EvidenceGraph: source_agent: str, category: str, title: str, - description: str, + interpretation: str = "", + verified_facts: list[dict] | None = None, raw_data: dict | None = None, timestamp: str | None = None, source_tool: str = "", from_lead_id: str | None = None, + task_id: str | None = None, + # Pre-S2 callers passed analysis text as ``description``. Accept it + # as an alias for ``interpretation`` so legacy tests and any in-flight + # tool-call messages don't break. Not advertised in the LLM-facing + # tool schema — BaseAgent's add_phenomenon advertises the new fields. + description: str | None = None, ) -> tuple[str, bool]: - """Add a phenomenon. Returns (id, was_merged). + """Add a phenomenon under the grounding gateway. Returns (id, was_merged). - Confidence is auto-computed from evidence completeness (source_tool, - timestamp, raw_data, description length). + Each fact in ``verified_facts`` must point at a real ToolInvocation + made by this agent within ``task_id`` (defaults to the graph's + current task scope). Any fact failing grounding raises + :class:`GroundingError` — the whole call is rejected; the caller + must fix and retry. This is the code-level enforcement of + DESIGN.md §4.4. """ + if description and not interpretation: + interpretation = description + facts = list(verified_facts or []) + active_task_id = task_id if task_id is not None else self._current_task_id + + # Grounding gateway — validate every fact BEFORE acquiring the lock + # (read-only check; lookup uses dict access which is thread-safe). + failures: list[dict] = [] + for fact in facts: + ok, reason = self.validate_fact_grounding( + fact, agent=source_agent, task_id=active_task_id or "", + ) + if not ok: + failures.append({"fact": fact, "reason": reason}) + if failures: + raise GroundingError( + "Phenomenon rejected — one or more facts are not grounded:\n" + + "\n".join( + f" - {f['reason']}: {json.dumps(f['fact'], ensure_ascii=False)}" + for f in failures + ), + failures=failures, + ) + async with self._lock: - similar = self._find_similar_phenomenon(title, description, category) + similar = self._find_similar_phenomenon(title, interpretation, category) if similar is not None: similar.confidence = min(1.0, similar.confidence + 0.15) if source_agent not in similar.corroborating_agents: @@ -479,6 +869,18 @@ class EvidenceGraph: for k, v in raw_data.items(): if k not in similar.raw_data: similar.raw_data[k] = v + # Merge any new facts whose (type, value, invocation_id) + # tuple isn't already on the existing phenomenon. + if facts: + seen = { + (f.get("type"), f.get("value"), f.get("invocation_id")) + for f in similar.verified_facts + } + for f in facts: + key = (f.get("type"), f.get("value"), f.get("invocation_id")) + if key not in seen: + similar.verified_facts.append(f) + seen.add(key) if from_lead_id and similar.from_lead_id is None: similar.from_lead_id = from_lead_id self._auto_save() @@ -487,18 +889,20 @@ class EvidenceGraph: pid = f"ph-{uuid.uuid4().hex[:8]}" confidence = _compute_quality_score( source_tool, timestamp, raw_data or {}, - description, [], + interpretation, facts, [], ) ph = Phenomenon( id=pid, source_agent=source_agent, category=category, title=title, - description=description, + interpretation=interpretation, + verified_facts=facts, raw_data=raw_data or {}, timestamp=timestamp, confidence=confidence, source_tool=source_tool, + source_id=self.active_source.id if self.active_source else "", from_lead_id=from_lead_id, created_at=datetime.now().isoformat(), ) @@ -512,15 +916,24 @@ class EvidenceGraph: description: str, created_by: str = "", parent_id: str | None = None, + prior_prob: float = 0.5, ) -> str: - """Add a hypothesis. Returns the hypothesis ID.""" + """Add a hypothesis. Returns the hypothesis ID. + + ``prior_prob`` seeds the starting log_odds (default 0.5 → 0.0). + Pick a different prior when you have base-rate knowledge — e.g. + prior_prob=0.1 for an unusual claim, 0.9 for a strong default. + """ async with self._lock: hid = f"hyp-{uuid.uuid4().hex[:8]}" + l_prior = prob_to_log_odds(prior_prob) hyp = Hypothesis( id=hid, title=title, description=description, - confidence=0.5, + prior_prob=prior_prob, + log_odds=l_prior, + confidence=log_odds_to_prob(l_prior), status="active", parent_id=parent_id, created_by=created_by, @@ -593,16 +1006,25 @@ class EvidenceGraph: edge_type: str, reason: str = "", ) -> float: - """Update hypothesis confidence based on a phenomenon linkage. + """Apply one phenomenon→hypothesis edge as an additive log_odds update. - The edge_type must be one of self.edge_weights keys. - Weight is looked up from the configured table, NOT judged by LLM. - Returns the new confidence value. + DESIGN.md §4.5: edge_type → log₁₀(LR) is looked up in + ``self.edge_log_lr`` (LLM never emits the number). The update is + ``L_post = L_prior + log_lr`` and ``confidence = sigmoid(L_post)`` + — commutative and order-independent, fixing the pre-S3 ordering + bug. Status flips at ≥0.8 → supported / ≤0.2 → refuted. + + **Idempotency**: if a ``(phenomenon, hypothesis, edge_type)`` edge + already exists, this is a no-op — the same agent re-recording the + same link (or two agents linking via the orchestrator's batch + judge and a manual override) does not double-count. Independent + evidence — *different* phenomena pointing the same way — still + accumulates fully. """ - if edge_type not in self.edge_weights: + if edge_type not in self.edge_log_lr: raise ValueError( f"Invalid hypothesis edge type: {edge_type}. " - f"Must be one of: {list(self.edge_weights.keys())}" + f"Must be one of: {list(self.edge_log_lr.keys())}" ) async with self._lock: @@ -612,27 +1034,37 @@ class EvidenceGraph: if hyp is None: raise ValueError(f"Hypothesis not found: {hyp_id}") - weight = self.edge_weights[edge_type] + # Idempotency check — same (ph, hyp, edge_type) already on graph. + for existing in self._adj.get(phenomenon_id, []): + if ( + existing.target_id == hyp_id + and existing.edge_type == edge_type + ): + return hyp.confidence + + log_lr = self.edge_log_lr[edge_type] + old_log_odds = hyp.log_odds old_conf = hyp.confidence + new_log_odds = old_log_odds + log_lr + new_conf = log_odds_to_prob(new_log_odds) - if weight > 0: - delta = weight * (1 - old_conf) - else: - delta = weight * old_conf - - new_conf = max(0.0, min(1.0, old_conf + delta)) + hyp.log_odds = new_log_odds hyp.confidence = new_conf if new_conf >= 0.8: hyp.status = "supported" elif new_conf <= 0.2: hyp.status = "refuted" + else: + hyp.status = "active" hyp.confidence_log.append({ "timestamp": datetime.now().isoformat(), "phenomenon_id": phenomenon_id, "edge_type": edge_type, - "weight": weight, + "log_lr": log_lr, + "old_log_odds": round(old_log_odds, 4), + "new_log_odds": round(new_log_odds, 4), "old_confidence": round(old_conf, 4), "new_confidence": round(new_conf, 4), "reason": reason, @@ -645,7 +1077,7 @@ class EvidenceGraph: source_id=phenomenon_id, target_id=hyp_id, edge_type=edge_type, - metadata={"reason": reason}, + metadata={"reason": reason, "log_lr": log_lr}, created_by="hypothesis_engine", created_at=datetime.now().isoformat(), ) @@ -654,7 +1086,381 @@ class EvidenceGraph: self._adj_rev.setdefault(hyp_id, []).append(edge) self._auto_save() - return new_conf + + # If this is a coref hypothesis, mirror the new confidence into the + # entity-level same_as edge. Done OUTSIDE the lock — _sync_same_as_edge + # re-acquires it internally — so we avoid reentrant locking. + if hyp.coref_entity_pair and len(hyp.coref_entity_pair) == 2: + await self._sync_same_as_edge( + hyp.coref_entity_pair[0], + hyp.coref_entity_pair[1], + hyp_id, + ) + return new_conf + + # ---- Cross-source entity coreference (DESIGN.md §4.6) ------------------- + + @staticmethod + def _coref_hypothesis_id(eid_a: str, eid_b: str) -> str: + """Deterministic id for the coref hypothesis between an entity pair. + + Same pair (regardless of arg order) always maps to the same id so + repeated observations augment the existing hypothesis rather than + spawning duplicates. + """ + pair = "|".join(sorted([eid_a, eid_b])) + return f"hyp-coref-{hashlib.sha256(pair.encode()).hexdigest()[:10]}" + + async def get_or_create_coref_hypothesis( + self, eid_a: str, eid_b: str, + ) -> tuple[str, bool]: + """Look up (or insert) the coreference hypothesis for an entity pair. + + Uses a low prior (``prior_prob=0.1``) — saying any two entities are + the same actor is a strong claim, so the default should be + skeptical and let evidence move the needle. + """ + hid = self._coref_hypothesis_id(eid_a, eid_b) + async with self._lock: + if hid in self.hypotheses: + return hid, False + ea = self.entities.get(eid_a) + eb = self.entities.get(eid_b) + if ea is None or eb is None: + raise ValueError(f"Unknown entity in coref pair: {eid_a}, {eid_b}") + l_prior = prob_to_log_odds(0.1) + self.hypotheses[hid] = Hypothesis( + id=hid, + title=f"Coreference: {ea.name} ≡ {eb.name}", + description=( + f"Hypothesis that {ea.id} ({ea.name}, {ea.entity_type}) " + f"and {eb.id} ({eb.name}, {eb.entity_type}) refer to " + f"the same actor across evidence sources." + ), + prior_prob=0.1, + log_odds=l_prior, + confidence=log_odds_to_prob(l_prior), + status="active", + created_by="coref_engine", + created_at=datetime.now().isoformat(), + coref_entity_pair=sorted([eid_a, eid_b]), + ) + self._auto_save() + return hid, True + + async def _sync_same_as_edge( + self, eid_a: str, eid_b: str, hyp_id: str, + ) -> None: + """Mirror coref hypothesis confidence into a ``same_as`` entity edge. + + - Confidence ≥ 0.8 → ensure an active ``same_as`` edge exists. + - Confidence < 0.8 → mark any existing edge inactive (audit, not delete). + Idempotent on both transitions. + """ + hyp = self.hypotheses.get(hyp_id) + if hyp is None: + return + active = hyp.confidence >= 0.8 + async with self._lock: + existing = None + for edge in self.edges: + if (edge.edge_type == "same_as" + and {edge.source_id, edge.target_id} == {eid_a, eid_b}): + existing = edge + break + if active: + if existing is None: + eid = f"edge-{uuid.uuid4().hex[:8]}" + edge = Edge( + id=eid, + source_id=eid_a, + target_id=eid_b, + edge_type="same_as", + metadata={ + "backed_by": hyp_id, + "active": True, + "confidence_at_creation": hyp.confidence, + }, + created_by="coref_engine", + created_at=datetime.now().isoformat(), + ) + self.edges.append(edge) + self._adj.setdefault(eid_a, []).append(edge) + self._adj_rev.setdefault(eid_b, []).append(edge) + elif not existing.metadata.get("active"): + existing.metadata["active"] = True + existing.metadata["reactivated_at"] = datetime.now().isoformat() + else: + if existing is not None and existing.metadata.get("active"): + existing.metadata["active"] = False + existing.metadata["deactivated_at"] = datetime.now().isoformat() + self._auto_save() + + async def observe_identity( + self, + entity_name: str, + entity_type: str, + identifier_type: str, + value: str, + source_agent: str, + invocation_id: str, + source_tool: str = "", + task_id: str | None = None, + ) -> dict: + """Record a typed identifier for an entity through the grounding gateway. + + DESIGN.md §4.6. Steps: + + 1. Validate ``invocation_id`` + ``value`` via the same gateway + ``add_phenomenon`` uses (raises :class:`GroundingError` on failure). + 2. Get-or-create the entity. + 3. Record an ``identity_observation`` phenomenon carrying the + identifier as its sole verified fact. + 4. Attach the identifier to the entity (idempotent by + ``(type, normalized_value)``). + 5. If the attachment is new, scan other entities for shared + identifiers (strong / weak) and any conflicting strong + identifiers, then propose / strengthen / weaken the coref + hypothesis between each candidate pair. ``same_as`` edges are + kept in sync with the hypothesis confidence. + + Returns a dict summarising the entity id, observation phenomenon, + whether the identifier was new, and any coref proposals fired. + """ + if identifier_type not in (STRONG_IDENTIFIER_TYPES | WEAK_IDENTIFIER_TYPES): + raise ValueError( + f"Unknown identifier_type: {identifier_type}. " + f"Strong: {sorted(STRONG_IDENTIFIER_TYPES)}; " + f"Weak: {sorted(WEAK_IDENTIFIER_TYPES)}." + ) + if not value: + raise ValueError("identifier value must be non-empty") + + # add_phenomenon enforces the grounding contract for the fact below. + active_task = task_id if task_id is not None else self._current_task_id + fact = {"type": identifier_type, "value": value, "invocation_id": invocation_id} + + # Get-or-create entity first so we can attribute the observation. + eid, _existed = await self.add_entity(entity_name, entity_type) + + norm = _normalize_identifier(identifier_type, value) + title = f"{identifier_type}={value} on {entity_name}" + pid, _merged = await self.add_phenomenon( + source_agent=source_agent, + category="identity_observation", + title=title, + interpretation=( + f"Agent attributed identifier {identifier_type}={value} " + f"(normalized={norm}) to entity {entity_name} ({entity_type})." + ), + verified_facts=[fact], + source_tool=source_tool, + task_id=active_task, + ) + + # Attach identifier to entity (idempotent on type + normalized value). + new_identifier = False + async with self._lock: + ent = self.entities[eid] + if not ent.has_identifier(identifier_type, norm): + ent.identifiers.append({ + "type": identifier_type, + "value": value, + "normalized": norm, + "strong": is_strong_identifier(identifier_type), + "invocation_id": invocation_id, + "phenomenon_id": pid, + "observed_at": datetime.now().isoformat(), + }) + new_identifier = True + self._auto_save() + + coref_proposals: list[dict] = [] + if new_identifier: + coref_proposals = await self._propose_coref_for_new_identifier( + new_eid=eid, + new_type=identifier_type, + new_norm=norm, + new_phenomenon_id=pid, + ) + + return { + "entity_id": eid, + "phenomenon_id": pid, + "new_identifier": new_identifier, + "coref_proposals": coref_proposals, + } + + async def _propose_coref_for_new_identifier( + self, + new_eid: str, + new_type: str, + new_norm: str, + new_phenomenon_id: str, + ) -> list[dict]: + """Blocking + propose: find candidate entities that share this + identifier with ``new_eid``, register / strengthen the coref + hypothesis for each pair, and emit conflicting-identifier edges + where the two entities have *different* values for the same + strong identifier type. O(|entities| × identifiers) — blocking + is implicit in the fact that the new identifier is fixed. + """ + new_ent = self.entities.get(new_eid) + if new_ent is None: + return [] + + is_strong_new = is_strong_identifier(new_type) + match_edge = "shared_strong_identifier" if is_strong_new else "shared_weak_identifier" + + proposals: list[dict] = [] + + for other_eid, other_ent in list(self.entities.items()): + if other_eid == new_eid: + continue + + # Match: other entity carries the same (type, normalized). + if not other_ent.has_identifier(new_type, new_norm): + continue + + # Collect conflicting strong identifiers between the pair — + # they'll fire negative-LR edges on the same coref hypothesis. + conflicts: list[dict] = [] + for a_ident in new_ent.identifiers: + if not a_ident.get("strong"): + continue + for b_ident in other_ent.identifiers: + if (b_ident.get("type") == a_ident.get("type") + and b_ident.get("strong") + and b_ident.get("normalized") != a_ident.get("normalized")): + conflicts.append({ + "type": a_ident.get("type"), + "new_value": a_ident.get("value"), + "other_value": b_ident.get("value"), + "new_phenomenon_id": a_ident.get("phenomenon_id"), + }) + + hid, _created = await self.get_or_create_coref_hypothesis( + new_eid, other_eid, + ) + + # +shared identifier edge (one per identifier, anchored to the + # newly recorded observation phenomenon). update_hypothesis_ + # confidence is idempotent on (ph, hyp, edge_type), so re-running + # the same observation does not double-count. + await self.update_hypothesis_confidence( + hid, new_phenomenon_id, match_edge, + reason=f"shared {new_type}={new_norm}", + ) + + # −conflicting strong identifier edges — one per conflict, anchored + # to the *new* entity's observation phenomenon for that identifier. + for c in conflicts: + ph_src = c["new_phenomenon_id"] + if not ph_src: + continue + await self.update_hypothesis_confidence( + hid, ph_src, "conflicting_strong_identifier", + reason=( + f"conflict {c['type']}: " + f"{c['new_value']} vs {c['other_value']}" + ), + ) + + await self._sync_same_as_edge(new_eid, other_eid, hid) + + proposals.append({ + "hypothesis_id": hid, + "other_entity_id": other_eid, + "match": {"type": new_type, "normalized": new_norm, + "edge_type": match_edge}, + "conflicts": conflicts, + "confidence": self.hypotheses[hid].confidence, + }) + + return proposals + + # ---- Cross-source entity cluster queries (DESIGN.md §4.6) ---------------- + + def _active_same_as_neighbors(self, entity_id: str) -> set[str]: + """Neighbours of *entity_id* via ``same_as`` edges that are still active. + + ``same_as`` edges are non-destructive: a coref hypothesis that drops + below threshold marks ``metadata['active']=False`` rather than + deleting, so the audit trail survives. Cluster queries respect that. + """ + out: set[str] = set() + for edge in self.edges: + if edge.edge_type != "same_as": + continue + if not edge.metadata.get("active", True): + continue + if edge.source_id == entity_id: + out.add(edge.target_id) + elif edge.target_id == entity_id: + out.add(edge.source_id) + return out + + def resolve_actor_cluster(self, entity_id: str) -> set[str]: + """Return the connected component containing *entity_id* via active + ``same_as`` edges — the set of entity ids that current coref evidence + treats as the same actor. + + Reversible: deactivating a ``same_as`` edge (because the backing + coref hypothesis drops below 0.8) breaks the component, so this + always reflects the *current* state of the graph. + """ + if entity_id not in self.entities: + return set() + seen: set[str] = {entity_id} + frontier: list[str] = [entity_id] + while frontier: + cur = frontier.pop() + for nbr in self._active_same_as_neighbors(cur): + if nbr not in seen: + seen.add(nbr) + frontier.append(nbr) + return seen + + def actor_clusters(self) -> list[dict]: + """Group all entities into actor clusters via active ``same_as``. + + Returns a list of ``{members: [...], identifiers: [...], coref_hypotheses: [...]}`` + for the report agent and the orchestrator's cross-source views. + """ + unseen = set(self.entities.keys()) + clusters: list[dict] = [] + while unseen: + start = next(iter(unseen)) + members = self.resolve_actor_cluster(start) + unseen -= members + # Aggregate identifiers across the cluster (deduped on type+normalized). + ident_seen: set[tuple[str, str]] = set() + idents: list[dict] = [] + for eid in members: + for ident in self.entities[eid].identifiers: + key = (ident.get("type"), ident.get("normalized")) + if key in ident_seen: + continue + ident_seen.add(key) + idents.append({ + "type": ident.get("type"), + "value": ident.get("value"), + "strong": ident.get("strong"), + "on_entity": eid, + }) + coref_hyps = sorted({ + e.metadata.get("backed_by", "") + for e in self.edges + if e.edge_type == "same_as" + and e.metadata.get("active", True) + and (e.source_id in members or e.target_id in members) + } - {""}) + clusters.append({ + "members": sorted(members), + "identifiers": idents, + "coref_hypotheses": coref_hyps, + }) + return clusters # ---- Lead management (same as old Blackboard) ---------------------------- @@ -754,6 +1560,96 @@ class EvidenceGraph: self._auto_save() return aid, False + # ---- Tool invocation log ------------------------------------------------- + + async def record_tool_invocation( + self, + tool: str, + args: dict, + output: str, + cached: bool = False, + ) -> str: + """Record one tool call. Returns the invocation_id. + + Source / agent / task_id are read from the graph's current run + context (set by BaseAgent.run and set_active_source) so executors + can stay stateless. + """ + iid = f"inv-{uuid.uuid4().hex[:8]}" + src_id = self.active_source.id if self.active_source else "" + inv = ToolInvocation( + id=iid, + tool=tool, + args=dict(args), + output=output, + output_sha256=hashlib.sha256(output.encode("utf-8", errors="replace")).hexdigest(), + agent=self._current_agent or "unknown", + task_id=self._current_task_id or "", + source_id=src_id, + created_at=datetime.now().isoformat(), + cached=cached, + ) + async with self._lock: + self.tool_invocations[iid] = inv + # Cheap on cache hit; expensive but bounded otherwise. Skip + # auto-save here — too noisy if every tool call rewrites the + # state file; the next phenomenon write will flush. + return iid + + def validate_fact_grounding( + self, + fact: dict, + agent: str, + task_id: str, + ) -> tuple[bool, str]: + """Check a single verified_fact's grounding. Returns (ok, reason). + + Rules (DESIGN.md §4.4, refined after first end-to-end run): + 1. invocation_id must exist in self.tool_invocations. + 2. The invocation must have been made by `agent` within `task_id`. + 3. fact.value must appear in invocation.output — either as a + strict substring, OR (loose-match fallback) once both sides + are normalised via :func:`_normalize_for_grounding` + (case-folded, whitespace-collapsed, path-sep unified). + + The loose match catches the LLM's routine presentation + normalisations (case-folded hex, slash-flipped paths, collapsed + multi-line labels) without enabling fabrication: a string that + isn't present in ANY form still fails the normalised check. + """ + inv_id = fact.get("invocation_id", "") + value = fact.get("value", "") + if not inv_id: + return False, "missing invocation_id" + inv = self.tool_invocations.get(inv_id) + if inv is None: + return False, f"invocation_id {inv_id} not found in invocation log" + if inv.agent != agent: + return False, ( + f"invocation {inv_id} was made by agent '{inv.agent}', " + f"not '{agent}' — cannot be cited by a different agent" + ) + if task_id and inv.task_id and inv.task_id != task_id: + return False, ( + f"invocation {inv_id} was made in a different task scope " + f"({inv.task_id}) — cite only invocations from your current task" + ) + if not isinstance(value, str) or not value: + return False, "fact.value must be a non-empty string" + if value in inv.output: + return True, "ok" + # Loose fallback: normalised comparison absorbs case / whitespace / + # path-sep differences but a genuinely absent value still fails. + if _normalize_for_grounding(value) in _normalize_for_grounding(inv.output): + return True, "ok-normalized" + return False, ( + f"fact.value not found in invocation {inv_id} output — even after " + f"case/whitespace/path-sep normalisation. Copy a literal substring " + f"from that tool's result; if the content is a guess (device model, " + f"constructed path, label-joined value), move it into `interpretation` " + f"instead of `verified_facts`." + ) + # ---- Asset library ------------------------------------------------------- async def register_asset( @@ -838,7 +1734,12 @@ class EvidenceGraph: kw = keyword.lower() results = [] for ph in self.phenomena.values(): - if kw in ph.title.lower() or kw in ph.description.lower(): + haystack = ( + ph.title.lower() + + " " + ph.interpretation.lower() + + " " + " ".join(str(f.get("value", "")).lower() for f in ph.verified_facts) + ) + if kw in haystack: results.append(ph.summary()) for hyp in self.hypotheses.values(): if kw in hyp.title.lower() or kw in hyp.description.lower(): @@ -899,6 +1800,87 @@ class EvidenceGraph: if h.status == "active": h.status = "inconclusive" + # ---- Hypothesis × Evidence matrix (DESIGN.md §4.5) ----------------------- + + def hypothesis_evidence_matrix(self) -> dict: + """Structured pivot of every Phenomenon→Hypothesis edge. + + Returns ``{"hypotheses": [...], "phenomena": [...], "cells": {...}, + "counts_by_edge_type": {hyp_id: {edge_type: count}}}`` — the cells + map ``(hyp_id, ph_id)`` to a *list* of edge_type strings (a single + phenomenon may link via several edge_types after a manual override + plus an LLM judge call). Drives report rendering and gap selection. + """ + cells: dict[tuple[str, str], list[str]] = {} + counts: dict[str, dict[str, int]] = {h: {} for h in self.hypotheses} + for edge in self.edges: + if not ( + edge.source_id.startswith("ph-") + and edge.target_id.startswith("hyp-") + and edge.edge_type in self.edge_log_lr + ): + continue + key = (edge.target_id, edge.source_id) + cells.setdefault(key, []).append(edge.edge_type) + counts.setdefault(edge.target_id, {})[edge.edge_type] = ( + counts.setdefault(edge.target_id, {}).get(edge.edge_type, 0) + 1 + ) + + hypotheses = [ + { + "id": h.id, + "title": h.title, + "confidence": h.confidence, + "log_odds": h.log_odds, + "status": h.status, + } + for h in self.hypotheses.values() + ] + referenced = {ph_id for (_, ph_id) in cells} + phenomena = [ + {"id": ph.id, "title": ph.title, "category": ph.category} + for ph in self.phenomena.values() + if ph.id in referenced + ] + return { + "hypotheses": hypotheses, + "phenomena": phenomena, + "cells": {f"{h}|{p}": types for (h, p), types in cells.items()}, + "counts_by_edge_type": counts, + } + + def hypothesis_evidence_matrix_markdown(self) -> str: + """Render the matrix as a compact markdown pivot. + + Columns are the edge types (counts), plus log_odds, confidence, + status — enough for the report agent to ground every hypothesis + in its supporting and contradicting evidence at a glance. + """ + if not self.hypotheses: + return "(no hypotheses)" + matrix = self.hypothesis_evidence_matrix() + edge_types = sorted(self.edge_log_lr.keys()) + header = ( + "| Hypothesis | " + + " | ".join(edge_types) + + " | log_odds | conf | status |" + ) + sep = ( + "|---|" + + "|".join(["---:"] * len(edge_types)) + + "|---:|---:|---|" + ) + rows = [header, sep] + for h in matrix["hypotheses"]: + counts = matrix["counts_by_edge_type"].get(h["id"], {}) + cells = [str(counts.get(et, 0)) for et in edge_types] + rows.append( + f"| {h['title']} | " + + " | ".join(cells) + + f" | {h['log_odds']:+.2f} | {h['confidence']:.2f} | {h['status']} |" + ) + return "\n".join(rows) + # ---- Summary (lightweight, for system prompt) ---------------------------- def stats_summary(self) -> str: diff --git a/llm_client.py b/llm_client.py index 6b6e7d3..cee06a9 100644 --- a/llm_client.py +++ b/llm_client.py @@ -142,6 +142,12 @@ READ_ONLY_TOOLS: set[str] = { # Parser reads "read_text_file", "read_binary_preview", "search_text_file", "read_text_file_section", "list_extracted_dir", "parse_pcap_strings", + "find_files", + # iOS plugin reads (S4) + "parse_plist", "sqlite_tables", "sqlite_query", + "parse_ios_keychain", "read_idevice_info", + # Android + media reads (S6) — set_active_partition is NOT read-only. + "probe_android_partitions", "ocr_image", } @@ -503,7 +509,7 @@ class LLMClient: tools: list[dict], tool_executor: dict[str, Any], system: str | None = None, - max_iterations: int = 40, + max_iterations: int = 60, terminal_tools: tuple[str, ...] = (), ) -> tuple[str, list[dict]]: """Run a tool-calling loop using OpenAI-native tool calls. diff --git a/main.py b/main.py index 51a416d..7f57639 100644 --- a/main.py +++ b/main.py @@ -15,17 +15,21 @@ from pathlib import Path import yaml from agent_factory import AgentFactory +from case import ( + DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case, +) from evidence_graph import EvidenceGraph from llm_client import LLMClient from log_config import setup_logging from orchestrator import AnalysisAborted, Orchestrator from tool_registry import register_all_tools +from tools.archive import unzip_archive_sync RUNS_DIR = Path("runs") IMAGE_DIR = Path("image") - -# Common forensic image extensions (only first segment / single-file formats) -_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"] +# Persistent unpack cache for tree-mode sources (zip extractions). Lives +# at project root so multiple runs can reuse the same unpacked tree. +SOURCE_CACHE_DIR = Path(".cache/sources") def load_config(path: str = "config.yaml") -> dict: @@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict: # --------------------------------------------------------------------------- def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]: - """Find forensic disk image files under *search_dir*.""" - images: set[Path] = set() - for glob in _IMAGE_GLOBS: - images.update(search_dir.glob(glob)) - return sorted(images) + """Find forensic disk image files under *search_dir* (case-insensitive ext).""" + if not search_dir.is_dir(): + return [] + return sorted( + p for p in search_dir.iterdir() + if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS + ) def _parse_mmls(output: str) -> list[dict]: @@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]: images = _discover_images(image_dir) if not images: print(f"No disk images found in {image_dir}/") - print("Supported formats: " + ", ".join(_IMAGE_GLOBS)) + print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS))) sys.exit(1) if len(images) == 1: @@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]: print("Invalid choice.") +def resolve_case() -> Case: + """Resolve the Case to analyze. + + Priority: an explicit case file given as a CLI argument, then ./case.yaml + in the working directory, then legacy interactive single-image selection. + """ + # 1. Explicit case file passed on the command line + if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")): + case = load_case(sys.argv[1]) + if case is None: + print(f"Error: could not load case file {sys.argv[1]}") + sys.exit(1) + print(f"Loaded case: {case.name} ({len(case.sources)} sources)") + return case + + # 2. ./case.yaml in the working directory + case = load_case() + if case is not None: + print(f"Loaded case: {case.name} ({len(case.sources)} sources)") + return case + + # 3. Legacy interactive single-image selection + cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None + image_path, partition_offset = select_image_interactive(cli_dir) + return single_source_case(image_path, partition_offset) + + +def _is_analysable(src: EvidenceSource) -> bool: + """A source is analysable when it has a path AND its mode has tooling. + + S4 lights up tree-mode iOS extractions; image-mode disks were already + supported. Media-collection (screenshots) remain skipped until S6. + """ + if not src.path: + return False + if src.access_mode == "image": + return True + if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"): + return True + return False + + +def list_analysable_sources(case: Case) -> list[EvidenceSource]: + """Return every analysable source in the case (orchestrator iterates them). + + Pre-S6 main.py used to force-choose one source here; the multi-source + orchestrator (Phase 1 per-source triage) now consumes the full list. + Skipped sources are still reported for visibility. + """ + analysable = [s for s in case.sources if _is_analysable(s)] + skipped = [s for s in case.sources if not _is_analysable(s)] + if skipped: + print( + f"Note: {len(skipped)} source(s) not analysable in this build: " + + ", ".join(f"{s.label} ({s.type})" for s in skipped) + ) + if not analysable: + print("No analysable sources in this case.") + sys.exit(1) + print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:") + for s in analysable: + print(f" - {s.summary()}") + return analysable + + +def prepare_source(src: EvidenceSource) -> EvidenceSource: + """Materialise a tree-mode source for analysis. + + Mobile / archive sources arrive as .zip files. We unpack once into a + project-level cache (``.cache/sources//``) and rewrite + ``src.path`` to point at the unpacked directory. Idempotent — a + second run with the cache present is a no-op (unzip_archive_sync + skips files that already exist with the matching size). + + Disk-image and already-tree sources pass through unchanged. + """ + if src.access_mode != "tree": + return src + p = Path(src.path) + if p.is_dir(): + return src # already a directory, nothing to do + if not p.is_file(): + print(f"Warning: source path {src.path} does not exist; leaving as-is.") + return src + if p.suffix.lower() != ".zip": + # Other archive types (tar, 7z, ...) — not handled yet. + print(f"Warning: tree-mode source {src.id} is not a .zip " + f"({p.suffix}); leaving as-is.") + return src + + dest = SOURCE_CACHE_DIR / src.id + dest.mkdir(parents=True, exist_ok=True) + # Password-protected zips (e.g. CTF artefacts) carry their key in + # case.yaml's meta.password — never logged, never persisted. + password = (src.meta or {}).get("password") + pw_note = " (password from meta)" if password else "" + print(f"Unpacking {p.name} → {dest}{pw_note} (idempotent) ...") + result = unzip_archive_sync(str(p), str(dest), password=password) + first_line = result.split("\n", 1)[0] + print(" " + first_line) + if first_line.startswith("Error:"): + # Surface the multi-line guidance from _do_extract verbatim. + for extra in result.split("\n")[1:]: + print(" " + extra) + print(f" Source {src.id} stays unanalysable until this is resolved.") + # Leave src.path unchanged so the source remains marked unanalysable. + return src + src.path = str(dest) + src.access_mode = "tree" + return src + + def find_resumable_run() -> Path | None: """Find the most recent incomplete run with a saved graph state.""" if not RUNS_DIR.exists(): @@ -225,22 +343,30 @@ async def async_main() -> None: # Initialize evidence graph if graph is None: - # CLI arg takes priority, otherwise interactive prompt - cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None - image_path, partition_offset = select_image_interactive(cli_dir) + case = resolve_case() + # case_info derived from THIS case's meta (case.yaml), not from + # config.yaml's legacy `cfreds_hacking_case` block. Without this, + # the old CFReDS evidence MD5s would be embedded in reports for + # every subsequent unrelated case. graph = EvidenceGraph( - case_info=config.get("cfreds_hacking_case", {}), + case_info=dict(case.meta or {}), persist_path=run_dir / "graph_state.json", - edge_weights=config.get("hypothesis_edge_weights"), + edge_log_lr=config.get("hypothesis_log_lr"), ) - graph.image_path = image_path - graph.partition_offset = partition_offset + graph.case = case graph.extracted_dir = str(run_dir / "extracted") + analysable = list_analysable_sources(case) + # Prepare every analysable source up front (unzip tree-mode zips, + # etc.). Idempotent on cache hits — second run is a no-op. + prepared = [prepare_source(s) for s in analysable] + # Seed the active source so tools that resolve lazily have a target + # before Phase 1 begins; the orchestrator resets it per source. + graph.set_active_source(prepared[0]) else: graph._persist_path = run_dir / "graph_state.json" - # Register all tools with bound image path - register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir) + # Register all tools — they resolve the active evidence source at call time + register_all_tools(graph) # Create agent factory factory = AgentFactory(llm, graph) diff --git a/orchestrator.py b/orchestrator.py index a71ade8..7821ad7 100644 --- a/orchestrator.py +++ b/orchestrator.py @@ -10,7 +10,7 @@ import time from datetime import datetime from pathlib import Path -from agent_factory import AgentFactory +from agent_factory import AgentFactory, get_triage_agent_type from evidence_graph import EvidenceGraph from llm_client import LLMClient, _extract_first_balanced, _safe_json_loads from tool_registry import TOOL_CATALOG @@ -518,7 +518,7 @@ class Orchestrator: if not unlinked: return - valid_types = list(self.graph.edge_weights.keys()) + valid_types = list(self.graph.edge_log_lr.keys()) hyp_section = "\n".join( f" [{h.id}] {h.title}: {h.description}" for h in active @@ -551,7 +551,7 @@ class Orchestrator: if ( hyp_id in self.graph.hypotheses and ph_id in self.graph.phenomena - and edge_type in self.graph.edge_weights + and edge_type in self.graph.edge_log_lr ): await self.graph.update_hypothesis_confidence( hyp_id=hyp_id, @@ -593,7 +593,7 @@ class Orchestrator: ph_id = j.get("phenomenon_id", "") edge_type = j.get("edge_type", "") reason = j.get("reason", "") - if ph_id in self.graph.phenomena and edge_type in self.graph.edge_weights: + if ph_id in self.graph.phenomena and edge_type in self.graph.edge_log_lr: await self.graph.update_hypothesis_confidence( hyp_id=hyp.id, phenomenon_id=ph_id, @@ -618,7 +618,10 @@ class Orchestrator: phenomena (deterministic — the canonical tool was actually called). """ evidence_text = " ".join( - f"{ph.category} {ph.title} {ph.description}".lower() + ( + f"{ph.category} {ph.title} {ph.interpretation} " + + " ".join(str(f.get("value", "")) for f in ph.verified_facts) + ).lower() for ph in self.graph.phenomena.values() ) used_tools: set[str] = { @@ -747,28 +750,103 @@ class Orchestrator: # ---- Main pipeline ------------------------------------------------------- + # ---- Phase 1 helpers (multi-source triage) ------------------------------- + + @staticmethod + def _is_analysable(src) -> bool: + """Mirror of main._is_analysable so the orchestrator doesn't depend + on main.py's import. Disk-image sources need a path; tree-mode + sources are analysable when they're mobile_extraction or archive. + """ + if not getattr(src, "path", ""): + return False + if src.access_mode == "image": + return True + if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"): + return True + # media_collection is analysable too once a MediaAgent is registered. + if src.type == "media_collection": + return True + return False + + def _sources_to_triage(self) -> list: + """Pick every analysable source in the case (or fall back to the + single active_source for the legacy single-image path). + """ + case = self.graph.case + if case is None or not case.sources: + return [self.graph.active_source] if self.graph.active_source else [] + return [s for s in case.sources if self._is_analysable(s)] + + async def _phase1_triage_source(self, src) -> tuple[int, int]: + """Run the right triage agent on one source. Returns (Δphenomena, Δleads).""" + ph_before = len(self.graph.phenomena) + leads_before = sum(1 for l in self.graph.leads if l.status == "pending") + + self.graph.set_active_source(src) + agent_type = get_triage_agent_type(src) + agent = self.factory.get_or_create_agent(agent_type) + if agent is None: + logger.warning( + "No agent registered for type %s — skipping source %s", + agent_type, src.id, + ) + return 0, 0 + + _log( + f"Phase 1 triage: {src.id} ({src.label}) → {agent_type}", + event="dispatch", agent=agent_type, source=src.id, + ) + try: + await agent.run( + f"Perform an initial Phase-1 triage of source {src.id} " + f"({src.label}, type={src.type}). Survey the source's " + f"structure, identify the most interesting artefacts, and " + f"record significant findings via add_phenomenon. Call " + f"observe_identity for any concrete identifiers (email, " + f"phone, Apple ID, IMEI, wallet address, persistent " + f"username) you encounter — that's how this finding will " + f"link across the other sources in the case. Create " + f"add_lead for follow-up that's outside your scope." + ) + except Exception as e: + logger.error("Phase 1 agent [%s] failed on %s: %s", agent_type, src.id, e) + + return ( + len(self.graph.phenomena) - ph_before, + sum(1 for l in self.graph.leads if l.status == "pending") - leads_before, + ) + async def run(self, resume_phase: int = 1) -> str: """Run the 5-phase hypothesis-driven forensic analysis pipeline.""" - _log(f"Phase 1: Filesystem Survey (image: {Path(self.graph.image_path).name})", event="phase") + sources = self._sources_to_triage() + _log( + f"Phase 1: per-source triage ({len(sources)} source(s))", + event="phase", + ) report = "" try: - # Phase 1: Initial filesystem survey + # Phase 1: Initial per-source triage (S6 multi-source). + # Runs sequentially so each agent gets its own task_id scope — + # the grounding gateway requires that, and shared graph state + # (active_source, partition_offset) would race under parallel + # dispatch anyway. if resume_phase <= 1: t0 = time.monotonic() ph_before = len(self.graph.phenomena) - fs_agent = self.factory.get_or_create_agent("filesystem") - if fs_agent: - await fs_agent.run( - "Perform an initial survey of this disk image. " - "Examine the partition table, filesystem type, and root directory structure. " - "List key user directories and identify interesting files (documents, emails, " - "chat logs, installed programs, registry hives). " - "Create leads for other agents based on what you find." + for src in sources: + new_ph, new_leads = await self._phase1_triage_source(src) + _log( + f" {src.id}: +{new_ph} phenomena, +{new_leads} leads", + event="progress", source=src.id, ) - new_ph = len(self.graph.phenomena) - ph_before - new_leads = sum(1 for l in self.graph.leads if l.status == "pending") - _log(f"+{new_ph} phenomena, +{new_leads} leads", event="progress", elapsed=time.monotonic() - t0) + total_ph = len(self.graph.phenomena) - ph_before + total_leads = sum(1 for l in self.graph.leads if l.status == "pending") + _log( + f"Phase 1 total: +{total_ph} phenomena, {total_leads} pending leads", + event="progress", elapsed=time.monotonic() - t0, + ) # Phase 2: Hypothesis generation if resume_phase <= 2: @@ -865,8 +943,15 @@ class Orchestrator: "6. Conclusions and Recommendations" ) - image_stem = Path(self.graph.image_path).stem - report_name = f"{image_stem}_forensic_report.md" + # Multi-source case → name by case_id (stable across sources). + # Legacy single-image runs without a Case → fall back to the + # last active image's stem so old workflows still produce a + # plausible filename. + if self.graph.case and self.graph.case.case_id: + stem = self.graph.case.case_id + else: + stem = Path(self.graph.image_path).stem or "case" + report_name = f"{stem}_forensic_report.md" report_path = (self.run_dir / report_name) if self.run_dir else Path(report_name) try: report_path.write_text(report) diff --git a/pyproject.toml b/pyproject.toml index f53f96d..c163497 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -6,6 +6,8 @@ requires-python = ">=3.14" dependencies = [ "httpx[socks]>=0.28.1", "openai>=2.36.0", + "pillow>=12.2.0", + "pytesseract>=0.3.13", "pyyaml", "regipy>=6.2.1", ] diff --git a/regenerate_report.py b/regenerate_report.py index ee13af4..e4cc58c 100644 --- a/regenerate_report.py +++ b/regenerate_report.py @@ -32,10 +32,10 @@ async def main() -> None: config = yaml.safe_load(open("config.yaml")) agent_cfg = config["agent"] - # Load graph (edge_weights from config — applied to the loaded graph) + # Load graph (edge_log_lr from config — applied to the loaded graph) graph = EvidenceGraph.load_state( state_path, - edge_weights=config.get("hypothesis_edge_weights"), + edge_log_lr=config.get("hypothesis_log_lr"), ) print(f"Loaded: {graph.stats_summary()}") @@ -49,7 +49,7 @@ async def main() -> None: thinking_enabled=agent_cfg.get("thinking_enabled", False), ) - register_all_tools(graph.image_path, graph.partition_offset, graph) + register_all_tools(graph) factory = AgentFactory(llm, graph) # Run only the report agent diff --git a/tests/test_optimizations.py b/tests/test_optimizations.py index 2b703ee..bf139ec 100644 --- a/tests/test_optimizations.py +++ b/tests/test_optimizations.py @@ -8,8 +8,9 @@ import time import pytest from evidence_graph import ( - EvidenceGraph, Phenomenon, Lead, + EvidenceGraph, Phenomenon, Hypothesis, Lead, GroundingError, _compute_quality_score, _jaccard_similarity, + prob_to_log_odds, log_odds_to_prob, ) from llm_client import ( _truncate_tool_result, _partition_tool_calls, _ToolBatch, READ_ONLY_TOOLS, @@ -26,11 +27,15 @@ from tool_registry import ( class TestQualityScore: def test_full_score(self): + # With five grounded facts (5 × 0.05 = max 0.25 contribution) plus + # source_tool 0.20 + timestamp 0.15 + raw_data 0.15 + long interp 0.10 + # + related_ids 0.15 = 1.00. score = _compute_quality_score( source_tool="list_dir", timestamp="2024-01-01", raw_data={"key": "val"}, - description="A" * 50, + interpretation="A" * 50, + verified_facts=[{"type": "raw", "value": "x", "invocation_id": "inv-1"}] * 5, related_ids=["ph-0"], ) assert score == pytest.approx(1.0) @@ -40,20 +45,23 @@ class TestQualityScore: source_tool="", timestamp=None, raw_data={}, - description="short", + interpretation="short", + verified_facts=[], related_ids=[], ) assert score == pytest.approx(0.0) def test_partial_score(self): + # source_tool 0.20 + interpretation>=50 0.10 = 0.30 (no facts, no ts, no raw) score = _compute_quality_score( source_tool="parse_registry_key", timestamp=None, raw_data={}, - description="A" * 50, + interpretation="A" * 50, + verified_facts=[], related_ids=[], ) - assert score == pytest.approx(0.40) + assert score == pytest.approx(0.30) class TestJaccardSimilarity: @@ -90,7 +98,9 @@ class TestPhenomenonDedup: ) assert not merged ph = graph.phenomena[pid] - assert ph.confidence == pytest.approx(0.40) + # source_tool 0.20 + interpretation >= 50 chars 0.10 = 0.30 + # (verified_facts and raw_data both empty here) + assert ph.confidence == pytest.approx(0.30) @pytest.mark.asyncio async def test_identical_phenomenon_merges(self, graph): @@ -198,8 +208,9 @@ class TestHypothesisConfidence: pid, _ = await graph.add_phenomenon("fs", "filesystem", "test", "test desc", source_tool="t") hid = await graph.add_hypothesis("test hyp", "desc", created_by="test") conf = await graph.update_hypothesis_confidence(hid, pid, "direct_evidence", "reason") - # direct_evidence weight is +0.25 * (1-0.5) = +0.125 - assert conf == pytest.approx(0.625) + # S3 log-odds: prior 0.5 (L=0) + direct_evidence (+2.0) = L=2.0 + # confidence = 1 / (1 + 10^-2) ≈ 0.9901 + assert conf == pytest.approx(0.9901, abs=1e-3) @pytest.mark.asyncio async def test_confidence_log_tracked(self, graph): @@ -1506,3 +1517,1287 @@ class TestInvestigationAreaDerivation: assert graph.leads[0].hypothesis_id == h assert graph.leads[0].target_agent == "registry" assert graph.leads[0].priority == 2 + + +# --------------------------------------------------------------------------- +# S2 grounding gateway — ToolInvocation + add_phenomenon validation +# --------------------------------------------------------------------------- + +class TestAgentTaskContextIsolation: + """P0 race fix: _current_agent / _current_task_id are per-asyncio-task. + + Pre-fix, ``graph._current_agent`` was a plain instance attribute that + every concurrent agent stomped over. With contextvars-backed + properties, three agents launched via :func:`asyncio.gather` each see + their own values, and ``record_tool_invocation`` tags each invocation + with the correct issuer. + """ + + @pytest.mark.asyncio + async def test_concurrent_agents_get_isolated_contexts(self): + graph = EvidenceGraph() + + async def run_agent(name: str) -> dict: + graph._current_agent = name + graph._current_task_id = f"task-{name}" + # Yield so the other tasks get a chance to overwrite if the + # state were shared. Pre-fix, this is what surfaced the race. + await asyncio.sleep(0.001) + inv = await graph.record_tool_invocation( + tool="probe", args={"who": name}, output=f"hello from {name}", + ) + return { + "name": name, + "read_agent": graph._current_agent, + "read_task": graph._current_task_id, + "inv": graph.tool_invocations[inv], + } + + results = await asyncio.gather( + run_agent("agent-A"), + run_agent("agent-B"), + run_agent("agent-C"), + ) + for r in results: + # Each task reads back its own values, not a sibling's. + assert r["read_agent"] == r["name"], r + assert r["read_task"] == f"task-{r['name']}", r + # And the invocation log tags the invocation with the right agent / task. + assert r["inv"].agent == r["name"] + assert r["inv"].task_id == f"task-{r['name']}" + + @pytest.mark.asyncio + async def test_grounding_gateway_under_concurrent_agents(self): + """End-to-end: two concurrent agents each record + cite their own + invocation; the grounding gateway accepts both without cross-talk. + """ + graph = EvidenceGraph() + + async def cycle(name: str) -> str: + graph._current_agent = name + graph._current_task_id = f"task-{name}" + await asyncio.sleep(0.001) + inv = await graph.record_tool_invocation( + tool="t", args={}, output=f"unique-output-for-{name}", + ) + await asyncio.sleep(0.001) + pid, _ = await graph.add_phenomenon( + source_agent=name, category="x", title=f"finding by {name}", + interpretation="", + verified_facts=[{ + "type": "raw", + "value": f"unique-output-for-{name}", + "invocation_id": inv, + }], + source_tool="t", + ) + return pid + + pids = await asyncio.gather( + cycle("agent-A"), + cycle("agent-B"), + cycle("agent-C"), + ) + # All three phenomena landed — none was rejected by the gateway. + assert all(pid in graph.phenomena for pid in pids) + + +class TestGroundingGateway: + """Code-level enforcement of DESIGN.md §4.4. + + A phenomenon's verified_facts must trace back to a real ToolInvocation + made by the same agent within the same task scope, and each fact.value + must appear verbatim in that invocation's recorded output. + """ + + @pytest.fixture + def graph(self): + g = EvidenceGraph() + g._current_agent = "fs" + g._current_task_id = "task-abc" + return g + + @pytest.mark.asyncio + async def test_record_invocation_and_cite_succeeds(self, graph): + inv_id = await graph.record_tool_invocation( + tool="list_directory", + args={"inode": "33"}, + output="d/d 33-128-1: secret.txt\nr/r 33-128-2: another.bin", + ) + pid, merged = await graph.add_phenomenon( + source_agent="fs", category="filesystem", + title="Secret file present", + interpretation="Found a suggestive filename", + verified_facts=[ + {"type": "path", "value": "secret.txt", "invocation_id": inv_id}, + {"type": "inode", "value": "33-128-1", "invocation_id": inv_id}, + ], + source_tool="list_directory", + ) + assert not merged + ph = graph.phenomena[pid] + assert len(ph.verified_facts) == 2 + + @pytest.mark.asyncio + async def test_fact_value_not_in_output_rejected(self, graph): + inv_id = await graph.record_tool_invocation( + tool="list_directory", args={}, output="just two files", + ) + with pytest.raises(GroundingError) as exc: + await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="bogus", + verified_facts=[ + {"type": "path", "value": "/etc/shadow", "invocation_id": inv_id}, + ], + source_tool="list_directory", + ) + # New error message wording — strict + loose both failed. + msg = str(exc.value) + assert "not found" in msg and "normalisation" in msg + + @pytest.mark.asyncio + async def test_loose_match_accepts_case_folded_hex(self, graph): + """LLM frequently up-cases hex bytes; the loose match accepts it.""" + inv_id = await graph.record_tool_invocation( + tool="read_binary_preview", args={}, output="header: 89 50 4e 47 ...", + ) + pid, _ = await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="PNG header", + verified_facts=[{ + "type": "hash", "value": "89 50 4E 47", "invocation_id": inv_id, + }], + source_tool="read_binary_preview", + ) + assert pid in graph.phenomena + + @pytest.mark.asyncio + async def test_loose_match_accepts_label_collapsed_across_newlines(self, graph): + """`AppleID:\\n whoishogan@gmail.com` ≡ `AppleID: whoishogan@gmail.com`.""" + inv_id = await graph.record_tool_invocation( + tool="parse_plist", args={}, + output="Key: AppleID\n Value: whoishogan@gmail.com\nNext: ...", + ) + pid, _ = await graph.add_phenomenon( + source_agent="fs", category="identity_observation", + title="Apple ID on burner", + verified_facts=[{ + "type": "email", + "value": "Value: whoishogan@gmail.com", # label + value + "invocation_id": inv_id, + }], + source_tool="parse_plist", + ) + assert pid in graph.phenomena + + @pytest.mark.asyncio + async def test_loose_match_accepts_path_separator_flip(self, graph): + """Backslash output, forward-slash value — same file.""" + inv_id = await graph.record_tool_invocation( + tool="search_strings", args={}, output="archive: Sunny\\tor-portable.exe", + ) + pid, _ = await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="tor portable in archive", + verified_facts=[{ + "type": "path", "value": "Sunny/tor-portable.exe", "invocation_id": inv_id, + }], + source_tool="search_strings", + ) + assert pid in graph.phenomena + + @pytest.mark.asyncio + async def test_loose_match_does_not_let_genuine_hallucination_through(self, graph): + """Even with all normalisations applied, an absent value is still rejected.""" + inv_id = await graph.record_tool_invocation( + tool="t", args={}, output="alice and bob discussed lunch", + ) + with pytest.raises(GroundingError): + await graph.add_phenomenon( + source_agent="fs", category="x", title="fabricated", + verified_facts=[{ + "type": "raw", "value": "they planned the heist", + "invocation_id": inv_id, + }], + source_tool="t", + ) + + @pytest.mark.asyncio + async def test_invocation_from_other_agent_rejected(self, graph): + # record under a different agent + graph._current_agent = "registry" + inv_id = await graph.record_tool_invocation( + tool="parse_registry_key", args={}, output="HKLM\\System\\CCS", + ) + # now switch to fs and try to cite registry's invocation + graph._current_agent = "fs" + with pytest.raises(GroundingError) as exc: + await graph.add_phenomenon( + source_agent="fs", category="registry", title="hijacked", + verified_facts=[ + {"type": "raw", "value": "HKLM", "invocation_id": inv_id}, + ], + source_tool="parse_registry_key", + ) + assert "different agent" in str(exc.value) or "not '" in str(exc.value) + + @pytest.mark.asyncio + async def test_invocation_from_other_task_rejected(self, graph): + # Agent runs task A + graph._current_task_id = "task-A" + inv_a = await graph.record_tool_invocation( + tool="search_strings", args={"pattern": "foo"}, output="foo found at offset 1024", + ) + # Same agent runs task B — must not be allowed to forward task A's id + graph._current_task_id = "task-B" + with pytest.raises(GroundingError) as exc: + await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="stale", + verified_facts=[ + {"type": "raw", "value": "foo found", "invocation_id": inv_a}, + ], + source_tool="search_strings", + ) + assert "different task" in str(exc.value) + + @pytest.mark.asyncio + async def test_missing_invocation_id_rejected(self, graph): + with pytest.raises(GroundingError): + await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="ghost", + verified_facts=[{"type": "raw", "value": "x"}], + source_tool="t", + ) + + @pytest.mark.asyncio + async def test_unknown_invocation_id_rejected(self, graph): + with pytest.raises(GroundingError) as exc: + await graph.add_phenomenon( + source_agent="fs", category="filesystem", title="ghost", + verified_facts=[ + {"type": "raw", "value": "x", "invocation_id": "inv-does-not-exist"}, + ], + source_tool="t", + ) + assert "not found in invocation log" in str(exc.value) + + @pytest.mark.asyncio + async def test_empty_verified_facts_allowed_for_negative_findings(self, graph): + # A negative finding ("searched X, found nothing") is permitted — + # no fact is required, the interpretation alone is recorded. + pid, _ = await graph.add_phenomenon( + source_agent="fs", category="filesystem", + title="No matches for 'cain' in user dirs", + interpretation="Searched C:\\Documents and Settings — no hits.", + source_tool="search_strings", + ) + assert pid in graph.phenomena + assert graph.phenomena[pid].verified_facts == [] + + @pytest.mark.asyncio + async def test_persistence_round_trip_includes_invocations(self, graph, tmp_path): + inv_id = await graph.record_tool_invocation( + tool="list_directory", args={"inode": "1"}, output="root listing here", + ) + path = tmp_path / "state.json" + graph.save_state(path) + loaded = EvidenceGraph.load_state(path) + assert inv_id in loaded.tool_invocations + inv = loaded.tool_invocations[inv_id] + assert inv.tool == "list_directory" + assert inv.output == "root listing here" + + @pytest.mark.asyncio + async def test_legacy_description_loaded_as_interpretation(self): + # Pre-S2 state files persist description; from_dict must migrate. + legacy = { + "id": "ph-legacy01", + "source_agent": "fs", + "category": "filesystem", + "title": "old finding", + "description": "this was the analysis text in the old schema", + } + ph = Phenomenon.from_dict(legacy) + assert ph.interpretation == "this was the analysis text in the old schema" + assert ph.verified_facts == [] + assert not hasattr(ph, "description") + + +# --------------------------------------------------------------------------- +# S3 log-odds confidence — order independence, calibration, idempotency +# --------------------------------------------------------------------------- + +class TestLogOddsConfidence: + """Verify the additive log-odds update fixes the P3 ordering bug and + matches DESIGN.md §4.5 calibration values. + """ + + @pytest.fixture + def graph(self): + return EvidenceGraph() + + def test_log_odds_round_trip(self): + for p in (0.1, 0.3, 0.5, 0.7, 0.9): + assert log_odds_to_prob(prob_to_log_odds(p)) == pytest.approx(p) + assert prob_to_log_odds(0.5) == pytest.approx(0.0) + assert log_odds_to_prob(0.0) == pytest.approx(0.5) + + @pytest.mark.asyncio + async def test_order_independence(self, graph): + """Same edges in different orders → same final confidence (P3 fix).""" + # Two parallel runs with different application orders. + confs = [] + for order in (["supports", "weakens", "supports"], + ["weakens", "supports", "supports"], + ["supports", "supports", "weakens"]): + g = EvidenceGraph() + hid = await g.add_hypothesis("h", "d") + for i, etype in enumerate(order): + pid, _ = await g.add_phenomenon( + "fs", "filesystem", f"ph {i}", f"interp {i}", + source_tool=f"t{i}", + ) + await g.update_hypothesis_confidence(hid, pid, etype, "") + confs.append(g.hypotheses[hid].confidence) + # All three orderings must agree exactly. + assert confs[0] == pytest.approx(confs[1]) + assert confs[1] == pytest.approx(confs[2]) + # And the value should be 1 + 1 − 0.5 = 1.5 → sigmoid ≈ 0.9694 + assert confs[0] == pytest.approx(0.9694, abs=1e-3) + + @pytest.mark.asyncio + async def test_each_edge_type_calibrated(self, graph): + """Each edge type produces the documented log_lr contribution.""" + expected = { + "direct_evidence": +2.0, + "supports": +1.0, + "consequence_observed": +1.0, + "prerequisite_met": +0.5, + "weakens": -0.5, + "contradicts": -2.0, + } + for etype, log_lr in expected.items(): + g = EvidenceGraph() + pid, _ = await g.add_phenomenon( + "fs", "filesystem", f"ph-{etype}", "interp", + source_tool="t", + ) + hid = await g.add_hypothesis("h", "d") + conf = await g.update_hypothesis_confidence(hid, pid, etype, "") + assert g.hypotheses[hid].log_odds == pytest.approx(log_lr) + assert conf == pytest.approx(log_odds_to_prob(log_lr)) + + @pytest.mark.asyncio + async def test_status_flips_at_threshold(self, graph): + hid = await graph.add_hypothesis("h", "d") + pid, _ = await graph.add_phenomenon( + "fs", "filesystem", "evidence", "interp", source_tool="t", + ) + # Single supports (+1.0) → conf ≈ 0.909 (above 0.8 threshold) + await graph.update_hypothesis_confidence(hid, pid, "supports", "") + assert graph.hypotheses[hid].status == "supported" + + # Now stack two contradicts (−2 ×2 = −4); log_odds = 1 − 4 = −3 → conf ≈ 0.001 + for i in range(2): + pid_n, _ = await graph.add_phenomenon( + "fs", "filesystem", f"contra {i}", f"d {i}", source_tool="t", + ) + await graph.update_hypothesis_confidence(hid, pid_n, "contradicts", "") + assert graph.hypotheses[hid].status == "refuted" + assert graph.hypotheses[hid].confidence < 0.05 + + @pytest.mark.asyncio + async def test_idempotency_same_triple(self, graph): + """Re-applying the same (ph, hyp, edge_type) triple must NOT double-count.""" + pid, _ = await graph.add_phenomenon( + "fs", "filesystem", "evidence", "interp", source_tool="t", + ) + hid = await graph.add_hypothesis("h", "d") + await graph.update_hypothesis_confidence(hid, pid, "supports", "first") + first = graph.hypotheses[hid].confidence + first_edges = sum(1 for e in graph.edges if e.edge_type == "supports") + + # Same triple again — should be a no-op (return current confidence). + again = await graph.update_hypothesis_confidence(hid, pid, "supports", "dup") + assert again == pytest.approx(first) + # No additional edge created. + assert sum(1 for e in graph.edges if e.edge_type == "supports") == first_edges + + @pytest.mark.asyncio + async def test_independent_evidence_accumulates(self, graph): + """Distinct phenomena with same edge_type DO accumulate (independent).""" + hid = await graph.add_hypothesis("h", "d") + for i in range(3): + pid, _ = await graph.add_phenomenon( + "fs", "filesystem", f"ph {i}", f"d {i}", source_tool="t", + ) + await graph.update_hypothesis_confidence(hid, pid, "supports", "") + # 3 × +1.0 = +3.0 log_odds → conf ≈ 0.999 + assert graph.hypotheses[hid].log_odds == pytest.approx(3.0) + assert graph.hypotheses[hid].confidence > 0.99 + @pytest.mark.asyncio + async def test_prior_prob_shifts_starting_log_odds(self, graph): + # prior 0.9 → log_odds ≈ +0.954 + hid = await graph.add_hypothesis("h", "d", prior_prob=0.9) + h = graph.hypotheses[hid] + assert h.log_odds == pytest.approx(0.9542, abs=1e-3) + assert h.confidence == pytest.approx(0.9) + + @pytest.mark.asyncio + async def test_legacy_hypothesis_migrates_log_odds(self): + """Pre-S3 state with confidence only: from_dict derives log_odds.""" + legacy = { + "id": "hyp-old", + "title": "legacy", + "description": "d", + "confidence": 0.8, + "status": "supported", + } + h = Hypothesis.from_dict(legacy) + assert h.log_odds == pytest.approx(prob_to_log_odds(0.8)) + assert h.confidence == pytest.approx(0.8) + assert h.prior_prob == 0.5 + + @pytest.mark.asyncio + async def test_invalid_edge_type_rejected(self, graph): + pid, _ = await graph.add_phenomenon("fs", "filesystem", "ph", "d", source_tool="t") + hid = await graph.add_hypothesis("h", "d") + with pytest.raises(ValueError): + await graph.update_hypothesis_confidence(hid, pid, "bogus_edge_type", "") + + @pytest.mark.asyncio + async def test_matrix_view_counts_edges(self, graph): + h1 = await graph.add_hypothesis("h1", "d") + h2 = await graph.add_hypothesis("h2", "d") + p1, _ = await graph.add_phenomenon("fs", "filesystem", "a", "x", source_tool="t") + p2, _ = await graph.add_phenomenon("fs", "filesystem", "b", "y", source_tool="t") + await graph.update_hypothesis_confidence(h1, p1, "supports", "") + await graph.update_hypothesis_confidence(h1, p2, "supports", "") + await graph.update_hypothesis_confidence(h2, p1, "contradicts", "") + + mat = graph.hypothesis_evidence_matrix() + assert mat["counts_by_edge_type"][h1]["supports"] == 2 + assert mat["counts_by_edge_type"][h2]["contradicts"] == 1 + md = graph.hypothesis_evidence_matrix_markdown() + assert "h1" in md and "h2" in md + assert "supports" in md and "contradicts" in md + + +# --------------------------------------------------------------------------- +# S4 iOS plugin — archive + plist + sqlite + idevice + agent routing +# --------------------------------------------------------------------------- + +class TestArchiveAndIOSPlugin: + """End-to-end smoke tests for the iOS toolset. + + Synthesize a tiny iOS-like extraction in tmp_path (XML plist, binary + plist, iDevice_info.txt, a sqlite db, a keychain-shaped sqlite), + then drive every new tool against it. No network, no external + binaries — stdlib zipfile + plistlib + sqlite3 only. + """ + + @pytest.fixture + def fake_extraction_zip(self, tmp_path): + """Build a small zip that mimics an iOS extraction.""" + import plistlib + import sqlite3 + import zipfile + + staging = tmp_path / "staging" + staging.mkdir() + + # XML plist + xml_plist = staging / "Library" / "Preferences" / "com.example.plist" + xml_plist.parent.mkdir(parents=True) + with open(xml_plist, "wb") as f: + plistlib.dump({"DeviceName": "iPhone-Test", "UDID": "abcd1234"}, f) + + # Binary plist + bin_plist = staging / "var" / "Info.plist" + bin_plist.parent.mkdir(parents=True) + with open(bin_plist, "wb") as f: + plistlib.dump( + {"ProductVersion": "16.5", "BuildVersion": "20F66"}, + f, fmt=plistlib.FMT_BINARY, + ) + + # iDevice_info.txt + idinfo = staging / "iDevice_info.txt" + idinfo.write_text( + "DeviceName: iPhone-Test\n" + "ProductType: iPhone14,5\n" + "UniqueDeviceID: abcd1234ef567890\n" + ) + + # Generic SMS-like sqlite db + sms_db = staging / "HomeDomain" / "Library" / "SMS" / "sms.db" + sms_db.parent.mkdir(parents=True) + conn = sqlite3.connect(sms_db) + conn.executescript( + """ + CREATE TABLE message (id INTEGER PRIMARY KEY, text TEXT, handle_id INTEGER); + CREATE TABLE handle (id INTEGER PRIMARY KEY, contact TEXT); + INSERT INTO message VALUES (1, 'meet at 8', 100); + INSERT INTO message VALUES (2, 'ok', 101); + INSERT INTO handle VALUES (100, '+85291234567'); + INSERT INTO handle VALUES (101, '+85298765432'); + """ + ) + conn.commit() + conn.close() + + # Keychain-shaped sqlite at the canonical path + kc_db = staging / "var" / "keychains" / "keychain-2.db" + kc_db.parent.mkdir(parents=True) + conn = sqlite3.connect(kc_db) + conn.executescript( + """ + CREATE TABLE genp (agrp TEXT, acct TEXT, svce TEXT, data BLOB); + INSERT INTO genp VALUES ('com.apple.test', 'alice', 'AppleID', NULL); + INSERT INTO genp VALUES ('com.apple.test', 'bob', 'iCloud', NULL); + CREATE TABLE inet (agrp TEXT, acct TEXT, srvr TEXT); + INSERT INTO inet VALUES ('com.apple.test', 'alice', 'gmail.com'); + """ + ) + conn.commit() + conn.close() + + # Zip it + zpath = tmp_path / "fake_ios.zip" + with zipfile.ZipFile(zpath, "w", zipfile.ZIP_DEFLATED) as zf: + for p in staging.rglob("*"): + if p.is_file(): + zf.write(p, p.relative_to(staging)) + return zpath + + @pytest.mark.asyncio + async def test_unzip_archive_extracts_safely(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + dest = tmp_path / "out" + result = await unzip_archive(str(fake_extraction_zip), str(dest)) + assert "Extracted" in result + assert (dest / "iDevice_info.txt").is_file() + assert (dest / "HomeDomain" / "Library" / "SMS" / "sms.db").is_file() + assert (dest / "var" / "keychains" / "keychain-2.db").is_file() + + @pytest.mark.asyncio + async def test_unzip_archive_blocks_zip_slip(self, tmp_path): + import zipfile + from tools.archive import unzip_archive + malicious = tmp_path / "evil.zip" + with zipfile.ZipFile(malicious, "w") as zf: + zf.writestr("../../escape.txt", "should not land here") + zf.writestr("/abs/path.txt", "neither should this") + zf.writestr("clean.txt", "this is fine") + dest = tmp_path / "out" + result = await unzip_archive(str(malicious), str(dest)) + assert "Skipped" in result + # Only the clean entry made it in. + assert (dest / "clean.txt").is_file() + assert not (tmp_path / "escape.txt").exists() + assert not (tmp_path.parent / "escape.txt").exists() + + @pytest.mark.asyncio + async def test_encrypted_zip_without_password_returns_clear_error(self, tmp_path): + import zipfile + from tools.archive import unzip_archive + enc = tmp_path / "locked.zip" + with zipfile.ZipFile(enc, "w") as zf: + zf.writestr("secret.txt", "do not read") + for info in zf.infolist(): + info.flag_bits |= 0x1 # mark as encrypted (ZipCrypto-style) + result = await unzip_archive(str(enc), str(tmp_path / "out")) + assert result.startswith("Error:") + assert "password-protected" in result + assert "meta.password" in result + + # NB: there's no roundtrip extract-with-correct-password test because + # stdlib zipfile cannot WRITE encrypted archives, so we can't synthesize + # one in tmp_path without a third-party writer. The encrypted-without- + # password path above exercises the new error branch; the happy-path + # password decryption is best verified end-to-end against a real + # password-protected source from case.yaml. + + @pytest.mark.asyncio + async def test_unzip_archive_idempotent(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + dest = tmp_path / "out" + r1 = await unzip_archive(str(fake_extraction_zip), str(dest)) + r2 = await unzip_archive(str(fake_extraction_zip), str(dest)) + assert "Extracted" in r1 + # Second pass should report zero new extractions. + assert "Extracted 0 file(s)" in r2 + + @pytest.mark.asyncio + async def test_parse_plist_xml_and_binary(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.mobile_ios import parse_plist + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + + xml = await parse_plist(str(dest / "Library" / "Preferences" / "com.example.plist")) + assert "iPhone-Test" in xml + assert "abcd1234" in xml + + bin_ = await parse_plist(str(dest / "var" / "Info.plist")) + assert "16.5" in bin_ + assert "20F66" in bin_ + + @pytest.mark.asyncio + async def test_sqlite_tables_and_query(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.mobile_ios import sqlite_tables, sqlite_query + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + sms = dest / "HomeDomain" / "Library" / "SMS" / "sms.db" + + tables = await sqlite_tables(str(sms)) + assert "message" in tables and "handle" in tables + assert "2 row(s)" in tables # message + handle each have 2 rows + + rows = await sqlite_query(str(sms), "SELECT text FROM message", max_rows=10) + assert "meet at 8" in rows + assert "ok" in rows + + @pytest.mark.asyncio + async def test_sqlite_query_rejects_non_select(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.mobile_ios import sqlite_query + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + sms = dest / "HomeDomain" / "Library" / "SMS" / "sms.db" + result = await sqlite_query(str(sms), "DELETE FROM message") + assert "only single SELECT" in result + # And a multi-statement chain is also blocked + result2 = await sqlite_query( + str(sms), "SELECT 1; SELECT 2" + ) + assert "multi-statement" in result2 + + @pytest.mark.asyncio + async def test_parse_ios_keychain(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.mobile_ios import parse_ios_keychain + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + # Pass the containing dir — the parser auto-locates keychain-2.db. + result = await parse_ios_keychain(str(dest / "var" / "keychains")) + assert "genp" in result + assert "alice" in result + assert "AppleID" in result + assert "inet" in result + + @pytest.mark.asyncio + async def test_read_idevice_info(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.mobile_ios import read_idevice_info + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + # Pass the root — the helper finds iDevice_info.txt inside. + result = await read_idevice_info(str(dest)) + assert "DeviceName: iPhone-Test" in result + assert "iPhone14,5" in result + + @pytest.mark.asyncio + async def test_list_extracted_dir_summarises_huge_tree(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.parsers import list_extracted_dir + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + summary = await list_extracted_dir(str(dest)) + # Smart summary: total, extension breakdown, layout, largest. + assert "Total:" in summary + assert "Extension breakdown" in summary + assert "Top-level layout" in summary + # The iOS fixture has HomeDomain + var + Library at the top. + assert "HomeDomain" in summary or "Library" in summary or "var" in summary + # And the breakdown lists sqlite/plist extensions. + assert ".plist" in summary + # Steers the agent toward find_files for targeted lookups. + assert "find_files" in summary + + @pytest.mark.asyncio + async def test_find_files_locates_specific_artefacts(self, fake_extraction_zip, tmp_path): + from tools.archive import unzip_archive + from tools.parsers import find_files + dest = tmp_path / "out" + await unzip_archive(str(fake_extraction_zip), str(dest)) + + # Glob for sms.db anywhere — the iOS fixture has it under HomeDomain. + sms_hits = await find_files(str(dest), "**/sms.db") + assert "sms.db" in sms_hits + assert "matches: 1" in sms_hits + + # Plists at any depth. + plist_hits = await find_files(str(dest), "**/*.plist") + # The fixture writes com.example.plist + Info.plist. + assert "matches: 2" in plist_hits or "matches: 3" in plist_hits + + # Anchored glob: only entries under var/. + var_hits = await find_files(str(dest), "var/**") + assert "matches:" in var_hits + # Keychain DB is under var/keychains in the fixture, so it shows up. + assert "keychain-2.db" in var_hits + + # A pattern that matches nothing returns the empty-result message. + none = await find_files(str(dest), "**/nonexistent_pattern_xyz") + assert "no matches" in none + + +class TestAgentFactoryRouting: + """SOURCE_TYPE_AGENTS maps source.type to the right triage agent.""" + + def test_disk_image_routes_to_filesystem(self): + from agent_factory import get_triage_agent_type + assert get_triage_agent_type("disk_image") == "filesystem" + + def test_mobile_extraction_routes_to_ios_artifact(self): + from agent_factory import get_triage_agent_type + assert get_triage_agent_type("mobile_extraction") == "ios_artifact" + + def test_unknown_type_falls_back_to_filesystem(self): + from agent_factory import get_triage_agent_type + assert get_triage_agent_type("totally_unknown_type") == "filesystem" + + def test_ios_artifact_class_is_registered(self): + from agent_factory import _AGENT_CLASSES, _load_agent_classes + from agents.ios_artifact import IOSArtifactAgent + _load_agent_classes() + assert _AGENT_CLASSES["ios_artifact"] is IOSArtifactAgent + + +# --------------------------------------------------------------------------- +# S5 cross-source entity coreference (DESIGN.md §4.6) +# --------------------------------------------------------------------------- + +class TestEntityCoref: + """Identity observation + automatic coref hypothesis + same_as edges. + + Setup helper: every test records an invocation whose output literally + contains the identifier values being asserted, so the grounding gateway + accepts the observe_identity calls. + """ + + @pytest.fixture + def graph(self): + g = EvidenceGraph() + g._current_agent = "tester" + g._current_task_id = "task-coref" + return g + + async def _record(self, graph, output: str) -> str: + return await graph.record_tool_invocation( + tool="probe", args={}, output=output, + ) + + @pytest.mark.asyncio + async def test_strong_shared_identifier_creates_supported_coref(self, graph): + inv = await self._record( + graph, "Found email alice@example.com on system A\nAlso alice@example.com on system B", + ) + # Two entities (different sources, different names) sharing an email. + r1 = await graph.observe_identity( + entity_name="alice@example.com (laptop)", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="alice@example.com (phone)", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + # Second observation should fire a coref proposal. + assert r2["coref_proposals"], "expected a coref proposal on the second observation" + prop = r2["coref_proposals"][0] + # log_odds = prior(0.1)→-0.954 + shared_strong(+2.0) = +1.046 → conf ≈ 0.917 + assert prop["confidence"] > 0.8 + # And a same_as edge between the two entities is now active. + e1, e2 = r1["entity_id"], r2["entity_id"] + cluster = graph.resolve_actor_cluster(e1) + assert e2 in cluster and e1 in cluster + + @pytest.mark.asyncio + async def test_weak_identifier_alone_is_not_enough(self, graph): + inv = await self._record(graph, "nickname mr_evil appears here and there") + r1 = await graph.observe_identity( + entity_name="user-A", entity_type="person", + identifier_type="nickname", value="mr_evil", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="user-B", entity_type="person", + identifier_type="nickname", value="mr_evil", + source_agent="tester", invocation_id=inv, + ) + prop = r2["coref_proposals"][0] + # log_odds = -0.954 + shared_weak(+0.5) = -0.454 → conf ≈ 0.260 + assert prop["confidence"] < 0.8 + # No active same_as edge between them. + cluster = graph.resolve_actor_cluster(r1["entity_id"]) + assert r2["entity_id"] not in cluster + + @pytest.mark.asyncio + async def test_conflicting_strong_identifier_blocks_coref(self, graph): + # Two entities share a nickname but each has a DIFFERENT strong email. + inv = await self._record( + graph, + "user-A: nickname mr_evil, email alice@example.com\n" + "user-B: nickname mr_evil, email bob@other.org", + ) + await graph.observe_identity( + entity_name="user-A", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + await graph.observe_identity( + entity_name="user-B", entity_type="person", + identifier_type="email", value="bob@other.org", + source_agent="tester", invocation_id=inv, + ) + r1 = await graph.observe_identity( + entity_name="user-A", entity_type="person", + identifier_type="nickname", value="mr_evil", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="user-B", entity_type="person", + identifier_type="nickname", value="mr_evil", + source_agent="tester", invocation_id=inv, + ) + # r2's nickname triggers blocking + conflict detection. + prop = r2["coref_proposals"][0] + # +shared_weak (0.5) + −conflicting_strong (-2.0) = net -1.5 + # from prior -0.954 → L = -2.454 → conf ≈ 0.0035 (refuted) + assert prop["confidence"] < 0.05 + assert prop["conflicts"], "expected conflicting email to be flagged" + # No same_as edge. + cluster = graph.resolve_actor_cluster(r1["entity_id"]) + assert r2["entity_id"] not in cluster + + @pytest.mark.asyncio + async def test_observe_identity_grounding_rejects_fabrication(self, graph): + inv = await self._record(graph, "only this exact string is allowed") + with pytest.raises(GroundingError): + await graph.observe_identity( + entity_name="ghost", entity_type="person", + identifier_type="email", value="not-in-output@nope.com", + source_agent="tester", invocation_id=inv, + ) + + @pytest.mark.asyncio + async def test_unknown_identifier_type_rejected(self, graph): + inv = await self._record(graph, "value-here") + with pytest.raises(ValueError): + await graph.observe_identity( + entity_name="x", entity_type="person", + identifier_type="not_a_real_type", value="value-here", + source_agent="tester", invocation_id=inv, + ) + + @pytest.mark.asyncio + async def test_phone_number_normalization(self, graph): + # Two formats of the same number should collide. + inv = await self._record( + graph, "saw +852 9123-4567 on A and 85291234567 on B", + ) + r1 = await graph.observe_identity( + entity_name="caller-A", entity_type="person", + identifier_type="phone_number", value="+852 9123-4567", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="caller-B", entity_type="person", + identifier_type="phone_number", value="85291234567", + source_agent="tester", invocation_id=inv, + ) + assert r2["coref_proposals"], "phone normalization should match" + assert r2["coref_proposals"][0]["confidence"] > 0.8 + + @pytest.mark.asyncio + async def test_repeating_identifier_is_idempotent(self, graph): + inv = await self._record(graph, "alice@example.com everywhere") + r1 = await graph.observe_identity( + entity_name="alice", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="alice", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + # Same entity, same identifier — second call is a no-op. + assert r1["entity_id"] == r2["entity_id"] + assert r2["new_identifier"] is False + ent = graph.entities[r1["entity_id"]] + assert len([i for i in ent.identifiers if i["type"] == "email"]) == 1 + + @pytest.mark.asyncio + async def test_coref_is_reversible_via_contradicting_evidence(self, graph): + # Establish a supported coref via shared email. + inv = await self._record( + graph, "shared email alice@example.com on A and B", + ) + r1 = await graph.observe_identity( + entity_name="A", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + r2 = await graph.observe_identity( + entity_name="B", entity_type="person", + identifier_type="email", value="alice@example.com", + source_agent="tester", invocation_id=inv, + ) + hid = r2["coref_proposals"][0]["hypothesis_id"] + assert graph.hypotheses[hid].confidence > 0.8 + # Now a separate phenomenon contradicts the coref directly. + contra_pid, _ = await graph.add_phenomenon( + source_agent="tester", category="identity_observation", + title="A and B were online simultaneously on different devices", + interpretation="they cannot be the same actor", + source_tool="tester", + ) + await graph.update_hypothesis_confidence( + hid, contra_pid, "contradicts", "different physical presence", + ) + # log_odds was ~+1.046; contradicts (-2.0) drops to -0.954 → conf 0.10 + assert graph.hypotheses[hid].confidence < 0.2 + # same_as edge between the entities is now inactive — cluster shrinks. + cluster = graph.resolve_actor_cluster(r1["entity_id"]) + assert r2["entity_id"] not in cluster + # The audit edge still exists, just inactive. + edges = [ + e for e in graph.edges + if e.edge_type == "same_as" + and {e.source_id, e.target_id} == {r1["entity_id"], r2["entity_id"]} + ] + assert len(edges) == 1 + assert edges[0].metadata.get("active") is False + + @pytest.mark.asyncio + async def test_blocking_avoids_n_squared(self, graph): + # Build 30 entities with NO shared/similar identifiers — verify no + # coref hypothesis is created (no O(n²) hyp explosion). + for i in range(30): + inv = await self._record(graph, f"unique-email-{i}@example.com") + await graph.observe_identity( + entity_name=f"person-{i}", entity_type="person", + identifier_type="email", value=f"unique-email-{i}@example.com", + source_agent="tester", invocation_id=inv, + ) + coref_hyps = [ + h for h in graph.hypotheses.values() + if h.id.startswith("hyp-coref-") + ] + assert coref_hyps == [], "blocking should prevent O(n²) coref hypotheses" + + @pytest.mark.asyncio + async def test_actor_cluster_spans_three_entities(self, graph): + # A ≡ B via email, B ≡ C via phone → transitively, all three cluster. + inv = await self._record( + graph, + "A: alice@a.com\n" + "B: alice@a.com, phone 12345\n" + "C: phone 12345", + ) + rA = await graph.observe_identity( + entity_name="A", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + rB1 = await graph.observe_identity( + entity_name="B", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + rB2 = await graph.observe_identity( + entity_name="B", entity_type="person", + identifier_type="phone_number", value="12345", + source_agent="tester", invocation_id=inv, + ) + rC = await graph.observe_identity( + entity_name="C", entity_type="person", + identifier_type="phone_number", value="12345", + source_agent="tester", invocation_id=inv, + ) + cluster = graph.resolve_actor_cluster(rA["entity_id"]) + assert {rA["entity_id"], rB1["entity_id"], rC["entity_id"]} <= cluster + + @pytest.mark.asyncio + async def test_actor_clusters_renders_via_report_tool(self, graph): + """Report agent's get_actor_clusters renders cluster + identifiers.""" + from agents.report import ReportAgent + from unittest.mock import AsyncMock + inv = await self._record(graph, "shared email alice@a.com on both A and B") + await graph.observe_identity( + entity_name="A", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + await graph.observe_identity( + entity_name="B", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + # Drive the read tool directly (no LLM in the loop for this assertion). + agent = ReportAgent(AsyncMock(), graph) + rendered = await agent._get_actor_clusters() + assert "MULTI-SOURCE CLUSTER" in rendered + assert "alice@a.com" in rendered + assert "Backing coref hypotheses" in rendered + + @pytest.mark.asyncio + async def test_actor_clusters_groups_all_entities(self, graph): + # Two distinct clusters: A≡B, and C alone. + inv = await self._record(graph, "A=alice@a.com B=alice@a.com C=carol@c.com") + await graph.observe_identity( + entity_name="A", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + await graph.observe_identity( + entity_name="B", entity_type="person", + identifier_type="email", value="alice@a.com", + source_agent="tester", invocation_id=inv, + ) + await graph.observe_identity( + entity_name="C", entity_type="person", + identifier_type="email", value="carol@c.com", + source_agent="tester", invocation_id=inv, + ) + clusters = graph.actor_clusters() + # Two clusters (size 2 + size 1) — exact membership. + sizes = sorted(len(c["members"]) for c in clusters) + assert sizes == [1, 2] + # And each cluster aggregates its identifiers. + for c in clusters: + assert any(i["type"] == "email" for i in c["identifiers"]) + + +# --------------------------------------------------------------------------- +# S6 Android + Media + multi-source orchestration +# --------------------------------------------------------------------------- + +class TestAndroidPartitionProbe: + """probe_android_partitions parses mmls output and translates sector units.""" + + def test_parse_mmls_with_4096_byte_sectors(self): + from tools.mobile_android import _parse_mmls_with_unit + sample = ( + "GUID Partition Table (EFI)\n" + "Offset Sector: 0\n" + "Units are in 4096-byte sectors\n" + "\n" + " Slot Start End Length Description\n" + "000: Meta 0000000000 0000000000 0000000001 Safety Table\n" + "001: ------- 0000000000 0000001023 0000001024 Unallocated\n" + "002: Meta 0000000001 0000000001 0000000001 GPT Header\n" + "003: Meta 0000000002 0000000005 0000000004 Partition Table\n" + "004: 000 0000001024 0000002047 0000001024 BOTA0\n" + "017: 013 0000048128 0001148927 0001100800 SYSTEM\n" + "021: 017 0001203968 0007806719 0006602752 USERDATA\n" + ) + sector_size, parts = _parse_mmls_with_unit(sample) + assert sector_size == 4096 + # Three real partition rows (rest are Meta / Unallocated). + assert len(parts) == 3 + # Sector translation: BOTA0 at native 1024 (4K sectors) = 4194304 bytes = sector 8192 (512-byte). + bota0 = parts[0] + assert bota0["start_native"] == 1024 + # The probe formats the 512-sector explicitly when emitting markdown + # (test that separately on the live image); here just sanity-check parse. + assert bota0["description"] == "BOTA0" + + def test_parse_mmls_defaults_to_512_when_unit_missing(self): + from tools.mobile_android import _parse_mmls_with_unit + sample = ( + "DOS Partition Table\n" + "Offset Sector: 0\n" + "Units are in 512-byte sectors\n" + "\n" + "002: 000 0000000063 0019999999 0019999937 NTFS\n" + ) + sector_size, parts = _parse_mmls_with_unit(sample) + assert sector_size == 512 + assert len(parts) == 1 + assert parts[0]["start_native"] == 63 + + +class TestMediaOCR: + """ocr_image returns a clear install hint when tesseract is missing.""" + + @pytest.mark.asyncio + async def test_missing_runtime_returns_install_hint(self, tmp_path): + from tools.media import ocr_image, _has_ocr_runtime + # On a host without the runtime, the tool should not raise — it + # should return an Error: prefixed string the agent can record as + # a negative finding. + available, _reason = _has_ocr_runtime() + if available: + pytest.skip("OCR runtime is installed on this host") + # Create a placeholder file so the path-existence check passes. + fake = tmp_path / "fake.jpg" + fake.write_bytes(b"\xff\xd8\xff\xe0not-a-real-image") + result = await ocr_image(str(fake)) + assert result.startswith("Error: OCR runtime not available") + assert "pip install pytesseract pillow" in result + assert "tesseract-ocr" in result + + @pytest.mark.asyncio + async def test_missing_file_returns_clear_error(self, tmp_path): + from tools.media import ocr_image + result = await ocr_image(str(tmp_path / "no-such-file.jpg")) + assert result.startswith("Error: ") + assert "is not a file" in result + + +class TestSetActivePartition: + """set_active_partition mutates graph.active_source.partition_offset.""" + + @pytest.mark.asyncio + async def test_mutates_offset(self): + from case import EvidenceSource + from tool_registry import register_all_tools, TOOL_CATALOG + graph = EvidenceGraph() + graph.active_source = EvidenceSource( + id="src-test", label="test", type="disk_image", + path="/tmp/whatever", access_mode="image", partition_offset=0, + ) + graph.partition_offset = 0 + register_all_tools(graph) + td = TOOL_CATALOG["set_active_partition"] + # set_active_partition output is wrapped with [invocation: inv-xxx]\n + result = await td.executor(partition_offset=614400) + assert "0 → 614400" in result + assert graph.active_source.partition_offset == 614400 + # Legacy mirror field kept in sync too. + assert graph.partition_offset == 614400 + + +class TestPlatformRouting: + """get_triage_agent_type distinguishes Windows vs Android disk_images.""" + + def test_disk_image_default_falls_back_to_filesystem(self): + from agent_factory import get_triage_agent_type + from case import EvidenceSource + src = EvidenceSource( + id="x", label="x", type="disk_image", path="/x", + access_mode="image", + ) + assert get_triage_agent_type(src) == "filesystem" + + def test_disk_image_windows_routes_to_filesystem(self): + from agent_factory import get_triage_agent_type + from case import EvidenceSource + src = EvidenceSource( + id="x", label="x", type="disk_image", path="/x", + access_mode="image", meta={"platform": "windows"}, + ) + assert get_triage_agent_type(src) == "filesystem" + + def test_disk_image_android_routes_to_android_artifact(self): + from agent_factory import get_triage_agent_type + from case import EvidenceSource + src = EvidenceSource( + id="x", label="x", type="disk_image", path="/x", + access_mode="image", meta={"platform": "android"}, + ) + assert get_triage_agent_type(src) == "android_artifact" + + def test_media_collection_routes_to_media(self): + from agent_factory import get_triage_agent_type + from case import EvidenceSource + src = EvidenceSource( + id="x", label="x", type="media_collection", path="/x", + access_mode="tree", + ) + assert get_triage_agent_type(src) == "media" + + def test_string_signature_back_compat(self): + # The S5 signature accepted a plain source.type string. + from agent_factory import get_triage_agent_type + assert get_triage_agent_type("mobile_extraction") == "ios_artifact" + assert get_triage_agent_type("disk_image") == "filesystem" + + def test_android_and_media_classes_registered(self): + from agent_factory import _AGENT_CLASSES, _load_agent_classes + from agents.android_artifact import AndroidArtifactAgent + from agents.media import MediaAgent + _load_agent_classes() + assert _AGENT_CLASSES["android_artifact"] is AndroidArtifactAgent + assert _AGENT_CLASSES["media"] is MediaAgent + + +class TestOrchestratorMultiSource: + """Phase 1 iterates over every analysable source in the case.""" + + @pytest.mark.asyncio + async def test_phase1_dispatches_one_agent_per_source(self): + from unittest.mock import AsyncMock + from agent_factory import AgentFactory + from case import Case, EvidenceSource + from orchestrator import Orchestrator + + graph = EvidenceGraph() + case = Case( + case_id="multi", + name="multi-source", + sources=[ + EvidenceSource( + id="src-win", label="USB", type="disk_image", path="/tmp/usb", + access_mode="image", meta={"platform": "windows"}, + ), + EvidenceSource( + id="src-ios", label="iPhone", type="mobile_extraction", + path="/tmp/ios-tree", access_mode="tree", + ), + EvidenceSource( + id="src-droid", label="Android", type="disk_image", + path="/tmp/droid", access_mode="image", + meta={"platform": "android"}, + ), + ], + ) + graph.case = case + graph.set_active_source(case.sources[0]) + + invoked: list[tuple[str, str]] = [] + + class FakeAgent: + def __init__(self, name): self.name = name + async def run(self, task, lead_id=None): + invoked.append((self.name, graph.active_source.id)) + + llm = AsyncMock() + factory = AgentFactory(llm, graph) + factory._cache = { + "filesystem": FakeAgent("filesystem"), + "ios_artifact": FakeAgent("ios_artifact"), + "android_artifact": FakeAgent("android_artifact"), + } + orch = Orchestrator(llm, graph, factory) + + sources = orch._sources_to_triage() + for src in sources: + await orch._phase1_triage_source(src) + + # Each source got triaged by exactly the right agent, with + # active_source pointed at it during the run. + assert invoked == [ + ("filesystem", "src-win"), + ("ios_artifact", "src-ios"), + ("android_artifact", "src-droid"), + ] + + def test_is_analysable_filters_correctly(self): + from case import EvidenceSource + from orchestrator import Orchestrator + ok_disk = EvidenceSource(id="a", label="", type="disk_image", path="/x", access_mode="image") + ok_ios = EvidenceSource(id="b", label="", type="mobile_extraction", path="/y", access_mode="tree") + ok_archive = EvidenceSource(id="c", label="", type="archive", path="/z", access_mode="tree") + ok_media = EvidenceSource(id="d", label="", type="media_collection", path="/w", access_mode="tree") + no_path = EvidenceSource(id="e", label="", type="disk_image", path="", access_mode="image") + assert Orchestrator._is_analysable(ok_disk) + assert Orchestrator._is_analysable(ok_ios) + assert Orchestrator._is_analysable(ok_archive) + assert Orchestrator._is_analysable(ok_media) + assert not Orchestrator._is_analysable(no_path) + diff --git a/tool_registry.py b/tool_registry.py index 63b0239..87d5d67 100644 --- a/tool_registry.py +++ b/tool_registry.py @@ -1,6 +1,8 @@ """Central tool registry — catalogs all available forensic tools. -Tools are registered once at startup with bound image_path and offset. +Tools are registered once at startup. Sleuth Kit tools resolve their image +path and partition offset from graph.active_source at call time, so a single +registered tool follows whichever evidence source is currently active. The AgentFactory uses this catalog to compose agents dynamically. """ @@ -14,6 +16,11 @@ import re from dataclasses import dataclass, field from typing import Any +from evidence_graph import GroundingError +from tools import archive as arc +from tools import media as med +from tools import mobile_android as android +from tools import mobile_ios as ios from tools import parsers from tools import registry as reg from tools import sleuthkit as tsk @@ -35,6 +42,13 @@ CACHEABLE_TOOLS: set[str] = { "parse_registry_key", "search_registry", "get_user_activity", "read_text_file", "read_binary_preview", "search_text_file", "read_text_file_section", "list_extracted_dir", "parse_pcap_strings", + "find_files", + # iOS (read-only file parses): + "parse_plist", "sqlite_tables", "sqlite_query", + "parse_ios_keychain", "read_idevice_info", + # Android + media (read-only): + "probe_android_partitions", "ocr_image", + # NB: unzip_archive and set_active_partition are NOT cached — they have side effects. } @@ -45,24 +59,106 @@ def _cache_key(tool_name: str, kwargs: dict) -> str: return f"{tool_name}:{args_hash}" +def _looks_like_error(text: str) -> bool: + """Heuristic for unsuccessful tool output (mirrors the prior cache filter).""" + return text.startswith("Error") or text.startswith("[Command failed") or text.startswith("[icat failed") + + def _make_cached(tool_name: str, executor: Any) -> Any: - """Wrap an executor with an in-memory result cache.""" + """Thin in-memory cache wrapper around a tool executor. + + Kept as a standalone primitive (no graph dependency) so unit tests can + exercise caching in isolation. Production wiring composes this with + invocation logging via :func:`_make_invocation_executor`. + """ async def wrapper(**kwargs) -> str: key = _cache_key(tool_name, kwargs) - cached = _tool_result_cache.get(key) - if cached is not None: - logger.debug("Cache hit: %s(%s)", tool_name, kwargs) - return cached + hit = _tool_result_cache.get(key) + if hit is not None: + return hit result = await executor(**kwargs) - # Only cache successful results (not errors) - if not result.startswith("Error") and not result.startswith("[Command failed"): + if not _looks_like_error(result): _tool_result_cache[key] = result return result return wrapper +def _make_invocation_executor( + tool_name: str, + executor: Any, + graph: Any, + *, + cacheable: bool, + auto_record_category: str | None = None, +) -> Any: + """Single uniform wrapper around a forensic tool executor. + + Responsibilities (in order): + 1. Serve from the result cache when ``cacheable=True`` and the key + is hot. Cached hits still produce a fresh ToolInvocation record + marked ``cached=True`` so the agent can cite their work. + 2. Call the underlying executor on cache miss; store on success. + 3. Record a :class:`ToolInvocation` on the graph (this is the + provenance unit the grounding gateway looks up). + 4. (Optionally) auto-record the raw output as a Phenomenon with a + single ``type=raw`` fact citing the invocation just made. This + replaces the pre-S2 ``_make_auto_record`` shortcut. + 5. Return the result with a ``[invocation: inv-xxx]`` header so + the LLM learns the ID to put in ``add_phenomenon`` facts. + """ + + async def wrapper(**kwargs) -> str: + cached_flag = False + cache_hit_key: str | None = None + text: str | None = None + + if cacheable: + cache_hit_key = _cache_key(tool_name, kwargs) + hit = _tool_result_cache.get(cache_hit_key) + if hit is not None: + logger.debug("Cache hit: %s(%s)", tool_name, kwargs) + text, cached_flag = hit, True + + if text is None: + text = await executor(**kwargs) + if cacheable and cache_hit_key and not _looks_like_error(text): + _tool_result_cache[cache_hit_key] = text + + inv_id = await graph.record_tool_invocation( + tool=tool_name, args=kwargs, output=text, cached=cached_flag, + ) + + # Auto-record the raw output as a phenomenon (single grounded fact). + # Skipped on error outputs and when no graph is present. + if auto_record_category and not _looks_like_error(text): + agent = getattr(graph, "_current_agent", "") or "unknown" + first_line = text.split("\n", 1)[0][:80] + try: + await graph.add_phenomenon( + source_agent=agent, + category=auto_record_category, + title=f"{tool_name}: {first_line}", + interpretation="(auto-recorded raw tool output)", + verified_facts=[{ + "type": "raw", + "value": text[:2000], + "invocation_id": inv_id, + }], + source_tool=tool_name, + ) + except GroundingError as e: + # Should never happen for auto-record (we just wrote the + # invocation; value is a literal prefix of output). Log + # loudly if it does — that's a bug, not a hallucination. + logger.error("Auto-record grounding failed for %s: %s", tool_name, e) + + return f"[invocation: {inv_id}]\n{text}" + + return wrapper + + def get_cache_stats() -> dict[str, int]: """Return cache statistics for diagnostics.""" return {"entries": len(_tool_result_cache)} @@ -77,12 +173,11 @@ ASSET_CATEGORIES = [ ] -def _auto_categorize(filename: str) -> str: - """Infer asset category from filename.""" +def _auto_categorize_windows(filename: str) -> str: + """Original Windows-leaning heuristic for disk-image-extracted artifacts.""" name_lower = filename.lower() ext = os.path.splitext(name_lower)[1] - # Check full name (with extension) and base name against known hive names if name_lower in _REGISTRY_HIVE_NAMES: return "registry_hive" if ext == ".pf": @@ -93,7 +188,7 @@ def _auto_categorize(filename: str) -> str: return "address_book" if name_lower == "info2" or re.match(r"dc\d+\.exe", name_lower): return "recycle_bin" - # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat) + # Extension-based checks before keyword-based (e.g. mirc.ini → config, not chat). if ext in (".ini", ".csv", ".dat", ".cfg"): return "config_file" if ext in (".log", ".lst"): @@ -107,6 +202,49 @@ def _auto_categorize(filename: str) -> str: return "other" +def _auto_categorize_ios(filename: str) -> str: + """iOS extraction heuristic — plist / sqlite / keychain land here. + + Domain-rooted iOS extractions yield specific filenames (sms.db, + AddressBook.sqlitedb, keychain-2.db, *.plist) that the Windows + categorizer would dump into 'other' — fixing P4. + """ + name_lower = filename.lower() + ext = os.path.splitext(name_lower)[1] + + if name_lower == "keychain-2.db": + return "ios_keychain" + if name_lower in ("sms.db", "chatstorage.sqlite"): + return "messaging_db" + if name_lower in ("addressbook.sqlitedb", "addressbookimages.sqlitedb"): + return "address_book" + if name_lower == "idevice_info.txt": + return "device_info" + if ext in (".sqlite", ".sqlite3", ".sqlitedb", ".db"): + return "sqlite_db" + if ext == ".plist": + return "plist" + if ext in (".log",): + return "text_log" + return "other" + + +# Per-source-type categorizers — dispatched by _auto_categorize at call time +# based on graph.active_source.type. Solves P4 (Windows-only categorization). +_CATEGORIZERS = { + "disk_image": _auto_categorize_windows, + "mobile_extraction": _auto_categorize_ios, + "archive": _auto_categorize_windows, + "media_collection": lambda fn: "other", +} + + +def _auto_categorize(filename: str, source_type: str = "disk_image") -> str: + """Dispatch to a source-type-aware categorizer (defaults to Windows).""" + fn = _CATEGORIZERS.get(source_type, _auto_categorize_windows) + return fn(filename) + + @dataclass class ToolDefinition: """A registered tool available for agent composition.""" @@ -123,44 +261,53 @@ class ToolDefinition: TOOL_CATALOG: dict[str, ToolDefinition] = {} -def _make_auto_record(tool_name: str, category: str, executor: Any, graph: Any) -> Any: - """Wrap a forensic tool to auto-record its result as a phenomenon.""" - - async def wrapper(**kwargs) -> str: - result = await executor(**kwargs) - if graph is None or not result or result.startswith("Error") or result.startswith("["): - return result - # Auto-record: the tool produced a forensic fact - agent = getattr(graph, "_current_agent", "") or "unknown" - title = f"{tool_name}: {result.split(chr(10))[0][:80]}" - await graph.add_phenomenon( - source_agent=agent, - category=category, - title=title, - description=result[:2000], - source_tool=tool_name, - ) - return result - - return wrapper +# Set of (tool_name, category) pairs that auto-record a phenomenon when run. +# Replaces the pre-S2 ``_make_auto_record`` per-tool wrapping; the central +# instrumentation pass at the end of register_all_tools applies these. +AUTO_RECORD_TOOLS: dict[str, str] = { + "list_installed_software": "registry", + "get_system_info": "registry", + "get_timezone_info": "registry", + "get_computer_name": "registry", + "get_shutdown_time": "registry", + "enumerate_users": "registry", + "get_network_interfaces": "registry", + "get_email_config": "registry", + "parse_prefetch": "filesystem", +} -def register_all_tools( - image_path: str, - partition_offset: int, - graph: Any = None, - extracted_dir: str = "extracted", -) -> None: - """Populate TOOL_CATALOG with all available tools, pre-bound to image/offset.""" +def register_all_tools(graph: Any) -> None: + """Populate TOOL_CATALOG with all available forensic tools. + + Tools no longer close over a fixed image path. The Sleuth Kit tools + resolve the image path and partition offset from ``graph.active_source`` + at call time, so the same registered tool follows whichever evidence + source the orchestrator has made active. + """ TOOL_CATALOG.clear() + def _img() -> str: + """Resolve the active source's image path at tool-call time.""" + src = getattr(graph, "active_source", None) + if src is None or not src.path: + raise RuntimeError( + "No active evidence source — call graph.set_active_source() first." + ) + return src.path + + def _off() -> int: + """Resolve the active source's partition offset at tool-call time.""" + src = getattr(graph, "active_source", None) + return src.partition_offset if src is not None else 0 + # ---- Sleuth Kit tools ---- TOOL_CATALOG["partition_info"] = ToolDefinition( name="partition_info", description="Get the partition table layout of the disk image. Run this first to understand disk structure.", input_schema={"type": "object", "properties": {}}, - executor=lambda: tsk.partition_info(image_path), + executor=lambda: tsk.partition_info(_img()), module="sleuthkit", tags=["filesystem", "disk", "partition"], ) @@ -169,7 +316,7 @@ def register_all_tools( name="filesystem_info", description="Get detailed filesystem information (type, block size, volume name, etc.) for the selected partition.", input_schema={"type": "object", "properties": {}}, - executor=lambda: tsk.filesystem_info(image_path, partition_offset), + executor=lambda: tsk.filesystem_info(_img(), _off()), module="sleuthkit", tags=["filesystem", "disk"], ) @@ -185,7 +332,7 @@ def register_all_tools( }, }, executor=lambda inode=None, recursive=False: tsk.list_directory( - image_path, partition_offset, inode, recursive + _img(), _off(), inode, recursive ), module="sleuthkit", tags=["filesystem", "directory", "listing"], @@ -204,12 +351,13 @@ def register_all_tools( ) # Resolve real disk path first - orig_path = (await tsk.find_file(image_path, inode, partition_offset)).strip() + orig_path = (await tsk.find_file(_img(), inode, _off())).strip() if not orig_path or "not found" in orig_path.lower(): return f"Error: inode {inode} not found on the disk image." # Derive local filename from real disk path filename = os.path.basename(orig_path) + extracted_dir = graph.extracted_dir local_path = os.path.join(extracted_dir, filename) # Handle name collisions by appending inode @@ -219,12 +367,15 @@ def register_all_tools( filename = os.path.basename(local_path) # Extract - result = await tsk.extract_file(image_path, inode, local_path, partition_offset) + result = await tsk.extract_file(_img(), inode, local_path, _off()) if result.startswith("[icat failed"): return result size = os.path.getsize(local_path) if os.path.exists(local_path) else 0 - category = _auto_categorize(os.path.basename(orig_path)) + src_type = ( + graph.active_source.type if graph.active_source else "disk_image" + ) + category = _auto_categorize(os.path.basename(orig_path), src_type) # Register if graph is not None: @@ -275,7 +426,7 @@ def register_all_tools( }, "required": ["inode"], }, - executor=lambda inode: tsk.find_file(image_path, inode, partition_offset), + executor=lambda inode: tsk.find_file(_img(), inode, _off()), module="sleuthkit", tags=["filesystem"], ) @@ -290,7 +441,7 @@ def register_all_tools( }, "required": ["pattern"], }, - executor=lambda pattern: tsk.search_strings(image_path, pattern), + executor=lambda pattern: tsk.search_strings(_img(), pattern), module="sleuthkit", tags=["filesystem", "search", "strings"], ) @@ -299,7 +450,7 @@ def register_all_tools( name="count_deleted_files", description="List and count all deleted files. Shows total count, executables, and extension breakdown.", input_schema={"type": "object", "properties": {}}, - executor=lambda: tsk.count_deleted_files(image_path, partition_offset), + executor=lambda: tsk.count_deleted_files(_img(), _off()), module="sleuthkit", tags=["filesystem", "deleted", "recovery"], ) @@ -308,7 +459,7 @@ def register_all_tools( name="build_filesystem_timeline", description="Build a MAC timeline from the filesystem (Modified/Accessed/Changed times for all files).", input_schema={"type": "object", "properties": {}}, - executor=lambda: tsk.build_timeline(image_path, partition_offset), + executor=lambda: tsk.build_timeline(_img(), _off()), module="sleuthkit", tags=["filesystem", "timeline"], ) @@ -341,8 +492,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("list_installed_software", "registry", - lambda hive_path: reg.list_installed_software(hive_path), graph), + executor=lambda hive_path: reg.list_installed_software(hive_path), module="registry", tags=["registry", "software", "installed"], ) @@ -390,8 +540,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_system_info", "registry", - lambda hive_path: reg.get_system_info(hive_path), graph), + executor=lambda hive_path: reg.get_system_info(hive_path), module="registry", tags=["registry", "system"], ) @@ -406,8 +555,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_timezone_info", "registry", - lambda hive_path: reg.get_timezone_info(hive_path), graph), + executor=lambda hive_path: reg.get_timezone_info(hive_path), module="registry", tags=["registry", "timezone", "system"], ) @@ -422,8 +570,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_computer_name", "registry", - lambda hive_path: reg.get_computer_name(hive_path), graph), + executor=lambda hive_path: reg.get_computer_name(hive_path), module="registry", tags=["registry", "system", "hostname"], ) @@ -438,8 +585,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_shutdown_time", "registry", - lambda hive_path: reg.get_shutdown_time(hive_path), graph), + executor=lambda hive_path: reg.get_shutdown_time(hive_path), module="registry", tags=["registry", "system", "shutdown"], ) @@ -454,8 +600,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("enumerate_users", "registry", - lambda hive_path: reg.enumerate_users(hive_path), graph), + executor=lambda hive_path: reg.enumerate_users(hive_path), module="registry", tags=["registry", "user", "accounts", "sam"], ) @@ -470,8 +615,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_network_interfaces", "registry", - lambda hive_path: reg.get_network_interfaces(hive_path), graph), + executor=lambda hive_path: reg.get_network_interfaces(hive_path), module="registry", tags=["registry", "network", "adapter", "ip"], ) @@ -486,8 +630,7 @@ def register_all_tools( }, "required": ["hive_path"], }, - executor=_make_auto_record("get_email_config", "registry", - lambda hive_path: reg.get_email_config(hive_path), graph), + executor=lambda hive_path: reg.get_email_config(hive_path), module="registry", tags=["registry", "email", "account"], ) @@ -504,8 +647,7 @@ def register_all_tools( }, "required": ["file_path"], }, - executor=_make_auto_record("parse_prefetch", "filesystem", - lambda file_path: parsers.parse_prefetch(file_path), graph), + executor=lambda file_path: parsers.parse_prefetch(file_path), module="parsers", tags=["filesystem", "prefetch", "execution"], ) @@ -577,7 +719,13 @@ def register_all_tools( TOOL_CATALOG["list_extracted_dir"] = ToolDefinition( name="list_extracted_dir", - description="List files in an extracted directory with sizes.", + description=( + "Summarise an extracted directory tree: total counts, " + "extension breakdown, top-level layout, largest files. " + "Scales to 10k+-file trees without truncating into uselessness. " + "For targeted searches (find every *.plist, locate sms.db, ...) " + "use find_files instead." + ), input_schema={ "type": "object", "properties": { @@ -590,6 +738,31 @@ def register_all_tools( tags=["filesystem", "listing", "extracted"], ) + TOOL_CATALOG["find_files"] = ToolDefinition( + name="find_files", + description=( + "Recursively find files under a directory by glob pattern. " + "Use this on tree-mode sources (iOS extractions, archives, " + "Android-mounted partitions) to locate specific artefacts in " + "huge trees. Patterns are fnmatch-style; '**' means 'any " + "depth'. Examples: '**/sms.db', '**/keychain-2.db', " + "'**/ChatStorage.sqlite', '**/*.plist', 'HomeDomain/Library/**'. " + "Results sort by size descending; capped at max_results." + ), + input_schema={ + "type": "object", + "properties": { + "root": {"type": "string", "description": "Directory to search under."}, + "pattern": {"type": "string", "description": "fnmatch glob pattern (use '**' for any depth)."}, + "max_results": {"type": "integer", "description": "Result cap (default 500)."}, + }, + "required": ["root", "pattern"], + }, + executor=lambda root, pattern, max_results=500: parsers.find_files(root, pattern, max_results), + module="parsers", + tags=["filesystem", "search", "extracted", "glob"], + ) + TOOL_CATALOG["parse_pcap_strings"] = ToolDefinition( name="parse_pcap_strings", description="Extract HTTP headers, hosts, User-Agent, cookies, and URLs from a PCAP/capture file.", @@ -605,11 +778,224 @@ def register_all_tools( tags=["network", "pcap", "http", "capture"], ) - # ---- Apply result caching to deterministic read-only tools ---- - # Must come AFTER all tools are registered. Auto-record wrapped tools - # (e.g. get_system_info) are NOT in CACHEABLE_TOOLS since they write - # to the evidence graph as a side effect. + # ---- Archive tools (tree-mode prep) ---- + + TOOL_CATALOG["unzip_archive"] = ToolDefinition( + name="unzip_archive", + description=( + "Extract a .zip archive into a target directory. Defensive against " + "zip-slip; skips symlinks. Idempotent on rerun. Pass `password` for " + "password-protected zips — only the legacy ZipCrypto algorithm is " + "supported by stdlib (AES zips need an external `7z x` step)." + ), + input_schema={ + "type": "object", + "properties": { + "zip_path": {"type": "string", "description": "Path to the .zip file."}, + "dest_dir": {"type": "string", "description": "Directory to extract into (created if missing)."}, + "password": {"type": "string", "description": "Password for encrypted zips (omit for plain archives)."}, + }, + "required": ["zip_path", "dest_dir"], + }, + executor=lambda zip_path, dest_dir, password=None: arc.unzip_archive(zip_path, dest_dir, password), + module="archive", + tags=["archive", "zip", "extract", "ingest"], + ) + + # ---- iOS plugin tools (DESIGN.md §4.7) ---- + + TOOL_CATALOG["parse_plist"] = ToolDefinition( + name="parse_plist", + description=( + "Parse a .plist file (XML or binary) and return its contents as JSON. " + "Bytes are rendered as hex; dates as ISO-8601." + ), + input_schema={ + "type": "object", + "properties": { + "file_path": {"type": "string", "description": "Path to .plist file."}, + }, + "required": ["file_path"], + }, + executor=lambda file_path: ios.parse_plist(file_path), + module="mobile_ios", + tags=["ios", "plist", "parse"], + ) + + TOOL_CATALOG["sqlite_tables"] = ToolDefinition( + name="sqlite_tables", + description=( + "List user tables in a sqlite database with row counts and column " + "names. Use this to scout an unfamiliar .sqlite / .db file before " + "querying it." + ), + input_schema={ + "type": "object", + "properties": { + "db_path": {"type": "string", "description": "Path to .sqlite/.db file."}, + }, + "required": ["db_path"], + }, + executor=lambda db_path: ios.sqlite_tables(db_path), + module="mobile_ios", + tags=["sqlite", "schema", "ios", "android"], + ) + + TOOL_CATALOG["sqlite_query"] = ToolDefinition( + name="sqlite_query", + description=( + "Run a single read-only SELECT against a sqlite file. " + "Multi-statement queries and non-SELECT statements are rejected. " + "Use this for sms.db / ChatStorage.sqlite / AddressBook.sqlitedb / etc." + ), + input_schema={ + "type": "object", + "properties": { + "db_path": {"type": "string", "description": "Path to .sqlite/.db file."}, + "query": {"type": "string", "description": "A single SELECT statement."}, + "max_rows": {"type": "integer", "description": "Row cap (default 100)."}, + }, + "required": ["db_path", "query"], + }, + executor=lambda db_path, query, max_rows=100: ios.sqlite_query(db_path, query, max_rows), + module="mobile_ios", + tags=["sqlite", "query", "ios", "android"], + ) + + TOOL_CATALOG["parse_ios_keychain"] = ToolDefinition( + name="parse_ios_keychain", + description=( + "Locate and summarise iOS keychain entries (keychain-2.db). " + "Pass either the db file directly or the containing directory; " + "dumps accounting metadata from genp/inet/cert/keys tables." + ), + input_schema={ + "type": "object", + "properties": { + "keychain_root": { + "type": "string", + "description": "Path to keychain-2.db or a directory that contains it.", + }, + }, + "required": ["keychain_root"], + }, + executor=lambda keychain_root: ios.parse_ios_keychain(keychain_root), + module="mobile_ios", + tags=["ios", "keychain", "credentials"], + ) + + TOOL_CATALOG["read_idevice_info"] = ToolDefinition( + name="read_idevice_info", + description=( + "Read the iDevice_info.txt summary at the root of an iOS extraction. " + "Pass the file path or the extraction root directory." + ), + input_schema={ + "type": "object", + "properties": { + "file_path": {"type": "string", "description": "Path to iDevice_info.txt or extraction root."}, + }, + "required": ["file_path"], + }, + executor=lambda file_path: ios.read_idevice_info(file_path), + module="mobile_ios", + tags=["ios", "device", "metadata"], + ) + + # ---- Android plugin (DESIGN.md §4.7) ---- + + TOOL_CATALOG["probe_android_partitions"] = ToolDefinition( + name="probe_android_partitions", + description=( + "Survey every partition on an Android disk dump (mmls + per-" + "partition fsstat). Returns a markdown table with name, native " + "and 512-byte sector offsets, filesystem type, and a strategy " + "hint per partition. Use this BEFORE deciding which partitions " + "to dive into via set_active_partition + list_directory." + ), + input_schema={"type": "object", "properties": {}}, + executor=lambda: android.probe_android_partitions(_img()), + module="mobile_android", + tags=["android", "partition", "survey"], + ) + + async def _set_active_partition(partition_offset: int) -> str: + src = getattr(graph, "active_source", None) + if src is None: + return "Error: no active evidence source." + old = src.partition_offset + new = int(partition_offset) + src.partition_offset = new + # Sync the legacy mirror field so older readers stay consistent. + graph.partition_offset = new + return ( + f"Active partition offset: {old} → {new} (512-byte sectors). " + f"Subsequent list_directory / extract_file / search_strings " + f"calls now target this partition on {src.id} ({src.label})." + ) + + TOOL_CATALOG["set_active_partition"] = ToolDefinition( + name="set_active_partition", + description=( + "Switch the current partition offset (in 512-byte sectors) on " + "the active disk-image source. Use the values from " + "probe_android_partitions's '512-sector' column. NOT a " + "forensic read — purely repoints the TSK toolset. Mutates " + "shared state; call serially within one agent run." + ), + input_schema={ + "type": "object", + "properties": { + "partition_offset": { + "type": "integer", + "description": "Partition start in 512-byte sectors.", + }, + }, + "required": ["partition_offset"], + }, + executor=_set_active_partition, + module="android", + tags=["android", "partition", "navigation"], + ) + + # ---- Media plugin (DESIGN.md §4.7) ---- + + TOOL_CATALOG["ocr_image"] = ToolDefinition( + name="ocr_image", + description=( + "Extract text from an image via tesseract. The LLM backend has " + "no vision, so this is the only way to read JPEG/PNG evidence " + "(screenshots of chats, transactions, IDs). Default lang covers " + "English + Simplified & Traditional Chinese; override `lang` " + "if you know the artefact's language. Returns 'Error: OCR " + "runtime not available' with an install hint when tesseract " + "isn't on the host — record that absence as a negative " + "finding rather than guessing." + ), + input_schema={ + "type": "object", + "properties": { + "file_path": {"type": "string", "description": "Path to image file."}, + "lang": {"type": "string", "description": "Tesseract language code(s), e.g. 'eng' or 'eng+chi_sim'."}, + }, + "required": ["file_path"], + }, + executor=lambda file_path, lang="eng+chi_sim+chi_tra": med.ocr_image(file_path, lang), + module="media", + tags=["media", "ocr", "image"], + ) + + # ---- Wrap every executor with invocation logging (+ cache + auto-record) ---- + # Must run AFTER all tools are registered. Every tool call now produces + # a ToolInvocation entry on the graph (provenance for grounding), and + # returns the result prefixed with ``[invocation: inv-xxx]`` so the LLM + # can cite the call in add_phenomenon facts. _tool_result_cache.clear() for tool_name, td in TOOL_CATALOG.items(): - if tool_name in CACHEABLE_TOOLS: - td.executor = _make_cached(tool_name, td.executor) + td.executor = _make_invocation_executor( + tool_name, + td.executor, + graph, + cacheable=(tool_name in CACHEABLE_TOOLS), + auto_record_category=AUTO_RECORD_TOOLS.get(tool_name), + ) diff --git a/tools/archive.py b/tools/archive.py new file mode 100644 index 0000000..42ab1f7 --- /dev/null +++ b/tools/archive.py @@ -0,0 +1,156 @@ +"""Archive extraction tools — generic unzip for tree-mode evidence sources. + +Mobile extractions (iOS / Android backups), archive sources, and shared +work products all arrive as .zip files. The forensic agents work on the +unpacked tree; this module is the single entry point for safely turning +an archive into a directory. + +Stdlib-only. No graph dependency. +""" + +from __future__ import annotations + +import logging +import os +import zipfile +from pathlib import Path + +logger = logging.getLogger(__name__) + + +def _is_within(base: Path, target: Path) -> bool: + """True when *target* resolves to a path inside *base* — symlink-safe.""" + try: + base_r = base.resolve() + target_r = target.resolve() + except OSError: + return False + try: + target_r.relative_to(base_r) + except ValueError: + return False + return True + + +def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool: + """True when any entry has the zip 'encrypted' flag bit set.""" + return any(info.flag_bits & 0x1 for info in zf.infolist()) + + +def _do_extract( + zip_path: str, + dest_dir: str, + password: str | None = None, +) -> str: + """Shared core for unzip_archive (async) and unzip_archive_sync. + + Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files + whose target already exists at the matching size are skipped). Returns + a multi-line summary the agent can read directly. + """ + zp = Path(zip_path) + if not zp.is_file(): + return f"Error: {zip_path} is not a file." + + dest = Path(dest_dir) + dest.mkdir(parents=True, exist_ok=True) + + extracted = 0 + skipped: list[str] = [] + total_bytes = 0 + pwd_bytes = password.encode("utf-8") if password else None + + try: + with zipfile.ZipFile(zp, "r") as zf: + encrypted = _is_zip_encrypted(zf) + if encrypted and pwd_bytes is None: + return ( + f"Error: {zip_path} is password-protected. " + f"Provide the password via case.yaml's " + f"meta.password on this source, or pass `password=` " + f"explicitly. Stdlib zipfile only supports the legacy " + f"ZipCrypto algorithm — AES-encrypted zips (created by " + f"7-Zip / WinZip) need an external tool like 7z." + ) + for info in zf.infolist(): + name = info.filename + # Block absolute paths and parent-escape attempts up front. + if name.startswith(("/", "\\")) or ".." in Path(name).parts: + skipped.append(f"escape: {name}") + continue + target = dest / name + if not _is_within(dest, target): + skipped.append(f"escape: {name}") + continue + # Symlink entries — skip rather than risk traversing out. + if info.external_attr >> 16 & 0o120000 == 0o120000: + skipped.append(f"symlink: {name}") + continue + if info.is_dir(): + target.mkdir(parents=True, exist_ok=True) + continue + # Skip if already extracted with matching size (idempotent rerun). + if target.exists() and target.stat().st_size == info.file_size: + continue + target.parent.mkdir(parents=True, exist_ok=True) + try: + with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out: + while True: + chunk = src.read(65536) + if not chunk: + break + out.write(chunk) + except RuntimeError as e: + # zipfile raises RuntimeError for bad-password / AES-encrypted. + msg = str(e) + if "Bad password" in msg or "password required" in msg: + return ( + f"Error: bad or missing password for {zip_path}. " + f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib " + f"cannot decrypt it — use `7z x -p ...` " + f"externally and point the source path at the result." + ) + raise + extracted += 1 + total_bytes += info.file_size + except zipfile.BadZipFile as e: + return f"Error: {zip_path} is not a valid zip archive: {e}" + except Exception as e: + return f"Error extracting {zip_path}: {e}" + + parts = [ + f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}", + ] + if skipped: + parts.append(f"Skipped {len(skipped)} unsafe entries:") + for s in skipped[:10]: + parts.append(f" - {s}") + if len(skipped) > 10: + parts.append(f" ... ({len(skipped) - 10} more)") + return "\n".join(parts) + + +async def unzip_archive( + zip_path: str, dest_dir: str, password: str | None = None, +) -> str: + """Extract *zip_path* into *dest_dir*. Idempotent on rerun. + + Defensive: rejects entries with absolute paths, leading '..', or that + would resolve outside *dest_dir* (the classic zip-slip vector). Symlink + entries are skipped (we never follow symlinks into the host filesystem). + Password-protected zips need the password argument (or + ``meta.password`` on the source in case.yaml) — stdlib ``zipfile`` + only handles the legacy ZipCrypto algorithm. + """ + return _do_extract(zip_path, dest_dir, password) + + +def unzip_archive_sync( + zip_path: str, dest_dir: str, password: str | None = None, +) -> str: + """Synchronous variant of :func:`unzip_archive` for startup-time prepare_source. + + Same behaviour, just no async wrapping — used before the event loop + starts so we don't have to spin one up just to unpack a zip. + """ + return _do_extract(zip_path, dest_dir, password) diff --git a/tools/media.py b/tools/media.py new file mode 100644 index 0000000..7738107 --- /dev/null +++ b/tools/media.py @@ -0,0 +1,87 @@ +"""Media plugin — OCR for image evidence. + +DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run +OCR locally for any image-bearing evidence. Tesseract via pytesseract is +the default; if the runtime is missing those packages, the tool returns a +clear install hint rather than failing silently. +""" + +from __future__ import annotations + +import logging +import os +from pathlib import Path + +logger = logging.getLogger(__name__) + +MAX_OUTPUT = 8000 + +_INSTALL_HINT = ( + "Error: OCR runtime not available. Install with:\n" + " pip install pytesseract pillow\n" + " sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n" + "(or the equivalent for your distribution). Then retry." +) + + +def _has_ocr_runtime() -> tuple[bool, str]: + """Return (available, reason). reason is empty when available.""" + try: + import pytesseract # noqa: F401 + from PIL import Image # noqa: F401 + except ImportError as e: + return False, f"missing python package: {e.name}" + # Check the tesseract binary too. + import shutil + if shutil.which("tesseract") is None: + return False, "tesseract binary not on PATH" + return True, "" + + +async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str: + """Extract text from an image via tesseract. + + *lang* defaults to English + Simplified + Traditional Chinese, matching + the multi-language artefacts the current case involves. Pass a single + language code (e.g. ``"eng"``) to skip language packs that aren't + installed. + """ + p = Path(file_path) + if not p.is_file(): + return f"Error: {file_path} is not a file." + available, reason = _has_ocr_runtime() + if not available: + return f"{_INSTALL_HINT}\n[detail: {reason}]" + + import pytesseract + from PIL import Image + + try: + img = Image.open(p) + except Exception as e: + return f"Error: could not open image {file_path}: {e}" + + try: + text = pytesseract.image_to_string(img, lang=lang) + except pytesseract.TesseractError as e: + msg = str(e) + if "Failed loading language" in msg or "Error opening data file" in msg: + return ( + f"Error: tesseract is installed but missing language pack(s) for {lang!r}. " + f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a " + f"different `lang`. Detail: {msg}" + ) + return f"Error running tesseract: {msg}" + except Exception as e: + return f"Error during OCR: {e}" + + size = p.stat().st_size + header = ( + f"ocr: {file_path} ({size} bytes, lang={lang}, " + f"{len(text.splitlines())} line(s))\n" + ) + if len(text) > MAX_OUTPUT - len(header): + body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]" + else: + body = text + return header + body diff --git a/tools/mobile_android.py b/tools/mobile_android.py new file mode 100644 index 0000000..5d6e44b --- /dev/null +++ b/tools/mobile_android.py @@ -0,0 +1,160 @@ +"""Android plugin tools — partition survey + sector translation. + +DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source; +``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK +toolchain already handles ext4/F2FS reads, so once the agent picks a partition +offset the standard list_directory / extract_file / search_strings tools work. + +Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but +TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions`` +emits BOTH unit systems so the agent can plug the right ``partition_offset`` +value into ``set_active_partition``. +""" + +from __future__ import annotations + +import asyncio +import logging +import re +from pathlib import Path + +logger = logging.getLogger(__name__) + +MAX_OUTPUT = 8000 + +# Partitions worth flagging when we encounter them — informs the agent's +# strategy. Not exhaustive; just opinionated hints. +_PARTITION_HINTS: dict[str, str] = { + "EFS": "modem firmware area; often contains IMEI / MAC / serial", + "PARAM": "boot parameters; cmdline + flags", + "BOOT": "kernel + initramfs (raw image)", + "RECOVERY": "recovery image (raw)", + "SYSTEM": "Android /system — read-only OS partition (ext4)", + "CACHE": "downloaded OTA payloads; usually transient", + "USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices", + "PERSISTENT": "Samsung persistent partition; carrier/device flags", + "STEADY": "Samsung steady-state config", + "HIDDEN": "Samsung hidden partition; check before assuming empty", + "CP_DEBUG": "modem debug logs", + "TOMBSTONES": "userland crash dumps", +} + + +def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]: + """Parse mmls output, returning (sector_size_bytes, partitions). + + mmls states ``Units are in N-byte sectors`` near the top; we extract N + to translate between image-native units and the 512-byte units TSK + tools accept via ``-o``. + """ + sector_size = 512 + m = re.search(r"Units are in (\d+)-byte sectors", output) + if m: + sector_size = int(m.group(1)) + + parts: list[dict] = [] + for line in output.splitlines(): + m = re.match( + r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)", + line, + ) + if not m: + continue + _row, slot, start, end, length, desc = m.groups() + if slot == "Meta" or slot.startswith("---"): + continue + parts.append({ + "slot": slot, + "start_native": int(start), + "end_native": int(end), + "length_native": int(length), + "description": desc.strip(), + }) + return sector_size, parts + + +async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]: + proc = await asyncio.create_subprocess_exec( + *cmd, + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + try: + stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout) + except asyncio.TimeoutError: + proc.kill() + return 124, "", f"timeout after {timeout}s" + return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace") + + +_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE) + + +async def _classify_partition(image_path: str, sector_offset_512: int) -> str: + """Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'. + + fsstat's "Cannot determine file system type" is treated as 'unknown' — + typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data + (modern userdata under FBE). + """ + rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15) + if rc != 0: + return "unknown" + m = _FS_TYPE_RE.search(out) + if m: + return m.group(1) + return "unknown" + + +async def probe_android_partitions(image_path: str) -> str: + """Survey every partition on an Android disk dump and return a table. + + The agent reads this once to plan its work: which partitions are + Ext4/F2FS (use TSK), which are raw (extract image / strings only), + which are encrypted (skip until decrypted). + """ + p = Path(image_path) + if not p.is_file(): + return f"Error: {image_path} is not a file." + + rc, out, err = await _run(["mmls", str(p)], timeout=30) + if rc != 0: + return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}" + + sector_size, parts = _parse_mmls_with_unit(out) + if not parts: + return f"No partitions detected in {image_path}." + + lines = [ + f"Android partition survey: {image_path}", + f" mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)", + f" {len(parts)} data partitions", + "", + "| slot | name | start (native) | start (512-sector) | size | fs_type | hint |", + "|---|---|---:|---:|---|---|---|", + ] + for prt in parts: + sector_512 = prt["start_native"] * sector_size // 512 + bytes_size = prt["length_native"] * sector_size + # human-readable size + if bytes_size >= 1 << 30: + size_h = f"{bytes_size / (1 << 30):.1f} GB" + elif bytes_size >= 1 << 20: + size_h = f"{bytes_size / (1 << 20):.1f} MB" + else: + size_h = f"{bytes_size // 1024} KB" + fs_type = await _classify_partition(str(p), sector_512) + # Try to extract a friendly partition name from the description + # (mmls description often includes the partition name uppercase). + name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"]) + pname = name_match.group(0) if name_match else prt["description"][:20] + hint = _PARTITION_HINTS.get(pname, "") + lines.append( + f"| {prt['slot']} | {pname} | {prt['start_native']} | " + f"{sector_512} | {size_h} | {fs_type} | {hint} |" + ) + + body = "\n".join(lines) + if len(body) > MAX_OUTPUT: + body = body[:MAX_OUTPUT] + "\n\n[truncated]" + return body diff --git a/tools/mobile_ios.py b/tools/mobile_ios.py new file mode 100644 index 0000000..cfc62d0 --- /dev/null +++ b/tools/mobile_ios.py @@ -0,0 +1,274 @@ +"""iOS extraction parsers — plist / sqlite / keychain / iDevice info. + +DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth +Kit, no graph dependency. Stdlib + sqlite3 only. + +iOS extractions typically arrive as a zip containing domain-rooted trees +(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary, +binary/XML plists, and several SQLite databases (sms.db, AddressBook, +keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite). +""" + +from __future__ import annotations + +import asyncio +import json +import logging +import os +import plistlib +import re +import sqlite3 +from pathlib import Path + +logger = logging.getLogger(__name__) + +# Output cap (chars) — keeps a single tool result under the LLM context budget. +MAX_OUTPUT = 8000 + + +def _trunc(text: str, limit: int = MAX_OUTPUT) -> str: + if len(text) <= limit: + return text + return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]" + + +# --------------------------------------------------------------------------- +# plist +# --------------------------------------------------------------------------- + +def _to_jsonable(obj): + """Make plist values JSON-serializable: bytes → hex preview, dates → iso.""" + import datetime + if isinstance(obj, bytes): + if len(obj) <= 64: + return {"_bytes_hex": obj.hex()} + return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)} + if isinstance(obj, datetime.datetime): + return obj.isoformat() + if isinstance(obj, dict): + return {str(k): _to_jsonable(v) for k, v in obj.items()} + if isinstance(obj, (list, tuple)): + return [_to_jsonable(v) for v in obj] + return obj + + +async def parse_plist(file_path: str) -> str: + """Parse a .plist file (XML or binary) and return its contents as JSON. + + Both formats are handled transparently by ``plistlib.load``. + """ + p = Path(file_path) + if not p.is_file(): + return f"Error: {file_path} is not a file." + try: + with open(p, "rb") as f: + data = plistlib.load(f) + except plistlib.InvalidFileException as e: + return f"Error: {file_path} is not a valid plist ({e})" + except Exception as e: + return f"Error parsing plist {file_path}: {e}" + + serial = _to_jsonable(data) + rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str) + header = f"plist: {file_path} ({p.stat().st_size} bytes)\n" + return header + _trunc(rendered) + + +# --------------------------------------------------------------------------- +# sqlite +# --------------------------------------------------------------------------- + +_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE) + + +async def sqlite_tables(db_path: str) -> str: + """List user tables in a sqlite file with row counts and column names.""" + p = Path(db_path) + if not p.is_file(): + return f"Error: {db_path} is not a file." + try: + conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True) + except sqlite3.OperationalError as e: + return f"Error opening {db_path} (read-only): {e}" + try: + cur = conn.cursor() + cur.execute( + "SELECT name FROM sqlite_master " + "WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name" + ) + tables = [r[0] for r in cur.fetchall()] + if not tables: + return f"No user tables in {db_path}." + lines = [f"sqlite: {db_path} ({len(tables)} tables)"] + for name in tables: + try: + cur.execute(f"SELECT COUNT(*) FROM \"{name}\"") + count = cur.fetchone()[0] + except sqlite3.DatabaseError as e: + count = f"(count failed: {e})" + try: + cur.execute(f"PRAGMA table_info(\"{name}\")") + cols = [r[1] for r in cur.fetchall()] + except sqlite3.DatabaseError: + cols = [] + lines.append(f" {name}: {count} row(s); cols: {', '.join(cols)}") + return _trunc("\n".join(lines)) + finally: + conn.close() + + +async def sqlite_query( + db_path: str, + query: str, + max_rows: int = 100, +) -> str: + """Run a single read-only SELECT against a sqlite file. + + Multi-statement queries and anything other than a SELECT are rejected + (we open the database in read-only mode anyway, so writes would fail + too — but the explicit check keeps the agent honest). + """ + if not _SELECT_RE.match(query): + return "Error: only single SELECT statements are allowed." + if ";" in query.rstrip(";"): + return "Error: multi-statement queries are not allowed." + + p = Path(db_path) + if not p.is_file(): + return f"Error: {db_path} is not a file." + try: + conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True) + except sqlite3.OperationalError as e: + return f"Error opening {db_path} (read-only): {e}" + + try: + cur = conn.cursor() + try: + cur.execute(query) + except sqlite3.DatabaseError as e: + return f"Error executing query: {e}" + cols = [d[0] for d in cur.description] if cur.description else [] + rows = cur.fetchmany(max(1, int(max_rows))) + lines = [ + f"sqlite query: {db_path}", + f"columns: {cols}", + f"rows ({len(rows)}, capped at {max_rows}):", + ] + for row in rows: + rendered = [ + (v.hex() if isinstance(v, bytes) else str(v)) + for v in row + ] + lines.append(" " + " | ".join(rendered)) + return _trunc("\n".join(lines)) + finally: + conn.close() + + +# --------------------------------------------------------------------------- +# iOS keychain (keychain-2.db) +# --------------------------------------------------------------------------- + +# Standard iOS keychain tables. genp = generic passwords, inet = internet +# passwords, cert = certificates, keys = key material. Forensic extractions +# of locked keychains have ``data`` columns NULL but accounting metadata +# (agrp, acct, svce) intact — already useful for attribution work. +_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys") + + +async def parse_ios_keychain(keychain_root: str) -> str: + """Locate and summarize iOS keychain entries under *keychain_root*. + + *keychain_root* may be a path to ``keychain-2.db`` directly or to a + directory that contains it (e.g. ``.../var/keychains``). + """ + root = Path(keychain_root) + db: Path | None = None + if root.is_file() and root.name == "keychain-2.db": + db = root + elif root.is_dir(): + candidate = root / "keychain-2.db" + if candidate.is_file(): + db = candidate + else: + # Fall back to a shallow recursive search. + for found in root.rglob("keychain-2.db"): + db = found + break + if db is None: + return f"No keychain-2.db found under {keychain_root}." + + try: + conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True) + except sqlite3.OperationalError as e: + return f"Error opening {db}: {e}" + + try: + cur = conn.cursor() + cur.execute( + "SELECT name FROM sqlite_master " + "WHERE type='table' AND name IN ({})".format( + ",".join("?" * len(_KEYCHAIN_TABLES)) + ), + _KEYCHAIN_TABLES, + ) + present = [r[0] for r in cur.fetchall()] + if not present: + return f"keychain-2.db at {db} has no recognised tables." + + lines = [f"keychain: {db}"] + for name in present: + cur.execute(f"SELECT COUNT(*) FROM \"{name}\"") + count = cur.fetchone()[0] + lines.append(f"\n[{name}] {count} row(s)") + cur.execute(f"PRAGMA table_info(\"{name}\")") + cols = [r[1] for r in cur.fetchall()] + # Pick a useful subset of accounting columns when present. + preferred = [ + c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr") + if c in cols + ] + if not preferred: + preferred = cols[:5] + sel = ", ".join(f'"{c}"' for c in preferred) + cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30") + for row in cur.fetchall(): + lines.append(" " + " | ".join( + (v.hex() if isinstance(v, bytes) else str(v)) + for v in row + )) + return _trunc("\n".join(lines)) + finally: + conn.close() + + +# --------------------------------------------------------------------------- +# iDevice_info.txt +# --------------------------------------------------------------------------- + +async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str: + """Read the standard iDevice_info.txt summary at the root of an iOS extraction. + + The file is a flat ``Key: value`` dump from libimobiledevice / native + extraction tools. We surface the first *max_chars* of content verbatim + — the agent can search/extract specific keys via search_text_file if + the head isn't enough. + """ + p = Path(file_path) + if p.is_dir(): + # Be helpful: if the agent passed the extraction root, find the file. + candidate = p / "iDevice_info.txt" + if candidate.is_file(): + p = candidate + if not p.is_file(): + return f"Error: {file_path} is not a file." + try: + with open(p, "r", encoding="utf-8", errors="replace") as f: + content = f.read(max_chars) + size = p.stat().st_size + header = f"iDevice_info: {p} ({size} bytes)\n" + if size > max_chars: + content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]" + return header + content + except Exception as e: + return f"Error reading {file_path}: {e}" diff --git a/tools/parsers.py b/tools/parsers.py index 186613a..967ed9f 100644 --- a/tools/parsers.py +++ b/tools/parsers.py @@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str: return f"[Error parsing Prefetch: {e}]" -async def list_extracted_dir(dir_path: str) -> str: - """List files in an extracted directory.""" +async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str: + """Smart summary of a (potentially huge) extracted tree. + + Earlier versions dumped up to 200 random entries then truncated — that + leaves the agent blind on 10k+-file iOS extractions. The new layout + returns a compact summary that scales: total counts, extension + breakdown, top-level directories with their sizes, and the largest + files. For targeted lookups (e.g. find every ``*.sqlite`` under the + tree) the agent should use ``find_files`` instead. + """ + if not os.path.isdir(dir_path): + return f"[Error: {dir_path} is not a directory]" + try: - entries = [] - for root, dirs, files in os.walk(dir_path): + total_files = 0 + total_bytes = 0 + ext_counts: dict[str, int] = {} + ext_bytes: dict[str, int] = {} + top_level_dirs: dict[str, dict] = {} + biggest: list[tuple[int, str]] = [] # (size, relpath) + + dir_path_abs = os.path.abspath(dir_path) + for root, dirs, files in os.walk(dir_path_abs): + # Track top-level directory aggregates (cheap; no per-entry cost + # beyond the walk we're already doing). + rel_root = os.path.relpath(root, dir_path_abs) + if rel_root == ".": + top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs} + top_level_dirs.update(top_dirs) + top_key = None + else: + top_key = rel_root.split(os.sep, 1)[0] + if top_key not in top_level_dirs: + top_level_dirs[top_key] = {"files": 0, "bytes": 0} + for f in files: full = os.path.join(root, f) - rel = os.path.relpath(full, dir_path) - size = os.path.getsize(full) - entries.append(f" {rel} ({size} bytes)") - if len(entries) > 200: - entries.append(f" ... (truncated)") - break + try: + size = os.path.getsize(full) + except OSError: + continue + total_files += 1 + total_bytes += size + ext = os.path.splitext(f)[1].lower() or "(no ext)" + ext_counts[ext] = ext_counts.get(ext, 0) + 1 + ext_bytes[ext] = ext_bytes.get(ext, 0) + size + if top_key is not None: + top_level_dirs[top_key]["files"] += 1 + top_level_dirs[top_key]["bytes"] += size + # Maintain a top-10 largest list cheaply (bounded insertion). + if len(biggest) < 10: + biggest.append((size, os.path.relpath(full, dir_path_abs))) + biggest.sort(reverse=True) + elif size > biggest[-1][0]: + biggest[-1] = (size, os.path.relpath(full, dir_path_abs)) + biggest.sort(reverse=True) - return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries) + def _human(n: int) -> str: + for unit in ("B", "KB", "MB", "GB"): + if n < 1024: + return f"{n:.1f}{unit}" if unit != "B" else f"{n}B" + n /= 1024 + return f"{n:.1f}TB" + + lines = [ + f"Directory: {dir_path}", + f" Total: {total_files} file(s), {_human(total_bytes)}", + ] + + # Top-level directory layout (immediate children, sorted by file count). + if top_level_dirs: + lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):") + sorted_tlds = sorted( + top_level_dirs.items(), key=lambda kv: -kv[1]["files"], + )[:15] + for d, stats in sorted_tlds: + lines.append( + f" {d}/ ({stats['files']} files, {_human(stats['bytes'])})" + ) + if len(top_level_dirs) > 15: + lines.append(f" ... ({len(top_level_dirs) - 15} more top-level dirs)") + + # Extension breakdown. + if ext_counts: + lines.append(f"\nExtension breakdown (top 15):") + for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]: + lines.append( + f" {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}" + ) + + # Largest files (often the highest-value forensic targets). + if biggest: + lines.append("\nLargest files:") + for size, rel in biggest: + lines.append(f" {rel} ({_human(size)})") + + lines.append( + f"\nNext step: call find_files with a pattern like " + f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts." + ) + + return "\n".join(lines) except Exception as e: return f"[Error listing {dir_path}: {e}]" + + +async def find_files( + root: str, + pattern: str, + max_results: int = 500, +) -> str: + """Recursively find files under *root* whose path matches *pattern*. + + Uses fnmatch-style globs against the *full relative path*; ``**`` is + treated as "any number of path segments" (so ``**/*.plist`` finds + every plist no matter how deep). Examples: + + - ``**/sms.db`` — iOS SMS database + - ``**/keychain-2.db`` — iOS keychain + - ``**/ChatStorage.sqlite`` — WhatsApp app store + - ``HomeDomain/Library/**`` — anchor at a known iOS domain root + - ``**/*.{plist,sqlite,db}`` — multi-extension (use 2+ calls or a regex if needed) + + Results are sorted by size descending — the biggest hits usually + matter most. Capped at *max_results* to keep the LLM context bounded. + """ + import fnmatch + + if not os.path.isdir(root): + return f"[Error: {root} is not a directory]" + + root_abs = os.path.abspath(root) + # Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /). + # fnmatch doesn't natively distinguish segment vs path; expanding ``**`` + # to ``*`` and letting fnmatch match the full relpath is good enough for + # forensic lookups. + fn_pattern = pattern.replace("**", "*") + + hits: list[tuple[int, str]] = [] + truncated = False + try: + for dirpath, _dirs, files in os.walk(root_abs): + for f in files: + full = os.path.join(dirpath, f) + rel = os.path.relpath(full, root_abs) + if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern): + try: + size = os.path.getsize(full) + except OSError: + size = 0 + hits.append((size, rel)) + if len(hits) >= max_results * 4: + # Hard upper bound to keep the walk cheap on huge trees. + truncated = True + break + if truncated: + break + except Exception as e: + return f"[Error searching {root}: {e}]" + + hits.sort(reverse=True) + if len(hits) > max_results: + truncated = True + hits = hits[:max_results] + + lines = [ + f"find_files: pattern={pattern!r} under {root}", + f" matches: {len(hits)}" + (" (truncated)" if truncated else ""), + ] + if not hits: + lines.append(" (no matches)") + else: + for size, rel in hits: + lines.append(f" {rel} ({size} bytes)") + return "\n".join(lines) diff --git a/uv.lock b/uv.lock index 4634a1f..4a39485 100644 --- a/uv.lock +++ b/uv.lock @@ -170,6 +170,8 @@ source = { virtual = "." } dependencies = [ { name = "httpx", extra = ["socks"] }, { name = "openai" }, + { name = "pillow" }, + { name = "pytesseract" }, { name = "pyyaml" }, { name = "regipy" }, ] @@ -184,6 +186,8 @@ dev = [ requires-dist = [ { name = "httpx", extras = ["socks"], specifier = ">=0.28.1" }, { name = "openai", specifier = ">=2.36.0" }, + { name = "pillow", specifier = ">=12.2.0" }, + { name = "pytesseract", specifier = ">=0.3.13" }, { name = "pyyaml" }, { name = "regipy", specifier = ">=6.2.1" }, ] @@ -222,6 +226,39 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" }, ] +[[package]] +name = "pillow" +version = "12.2.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" }, + { url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" }, + { url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" }, + { url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" }, + { url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" }, + { url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" }, + { url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" }, + { url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" }, + { url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" }, + { url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" }, + { url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" }, + { url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" }, + { url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" }, + { url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" }, + { url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" }, + { url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" }, + { url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" }, + { url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" }, + { url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" }, + { url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" }, + { url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" }, + { url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" }, + { url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" }, + { url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" }, + { url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" }, +] + [[package]] name = "pluggy" version = "1.6.0" @@ -296,6 +333,19 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" }, ] +[[package]] +name = "pytesseract" +version = "0.3.13" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "packaging" }, + { name = "pillow" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" }, +] + [[package]] name = "pytest" version = "9.0.2"