Go to file

BattleTag 6b485b98f7 fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error

First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.

Two-layer fix in evidence_graph.validate_fact_grounding:

  Layer A (silent heal): when the cited inv-id misses, search the same
  agent / task's invocations for one whose output contains the value
  (strict or normalised substring). If exactly one matches, rewrite
  fact.invocation_id in place and accept. Multi-match is NOT auto-
  rescued — the candidate ids go back to the LLM so it picks deliberately.

  Layer B (informative retry): GroundingError now appends the agent's
  recent invocation ids and a brief tool-call summary, so the LLM has
  the real ids in front of it for the next attempt rather than
  fabricating again from memory.

Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-21 02:14:20 -10:00

agents

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

tests

fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error

2026-05-21 02:14:20 -10:00

tools

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

.python-version

Initial commit

2026-05-09 17:36:26 +08:00

agent_factory.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

base_agent.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

case.example.yaml

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

case.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

DESIGN.md

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

evidence_graph.py

fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error

2026-05-21 02:14:20 -10:00

llm_client.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

log_config.py

Initial commit

2026-05-09 17:36:26 +08:00

main.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

orchestrator.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

pyproject.toml

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

README.md

refactor: native tool calling + generic forced-retry + terminal exit

2026-05-13 13:51:19 +08:00

regenerate_report.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

tool_registry.py

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

uv.lock

feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source

2026-05-21 02:12:10 -10:00

README.md

MASForensics

Multi-Agent System for Digital Forensics — 基于大语言模型的多智能体电子取证系统。

系统通过 7 个专业化 Agent 协同工作，对磁盘镜像进行自动化取证分析，最终生成结构化的取证报告。Agent 之间不直接通信，通过共享的 EvidenceGraph（证据知识图）协作。

架构

main.py                          入口：配置加载、镜像选择、断连恢复
  │
  ├── Orchestrator               五阶段流水线调度
  │     │
  │     ├── FileSystemAgent      分区/文件系统、目录、删除文件、Prefetch
  │     ├── HypothesisAgent      生成假设，链接已有证据
  │     ├── RegistryAgent        注册表分析（SYSTEM/SOFTWARE/SAM/NTUSER.DAT）
  │     ├── CommunicationAgent   邮件、IRC/mIRC 聊天记录
  │     ├── NetworkAgent         浏览器历史、PCAP 抓包
  │     ├── TimelineAgent        跨类别时间线关联
  │     └── ReportAgent          综合报告生成
  │
  ├── EvidenceGraph              带类型边的证据知识图（自动持久化）
  ├── AgentFactory               角色模板 + 动态 Agent 组合
  ├── ToolRegistry               工具目录 + 结果缓存
  └── LLMClient                  Claude API 客户端（异步、tool-use）

EvidenceGraph：证据知识图

三类节点 + 类型化加权边：

节点	前缀	含义
`Phenomenon`	`ph-*`	可观测的取证产物（一条具体发现）
`Hypothesis`	`hyp-*`	解释性假设（待验证的论断）
`Entity`	`ent-*`	人、程序、主机、IP 等可复现的实体

Phenomenon → Hypothesis 的边类型与权重写死在 HYPOTHESIS_EDGE_WEIGHTS：

TODO

当前流程跑通以后，寻找自适应方案

边类型	权重	语义
`direct_evidence`	+0.25	现象就是假设所述行为本身
`supports`	+0.15	与假设一致但非决定性
`consequence_observed`	+0.15	观察到假设预期的结果
`prerequisite_met`	+0.10	满足假设的前置条件
`weakens`	−0.10	降低假设可能性
`contradicts`	−0.20	直接反驳假设

置信度更新公式（收敛于 [0, 1]）：

正向边：delta = weight * (1 - old_conf)
负向边：delta = weight * old_conf

跨阈值自动转状态：≥ 0.8 → supported，≤ 0.2 → refuted，跑完仍 active → inconclusive。LLM 只负责挑边类型（分类任务），权重表与状态转移由代码裁决，避免数值幻觉。

新增 Phenomenon 时通过 Jaccard 相似度合并（title > 0.6 且 description > 0.4 即视为重复，合并后提升置信度并追加 corroborating_agents），避免同一发现被重复入图。

五阶段流水线

阶段	说明
Phase 1	FileSystemAgent 初勘镜像，识别分区/文件系统/关键路径，产出首批 Phenomenon
Phase 2	假设生成 — 优先读 `config.yaml:hypotheses`；未配置则由 HypothesisAgent 从 Phase 1 现象自动生成 3-7 个
Phase 3	假设驱动调查（默认 5 轮迭代）。每轮：一次性为所有 active 假设产出 leads → 按 agent 类型并发派发（信号量 = 3）→ 一次性判定新现象与各假设的关系。所有假设收敛即提前退出。末尾：失败 lead 重试一次 + Gap Analysis
Phase 4	TimelineAgent 用 `build_filesystem_timeline` 生成 MAC 时间线，与 Phenomenon 时间戳关联
Phase 5	ReportAgent 综合假设、证据、实体，生成 Markdown 报告

Investigation Areas（hypothesis-derived）

Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 InvestigationArea（snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids）。Areas 存进 graph.investigation_areas，序列化到 runs/<ts>/investigation_areas.json。两个用途：

Phase 3 主循环提示 — 每个 hypothesis 块附 Expected areas: a, b, c，LLM 仍自由选 lead 但有软引导
Phase 3 末尾 Gap Analysis — 两层判定覆盖情况：
- 关键词匹配：扫 Phenomenon 标题/描述对照 area.expected_keywords
- 工具命中：检查 area.expected_tools 是否实际调用过

未覆盖的 area 自动派 lead（suggested_agent + priority + motivating_hypothesis_ids[0] 透传给 Lead.hypothesis_id 保留 provenance），最多 3 轮补漏。

手动 override：config.yaml:investigation_areas 默认注释掉，纯 LLM 派生。取消注释可添加强制必查的领域，会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖（LLM 只会 augment keyword/tool 列表）。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。

Agent 体系

AgentFactory 维护 7 个角色模板（ROLE_TEMPLATES），每个模板指定默认工具集。HypothesisAgent 和 ReportAgent 是 BaseAgent 的子类（额外注册专用工具），其余 5 个 Agent 直接由 BaseAgent + 工具列表生成。

Agent 工作流

BaseAgent.run 在 system prompt 中强制四阶段：

A. INVESTIGATE   先查图状态 / Asset Library，再调取证工具
B. RECORD        每条发现写 add_phenomenon
C. LINK          按需 link_to_entity，但禁止凭记忆引用 ph-id，必须先 list_phenomena
D. ANSWER        以上完成后再给最终答复

prompt 内置反幻觉规则：只允许记录工具输出中逐字出现的内容；时间戳/路径/inode 必须来自工具返回；输出被截断须标 [truncated]。

动态 Agent 组合

AgentFactory.create_specialized_agent() 应对能力缺口：将工具目录与假设描述喂给 LLM，由其挑 3-8 个工具并写角色描述，工厂据此实例化新 Agent 并缓存。

工具系统

tool_registry.py 启动时调用 register_all_tools(image_path, partition_offset, graph)，将所有工具一次性注册到全局 TOOL_CATALOG。

工具结果缓存

CACHEABLE_TOOLS 集合标记纯读取/确定性工具（partition_info、list_directory、parse_registry_key …）。镜像只读，同 args 调用产出固定，命中缓存直接复用，错误结果不入缓存。

Asset Library

EvidenceGraph.asset_library 按 inode 索引所有已提取文件，避免重复 extract。Agent 通过 list_assets / find_extracted_file 工具查询。新文件按文件名自动归类到 registry_hive / chat_log / prefetch / network_capture / recycle_bin 等十类之一。

取证工具链

Sleuth Kit（磁盘取证） — 异步子进程调用 TSK：

工具	用途
`mmls`	分区表分析
`fsstat`	文件系统元数据
`fls`	目录列举（含已删除文件）
`icat`	按 inode 提取文件
`srch_strings`	磁盘字符串搜索
`fls -m`	MAC 时间线生成

regipy（注册表解析） — 直接读 SYSTEM / SOFTWARE / SAM / NTUSER.DAT 二进制，提取系统信息、用户账户、网络配置、已安装软件、邮件账户、关机时间等。

文件解析器 — Prefetch 二进制（.pf）、PCAP 字符串提取（HTTP 请求 / Host / Cookie / UA）、通用文本与二进制读取、正则搜索、Hex dump。

断连恢复与运行归档

三层防护：

EvidenceGraph 自动持久化 — 每次 add_phenomenon / add_hypothesis / add_edge / add_lead 等写操作均自动落盘（原子写 .tmp 后 rename）
Agent 级容错 — 单 Agent 失败 → 该 lead 标 failed，连续 3 次失败触发 AnalysisAborted 优雅退出；Phase 3 末尾对失败 lead 重试一次（retry=True 防无限循环）
续跑 — main.py 启动时扫 runs/*/graph_state.json，发现存在但缺 run_metadata.json 的目录即提示恢复，并按 graph 当前状态决定从哪一阶段续起

运行归档目录

runs/
  2026-04-02T14-30-00/
    config.yaml                    配置快照
    graph_state.json               实时图状态（续跑用）
    phenomena.json                 现象导出
    hypotheses.json                假设 + 置信度日志
    entities.json                  实体
    edges.json                     边
    leads.json                     线索及最终状态
    extracted/                     从镜像提取的文件
    <image>_forensic_report.md     取证报告
    run_metadata.json              运行元数据（时长、统计、错误）
    masforensics.log               运行日志

快速开始

环境要求

Python >= 3.14
The Sleuth Kit（系统安装，提供 mmls、fls、icat 等命令）
磁盘镜像文件

安装

uv sync

配置

编辑 config.yaml：

agent:
  base_url: "https://your-api-proxy.com"
  api_key: "sk-your-key"
  model: "claude-sonnet-4-6"
  max_tokens: 16384

max_investigation_rounds: 5          # Phase 3 最大迭代轮数

# hypotheses:                        # 可选：手动指定初始假设
#   - title: "嫌疑人主动实施网络嗅探"
#     description: "..."

# investigation_areas:                 # 可选：手动 override（默认全 LLM 派生）
#   - area: shutdown_time              #         LLM 通过 slug dedupe 只 augment
#     agent: registry                  #         keyword/tool 列表，不覆盖 manual
#     priority: 3
#     keywords: [shutdown]
#     tools: [get_shutdown_time]

未配置 hypotheses 时由 HypothesisAgent 自动生成。

运行

python main.py                       # 交互式选镜像与分区
python main.py /path/to/image/dir    # 指定镜像目录

中断后再次运行会自动检测未完成的 run 并提示是否续跑。

仅重生成报告

跑完一次后若只想换提示词或修复报告：

python regenerate_report.py runs/<timestamp>

跳过 Phase 1-4，直接从已有 graph_state.json 重跑 ReportAgent。

项目结构

MASForensics/
├── main.py                  入口、镜像选择、断连恢复
├── orchestrator.py          五阶段流水线调度
├── evidence_graph.py        证据知识图 + 边权重表 + 持久化
├── base_agent.py            Agent 基类 + 内建 graph 工具
├── agent_factory.py         角色模板 + 动态 Agent 组合
├── tool_registry.py         工具目录 + 结果缓存 + 自动归类
├── llm_client.py            LLM API 客户端
├── log_config.py            彩色终端日志 + 文件日志
├── regenerate_report.py     从已有 graph_state 重生成报告
├── config.yaml              配置 + 调查领域 + 可选假设
├── agents/
│   ├── hypothesis.py        HypothesisAgent（add_hypothesis、link）
│   ├── report.py            ReportAgent（综合报告，自带读取工具）
│   ├── timeline.py          TimelineAgent（保留以备扩展）
│   └── ...                  filesystem/registry/communication/network（同上）
├── tools/
│   ├── sleuthkit.py         TSK 异步封装
│   ├── registry.py          regipy 解析
│   └── parsers.py           Prefetch / PCAP / 通用文件解析
├── image/                   磁盘镜像（用户放）
├── runs/                    运行归档
└── tests/
    └── test_optimizations.py

依赖

包	用途
`httpx[socks]`	异步 HTTP 客户端（支持 SOCKS 代理）
`pyyaml`	配置文件解析
`regipy`	Windows 注册表 hive 解析
`pytest` / `pytest-asyncio`	测试

默认案例

CFReDS Hacking Case（NIST 标准取证教学镜像）：

镜像：SCHARDT.001（~4.6 GB，IBM 硬盘，8 个分段）
系统：Windows XP
场景：涉嫌黑客入侵的计算机取证分析
完整镜像 MD5：AEE4FCD9301C03B3B054623CA261959A（config.yaml 含各分段 MD5 用于校验）

测试

python -m pytest tests/ -v

README.md Unescape Escape

MASForensics

架构

EvidenceGraph：证据知识图

TODO

五阶段流水线

Investigation Areas（hypothesis-derived）

Agent 体系

Agent 工作流

动态 Agent 组合

工具系统

工具结果缓存

Asset Library

取证工具链

断连恢复与运行归档

运行归档目录

快速开始

环境要求

安装

配置

运行

仅重生成报告

项目结构

依赖

默认案例

测试

README.md