Compare commits

...

20 Commits

Author SHA1 Message Date
BattleTag
8b964b5dec docs(strategist) S8/9: DESIGN.md updates + DESIGN_STRATEGIST.md spec
DESIGN_STRATEGIST.md §11. The strategist refit is the first sub-design
big enough to need its own document, so it lives as a sibling to
DESIGN.md rather than inline.

DESIGN_STRATEGIST.md (new, 543 lines) covers:
  §0  Scope, non-goals, invariants preserved
  §1  Data model (Lead extension, InvestigationRound)
  §2  Six tools (graph_overview / source_coverage / marginal_yield /
      budget_status / propose_lead / declare_investigation_complete)
      with full input_schema
  §3  InvestigationStrategist agent class
  §4  Orchestrator Phase 3 loop pseudocode
  §5  Persistence + resume strategy
  §6  config schema
  §7  Test plan (8 scenarios)
  §8  9-step build order (matches commit history)
  §9  Risks + mitigations
  §10 Open questions
  §11 Required DESIGN.md updates (applied here)
  §12 What this design does NOT solve (exam-test coverage, vision-
      capable LLM, blockchain explorer, etc.)

DESIGN.md updates per §11:
  §4.5  Note harmonic damping is now landed
  §4.9  Phase 3 table row now points at the strategist loop +
        inline summary
  §5    Lead + InvestigationRound rows added to the data-model
        summary table

This commit closes the strategist refit. All 174 tests pass / 1 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:28:06 -10:00
BattleTag
388321ee30 feat(strategist) S7: strategist resume / open-round repair
DESIGN_STRATEGIST.md §5. Support resume from a crash mid-strategist-loop.

_resume_strategist_state inspects investigation_rounds for a tail entry
without completed_at — an "open" round, i.e. one that started but never
closed. Two repairs:

  1. Mark the round closed with strategist_action="interrupted_resume"
     so the run history reflects what actually happened.
  2. Walk that round's leads; any still in "assigned" state are
     re-marked as "failed" with failure_reason="interrupted before
     complete". The Retry-failed-leads + Gap-analysis passes that run
     after the strategist loop can pick them up.

Returns max(round_number) + 1 — the round at which to resume the loop.
On a clean graph (no prior rounds) returns 1 and makes no changes.

_phase3_strategist_loop now calls this helper before the main for-loop
and uses its return value as start_round, so a resume run lands at the
right round number rather than restarting from R1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:27:05 -10:00
BattleTag
093f3cec1f feat(strategist) S6: config.example.yaml schema for strategist + budgets
DESIGN_STRATEGIST.md §6. Document the strategist loop's tunables so
operators can override defaults without code changes.

config.yaml itself is gitignored (it carries the API key), so this
commit adds config.example.yaml as the tracked schema reference.
The runtime reads config.yaml; operators copy the example as a
starting point.

  strategist.enabled       — default true; false routes Phase 3 through
                             the legacy fixed-round loop instead.
  strategist.max_rounds    — orchestrator cap (default 10).
  strategist.hard_stop_marginal_yield_zero_rounds — safety net for
                             over-eager strategist + zero-yielding
                             workers (default 3).
  budgets.tool_calls_total — global tool-call hard cap.
  budgets.strategist_rounds_max — informational, surfaced via
                             budget_status (orchestrator enforces
                             via strategist.max_rounds instead).
  budgets.wall_clock_minutes_max — wall-clock hard cap.

Comment out any budget cap to make it unbounded — Orchestrator's
_budget_exceeded treats missing caps as no-op.

Legacy max_investigation_rounds is kept as the fallback used only when
strategist.enabled is false; documented inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:26:12 -10:00
BattleTag
a103c17bdb feat(strategist) S5: Phase 3 strategist loop in orchestrator
DESIGN_STRATEGIST.md §4. Replace the fixed-round hypothesis-directed
loop with a belief-driven strategist loop that runs the strategist
agent once per round and dispatches the leads it proposes.

New helpers on Orchestrator:
  _budget_exceeded()              hard budget caps (tool_calls,
                                  wall_clock_minutes), complementing
                                  strategist self-throttling.
  _execute_strategist_lead(lead)  dispatch one lead serially; the
                                  next strategist round sees the
                                  cumulative effect of this lead's
                                  graph mutations.
  _phase3_strategist_loop()       main loop. Open round, run strategist,
                                  exit on declare_complete or empty
                                  proposals, otherwise dispatch each
                                  lead, judge new phenomena, close round,
                                  apply yield/budget checks.
  _phase3_legacy_loop()           fallback when strategist.enabled is
                                  false. Identical to the
                                  pre-DESIGN_STRATEGIST behaviour.

The run() entry point branches on strategist_cfg.enabled (default
true) and always follows up with _retry_failed_leads() + Gap
Analysis + mark_remaining_inconclusive() regardless of variant.

Orchestrator.__init__ also wires graph.budgets and
graph.run_start_monotonic from config so the budget_status tool
sees real numbers.

Integration tests use a mock strategist + mock workers to verify
declare_complete, propose_lead -> worker dispatch, zero-yield-streak
hard stop, and budget-cap-stops-the-loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:25:04 -10:00
BattleTag
65745d21dc feat(strategist) S4: InvestigationStrategist agent
DESIGN_STRATEGIST.md §3. The smallest possible agent — its entire
output per round is one decision: propose 1-3 leads (each citing a
real hypothesis it expects to move) OR declare the investigation
complete with a reason.

Constraint surface:
  mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
  terminal_tools         = ("declare_investigation_complete",)

The agent inherits the BaseAgent forced-retry mechanism: if it returns
without calling either action tool, the orchestrator force-prompts a
RECORD-only retry. declare_complete being terminal means the
tool_call_loop short-circuits the moment the strategist decides
we're done.

_register_graph_tools overrides BaseAgent's default to skip
_register_graph_write_tools entirely — the strategist NEVER writes
phenomena, entities, edges, or hypotheses directly. All graph
mutations come from the workers it dispatches via leads. This keeps
the planning agent's responsibility surface narrow: read the graph,
choose what to do next, that's it.

Prompt walks through the workflow (call graph_overview / marginal_
yield / budget_status / source_coverage first, then take exactly
one terminal action) with decision criteria for propose vs stop.

Registered in agent_factory._AGENT_CLASSES["strategist"].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:22:05 -10:00
BattleTag
ff3a05d7ce feat(strategist) S3: propose_lead / declare_investigation_complete
DESIGN_STRATEGIST.md §2.5. The strategist's two write actions.

propose_lead validates motivating_hypothesis exists in the graph,
validates expected_evidence_type is a real edge type, validates
source_id refers to a real source in the case — fast specific
errors so the strategist gets fixable feedback rather than a
generic crash. On success, calls graph.add_lead with proposed_by=
"strategist" and round_number=graph.current_strategist_round so
the round-completion code can collect this round's leads.

declare_investigation_complete sets graph.strategist_complete_requested
which the orchestrator inspects after each strategist run to decide
whether to break the loop. reason must come from a closed enum so
the audit log is consistent.

EvidenceGraph gains two transient run-context fields:
  current_strategist_round       — set by orchestrator at start of round
  strategist_complete_requested  — flipped by declare_complete

These are intentionally NOT persisted — they're per-run flags, not
graph state.

Both tools required to be in InvestigationStrategist.mandatory_record_
tools (added in S4) so the agent's forced-retry mechanism kicks in if
it returns without taking a documented decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:21:13 -10:00
BattleTag
6ebbc675c1 feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status
DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses
to ground its decision each round.

  graph_overview()      — hypotheses table (log_odds, conf, edges_in,
                          distinct_sources, recent_flip), sources table,
                          pending leads. distinct_sources is the
                          critical signal: a hypothesis with 23 edges
                          but only 1 distinct_source has fragile cross-
                          source independence and is a candidate for
                          a corroboration-seeking lead.
  source_coverage(src)  — per-source ✓/✗ against an expected-artefact
                          catalogue. Catalogue is heuristic hints,
                          NOT a forced checklist. Footer reminds the
                          strategist to investigate ✗ items only when
                          an active hypothesis depends on them — this
                          is the "应试能力存在但不被绑死" guardrail.
  marginal_yield(N)     — new phenomena / edges / status flips per
                          recent round. Two consecutive zero-yield
                          rounds = strong signal to declare complete.
  budget_status()       — usage vs caps (tool_calls, rounds, wall
                          clock). Pacing warnings at 70% / 90%.

tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type
table of (name, detector, value_for) entries. Detectors are
substring patterns on tool name + args; the matcher resolves at
call time against graph.tool_invocations. Catalogue covers iOS /
Android / Windows disk / media-collection / archive source types.

All four tools registered in tool_registry, listed as read-only in
llm_client.READ_ONLY_TOOLS for parallel execution. They go through
the invocation-logging wrapper so the strategist's reads are
themselves auditable (the wrapper does NOT cache them — graph
state changes between calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:19:54 -10:00
BattleTag
ca96f29849 feat(strategist) S1: Lead extension + InvestigationRound model
DESIGN_STRATEGIST.md §1. Foundation for the Phase 3 strategist loop.

Lead now carries four annotations that let the orchestrator measure
marginal yield per lead and dedupe strategist proposals:
  - proposed_by         (agent that proposed it: "strategist", "filesystem", …)
  - motivating_hypothesis (hyp-id the lead is meant to corroborate/refute)
  - expected_evidence_type (edge type the lead's worker should produce)
  - round_number        (0 = Phase 1 lead, ≥1 = strategist-proposed)

add_lead idempotently dedupes strategist proposals on
(motivating_hypothesis, expected_evidence_type, target_agent, source_id)
to prevent the "strategist loops on the same lead" failure mode.

New InvestigationRound dataclass records per-round provenance: before/
after hypothesis status snapshots, phenomena + edge count deltas, and
the strategist's decision_rationale. ``new_phenomena_count``,
``new_edges_count``, ``status_flips`` are derived properties that the
marginal_yield tool will use.

start_investigation_round / complete_investigation_round /
get_investigation_round / latest_round / leads_from_round complete the
lifecycle. complete is idempotent on already-closed rounds.

Lead.from_dict is forward-compat for state files written before this
commit. InvestigationRound persists as a top-level list in
graph_state.json (auto-save + load_state both wired).

EvidenceGraph also gains graph.budgets and graph.run_start_monotonic
fields that the budget_status view (S2) will read; orchestrator
populates them in S5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:18:35 -10:00
BattleTag
8020c24776 fix(graph): harmonic damping for repeated same-edge_type evidence
First full-case run (runs/2026-05-20T20-15-04/) produced hypotheses
with log_odds +31 (8 direct_evidence + 15 supports). That's the
naive-Bayes independence assumption breaking down: 15 different
phenomena all "supporting" the same hypothesis from one source are
not 15 independent pieces of evidence, they're highly correlated.
DESIGN.md §4.5 last bullet flagged this as a "未实施旋钮" — this
commit implements it.

Rule: the k-th edge of a given (hyp_id, edge_type) contributes
log_lr_base / k instead of log_lr_base. Cumulative is harmonic
sum H_N, bounded by ~ ln N. Single-edge hypotheses unaffected
(k=1 → /1 → no change). Replaying the 2026-05-20 graph's 108
edges under the new rule pulls the top hypothesis from +31.0 →
+8.75; the smallest active hypothesis from +4.0 → +2.08.

Also adds rank + log_lr_base to confidence_log entries so the
math is auditable from the persisted graph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:16:37 -10:00
BattleTag
f04ccd4bc7 fix(base_agent): forced-retry iter cap 10→30 + narrow tools to record+read
Timeline agent on the 2026-05-20 full run produced 0 phenomena: initial
round hit max_iterations=60 cap before recording, forced retry then hit
max_iterations=10 cap because every grounding-rejected call burns one
iteration in the new gateway. Two changes restore depth without re-
introducing the original "agent wanders off and never records" failure:

  1. Raise retry cap 10 → 30. With grounding auto-rescue (prev commit)
     most rejections heal on the first retry, but some still need 2-3
     turns; 10 is empirically too tight, 30 leaves headroom.

  2. Narrow the retry tool surface to RECORD + graph-write +
     read-only-graph-query tools. Investigation tools (list_directory,
     sqlite_query, parse_registry_key) are dropped on retry so the agent
     can't restart its search loop — the retry is explicitly "record
     what you already found, then stop".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:15:08 -10:00
BattleTag
6b485b98f7 fix(grounding): auto-rescue hallucinated invocation_id + list real ids in error
First full-case run (runs/2026-05-20T20-15-04/) produced 83 GroundingError
rejections, almost all from a single failure mode: LLM cites a plausible-
looking inv-XXXXXXXX that doesn't exist, while the fact's value is in fact
present verbatim in one of its real tool outputs. The agent knew which
tool it read from, it just mis-typed the citation id.

Two-layer fix in evidence_graph.validate_fact_grounding:

  Layer A (silent heal): when the cited inv-id misses, search the same
  agent / task's invocations for one whose output contains the value
  (strict or normalised substring). If exactly one matches, rewrite
  fact.invocation_id in place and accept. Multi-match is NOT auto-
  rescued — the candidate ids go back to the LLM so it picks deliberately.

  Layer B (informative retry): GroundingError now appends the agent's
  recent invocation ids and a brief tool-call summary, so the LLM has
  the real ids in front of it for the next attempt rather than
  fabricating again from memory.

Both layers preserve the design invariant: the fact's value must still
be present in a real tool output — nothing new can land grounded that
wasn't already verifiable. Cross-agent / cross-task isolation is also
preserved (rescue candidates filtered on agent + task_id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:14:20 -10:00
BattleTag
81ade8f7ac feat(refit): complete S1-S6 — case abstraction, grounding, log-odds, plugins, coref, multi-source
Consolidates the long-running refit work (DESIGN.md as authoritative spec)
into a single baseline commit. Six stages landed together:

  S1  Case + EvidenceSource abstraction; tools parameterised by source_id
      (case.py, main.py multi-source bootstrap, .bin extension support)
  S2  Grounding gateway in add_phenomenon: verified_facts cite real
      ToolInvocation ids; substring / normalised match enforced; agent +
      task scope checked. Phenomenon.description split into verified_facts
      (grounded) + interpretation (free text). [invocation: inv-xxx]
      prefix on every wrapped tool result so the LLM can cite.
  S3  Confidence as additive log-odds: edge_type → log10(LR) calibration
      table; commutative updates; supported / refuted thresholds derived
      from log_odds; hypothesis × evidence matrix view.
  S4  iOS plugin: unzip_archive + parse_plist / sqlite_tables /
      sqlite_query / parse_ios_keychain / read_idevice_info;
      IOSArtifactAgent; SOURCE_TYPE_AGENTS routing.
  S5  Cross-source entity resolution: typed identifiers on Entity,
      observe_identity gateway, auto coref hypothesis with shared /
      conflicting strong/weak LR edges, reversible same_as edges,
      actor_clusters() view.
  S6  Android partition probe + AndroidArtifactAgent; MediaAgent with
      OCR fallback; orchestrator Phase 1 iterates every analysable
      source; platform-aware get_triage_agent_type; ReportAgent renders
      actor clusters + per-source breakdown.

142 unit tests / 1 skipped — full coverage of the new gateway, log-odds
math, coref hypothesis fall-out, and orchestrator multi-source dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:12:10 -10:00
BattleTag
444d58726a refactor: native tool calling + generic forced-retry + terminal exit
- llm_client: switch tool_call_loop from text-based <tool_call> regex
  to OpenAI-native tools=[...] / structured tool_calls field; accumulate
  delta.reasoning_content for DeepSeek thinking-mode echo-back; fold
  preserves system msg and aligns boundary to never orphan role:tool
- base_agent: generic forced-retry via mandatory_record_tools class attr
  (filesystem -> add_phenomenon, timeline -> add_temporal_edge,
  hypothesis -> add_hypothesis, report -> save_report); count via
  executor wrapper
- terminal_tools class attr + loop short-circuit: when a terminal tool
  is called, loop exits with its raw return as final_text. ReportAgent
  declares save_report as terminal - replaces the <answer>-tag stop
  signal that native tool calling broke
- _execute_*: return (raw, formatted) - terminal exit uses untruncated
  raw, conversation history uses 3000-char-capped formatted
- evidence_graph + orchestrator: LLM-derived InvestigationArea support
  (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS /
  _AREA_TOOLS); manual yaml block kept as optional seed
- strip <answer> references from agent prompts (no longer load-bearing)

Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures
(was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via
save_report (was max_iterations regression). 78/78 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 13:51:19 +08:00
BattleTag
0a2b344c84 fix: share _safe_json_loads with tool-call parser, not just orchestrator
Move _safe_json_loads from orchestrator.py to llm_client.py and have
_extract_tool_calls use it when parsing <tool_call> JSON blocks from
model output. orchestrator now imports it from llm_client.

Background: in the first full DeepSeek run (runs/2026-05-12T17-25-38),
~10 'Failed to parse tool call JSON' warnings appeared, all from regex
patterns where the LLM wrote \. or \* inside JSON string values:

  Failed to parse tool call JSON: {..., "pattern": "Outlook Express|...|\.dbx"}
  Failed to parse tool call JSON: {..., "pattern": "ethereal.*\.pcap"}
  Failed to parse tool call JSON: {..., "pattern": "lookatlan.*\.txt|..."}

These are exactly the kind of stray-backslash errors stage-1 sanitize
already handles for orchestrator JSON calls — but tool-call extraction
was using bare json.loads. Result: each failed tool call silently dropped
on the floor, the LLM never got a result, and at least one network agent
burned 14m26s spinning before hitting max_iterations=40.

Now the sanitize/log-on-failure path is shared. Verified against the
three failure cases from yesterday's log: all three now parse cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:29:21 +08:00
BattleTag
76df34ed79 docs: add TODO marker for adaptive edge weights
Note that the hard-coded HYPOTHESIS_EDGE_WEIGHTS table is a temporary
choice; an adaptive scheme should be explored once the full pipeline
is stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:23 +08:00
BattleTag
893f5b5de2 fix: address agent boundary / JSON robustness / Phase 4 no-op from CFReDS run
Issues found running the system end-to-end on the NIST CFReDS Hacking Case
disk image (SCHARDT.001, Mr. Evil). Four interconnected fixes:

1. HypothesisAgent boundary leak (two layers)
   B.1 Tool set: BaseAgent._register_graph_tools was registering
       add_phenomenon / add_lead / link_to_entity for every agent. With
       an empty graph in Phase 2, HypothesisAgent "compensated" by
       inventing phenomena, dispatching leads, and linking entities.
   B.2 Prompt leak: BaseAgent's shared system prompt hard-coded "Call
       investigation tools (list_directory, parse_registry_key, etc.)".
       HypothesisAgent hallucinated list_directory and wasted 2 LLM
       rounds on 'unknown tool' errors before backing off.

   Fix:
   - Split _register_graph_tools into _register_graph_read_tools +
     _register_graph_write_tools.
   - HypothesisAgent, ReportAgent, TimelineAgent override
     _register_graph_tools to skip write tools.
   - HypothesisAgent and TimelineAgent override _build_system_prompt
     with focused, role-specific workflows (no Phase A-D investigation
     boilerplate).

2. JSON parse failures in Phase 3 lead generation (5/6 hypotheses lost)
   DeepSeek emits JSON with stray backslashes (Windows path references)
   and occasional minor syntax slips. Old single-stage sanitize couldn't
   recover; per-hypothesis fallback silently swallowed each failure.

   Fix:
   - _safe_json_loads: progressive — stage 0 as-is, stage 1 escape stray
     \X (anything not in valid JSON escape set), log raw input on final
     failure for diagnosis.
   - New _call_llm_for_json helper: on parse failure, append the error
     to the prompt and re-call LLM (self-correcting retry, up to 2).
   - All 4 LLM-JSON callsites in orchestrator refactored to use it.

3. Phase 1 sometimes skipped add_phenomenon (LLM treated <answer> as deliverable)
   Strengthen BaseAgent's RECORDING REQUIREMENT — explicit "your <answer>
   is DISCARDED; only graph mutations propagate" plus a new rule:
   negative findings (searched X, found nothing) MUST also be recorded
   as phenomena, since they constrain the hypothesis space.

4. Phase 4 Timeline was a no-op
   TimelineAgent inherited BaseAgent's Phase A-D prompt and never called
   add_temporal_edge — produced 0 temporal edges. Override the prompt
   with concrete workflow (build_filesystem_timeline ->
   get_timestamped_phenomena -> 15-40 add_temporal_edge calls) and
   restrict tool set to read-only + its 3 temporal tools.

Verified end-to-end: HypothesisAgent now 8 tools (no writes), ReportAgent
13 (no graph writes), TimelineAgent 10 (read + temporal + timeline).
All 60 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:14:16 +08:00
BattleTag
0a966d8476 feat: switch LLM client to OpenAI SDK for DeepSeek compatibility
The previous LLMClient used raw httpx + Claude Messages API (/v1/messages,
x-api-key, Anthropic SSE event types). Incompatible with DeepSeek.

Rewrite LLMClient.__init__/chat/close to use openai.AsyncOpenAI:
- /v1/chat/completions endpoint, OpenAI message format
- Bearer auth, native SDK error types
- Stream chunks via async for + chunk.choices[0].delta.content

Tool calling protocol (ReAct text-based tags) and all surrounding helpers
(_apply_progressive_decay, _fold_old_messages, _partition_tool_calls,
tool_call_loop, etc.) are unchanged — endpoint-agnostic by design.

New optional config params surfaced to config.yaml.agent:
- reasoning_effort: "high" | "medium" | "low" — DeepSeek/o1-style depth
- thinking_enabled: bool — DeepSeek extra_body.thinking switch

main.py and regenerate_report.py pass these through to LLMClient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 17:13:54 +08:00
BattleTag
31812a72ee test: track tests/ directory in version control
tests/test_optimizations.py — 60 pytest cases covering:
- EvidenceGraph: quality scoring, Jaccard merge, async safety,
  hypothesis confidence updates, asset library
- llm_client: tool-result truncation, parallel batch execution,
  progressive context decay, message folding
- orchestrator: parallel dispatch, batched lead generation,
  batched judging
- tool_registry: result cache key derivation

FakeAgent.run signatures updated to BaseAgent.run(task, lead_id=None).

Previously listed in .gitignore (which is itself untracked, so the
ignore rule lives only locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:10:31 +08:00
BattleTag
74e6bde13a refactor: lead provenance, unified link path, SSOT cleanup, configurable weights
Five interrelated cleanups:

1. Lead -> Phenomenon provenance
   - Phenomenon.from_lead_id field on the dataclass
   - BaseAgent.run(lead_id=...) writes self._current_lead_id
   - _add_phenomenon auto-injects from agent state (LLM unaware)
   - Orchestrator dispatch passes lead.id; Phase 1/2-auto/4/5 stay None
   - Merge path preserves the first non-None lead_id on collision

2. Unified Phenomenon <-> Hypothesis link path
   - HypothesisAgent only adds hypotheses, never links
   - link_phenomenon_to_hypothesis tool + executor removed
   - All links go through Orchestrator._judge_new_phenomena
   - Phase 2 unconditionally judges after hypothesis generation
   - Gap Analysis judges after each dispatch round
   (Three previously-missing judge calls now in place.)

3. SSOT in agent subclasses
   - Remove RoleTemplate dataclass, ROLE_TEMPLATES dict,
     _instantiate_from_template method
   - Each agent subclass owns name, role, and tool list
   - agent_factory.py shrinks from 299 to 153 lines
   - All 7 agents now route through _AGENT_CLASSES (filesystem,
     registry, communication, network, timeline were previously dead
     subclasses overridden by templates)

4. Configurable edge weights
   - HYPOTHESIS_EDGE_WEIGHTS -> _DEFAULT_EDGE_WEIGHTS (private default)
   - EvidenceGraph(edge_weights=...) override via config.yaml
   - hypothesis_edge_weights section in config.yaml (commented example)
   - main.py and regenerate_report.py read and pass through

5. regenerate_report.py auto-picks the latest run/*/graph_state.json
   when no CLI arg is given (was a hardcoded date path)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:10:15 +08:00
BattleTag
fde96c7d9f docs: rewrite README for EvidenceGraph + 5-phase + 7-agent architecture
Previous README described a Blackboard-based 4-phase, 6-agent system.
The actual code uses:
- EvidenceGraph with typed weighted edges (Phenomenon/Hypothesis/Entity)
- 5 phases (explicit Hypothesis Generation between survey and investigation)
- 7 agents (added HypothesisAgent)

Documents the confidence update formula, Phenomenon Jaccard merging,
Asset Library inode dedup, tool-result caching, Gap Analysis coverage
check, auto-persistence, and the resume mechanism.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:09:59 +08:00
30 changed files with 11089 additions and 933 deletions

317
DESIGN.md Normal file
View File

@@ -0,0 +1,317 @@
# MASForensics 系统改造设计
> 目标:把当前「单台 Windows 磁盘取证」系统改造为能处理**多设备、多行为人、
> 异构证据、需跨源关联**的复杂取证系统。本文是唯一的权威设计文档
> (已合并早先的 `REFIT_PLAN.md` / `RESEARCH_DESIGN.md` 两份草稿)。
>
> 触发本次改造的实际案件2025 美亚杯资格赛 Individual —— 5 份证据
> 1 USB E01、1 安卓整盘 `blk0_sda.bin`、3 份 iOS 提取、1 组交易截图),
> 跨 LEUNG YL / CHAN MH / FUNG CC 至少 3 人。
---
## 1. 设计原则(贯穿全文的不变式)
1. **LLM 提议,代码裁决**。LLM 负责语言/分类/感知;它**不持有案件状态、
不产出数值、不写入未经核验的事实**。所有「真相」在符号层。
2. **每条记录的事实都可从一次工具调用重新推导**。结论可被独立复核。
3. **推理核心与设备类型无关**。设备特定逻辑全部位于「能力插件」中;
支持一种新设备 = 写插件,绝不改核心。
4. **看似不可逆的操作(如实体归并)实为可逆、带证据的论断**,可被推翻。
这四条不是口号——下文每个设计决策都对应其中一条。
---
## 2. 现状问题诊断
| # | 问题 | 位置 | 后果 |
|---|---|---|---|
| P1 | **单镜像假设深植**:工具是闭包绑死 `image_path`,图是单源,主程序只选一个镜像 | `tool_registry.py:148` `register_all_tools``main.py:91-153` | 无法摄取多份证据,无法跨设备关联 |
| P2 | **反幻觉只写在提示词里** | `base_agent.py` system prompt | LLM 一旦不听话,错误事实进入案件记录且**事后无法识别** |
| P3 | **置信度公式无统计含义且有序依赖缺陷**`delta=weight*(1-conf)`(正)/`weight*conf`(负),正负边混合时更新结果与边的到达顺序有关 | `evidence_graph.py:26-33` | 置信度不可校准、不可辩护 |
| P4 | **工件分类是 Windows 专属**:靠 hive 名 / `.pf` / `mirc` 关键词 | `tool_registry.py:80-107` `_auto_categorize` | iOS/安卓工件全部落入 `other` |
| P5 | **案件信息硬编码** `cfreds_hacking_case` | `config.yaml:35-50` | 换案即需改代码 |
| P6 | **镜像发现靠扩展名 glob**`.bin` 不在列表 | `main.py:28` `_IMAGE_GLOBS` | `blk0_sda.bin` 不被发现 |
| P7 | **Phenomenon 无来源标注** | `evidence_graph.py:85` `Phenomenon` | 不知道某发现出自哪台设备,跨源关联无锚点 |
改造同时解决「接入新证据」与「修掉 P1-P7 这些固有缺陷」。
---
## 3. 目标架构
```
case.yaml ──► Case ──► N × EvidenceSource
├ id / type / owner / path
└ access_mode: image | tree
┌──────────────┴───────────────┐
image-backed tree-backed
(TSK, inode 寻址) (路径寻址:已挂载/已解包)
│ │
└────────────┬─────────────────┘
SourceRegistry ── source_id → SourceHandle解析 path/offset/mode
ToolRegistry ── 工具按 access_mode 注册,调用时绑定 source_id
┌──────────────────────┼───────────────────────┐
▼ ▼ ▼
Knowledge-Source Graph Write Gateway ToolInvocationLog
Agents (LLM) ──► (唯一写入口,强制 (每次工具调用留痕:
只能经网关写图 前置条件 = grounding args / 输出 / sha256
│ │
└──────────────────────┴──► Grounded Evidence Graph (GEG)
Phenomenon / Hypothesis / Entity
置信度 = 对数几率累加
```
**保留**现有的五阶段流水线、断连恢复、运行归档、工具结果缓存、
`AgentFactory` 动态组合——这些设计是好的,不重写,只适配。
---
## 4. 核心设计
### 4.1 证据源抽象(解决 P1/P5/P6/P7地基
新增 `case.py`
- **`EvidenceSource`** 数据类:`id``label``type``owner`(关联人)、
`path``access_mode``meta`(类型特定,如分区 offset / 解包后根目录)。
- **`Case`**:持有 `list[EvidenceSource]` + 案件元数据,从 `case.yaml` 加载。
- **`access_mode` 是关键设计区分**
- `image`:块设备/磁盘镜像,用 TSK 按 inode 寻址USB E01、安卓 `blk0_sda` 各分区)。
- `tree`已挂载文件系统或已解包目录按路径寻址iOS 提取解压后、归档展开后)。
- 工具按 access_mode 分族注册(见 4.2)。一份证据可经「准备」从 image 变为 tree
(如分区 mount、zip 解包)。
`main.py``select_image_interactive`:91-153改为加载/构造 `Case`
`_IMAGE_GLOBS` 改为类型探测(`mmls` 试探 + 文件头嗅探),不再靠扩展名。
`config.yaml` 删除 `cfreds_hacking_case`,案件信息移入 `case.yaml`
### 4.2 工具注册按源参数化(解决 P1
现状:`register_all_tools(image_path, offset, ...)` 把单一镜像闭包进每个工具
`tool_registry.py:159+`)。改造:
- 工具执行器签名增加 `source_id`;执行时经 `SourceRegistry` 解析出真实 path/offset/mode。
- `TOOL_CATALOG``access_mode` 标注工具适用性agent 拿到的工具集由其
负责的源类型决定。
- **「当前源」上下文**:编排器为 agent 设置 current source类比现有
`graph._current_agent`工具默认作用于它——LLM 不必每次传 `source_id`
(减少出错)。跨源工具(时间线合并、实体查询)显式跨源。
- 缓存键 `_cache_key``tool_registry.py:41`)纳入 `source_id`,防止跨源串味。
### 4.3 图写入网关(解决 P2落实原则 1
现状agent 通过 `add_phenomenon` 等工具直接写图,约束只在 prompt。改造
- 所有图变更(`add_phenomenon` / `add_hypothesis` / `link` / `observe_identity` …)
收敛到**一个写入网关**。网关在代码层强制前置条件。
- 现有 prompt 里的「反幻觉规则」下沉为网关的硬校验。LLM agent 的四阶段工作流
INVESTIGATE→RECORD→LINK→ANSWER不变——变的是 RECORD 这一步底下的网关变严。
- `base_agent.py``mandatory_record_tools` 机制保留(它保证 agent 真的记录了东西)。
### 4.4 证据落地约束 Grounding解决 P2落实原则 2
这是系统可靠性的核心机制。
**ToolInvocationLog**:每次工具调用留痕一条记录
`{invocation_id, source_id, tool, args, output, output_sha256, agent, ts}`
现有结果缓存(`tool_registry.py:29`)已存确定性输出,扩展为完整留痕即可。
**Phenomenon 一分为二**——把「事实」和「解读」分开:
- `verified_facts`: `list[{type, value, invocation_id}]`
`type ∈ {path, timestamp, inode, hash, identifier, count, ...}`
- `interpretation`: 自由文本agent 的分析叙述。
**`add_phenomenon` 网关前置条件**
1. 每个 fact 必须引用一次**本 agent 本任务内真实发生过的** `invocation_id`
2. 代码校验 `fact.value` 命中该次调用的输出:
- 文本输出 → 逐字 substring 匹配;
- 结构化/二进制工具输出 → 与解析后的字段匹配。
3. 任一 fact 不通过 → **整条拒绝写入**,返回失败的 factagent 须修正重试。
4. 通过 → 写入;`verified_facts` 每条带 `invocation_id`(可重跑复核),
`interpretation` 标记为「未核验分析」。
**效果**:在系统里「记录一条工具输出未支撑的路径/时间戳/哈希/标识符」
**结构性地不可能**。LLM 仍可能写错 `interpretation`,但报告会把
verified facts带重跑指令的引证与 interpretation明确标注的分析
**分开渲染**,人类调查员一眼可辨。这是诚实划定边界的可靠性保证。
> 现有 `_make_auto_record``tool_registry.py:126`)把工具输出直接转 phenomenon——
> 那是「平凡落地」的特例(描述即输出),新设计是它的一般化与形式化。
### 4.5 假设置信度:似然比 / 对数几率(解决 P3
`evidence_graph.py:26``_DEFAULT_EDGE_WEIGHTS` 从「拍脑袋的 delta」
换成基于**似然比LR**的对数几率累加:
- 每条 `Phenomenon → Hypothesis` 边代表一个似然比。LLM 仍只做**离散分类**
(这条证据对这条假设是 direct_evidence / supports / weakens / contradicts …),
数值 `log₁₀(LR)` 由标定表查得——**LLM 绝不吐数字**延续现有「LLM 选类型、
代码算数值」哲学并赋予统计基础)。
- 置信度更新:
```
L_post = L_prior + Σ log₁₀(LR_i) # 对数几率,可交换 → 无序依赖
confidence = 1 / (1 + 10^(L_post))
```
- 边类型 → `log₁₀(LR)` 标定表(初值,后续可由标注案例校准):
| 边类型 | log₁₀LR |
|---|---:|
| `direct_evidence` | +2.0 |
| `supports` / `consequence_observed` | +1.0 |
| `prerequisite_met` | +0.5 |
| `weakens` | 0.5 |
| `contradicts` | 2.0 |
- 阈值不变≥0.8 supported / ≤0.2 refuted只是改由 `L_post` 推出。
- `prior_prob` 成为可配置量(默认 0.5 → `L_prior=0`)。
- **同类证据调和衰减**2026-05 落地):同 `(hypothesis, edge_type)` 的第 k 条边
贡献 `log_lr_base / k`。累计 = `log_lr_base · H_N`(调和级数,~ ln N
解决朴素贝叶斯独立性破产 + 同一发现被多 agent 重复入图导致 L=+31 的失控
2026-05-20 实战数据。单条边不变k=1, 衰减=1.0)。**结构信号**比绝对值
更重要strategist 看 `distinct_sources` 比看 confidence 数值更能判断证据厚度。
附带产出一个 **假设 × 证据矩阵**视图,供报告与线索选择使用。
### 4.6 跨源实体解析(解决「复杂场景」的关联难题,落实原则 4
复杂取证的核心难题iPhone keychain 里的 Apple ID、安卓短信库里的号码、
USB 文件作者、交易截图里的钱包地址——**哪些指向同一行为人?**
**关键设计:「身份共指」本身就是一条假设**——于是实体解析不是独立子系统,
而是 4.5 假设机制的复用:
- agent 观察到标识符即经网关 `observe_identity`,记一条**类型化**的标识符
强标识符IMEI / 钱包地址 / email / 电话号;弱标识符:昵称 / 显示名),
挂到暂定 `Entity`。
- 「Entity A ≡ Entity B」登记为一条 `Hypothesis`;共享强标识符 = 强 +LR 边,
共享弱标识符 = 弱 +LR 边,冲突的强标识符 = 强 LR 边——用 4.5 同一套计算打分。
- **不做破坏性归并**:跨阈值时在两个 Entity 间加一条 `same_as` 边(由该 coref
假设背书)。查询时把 `same_as` 连通分量视作同一行为人。**完全可逆、可审计、
可被后续 contradicts 证据推翻**(落实原则 4
- **Blocking**:只在「至少共享一个标识符或名称高相似」的实体对间建 coref 假设,
避免 O(n²)。
跨设备时间线、「谁在何时做了什么」由 `same_as` 连通后的实体图自然涌现。
### 4.7 能力插件层(接入 5 类证据)
每类证据 = 一个 `(摄取 handler, 工具集, 知识源 agent)` 三元组。推理核心不动。
| 插件 | 摄取 | 新工具 | 知识源 agent |
|---|---|---|---|
| **iOS 提取** | `unzip` 解包为 `tree` 源 | `parse_plist`(含二进制 plist)、`sqlite_tables`/`sqlite_query`(sms.db、WhatsApp `ChatStorage.sqlite`、通讯录)、`parse_ios_keychain`、`read_idevice_info` | `iOSArtifactAgent` |
| **安卓整盘** | `mmls` 分区→各分区 `image` 源;可 mount 为 `tree` | 复用 TSKext4/F2FS 读取;`fsstat` 探明加密 | 复用 filesystem + `AndroidArtifactAgent` |
| **磁盘镜像(E01)** | 已支持TSK 含 ewf | 现有 TSK 工具链 | 现有 filesystem/registry |
| **归档** | `unzip_archive` 通用解包 | —— | —— |
| **媒体/截图** | —— | `ocr_image`tesseract注意 DeepSeek 无视觉能力,必须走 OCR | `MediaAgent` |
**安卓风险**`blk0_sda` 的 `userdata` 分区大概率 FBE 加密。先 `fsstat` 各分区
探明未加密→TSK 直接用;加密且无密钥→只能分析 `EFS`/`PARAM`/`system` 等非加密区。
`tool_registry.py:80` 的 `_auto_categorize` 改为可扩展:分类由源插件提供自己的
工件分类表,而非全局 Windows 关键词表(解决 P4
### 4.8 Agent 体系重组
现有 7 个 agent 按 Windows 工件命名registry、communication=邮件/IRC、
network=浏览器/PCAP。改为按**调查职能**组织,并增加平台特定 agent
- `agent_factory.py` 的 `_AGENT_CLASSES`:34-40扩充新增 `ios_artifact`、
`android_artifact`、`financial`(钱包/交易)、`media`。
- `communication` 泛化:邮件 + IM + 短信,跨平台。
- 新增 **源类型 → 适任 agent** 映射,供 Phase 1 逐源派 triage agent。
- `create_specialized_agent`:69的动态组合机制保留——它本就是应对能力缺口的
正确手段,只是工具目录变大后选择空间更丰富。
### 4.9 编排器多源流水线
| 阶段 | 改造 |
|---|---|
| Phase 1 | 「单镜像初勘」→ **逐源并行 triage**,每源派类型适配的 agent |
| Phase 2 | 假设跨源生成;身份共指假设在此首次登记 |
| Phase 3 | **Strategist 循环**LLM 元 agent 每轮看图决定 propose_lead 或 declare_completeworkers 执行 leadhypothesis 边重判 — 详见 `DESIGN_STRATEGIST.md` |
| Phase 4 | 跨源时间线合并,**按源做时区归一**iOS UTC vs 安卓本地时间) |
| Phase 5 | 一案一份综合报告:含假设结论、实体关联图、每条结论的 provenance 引证 |
**Phase 3 的"LLM 决定深度"**2026-05 实战暴露 Phase 3 单轮触发 + log-odds 通胀致使 8 个 pending leads 一个未派发后落地):调度层从代码硬决策("max_rounds=N, converged→stop")转为 LLM 元 agent 驱动。
- 新 agent `InvestigationStrategist``agents/strategist.py`每轮取一个动作propose 1-3 lead或 declare_investigation_complete
- 4 个只读视图工具:`graph_overview` / `source_coverage` / `marginal_yield` / `budget_status``tools/strategy.py`)让 LLM 看到调度信号
- 2 个写入决策工具:`propose_lead` / `declare_investigation_complete` 是 strategist 的 mandatory_record
- 编排器读 `config.yaml:strategist.*` + `config.yaml:budgets.*` 控制 max_rounds 和 hard caps
- 看 `[[DESIGN_STRATEGIST]]` 获取完整数据模型、prompt 设计、断连恢复、风险/缓解
断连恢复、运行归档逻辑保留;`graph_state.json` 新增 `investigation_rounds[]` 数组持久化 strategist 每轮决策。
---
## 5. 数据模型变更汇总
| 节点/结构 | 变更 |
|---|---|
| `EvidenceSource` | **新增**一等节点(`src-*` |
| `ToolInvocation` | **新增**留痕记录(`inv-*`),随 graph 持久化 |
| `Phenomenon` | + `source_id`description 拆为 `verified_facts[]` + `interpretation`;澄清/移除语义含混的 `confidence`(默认 1.0),观测的可靠性由 grounding 表达 |
| `Hypothesis` | + `prior_prob`、`log_odds`(累加量);`confidence` 改为派生值 |
| `Entity` | + 类型化标识符集合;通过 `same_as` 边跨源连通 |
| Phenomenon→Hypothesis 边 | 携带 `edge_type`,映射到 `log₁₀(LR)`(替换 `_DEFAULT_EDGE_WEIGHTS`);同 `(hyp, edge_type)` 的第 k 条边按 `1/k` 调和衰减 |
| Entity→Entity 边 | **新增** `same_as`(由 coref 假设背书,可逆) |
| `Lead` | + `proposed_by` / `motivating_hypothesis` / `expected_evidence_type` / `round_number`strategist 注解) |
| `InvestigationRound` | **新增**strategist 每轮决策的 provenance + before/after 快照 + 收益指标 |
`evidence_graph.py` 的 `VALID_EDGE_TYPES`、序列化/反序列化、Jaccard 去重相应适配。
---
## 6. 组件改动清单
| 文件 | 改动 |
|---|---|
| `case.py` | **新建**`Case` / `EvidenceSource` / `SourceRegistry` |
| `main.py` | 选源逻辑改为加载 `Case`;类型探测替代扩展名 glob |
| `tool_registry.py` | 工具按 `source_id` 参数化;缓存键含 source`_auto_categorize` 改可扩展;`ToolInvocationLog` |
| `evidence_graph.py` | 数据模型变更(第 5 节LR/对数几率置信度;写入网关 + grounding 校验 |
| `base_agent.py` | RECORD 走网关;`add_phenomenon` 改为 `verified_facts`+`interpretation` 接口 |
| `agent_factory.py` | `_AGENT_CLASSES` 扩充源类型→agent 映射 |
| `orchestrator.py` | Phase 1 逐源Phase 4 跨源时区归一Phase 5 综合报告 |
| `agents/` | 新增 `ios_artifact.py` / `android_artifact.py` / `financial.py` / `media.py``communication.py` 泛化 |
| `tools/` | 新增 `mobile_ios.py`plist/sqlite/keychain、`media.py`OCR、`archive.py`(解包) |
| `config.yaml` / `case.yaml` | 删除 `cfreds_hacking_case`;新建 `case.yaml` 证据清单 |
---
## 7. 构建顺序(按依赖排序)
| 阶段 | 内容 | 依赖 | 价值 |
|---|---|---|---|
| **S1** | 4.1 证据源抽象 + 4.2 工具参数化 + 修 P6 | —— | 地基;先只在 USB E01 上跑通验证不破坏现有逻辑 |
| **S2** | 4.3 写入网关 + 4.4 grounding + ToolInvocationLog | S1 | 可靠性核心;可量化「零幻觉录入」 |
| **S3** | 4.5 LR/对数几率置信度 | 独立(可与 S2 并行) | 修 P3置信度可辩护 |
| **S4** | 4.7 iOS 插件 + 4.8 agent 重组 | S1 | 覆盖率 1/5 → 4/5 |
| **S5** | 4.6 跨源实体解析 | S1+S3 | 跨设备关联,复杂场景能力成型 |
| **S6** | 4.7 安卓 + 媒体插件 + 4.9 编排器适配 | S1+S4 | 全 5 份证据接入 |
S1+S2+S3 是「把系统改对」S4-S6 是「把能力铺全」。建议严格按序——
S1 不稳,后面全是空中楼阁。
---
## 8. 设计取舍与未决问题
1. **grounding 对自由文本的边界**:只硬核验 `verified_facts` 里的结构化原子,
`interpretation` 不做逐字核验(诚实划界)。可加一个二级 lint扫描
interpretation 中形似路径/时间戳/哈希但未被任何引用调用覆盖的串并告警。
2. **LR 标定表初值人定**:先用第 4.5 节的初值跑通;「从标注案例学习 LR」是后续工作。
3. **安卓 userdata 加密**:能否取得解密密钥决定 4.7 安卓插件的证据深度——需尽早探明。
4. **实体解析的破坏性 vs 可逆**:本设计选**可逆的 `same_as` 边**而非破坏性归并——
牺牲一点查询效率换取完全可审计可回滚,符合原则 4。
5. **报告粒度**:定为「一案一份综合报告」,内嵌每证据小节 + 跨源关联,
而非每证据独立成篇。

543
DESIGN_STRATEGIST.md Normal file
View File

@@ -0,0 +1,543 @@
# Strategist Loop —— Phase 3 信念驱动改造
> 这是 DESIGN.md 的补充设计文档,针对 §4.9 编排器 Phase 3 的具体重写。
>
> **触发动因**2026-05-20 第一次全 6-source 实战(`runs/2026-05-20T20-15-04/`
> 暴露 Phase 3 不工作——8 条 pending leads 一个都没派发,因为
> log-odds 通胀让所有 hypothesis 立即 converged。即使在「调和衰减」修复
> log-odds 数学后commit 在 `evidence_graph.py:update_hypothesis_confidence`
> Phase 3 在当前架构下仍然是「单轮触发、规则收敛」的机械流程——LLM
> 在调度层完全没有发言权。本设计把 Phase 3 改为 LLM 驱动的探索循环。
---
## 0. 范围
### 做什么
`orchestrator.py:Phase 3` 从「单轮、规则触发」改造为「strategist-loop、信念驱动」
新增一个 `InvestigationStrategist` agent + 4 个决策视图工具 + 2 个决策动作工具
+ 编排器循环改写。
### 不做什么
- 不改 Phase 1per-source triage 保持现状)
- 不改 Phase 2HypothesisAgent 不动strategist 可以**调用**它,但不替代)
- 不改 Phase 4/5timeline / report
- 不写专家级 per-source 检查清单(只在 `source_coverage` 工具里塞**软提示**清单)
- 不引入新的图节点类型leads 复用现有结构
### 保留的不变式
- DESIGN.md §4.3 grounding 网关,所有写入仍走它
- DESIGN.md §4.5 log-odds + 调和衰减
- DESIGN.md §4.4 verified_facts vs interpretation 划界
- 断连恢复(`graph_state.json` 序列化兼容)
### 设计原则
1. **"LLM 提议,代码裁决" 上移到调度层**DESIGN.md 第一原则现在只在事实层
grounding兑现调度层「该不该深入、深入哪里、何时停」目前是代码硬决策。
本设计让 LLM 持有调度决策权。
2. **应试能力存在但不被绑死**:系统的工具集和软提示清单覆盖应试场景所需的工件
类别;但是否查某个工件、查到什么深度,由 strategist 看具体案件性质决定,
不被预定义清单强制。
3. **可解释、可审计**:每一轮 strategist 决策、动机、产出收益都被记入持久化的
`InvestigationRound`,可事后复盘。
---
## 1. 数据模型变更
### 1.1 `Lead` 扩 4 字段
`evidence_graph.py:Lead` 现有 `(id, title, description, target_agent, source_id, status, …)`
新增:
```python
@dataclass
class Lead:
# ... existing fields
proposed_by: str = "" # "strategist" | "filesystem" | ... — 提案 agent
motivating_hypothesis: str = "" # hyp-id this lead is meant to corroborate/refute
expected_evidence_type: str = "" # one of edge_types — 期望产出的边类型
round_number: int = 0 # 哪一轮 strategist 产生
```
`motivating_hypothesis` 是关键——它把 lead 和 hypothesis 显式挂钩,让事后能算
"这条 lead 跑完到底有没有改变假设状态",即 strategist 的边际收益度量。
### 1.2 新增 `InvestigationRound` 节点
记录每一轮 strategist 的决策本身——provenance 也要可审计:
```python
@dataclass
class InvestigationRound:
id: str # "round-001"
round_number: int
started_at: str
completed_at: str = ""
strategist_action: str = "" # "propose_leads" | "declare_complete"
leads_proposed: list[str] = field(default_factory=list)
leads_executed: list[str] = field(default_factory=list)
hypothesis_status_snapshot_before: dict = field(default_factory=dict) # hyp_id → status
hypothesis_status_snapshot_after: dict = field(default_factory=dict)
new_phenomena_count: int = 0
new_edges_count: int = 0
decision_rationale: str = "" # strategist 自述
```
随 graph 序列化(加进 `to_dict`/`from_dict`)。
---
## 2. 新工具
放在新文件 `tools/strategy.py`。按现有 `TOOL_CATALOG` 注册模式登记。
### 2.1 `graph_overview()` — 全局态势(只读)
**Signature**: `graph_overview() -> str`
**输出**markdown比 JSON 更易 LLM 解读):
```markdown
# Investigation State
## Hypotheses (8)
| id | title | L | conf | status | edges_in | distinct_sources | flipped_in_last_2_rounds |
|----|-------|---|------|--------|----------|------------------|---------------------------|
| hyp-83db8748 | Multi-Device Composite | +8.75 | 0.99 | supported | 23 | 1 | no |
| hyp-daa7c704 | Multiple Identity Aliases | +9.21 | 0.99 | supported | 11 | 3 | no |
| hyp-7fa9b13e | Sunny.zip contains timer_a | +2.08 | 0.99 | supported | 4 | 1 | yes (active→supported in R2) |
| ...
## Sources (6)
| id | type | phenomena | identities | last_touched_in_round |
| src-usb-leung | disk_image | 8 | 1 | R1 |
| ...
## Pending Leads (3)
| id | from | targeting | for_hypothesis | reason |
| lead-aaa | filesystem | src-ios-chan/Safari | hyp-83db8748 | Safari history likely contains device-switching evidence |
```
**关键标注**`distinct_sources` 一栏暴露了"这个假设只靠一个源支撑"——strategist
看到 23 边都来自 android 源会自动判断"需要从别处独立证据"。
### 2.2 `source_coverage(source_id: str)` — 单源覆盖度(只读)
**Signature**: `source_coverage(source_id: str) -> str`
**实现**:扫 `graph.tool_invocations`,过滤 `source_id == 该源`,按工具名 + 主要 args
分组。然后跟 `EXPECTED_ARTEFACTS[source_type]` 比对,未触达项打 ✗。
```python
# tools/strategy.py
EXPECTED_ARTEFACTS: dict[str, list[dict]] = {
"disk_image+windows": [
{"name": "filesystem layout", "detector": "fls|mmls", "value_for": "deleted files, hidden partitions"},
{"name": "registry hives", "detector": "parse_registry_key", "value_for": "user activity, installed software"},
{"name": "browser history", "detector": "list_directory@AppData/.../History", "value_for": "URL access, downloads"},
{"name": "prefetch", "detector": "extract_file@Windows/Prefetch", "value_for": "program execution evidence"},
# ...
],
"mobile_extraction": [
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb", "value_for": "contacts"},
{"name": "SMS messages", "detector": "sqlite_query@sms.db", "value_for": "messaging content"},
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite", "value_for": "WhatsApp content"},
{"name": "Call history", "detector": "sqlite_query@CallHistoryDB", "value_for": "call records"},
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text@Bookmarks.plist", "value_for": "web browsing"},
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite", "value_for": "photo metadata, EXIF, geolocation"},
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3.sqlite|parse_keychain", "value_for": "Apple ID, services"},
{"name": "App inventory", "detector": "list_directory@var/containers/Bundle/Application", "value_for": "installed apps"},
],
"disk_image+android": [...],
"media_collection": [
{"name": "OCR text", "detector": "ocr_image", "value_for": "screenshot text"},
{"name": "EXIF metadata", "detector": "exif_image", "value_for": "device, timestamps, geolocation"},
],
}
```
**软提示语义**output 末尾必带一句:
> Coverage hints are heuristics, not requirements. Skip an item if the case theory
> makes it irrelevant. Investigate ✗ items only when they could materially affect
> an active hypothesis.
这一句是**"应试能力存在但不被绑死"的关键**——LLM 看到 ✗ 不会盲投,会先看
hypothesis 列表问"这个工件对当前任何 hypothesis 有意义吗"。
### 2.3 `marginal_yield(last_n_rounds: int = 2)` — 边际收益(只读)
**Signature**: `marginal_yield(last_n_rounds: int = 2) -> str`
**实现**:扫最近 N 个 `InvestigationRound`,统计:
- 每轮新增 phenomena 数
- 每轮新增 P→H 边数
- 每轮 hypothesis status flips 数active→supported / 反向)
**输出**
```markdown
# Marginal Yield (last 2 rounds)
| round | new_phenomena | new_edges | status_flips |
| R3 | 5 | 7 | 1 |
| R4 | 2 | 1 | 0 |
Trend: decelerating (R4 yield 33% of R3).
Recommendation interpretation aid: yield trending to zero suggests diminishing
returns; consider declare_complete after one more probe.
```
最后一行是 LLM-friendly heuristic prose不是强制信号。
### 2.4 `budget_status()` — 预算视图(只读)
**Signature**: `budget_status() -> str`
```markdown
# Budget Status
| metric | used | cap | pct |
| tool_calls | 1248 | 5000 | 25% |
| strategist_rounds | 3 | 10 | 30% |
| wall_clock_minutes | 142 | 360 | 39% |
Phase 1 used 89% of allocated. Phase 2 used 4%. Phase 3 (strategist) so far: 7%.
```
预算从 config.yaml 读,新增字段见 §6。无预算配置时进 unbounded 模式(仅靠
strategist 自宣 complete + hard safety cap
### 2.5 决策动作工具(写入)
注册到 strategist 的 `mandatory_record_tools`。Strategist 每轮必须 call 至少一个,
否则 forced-retry 触发(复用现有机制)。
**`propose_lead(...)`**
```python
{
"name": "propose_lead",
"input_schema": {
"type": "object",
"required": [
"description", "target_agent",
"motivating_hypothesis", "expected_evidence_type",
],
"properties": {
"description": {
"type": "string",
"description": "1-2 sentence specific investigation request, including target source/artefact",
},
"target_agent": {
"type": "string",
"enum": ["filesystem","registry","communication","network","ios_artifact","android_artifact","media"],
},
"source_id": {"type": "string", "description": "which source to investigate"},
"motivating_hypothesis": {
"type": "string",
"description": "hyp-id this lead is meant to corroborate or refute",
},
"expected_evidence_type": {
"type": "string",
"enum": ["direct_evidence","supports","contradicts","weakens","prerequisite_met","consequence_observed"],
},
"rationale": {"type": "string", "description": "why this fills a real gap"},
}
}
}
```
**`declare_investigation_complete(...)`**
```python
{
"name": "declare_investigation_complete",
"input_schema": {
"type": "object",
"required": ["reason"],
"properties": {
"reason": {
"type": "string",
"enum": [
"marginal_yield_zero",
"budget_exhausted",
"all_hypotheses_resolved",
"coverage_saturated",
"other",
],
},
"rationale": {"type": "string"},
}
}
}
```
Terminal tool —— 调用即结束循环(复用现有 `terminal_tools` 机制)。
---
## 3. `InvestigationStrategist` agent
新文件 `agents/strategist.py`,约 150 行。
```python
class InvestigationStrategist(BaseAgent):
name = "strategist"
role = (
"You are the investigation strategist. You do not run forensic tools yourself. "
"Your job is to read the current evidence graph and decide ONE of:\n"
" (a) propose 1-3 new investigation leads that would materially affect an active hypothesis, or\n"
" (b) declare the investigation complete.\n"
"\n"
"Use graph_overview / source_coverage / marginal_yield / budget_status to ground your judgment. "
"DO NOT propose a lead that just adds more same-direction evidence to an already-supported hypothesis "
"(harmonic damping makes it ~useless). DO propose leads when:\n"
" - A hypothesis is supported by edges from only ONE source — get cross-source corroboration.\n"
" - A hypothesis is in the active band (0.2 < conf < 0.8) — it needs the deciding evidence.\n"
" - A specific high-value artefact is uncovered on a source where the active hypotheses suggest it matters.\n"
"\n"
"Declare complete when marginal_yield is approaching zero AND no remaining active hypotheses have "
"obvious investigation paths."
)
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
terminal_tools = ("declare_investigation_complete",)
def _register_graph_tools(self):
# Read-only tools — strategist NEVER writes phenomena/edges directly.
# All graph writes happen via the workers it dispatches.
self._register_graph_read_tools()
# No graph_write_tools.
# Add strategy-specific tools:
for tool_name in (
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
"propose_lead", "declare_investigation_complete",
):
td = TOOL_CATALOG[tool_name]
self.register_tool(td.name, td.description, td.input_schema, td.executor)
```
注册到 `agent_factory._AGENT_CLASSES["strategist"]`
---
## 4. 编排器改造
### 4.1 删除/替换:现在的 Phase 3
`orchestrator.py:Phase 3` 当前逻辑(约 150 行):检查 leads → 派 worker →
检查 converged → 退出。**删除**。
### 4.2 新 Phase 3strategist loop
```python
async def _phase3_strategist_loop(self, run_dir: Path) -> None:
"""Belief-driven investigation: strategist proposes, workers execute, repeat."""
_log("Phase 3: Strategist-Driven Investigation", event="phase")
strategist = self.factory.get_or_create_agent("strategist")
max_rounds = self.config.get("budgets", {}).get("strategist_rounds_max", 10)
for round_num in range(1, max_rounds + 1):
# 1. Record round start + snapshot
rid = await self.graph.start_investigation_round(round_num)
# 2. Strategist run
_log(f"Strategist Round {round_num}", event="phase")
await strategist.run(
f"Review the graph and decide the next investigation action. "
f"This is round {round_num}/{max_rounds}. Budget used so far: see budget_status."
)
# 3. Did strategist declare complete?
if self.graph.is_round_terminal(rid):
_log(f"Strategist declared complete at round {round_num}", event="progress")
break
# 4. Collect new leads proposed this round
new_leads = self.graph.leads_from_round(round_num)
if not new_leads:
_log(f"No leads proposed in round {round_num} — stopping", event="progress")
break
# 5. Dispatch each lead
for lead in new_leads:
await self._execute_lead(lead, round_num)
# 6. Close round + record yield
await self.graph.complete_investigation_round(rid)
# 7. Hard budget check
if self._budget_exceeded():
_log(f"Budget exhausted at round {round_num}", event="progress")
break
```
### 4.3 `_execute_lead` 复用现有 worker 派发逻辑
```python
async def _execute_lead(self, lead: Lead, round_num: int) -> None:
agent_type = AGENT_ALIASES.get(lead.target_agent, lead.target_agent)
worker = self.factory.get_or_create_agent(agent_type)
if worker is None:
logger.warning(f"No worker for lead {lead.id}: {agent_type}")
return
src = self.graph.case.get_source(lead.source_id) if lead.source_id else None
if src:
self.graph.set_active_source(src)
_log(
f"Round {round_num} dispatching: {lead.description}",
event="dispatch", agent=agent_type,
)
await worker.run(
f"Investigate this specific lead from the strategist:\n\n"
f"REQUEST: {lead.description}\n"
f"MOTIVATING HYPOTHESIS: {lead.motivating_hypothesis}\n"
f"EXPECTED EVIDENCE TYPE: {lead.expected_evidence_type}\n"
f"RATIONALE: {lead.rationale}\n\n"
f"After investigating, record findings via add_phenomenon AND link relevant phenomena "
f"to {lead.motivating_hypothesis} via the appropriate edge_type."
)
lead.status = "completed"
self.graph._auto_save()
```
### 4.4 自动 hypothesis 重生成(可选,建议加)
新增 phenomena 可能产生**新假设**(不只是更新现有假设)。让 strategist 用
`propose_lead(target_agent="hypothesis", description="re-examine recent phenomena for new hypotheses")`
显式触发——这是 strategist 自决定的,不是定时触发。一致性优于自动定时。
---
## 5. 状态持久化
`graph_state.json` 新增顶层 key `investigation_rounds: list[InvestigationRound]`
`save_state` / `load_state` 处理。**断连恢复**时:
- 找最近一个未 completed 的 round → 视为该 round 失败
- 从下一个 round 重新开始
- 已完成 round 的 phenomena / edges 自然保留
---
## 6. 配置
`config.yaml` 新增:
```yaml
strategist:
enabled: true # false = 走老 Phase 3 逻辑safety fallback
max_rounds: 10
hard_stop_marginal_yield_zero_rounds: 3 # 连续 3 轮 yield=0 强制停
budgets:
tool_calls_total: 5000
wall_clock_minutes_max: 480
```
---
## 7. 测试策略
新文件 `tests/test_strategist.py` 或加入 `test_optimizations.py`。最少要测:
1. Strategist 调 `declare_complete` 时 loop 立即退出
2. Strategist 调 `propose_lead` 时 lead 入 graph 且 round_number 正确
3. Round snapshot 正确捕获 before/after status
4. 预算耗尽时即使 strategist 还想继续也强制停
5. 断连恢复:中途中断后重启从下一 round 开始
6. `graph_overview` 输出包含 `distinct_sources` 标注
7. `source_coverage` 对未触达项标 ✗
8. `marginal_yield` 数字与 `confidence_log` 一致
不写 LLM 集成测试——strategist 行为通过 mock LLM 验证(已有这种模式见
`test_forced_record_retry_fires_when_zero_phenomena`)。
---
## 8. 实施顺序
按依赖排(**每步独立 commit**——结构性改造,单点回滚关键):
| 步 | 内容 | 依赖 | 工作量估算 |
|---|---|---|---|
| 1 | `Lead` 加 4 字段 + `InvestigationRound` 数据类 + 序列化 | — | 60 行 + 测试 |
| 2 | `graph_overview` / `source_coverage` / `marginal_yield` / `budget_status` 实现 | 1 | 250 行 + 测试 |
| 3 | `propose_lead` / `declare_investigation_complete` 工具 | 1 | 80 行 + 测试 |
| 4 | `InvestigationStrategist` agent class | 2, 3 | 120 行 + 测试 |
| 5 | 编排器 Phase 3 重写 | 4 | 150 行(替换 ~50 行旧)+ 测试 |
| 6 | config schema + 加载逻辑 | 5 | 30 行 |
| 7 | 断连恢复处理 | 5 | 40 行 + 测试 |
| 8 | 真实案件 smoke run小规模USB only | 7 | 0 代码 |
| 9 | 文档DESIGN.md §4.9 改写 + 本文件归档 | 8 | 文档 |
总:~800 行新代码 + 测试 + 文档。
---
## 9. 风险 + 缓解
| 风险 | 缓解 |
|---|---|
| Strategist 太保守(永远 declare_complete | 加 prompt 例子展示什么是"该深入的情况";测试时小样本验证 |
| Strategist 太激进(每轮都 propose 7+ leads | `propose_lead` 工具 schema 限制每轮最多 3-5 个prompt 强调"重质不重量" |
| 单 worker 跑不完 lead 导致预算雪崩 | worker 调用本身 max_iter 不变strategist 预算独立 |
| LLM 不理解 `distinct_sources` 这种暗示 | `graph_overview` 末尾加 1-2 句 plain-English 解读 "Hypothesis X has 23 edges but all from one source → cross-source corroboration would strengthen it" |
| Phase 1 触发产生的 leads 被 strategist 忽略 | strategist prompt 明确"先处理已有 pending leads再产新的" |
| 死循环strategist 反复产同样 lead | Lead 表上加 `(motivating_hyp, expected_type, source_id)` 三元组去重 |
| `EXPECTED_ARTEFACTS` 清单维护成本 | 故意保持"软提示"——清单不完整也不会破,只是某些深度需要更多 LLM 自觉 |
---
## 10. 开放问题
1. **InvestigationRound 该不该自己跑 hypothesis agent**
倾向 strategist 用 lead 显式触发(一致性更好),不做定时触发。
2. **预算超用怎么办——硬停 vs 软警告?**
当前设计硬停;可加 "strategist 看到 budget < 10% 时只能 declare_complete"
的 schema enforcement。
3. **跨 source 边的"独立性奖励"是否纳入 log-odds**
上次衰减用了 `1/k`,没区分跨源 vs 同源。如果要纳入,公式应改为
`1/k_within_source × bonus_for_distinct_sources`。这是后续单独工程。
4. **Strategist 输出的 `rationale` 该不该走 grounding**
它不会写 phenomena`rationale` 字段可能包含具体值
"based on inv-12345...")。倾向不强制——这是元层判断,不是事实落地。
5. **现 Phase 3 的 `max_investigation_rounds` config 留还是删?**
建议留作 `strategist.enabled=false` 时的 fallback 旋钮。
---
## 11. 与 DESIGN.md 的关系
本文档落地后DESIGN.md 需要的对应更新:
- **§4.5**:补一段「同时也要看 log_odds 的**结构**——edges_in 数 / distinct_sources
是 strategist 判断是否深入的关键信号,不只是 confidence 数值」
- **§4.9 Phase 3**表格内容从「leads 派发到源感知 agent」改为
「strategist 循环:看图、提案、执行、复盘、停 / 续」
- **§8**(设计取舍):新增第 6 条:「调度层 LLM 化的取舍——strategist 决定深度,
但每轮预算受 `budgets.*` 硬限制;这是"LLM 提议、代码裁决"原则在调度层的兑现」
---
## 12. 备忘:本设计**不解决**的问题
- 应试题 8% 命中率的根因是**工具集不全**(无 vision、无 ZIP 暴力破解、无 VeraCrypt
挂载、无 blockchain explorer不是调度问题。strategist 让现有工具被用得更狠,
但不会凭空多出工具。
- LLM 编造 `invocation_id`(已修补,见 `feedback_grounding_pending` memory
log-odds 通胀(已修补:调和衰减)是本设计的**前置依赖**,不在本设计范围内。
- Per-edge-type 的更精细贝叶斯建模(如跨源独立性 bonus是独立工程。

238
README.md
View File

@@ -2,43 +2,120 @@
Multi-Agent System for Digital Forensics — 基于大语言模型的多智能体电子取证系统。
系统通过 6 个专业化 Agent 协同工作,对磁盘镜像进行自动化取证分析,最终生成结构化的取证报告。
系统通过 7 个专业化 Agent 协同工作,对磁盘镜像进行自动化取证分析,最终生成结构化的取证报告。Agent 之间不直接通信,通过共享的 **EvidenceGraph**(证据知识图)协作。
## 架构
```
main.py 入口:配置加载、恢复检测、运行管理
main.py 入口:配置加载、镜像选择、断连恢复
├── Orchestrator 阶段流水线调度
├── Orchestrator 阶段流水线调度
│ │
│ ├── FileSystemAgent 磁盘结构、文件系统、删除文件、Prefetch
│ ├── RegistryAgent 注册表分析(系统/用户/网络/软件)
│ ├── CommunicationAgent 邮件、IRC 聊天记录
│ ├── FileSystemAgent 分区/文件系统、目录、删除文件、Prefetch
│ ├── HypothesisAgent 生成假设,链接已有证据
│ ├── RegistryAgent 注册表分析SYSTEM/SOFTWARE/SAM/NTUSER.DAT
│ ├── CommunicationAgent 邮件、IRC/mIRC 聊天记录
│ ├── NetworkAgent 浏览器历史、PCAP 抓包
│ ├── TimelineAgent 跨类别时间线关联
│ └── ReportAgent 综合报告生成
├── Blackboard 共享知识库Evidence + Lead
── LLMClient Claude API 调用ReAct 模式)
├── EvidenceGraph 带类型边的证据知识图(自动持久化
── AgentFactory 角色模板 + 动态 Agent 组合
├── ToolRegistry 工具目录 + 结果缓存
└── LLMClient Claude API 客户端异步、tool-use
```
Agent 之间不直接通信,通过 **Blackboard黑板** 共享发现Evidence和线索Lead
## EvidenceGraph证据知识图
## 调查流程
三类节点 + 类型化加权边:
| 节点 | 前缀 | 含义 |
|---|---|---|
| `Phenomenon` | `ph-*` | 可观测的取证产物(一条具体发现) |
| `Hypothesis` | `hyp-*` | 解释性假设(待验证的论断) |
| `Entity` | `ent-*` | 人、程序、主机、IP 等可复现的实体 |
Phenomenon → Hypothesis 的边类型与权重写死在 `HYPOTHESIS_EDGE_WEIGHTS`
# TODO
当前流程跑通以后,寻找自适应方案
| 边类型 | 权重 | 语义 |
|---|---:|---|
| `direct_evidence` | +0.25 | 现象就是假设所述行为本身 |
| `supports` | +0.15 | 与假设一致但非决定性 |
| `consequence_observed` | +0.15 | 观察到假设预期的结果 |
| `prerequisite_met` | +0.10 | 满足假设的前置条件 |
| `weakens` | 0.10 | 降低假设可能性 |
| `contradicts` | 0.20 | 直接反驳假设 |
置信度更新公式(收敛于 [0, 1]
- 正向边:`delta = weight * (1 - old_conf)`
- 负向边:`delta = weight * old_conf`
跨阈值自动转状态:≥ 0.8 → `supported`,≤ 0.2 → `refuted`,跑完仍 active → `inconclusive`。LLM 只负责挑边类型(分类任务),权重表与状态转移由代码裁决,避免数值幻觉。
新增 Phenomenon 时通过 Jaccard 相似度合并title > 0.6 且 description > 0.4 即视为重复,合并后提升置信度并追加 `corroborating_agents`),避免同一发现被重复入图。
## 五阶段流水线
| 阶段 | 说明 |
|------|------|
| **Phase 1** | FileSystemAgent 勘查磁盘镜像,识别分区、目录结构、关键文件,产出初始 Lead |
| **Phase 2** | 多轮线索追踪 — Lead 按 Agent 类型分组并行派发,最多 10 轮迭代 |
| **Phase 2.5** | 覆盖率缺口分析 — 对照 config.yaml 中的 10 个调查领域,自动补漏 |
| **Phase 3** | TimelineAgent 综合所有 evidence 建立事件时间线 |
| **Phase 4** | ReportAgent 生成 Markdown 格式取证报告 |
| **Phase 1** | FileSystemAgent 勘镜像,识别分区/文件系统/关键路径,产出首批 Phenomenon |
| **Phase 2** | 假设生成 — 优先读 `config.yaml:hypotheses`;未配置则由 HypothesisAgent 从 Phase 1 现象自动生成 3-7 个 |
| **Phase 3** | 假设驱动调查(默认 5 轮迭代)。每轮:一次性为所有 active 假设产出 leads → 按 agent 类型并发派发(信号量 = 3→ 一次性判定新现象与各假设的关系。所有假设收敛即提前退出。末尾:失败 lead 重试一次 + Gap Analysis |
| **Phase 4** | TimelineAgent `build_filesystem_timeline` 生成 MAC 时间线,与 Phenomenon 时间戳关联 |
| **Phase 5** | ReportAgent 综合假设、证据、实体,生成 Markdown 报告 |
## 取证工具链
### Investigation Areashypothesis-derived
### Sleuth Kit磁盘取证
Phase 2 末尾 orchestrator 调一次 LLM 从所有 active hypothesis 派生 5-12 个 **InvestigationArea**snake_case slug、description、suggested_agent、expected_keywords、expected_tools、priority、motivating_hypothesis_ids。Areas 存进 `graph.investigation_areas`,序列化到 `runs/<ts>/investigation_areas.json`。两个用途:
通过异步子进程调用 TSK 命令行工具:
1. **Phase 3 主循环提示** — 每个 hypothesis 块附 `Expected areas: a, b, c`LLM 仍自由选 lead 但有软引导
2. **Phase 3 末尾 Gap Analysis** — 两层判定覆盖情况:
- **关键词匹配**:扫 Phenomenon 标题/描述对照 area.expected_keywords
- **工具命中**:检查 area.expected_tools 是否实际调用过
未覆盖的 area 自动派 lead`suggested_agent` + `priority` + `motivating_hypothesis_ids[0]` 透传给 `Lead.hypothesis_id` 保留 provenance最多 3 轮补漏。
**手动 override**`config.yaml:investigation_areas` 默认注释掉,纯 LLM 派生。取消注释可添加强制必查的领域,会先于 LLM 写入并通过 slug-based dedupe 保护不被覆盖LLM 只会 augment keyword/tool 列表)。这是跨案件/跨平台适配的关键 —— 不再 hardcode Windows-specific 领域。
## Agent 体系
`AgentFactory` 维护 7 个角色模板(`ROLE_TEMPLATES`),每个模板指定默认工具集。`HypothesisAgent``ReportAgent``BaseAgent` 的子类(额外注册专用工具),其余 5 个 Agent 直接由 `BaseAgent` + 工具列表生成。
### Agent 工作流
`BaseAgent.run` 在 system prompt 中强制四阶段:
```
A. INVESTIGATE 先查图状态 / Asset Library再调取证工具
B. RECORD 每条发现写 add_phenomenon
C. LINK 按需 link_to_entity但禁止凭记忆引用 ph-id必须先 list_phenomena
D. ANSWER 以上完成后再给最终答复
```
prompt 内置**反幻觉规则**:只允许记录工具输出中逐字出现的内容;时间戳/路径/inode 必须来自工具返回;输出被截断须标 `[truncated]`
### 动态 Agent 组合
`AgentFactory.create_specialized_agent()` 应对能力缺口:将工具目录与假设描述喂给 LLM由其挑 3-8 个工具并写角色描述,工厂据此实例化新 Agent 并缓存。
## 工具系统
`tool_registry.py` 启动时调用 `register_all_tools(image_path, partition_offset, graph)`,将所有工具一次性注册到全局 `TOOL_CATALOG`
### 工具结果缓存
`CACHEABLE_TOOLS` 集合标记纯读取/确定性工具partition_info、list_directory、parse_registry_key …)。镜像只读,同 args 调用产出固定,命中缓存直接复用,错误结果不入缓存。
### Asset Library
`EvidenceGraph.asset_library` 按 inode 索引所有已提取文件,避免重复 extract。Agent 通过 `list_assets` / `find_extracted_file` 工具查询。新文件按文件名自动归类到 `registry_hive` / `chat_log` / `prefetch` / `network_capture` / `recycle_bin` 等十类之一。
### 取证工具链
**Sleuth Kit磁盘取证** — 异步子进程调用 TSK
| 工具 | 用途 |
|------|------|
@@ -49,47 +126,43 @@ Agent 之间不直接通信,通过 **Blackboard黑板** 共享发现E
| `srch_strings` | 磁盘字符串搜索 |
| `fls -m` | MAC 时间线生成 |
### regipy注册表解析
**regipy注册表解析** — 直接读 SYSTEM / SOFTWARE / SAM / NTUSER.DAT 二进制,提取系统信息、用户账户、网络配置、已安装软件、邮件账户、关机时间等。
直接解析 Windows 注册表 hive 二进制文件SYSTEM、SOFTWARE、SAM、NTUSER.DAT提取系统信息、用户账户、网络配置、已安装软件、邮件账户、关机时间等
**文件解析器** — Prefetch 二进制(`.pf`、PCAP 字符串提取HTTP 请求 / Host / Cookie / UA、通用文本与二进制读取、正则搜索、Hex dump
### 文件解析器
## 断连恢复与运行归档
- **Prefetch** — 二进制解析 Windows XP .pf 文件(运行次数、最后执行时间)
- **PCAP** — 从抓包文件提取 HTTP 请求、Host、Cookie、User-Agent
- **通用文本/二进制** — 按偏移读取、正则搜索、Hex dump
三层防护:
## 断连恢复与数据归档
1. **EvidenceGraph 自动持久化** — 每次 `add_phenomenon` / `add_hypothesis` / `add_edge` / `add_lead` 等写操作均自动落盘(原子写 `.tmp` 后 rename
2. **Agent 级容错** — 单 Agent 失败 → 该 lead 标 `failed`,连续 3 次失败触发 `AnalysisAborted` 优雅退出Phase 3 末尾对失败 lead 重试一次(`retry=True` 防无限循环)
3. **续跑**`main.py` 启动时扫 `runs/*/graph_state.json`,发现存在但缺 `run_metadata.json` 的目录即提示恢复,并按 graph 当前状态决定从哪一阶段续起
系统设计了三层防护,应对长时间运行中的网络中断:
1. **Blackboard 自动持久化** — 每次 add_evidence / add_lead 自动写盘(原子写入)
2. **Agent 级容错** — 单个 Agent 失败标记 Lead 为 failed不影响其他 Agent自动重试一次
3. **优雅退出** — 连续 3 次 Agent 失败后保存现有成果并干净退出
每次运行自动创建带时间戳的归档目录:
### 运行归档目录
```
runs/
2026-04-02T14-30-00/
config.yaml 配置快照
blackboard_state.json 实时状态(用于恢复
evidence.json 结构化证据导出
leads.json 线索及最终状态
report.md 取证报告
run_metadata.json 运行元数据(时长、统计、错误)
masforensics.log 运行日志
config.yaml 配置快照
graph_state.json 实时状态(续跑用)
phenomena.json 现象导出
hypotheses.json 假设 + 置信度日志
entities.json 实体
edges.json
leads.json 线索及最终状态
extracted/ 从镜像提取的文件
<image>_forensic_report.md 取证报告
run_metadata.json 运行元数据(时长、统计、错误)
masforensics.log 运行日志
```
中断后再次运行 `python main.py`,系统自动检测未完成的运行并提示恢复。
## 快速开始
### 环境要求
- Python >= 3.14
- The Sleuth Kit系统安装提供 `mmls``fls``icat` 等命令)
- 磁盘镜像文件置于 `image/` 目录
- 磁盘镜像文件
### 安装
@@ -99,50 +172,77 @@ uv sync
### 配置
编辑 `config.yaml`,填入 LLM API 地址和密钥
编辑 `config.yaml`
```yaml
agent:
base_url: "https://your-api-proxy.com"
api_key: "sk-your-key"
model: "claude-sonnet-4-6"
max_tokens: 4096
max_tokens: 16384
max_investigation_rounds: 5 # Phase 3 最大迭代轮数
# hypotheses: # 可选:手动指定初始假设
# - title: "嫌疑人主动实施网络嗅探"
# description: "..."
# investigation_areas: # 可选:手动 override默认全 LLM 派生)
# - area: shutdown_time # LLM 通过 slug dedupe 只 augment
# agent: registry # keyword/tool 列表,不覆盖 manual
# priority: 3
# keywords: [shutdown]
# tools: [get_shutdown_time]
```
`investigation_areas` 部分定义了必须覆盖的调查领域,可按需增减
未配置 `hypotheses` 时由 HypothesisAgent 自动生成
### 运行
```bash
python main.py
python main.py # 交互式选镜像与分区
python main.py /path/to/image/dir # 指定镜像目录
```
报告和所有结构化数据将保存在 `runs/<timestamp>/` 目录下
中断后再次运行会自动检测未完成的 run 并提示是否续跑
### 仅重生成报告
跑完一次后若只想换提示词或修复报告:
```bash
python regenerate_report.py runs/<timestamp>
```
跳过 Phase 1-4直接从已有 `graph_state.json` 重跑 ReportAgent。
## 项目结构
```
MASForensics/
├── main.py 入口
├── orchestrator.py 流水线调度
├── blackboard.py 共享知识库
├── llm_client.py LLM API 客户端
├── base_agent.py Agent 基类
├── config.yaml 配置文件
├── main.py 入口、镜像选择、断连恢复
├── orchestrator.py 五阶段流水线调度
├── evidence_graph.py 证据知识图 + 边权重表 + 持久化
├── base_agent.py Agent 基类 + 内建 graph 工具
├── agent_factory.py 角色模板 + 动态 Agent 组合
├── tool_registry.py 工具目录 + 结果缓存 + 自动归类
├── llm_client.py LLM API 客户端
├── log_config.py 彩色终端日志 + 文件日志
├── regenerate_report.py 从已有 graph_state 重生成报告
├── config.yaml 配置 + 调查领域 + 可选假设
├── agents/
│ ├── filesystem.py 文件系统 Agent
│ ├── registry.py 注册表 Agent
│ ├── communication.py 通信 Agent
── network.py 网络 Agent
│ ├── timeline.py 时间线 Agent
│ └── report.py 报告 Agent
│ ├── hypothesis.py HypothesisAgentadd_hypothesis、link
│ ├── report.py ReportAgent综合报告自带读取工具
│ ├── timeline.py TimelineAgent保留以备扩展
── ... filesystem/registry/communication/network同上
├── tools/
│ ├── sleuthkit.py Sleuth Kit 封装
│ ├── registry.py 注册表解析(regipy
│ └── parsers.py 文件格式解析
├── image/ 磁盘镜像
├── extracted/ 提取的文件(运行时生成)
└── runs/ 运行归档
│ ├── sleuthkit.py TSK 异步封装
│ ├── registry.py regipy 解析
│ └── parsers.py Prefetch / PCAP / 通用文件解析
├── image/ 磁盘镜像(用户放)
├── runs/ 运行归档
└── tests/
└── test_optimizations.py
```
## 依赖
@@ -152,14 +252,16 @@ MASForensics/
| `httpx[socks]` | 异步 HTTP 客户端(支持 SOCKS 代理) |
| `pyyaml` | 配置文件解析 |
| `regipy` | Windows 注册表 hive 解析 |
| `pytest` / `pytest-asyncio` | 测试 |
## 当前案例
## 默认案例
默认配置分析 **CFReDS Hacking Case**NIST 标准取证教学镜像):
**CFReDS Hacking Case**NIST 标准取证教学镜像):
- 镜像SCHARDT.001~4.6GBIBM 硬盘8 个分段)
- 镜像SCHARDT.001~4.6 GBIBM 硬盘8 个分段)
- 系统Windows XP
- 场景:涉嫌黑客入侵的计算机取证分析
- 完整镜像 MD5`AEE4FCD9301C03B3B054623CA261959A``config.yaml` 含各分段 MD5 用于校验)
## 测试

View File

@@ -1,150 +1,99 @@
"""Agent Factory — composes agents from tool registry and role templates.
"""Agent Factory — instantiates agents from registered classes.
Provides both pre-defined agent templates (filesystem, registry, etc.)
and LLM-driven dynamic agent composition for capability gaps.
Each agent type has a dedicated subclass under agents/ that owns its name,
role description, and tool list (single source of truth). The factory just
maps agent_type → class. Also supports LLM-driven dynamic composition for
capability gaps via create_specialized_agent().
"""
from __future__ import annotations
import json
import logging
from dataclasses import dataclass, field
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG, ToolDefinition
from tool_registry import TOOL_CATALOG
# Agent classes with custom tools — keyed by template name
_AGENT_CLASSES: dict[str, type] = {}
# Agent classes keyed by name. Populated lazily to avoid circular imports.
_AGENT_CLASSES: dict[str, type[BaseAgent]] = {}
def _load_agent_classes() -> None:
"""Lazy-import custom agent classes to avoid circular imports."""
"""Lazy-import agent classes to avoid circular imports."""
if _AGENT_CLASSES:
return
from agents.android_artifact import AndroidArtifactAgent
from agents.communication import CommunicationAgent
from agents.filesystem import FileSystemAgent
from agents.hypothesis import HypothesisAgent
from agents.ios_artifact import IOSArtifactAgent
from agents.media import MediaAgent
from agents.network import NetworkAgent
from agents.registry import RegistryAgent
from agents.report import ReportAgent
from agents.strategist import InvestigationStrategist
from agents.timeline import TimelineAgent
_AGENT_CLASSES["filesystem"] = FileSystemAgent
_AGENT_CLASSES["registry"] = RegistryAgent
_AGENT_CLASSES["communication"] = CommunicationAgent
_AGENT_CLASSES["network"] = NetworkAgent
_AGENT_CLASSES["timeline"] = TimelineAgent
_AGENT_CLASSES["hypothesis"] = HypothesisAgent
_AGENT_CLASSES["report"] = ReportAgent
_AGENT_CLASSES["ios_artifact"] = IOSArtifactAgent
_AGENT_CLASSES["android_artifact"] = AndroidArtifactAgent
_AGENT_CLASSES["media"] = MediaAgent
_AGENT_CLASSES["strategist"] = InvestigationStrategist
# Triage agent per (source.type, platform). disk_image is ambiguous on its
# own — both a Windows USB image and an Android raw dump are disk_image —
# so the routing helper also looks at source.meta.platform when present.
SOURCE_TYPE_AGENTS: dict[str, str] = {
"disk_image": "filesystem", # default for unknown platform
"mobile_extraction": "ios_artifact",
"archive": "filesystem",
"media_collection": "media",
}
# Per-platform overrides for disk_image sources. Keys come from
# source.meta.platform in case.yaml (lowercased).
_DISK_IMAGE_PLATFORM_AGENTS: dict[str, str] = {
"windows": "filesystem",
"linux": "filesystem",
"android": "android_artifact",
"ios": "ios_artifact",
}
def get_triage_agent_type(source) -> str:
"""Pick the right Phase-1 agent for *source*.
Accepts either an :class:`EvidenceSource` or a raw source.type string
(for back-compat with the S5 signature). Disk-image sources additionally
consult ``source.meta.platform`` so Windows USBs and Android raw dumps —
both type=disk_image — get different agents.
"""
# Back-compat: accept a plain type string.
if isinstance(source, str):
return SOURCE_TYPE_AGENTS.get(source, "filesystem")
src_type = getattr(source, "type", "disk_image")
if src_type == "disk_image":
meta = getattr(source, "meta", {}) or {}
platform = str(meta.get("platform", "")).lower()
if platform in _DISK_IMAGE_PLATFORM_AGENTS:
return _DISK_IMAGE_PLATFORM_AGENTS[platform]
return SOURCE_TYPE_AGENTS.get(src_type, "filesystem")
logger = logging.getLogger(__name__)
@dataclass
class RoleTemplate:
"""Pre-defined agent archetype."""
name: str
role: str
default_tools: list[str] # tool names from TOOL_CATALOG
tags: list[str] = field(default_factory=list)
# Pre-defined templates matching the original 6 agents + hypothesis agent.
ROLE_TEMPLATES: dict[str, RoleTemplate] = {
"filesystem": RoleTemplate(
name="filesystem",
role=(
"File system forensic analyst. You examine disk image partition layouts, "
"directory structures, file metadata, and recover deleted files. "
"You identify suspicious files, installed programs, and user data locations. "
"You also handle Recycle Bin forensics and Prefetch execution evidence."
),
default_tools=[
"partition_info", "filesystem_info", "list_directory",
"extract_file", "find_file", "search_strings",
"parse_prefetch", "count_deleted_files",
"read_text_file", "search_text_file", "read_binary_preview",
],
tags=["filesystem", "disk", "files", "deleted", "prefetch"],
),
"registry": RoleTemplate(
name="registry",
role=(
"Windows registry forensic analyst. You parse registry hive files "
"(SYSTEM, SOFTWARE, SAM, NTUSER.DAT) to extract system configuration, "
"user accounts, installed software, network settings, email accounts, "
"and other Windows artifacts."
),
default_tools=[
"extract_file", "list_directory",
"parse_registry_key", "list_installed_software",
"get_user_activity", "search_registry",
"get_system_info", "get_timezone_info", "get_computer_name",
"get_shutdown_time", "enumerate_users",
"get_network_interfaces", "get_email_config",
],
tags=["registry", "windows", "system", "user", "software"],
),
"communication": RoleTemplate(
name="communication",
role=(
"Communication forensic analyst. You analyze email files (.dbx, .pst), "
"IRC/mIRC chat logs, newsgroup data, and other messaging artifacts "
"to identify communication patterns and contacts."
),
default_tools=[
"list_directory", "extract_file",
"read_text_file", "read_binary_preview",
"list_extracted_dir", "search_strings",
"search_text_file", "read_text_file_section",
],
tags=["email", "chat", "irc", "messaging", "communication"],
),
"network": RoleTemplate(
name="network",
role=(
"Network forensic analyst. You analyze browser history, cookies, "
"network captures (PCAP), wireless artifacts, and other network-related "
"evidence to reconstruct online activities."
),
default_tools=[
"list_directory", "extract_file",
"read_text_file", "read_binary_preview",
"list_extracted_dir", "search_strings",
"search_text_file", "read_text_file_section",
"parse_pcap_strings",
],
tags=["network", "browser", "pcap", "http", "internet"],
),
"timeline": RoleTemplate(
name="timeline",
role=(
"Timeline correlation analyst. You build chronological timelines "
"by combining filesystem MAC times with evidence from other agents. "
"You identify temporal patterns and correlate events across categories."
),
default_tools=[
"build_filesystem_timeline",
],
tags=["timeline", "correlation", "temporal"],
),
"report": RoleTemplate(
name="report",
role=(
"Forensic report writer. You synthesize all evidence and hypotheses "
"into a comprehensive forensic analysis report with executive summary, "
"detailed findings organized by hypothesis, timeline of events, and conclusions."
),
default_tools=[], # Report agent uses only graph query tools
tags=["report", "summary", "writing"],
),
"hypothesis": RoleTemplate(
name="hypothesis",
role=(
"Hypothesis analyst. You review all phenomena discovered so far "
"and formulate investigative hypotheses about what happened on the system. "
"For each hypothesis, identify which existing phenomena support or contradict it."
),
default_tools=[], # Uses only graph query + hypothesis tools
tags=["hypothesis", "analysis", "reasoning"],
),
}
class AgentFactory:
"""Creates agents from templates or dynamically via LLM composition."""
"""Creates agents from registered classes or dynamically via LLM composition."""
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
self.llm = llm
@@ -152,40 +101,20 @@ class AgentFactory:
self._cache: dict[str, BaseAgent] = {}
def get_or_create_agent(self, agent_type: str) -> BaseAgent | None:
"""Get a cached agent or create one from a template."""
"""Get a cached agent or instantiate one from its registered class."""
if agent_type in self._cache:
return self._cache[agent_type]
template = ROLE_TEMPLATES.get(agent_type)
if template is None:
logger.warning("No template for agent type: %s", agent_type)
return None
# Use custom agent class if one exists, otherwise BaseAgent
_load_agent_classes()
agent_cls = _AGENT_CLASSES.get(agent_type)
if agent_cls is not None:
agent = agent_cls(self.llm, self.graph)
else:
agent = self._instantiate_from_template(template)
if agent_cls is None:
logger.warning("No agent class for type: %s", agent_type)
return None
agent = agent_cls(self.llm, self.graph)
self._cache[agent_type] = agent
return agent
def _instantiate_from_template(self, template: RoleTemplate) -> BaseAgent:
"""Create a BaseAgent from a role template, registering tools from the catalog."""
agent = BaseAgent(self.llm, self.graph)
agent.name = template.name
agent.role = template.role
for tool_name in template.default_tools:
td = TOOL_CATALOG.get(tool_name)
if td is None:
logger.warning("Tool '%s' not in catalog (template: %s)", tool_name, template.name)
continue
agent.register_tool(td.name, td.description, td.input_schema, td.executor)
return agent
async def create_specialized_agent(
self,
hypothesis_title: str,
@@ -220,18 +149,15 @@ class AgentFactory:
messages=[{"role": "user", "content": prompt}],
)
# Parse response — try to extract JSON
try:
config = json.loads(response)
except json.JSONDecodeError:
# Try to find JSON in the response
import re
match = re.search(r'\{.*\}', response, re.DOTALL)
if match:
config = json.loads(match.group())
else:
logger.error("Failed to parse agent composition response: %s", response[:300])
# Fallback: create a generic agent with all tools
return self._create_fallback_agent(capability_gap)
agent_name = config.get("agent_name", "specialized")
@@ -239,13 +165,11 @@ class AgentFactory:
strategy = config.get("strategy", "")
tool_names = config.get("tools", [])
# Validate tool names against catalog
valid_tools = [t for t in tool_names if t in TOOL_CATALOG]
if not valid_tools:
logger.warning("No valid tools selected by LLM, using fallback")
return self._create_fallback_agent(capability_gap)
# Build agent
agent = BaseAgent(self.llm, self.graph)
agent.name = agent_name
agent.role = f"{role_text}\n\nInvestigation Strategy:\n{strategy}"

View File

@@ -0,0 +1,58 @@
"""Android Artifact Agent — multi-partition analysis of raw Android dumps.
DESIGN.md §4.7 安卓: ``mmls`` slices the dump into partitions; each one is
its own analysable surface. Ext4-backed partitions (typically SYSTEM,
USERDATA when not FBE-encrypted, EFS in some variants) yield to TSK; raw
partitions (BOOT, RECOVERY, RADIO, MODEM blobs) are best mined with
``search_strings``. Userdata is the prize and is often FBE-encrypted on
modern devices — the agent must check fsstat before assuming readability
(see ``probe_android_partitions`` for the survey).
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class AndroidArtifactAgent(BaseAgent):
name = "android_artifact"
role = (
"Android forensic analyst. You navigate raw Android disk dumps "
"(blk0_sda-style images) partition by partition. Workflow: call "
"probe_android_partitions ONCE to map the disk; pick the partitions "
"with fs_type=Ext4 or fs_type=F2FS (SYSTEM, USERDATA if readable, "
"EFS); for each, call set_active_partition(offset_from_512_sector_column) "
"and then list_directory / extract_file / search_strings as usual. "
"For raw partitions (BOOT, RECOVERY, RADIO, TOMBSTONES) skip directly "
"to search_strings — they have no filesystem. If USERDATA shows "
"fs_type=unknown it is almost certainly FBE-encrypted: record that "
"as a negative finding (the absence IS evidence) and move on to "
"what's reachable."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
# Android-specific
"probe_android_partitions",
"set_active_partition",
# Reused TSK toolset — partition_offset comes from active_source
"partition_info", "filesystem_info", "list_directory",
"extract_file", "find_file", "search_strings",
"count_deleted_files", "build_filesystem_timeline",
# Generic parsers
"read_text_file", "read_binary_preview", "search_text_file",
"read_text_file_section", "list_extracted_dir", "find_files",
# SQLite — Android apps store data in sqlite too (WhatsApp, etc.)
"sqlite_tables", "sqlite_query",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

View File

@@ -1,12 +1,17 @@
"""Hypothesis Agent — analyzes phenomena and generates investigative hypotheses."""
"""Hypothesis Agent — generates investigative hypotheses from phenomena.
Generates hypotheses only. Phenomenon→Hypothesis linking is handled centrally
by Orchestrator._judge_new_phenomena. Tool set is restricted to read-only
graph queries + add_hypothesis to prevent the agent from creating phenomena,
leads, or entity links.
"""
from __future__ import annotations
import json
import logging
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph, HYPOTHESIS_EDGE_WEIGHTS
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
logger = logging.getLogger(__name__)
@@ -17,19 +22,19 @@ class HypothesisAgent(BaseAgent):
role = (
"Hypothesis analyst. You review all phenomena discovered so far "
"and formulate investigative hypotheses about what happened on this system. "
"Your ultimate goal: build the most complete picture of events that occurred. "
"For each hypothesis, identify which existing phenomena support or contradict it."
"Your ultimate goal: build the most complete picture of events that occurred."
)
mandatory_record_tools = ("add_hypothesis",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_hypothesis_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools. add_hypothesis is registered separately."""
self._register_graph_read_tools()
def _register_hypothesis_tools(self) -> None:
"""Register hypothesis-specific tools."""
valid_edge_types = list(HYPOTHESIS_EDGE_WEIGHTS.keys())
self.register_tool(
name="add_hypothesis",
description=(
@@ -53,42 +58,30 @@ class HypothesisAgent(BaseAgent):
executor=self._add_hypothesis,
)
self.register_tool(
name="link_phenomenon_to_hypothesis",
description=(
"Link an existing phenomenon to a hypothesis with a relationship type. "
f"Valid relationship types: {', '.join(valid_edge_types)}. "
"direct_evidence = the phenomenon IS the hypothesis. "
"supports = consistent with the hypothesis. "
"prerequisite_met = a necessary condition is satisfied. "
"consequence_observed = an expected result of the hypothesis is found. "
"contradicts = directly contradicts the hypothesis. "
"weakens = makes the hypothesis less likely."
),
input_schema={
"type": "object",
"properties": {
"phenomenon_id": {
"type": "string",
"description": "ID of the phenomenon (e.g. 'ph-a1b2c3d4').",
},
"hypothesis_id": {
"type": "string",
"description": "ID of the hypothesis (e.g. 'hyp-e5f6g7h8').",
},
"edge_type": {
"type": "string",
"enum": valid_edge_types,
"description": "The edge_type of the relationship.",
},
"reason": {
"type": "string",
"description": "The reason this relationship holds (1-2 sentences).",
},
},
"required": ["phenomenon_id", "hypothesis_id", "edge_type", "reason"],
},
executor=self._link_phenomenon_to_hypothesis,
def _build_system_prompt(self, task: str) -> str:
"""Focused prompt — no INVESTIGATE/RECORD/LINK workflow."""
return (
f"You are {self.name}, a forensic hypothesis analyst.\n"
f"Role: {self.role}\n\n"
f"Image: {self.graph.image_path}\n"
f"Current investigation state: {self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call list_phenomena and search_graph to review existing findings.\n"
f"2. For each hypothesis you want to record, call add_hypothesis (title + description).\n"
f"3. STOP after you have generated 3-7 hypotheses. Do not call any more tools.\n\n"
f"STRICT BOUNDARIES:\n"
f"- Your only mutation tool is add_hypothesis. Do NOT attempt list_directory, "
f"parse_registry_key, extract_file, or any disk-image investigation tools — "
f"they are not yours and you will get 'unknown tool' errors.\n"
f"- You CANNOT create phenomena, leads, or entity links. The orchestrator handles "
f"all phenomenon↔hypothesis linking after you finish.\n"
f"- Each hypothesis must be specific and testable. Avoid generic templates like "
f"'Unauthorized Remote Access' or 'Malware Deployment' unless concrete phenomena "
f"in the graph already point to them.\n"
f"- If the graph is empty, generate broad starting hypotheses and mark them "
f"clearly as exploratory in their description so downstream agents know they "
f"still need evidence."
)
async def _add_hypothesis(self, title: str, description: str) -> str:
@@ -98,33 +91,3 @@ class HypothesisAgent(BaseAgent):
created_by=self.name,
)
return f"Hypothesis created: {hid}{title} (confidence: 0.50)"
async def _link_phenomenon_to_hypothesis(
self,
phenomenon_id: str,
hypothesis_id: str,
edge_type: str = "",
reason: str = "",
# Common LLM misnaming — accept as fallbacks
relationship: str = "",
note: str = "",
) -> str:
edge_type = edge_type or relationship
reason = reason or note
if not edge_type:
return "Error: edge_type is required."
try:
new_conf = await self.graph.update_hypothesis_confidence(
hyp_id=hypothesis_id,
phenomenon_id=phenomenon_id,
edge_type=edge_type,
reason=reason,
)
weight = HYPOTHESIS_EDGE_WEIGHTS[edge_type]
direction = "+" if weight > 0 else ""
return (
f"Linked: {phenomenon_id} —[{edge_type}]→ {hypothesis_id} "
f"(weight: {direction}{weight}, new confidence: {new_conf:.3f})"
)
except ValueError as e:
return f"Error linking: {e}"

49
agents/ios_artifact.py Normal file
View File

@@ -0,0 +1,49 @@
"""iOS Artifact Agent — analyses unpacked iOS extractions.
DESIGN.md §4.7/§4.8: tree-mode iOS sources are the third evidence family
the system handles (alongside disk images and pcaps). This agent owns the
iOS-specific toolset; the grounded ``add_phenomenon`` contract from
BaseAgent applies unchanged — every fact must cite a tool invocation.
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class IOSArtifactAgent(BaseAgent):
name = "ios_artifact"
role = (
"iOS forensic analyst. You analyse unpacked iOS extractions — "
"binary/XML plists, SQLite databases (sms.db, ChatStorage.sqlite, "
"AddressBook.sqlitedb), the keychain (keychain-2.db), and the "
"iDevice_info.txt summary — to extract device identity, accounts, "
"messaging, contacts, and credential metadata. Domain-rooted iOS "
"trees (HomeDomain, AppDomain*, ProtectedDomain, NetworkDomain) "
"are your map; navigate by path, not by inode."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
# navigation — find_files is the workhorse on 10k+-file iOS trees;
# list_extracted_dir is for initial layout summary only.
"list_extracted_dir", "find_files",
"read_text_file", "read_text_file_section", "read_binary_preview",
"search_text_file",
# iOS-specific parsers
"parse_plist",
"sqlite_tables", "sqlite_query",
"parse_ios_keychain",
"read_idevice_info",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

52
agents/media.py Normal file
View File

@@ -0,0 +1,52 @@
"""Media Agent — OCR-based analysis of screenshot/photo evidence.
DESIGN.md §4.7: the LLM backend has no vision capability, so JPEG/PNG
evidence must go through tesseract first. The agent runs OCR, then
records extracted strings — especially identifiers (wallet addresses,
phone numbers, usernames) — via the grounded observe_identity gateway so
they participate in cross-source coref the same way iOS keychain entries
or Windows account names do.
If the OCR runtime is missing on the host, ocr_image returns an explicit
install hint; the agent should record that as a negative finding ("no
text extracted — tesseract not installed") rather than guessing.
"""
from __future__ import annotations
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
class MediaAgent(BaseAgent):
name = "media"
role = (
"Media / OCR forensic analyst. You analyse screenshots, photos, and "
"scanned documents — any pixel-based evidence the LLM cannot read "
"directly. Workflow: list_extracted_dir to enumerate images, "
"ocr_image on each promising one, then add_phenomenon (with the "
"OCR'd text as the verified_fact value) and observe_identity for "
"any wallet addresses, phone numbers, email addresses, or "
"usernames the text contains. If OCR fails because tesseract is "
"missing, RECORD that as a negative finding instead of fabricating "
"image content — the absence is a real fact about this run."
)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_tools(self) -> None:
tool_names = [
"ocr_image",
"list_extracted_dir", "find_files",
"read_binary_preview",
"read_text_file",
"search_text_file",
]
for name in tool_names:
td = TOOL_CATALOG.get(name)
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)

View File

@@ -2,9 +2,6 @@
from __future__ import annotations
import json
import os
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
@@ -15,34 +12,60 @@ class ReportAgent(BaseAgent):
role = (
"Forensic report writer. You synthesize all findings from the investigation "
"into a structured, professional forensic analysis report organized by hypotheses.\n\n"
"IMPORTANT: Only include findings that have a source_tool attribution (marked VERIFIED). "
"If evidence lacks source attribution, mark it as UNVERIFIED. "
"Do NOT invent or fabricate any data, timestamps, or findings not present in the evidence.\n\n"
"CRITICAL: You MUST call save_report to write the final report."
"Phenomena are marked GROUNDED (verified_facts cite a real tool invocation), "
"TOOL-ONLY (source_tool set but no facts), or UNVERIFIED (neither). When "
"writing the report, render verified_facts as primary evidence with their "
"invocation citations, and render interpretation as 'agent analysis' so the "
"reader can tell ground truth from inference. Do NOT invent or fabricate any "
"data, timestamps, or findings not present in the evidence.\n\n"
"This is a cross-source case: phenomena come from multiple evidence "
"sources, and entities discovered on different sources may refer to the "
"same real-world actor. ALWAYS include:\n"
" - 'Findings by Source' section sourced from get_phenomena_by_source\n"
" - 'Actor Clusters' section sourced from get_actor_clusters (the "
"cross-source attribution view — multi-source clusters answer "
"'which findings on different devices belong to the same person')\n"
" - 'Hypothesis × Evidence Matrix' from get_hypothesis_evidence_matrix"
)
# Calling save_report is BOTH the recording action and the completion
# signal. tool_call_loop returns the moment save_report executes; the
# tool's return value becomes the agent's final_text. The forced-retry
# mechanism fires if save_report is never called.
mandatory_record_tools = ("save_report",)
terminal_tools = ("save_report",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools. Report agent does not mutate state."""
self._register_graph_read_tools()
def _build_system_prompt(self, task: str) -> str:
"""Report agent gets a clean prompt — no Phase A/B/C/D workflow."""
return (
f"You are a forensic report writer.\n"
f"Role: {self.role}\n\n"
f"Investigation state:\n{self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call get_hypotheses_with_evidence to get all hypotheses and their linked evidence\n"
f"2. Call get_all_phenomena to get detailed findings by category\n"
f"3. Call get_entities to get people, programs, and hosts\n"
f"4. Call get_case_info for case metadata\n"
f"5. Write the complete report directly in your <answer> block\n\n"
f"1. Call get_hypotheses_with_evidence, get_all_phenomena, get_entities,\n"
f" get_case_info, get_hypothesis_evidence_matrix, get_actor_clusters,\n"
f" and get_phenomena_by_source in parallel — these are the eight data\n"
f" sources you assemble the report from.\n"
f"2. Assemble the complete markdown forensic report. Cross-source\n"
f" actor clusters and per-source breakdown are MANDATORY sections.\n"
f"3. Call save_report(content=<full markdown>, output_path=\"report.md\").\n"
f" This single call is the completion signal — the run ENDS the moment it executes.\n"
f" Do NOT call any read tools after this point; they will not run.\n"
f" Do NOT write the report as free text outside of save_report; only the\n"
f" `content` argument of save_report is persisted.\n\n"
f"RULES:\n"
f"- Write the report DIRECTLY in <answer> — do NOT use save_report tool\n"
f"- Only include findings present in the evidence graph\n"
f"- Do NOT invent timestamps, file paths, or data not in the phenomena\n"
f"- The report must be complete — do not cut off mid-section\n"
f"- The report must be the complete markdown — do not cut off mid-section.\n"
f"- Only include findings present in the evidence graph.\n"
f"- Do NOT invent timestamps, file paths, or data not in the phenomena.\n"
f"- The `content` argument can be 10K+ chars. JSON-escape inner quotes (\\\") and\n"
f" backslashes (\\\\) and newlines (\\n) correctly.\n"
)
def _register_tools(self) -> None:
@@ -74,6 +97,45 @@ class ReportAgent(BaseAgent):
executor=self._get_entities,
)
self.register_tool(
name="get_hypothesis_evidence_matrix",
description=(
"Render the hypothesis × evidence pivot as a markdown table. "
"Columns: per edge_type counts, log_odds, confidence, status. "
"Embed this directly in the report to show how each hypothesis "
"stands relative to the others on a single screen."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_hypothesis_evidence_matrix,
)
self.register_tool(
name="get_actor_clusters",
description=(
"Render the cross-source actor clusters: each cluster is the "
"set of Entity nodes the system currently treats as the same "
"actor (via active same_as edges backed by coref hypotheses "
"≥ 0.8). Includes the aggregated identifier evidence per "
"cluster. Use this in the report's 'Entities / Actors' "
"section so readers see who-is-who across devices, not just "
"raw entity rows."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_actor_clusters,
)
self.register_tool(
name="get_phenomena_by_source",
description=(
"Group every phenomenon by its originating evidence source "
"(source_id). Use this to drive the report's 'Findings by "
"Source' section so each evidence item's per-device "
"contribution is auditable."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_phenomena_by_source,
)
self.register_tool(
name="save_report",
description="Save the final report to a file.",
@@ -106,12 +168,24 @@ class ReportAgent(BaseAgent):
items = [ph for ph in phenomena.values() if ph.category == cat]
lines.append(f"\n--- {cat.upper()} ({len(items)} entries) ---")
for ph in items:
verified = "VERIFIED" if ph.source_tool else "UNVERIFIED"
lines.append(f"\n[{verified}] {ph.title} ({ph.id})")
# Grounded = at least one verified fact AND a source_tool.
grounded = bool(ph.verified_facts) and bool(ph.source_tool)
marker = "GROUNDED" if grounded else (
"TOOL-ONLY" if ph.source_tool else "UNVERIFIED"
)
lines.append(f"\n[{marker}] {ph.title} ({ph.id})")
lines.append(f" Source: {ph.source_agent} | Tool: {ph.source_tool or 'N/A'}")
if ph.timestamp:
lines.append(f" Timestamp: {ph.timestamp}")
lines.append(f" {ph.description[:500]}")
if ph.verified_facts:
lines.append(f" Verified facts ({len(ph.verified_facts)}):")
for f in ph.verified_facts:
lines.append(
f" - [{f.get('type','?')}] {str(f.get('value',''))[:200]} "
f"(cite: {f.get('invocation_id','?')})"
)
if ph.interpretation:
lines.append(f" Analysis: {ph.interpretation[:500]}")
return "\n".join(lines)
async def _get_hypotheses_with_evidence(self) -> str:
@@ -141,12 +215,87 @@ class ReportAgent(BaseAgent):
return "\n".join(lines)
async def _get_case_info(self) -> str:
info = self.graph.case_info
lines = ["=== Case Information ==="]
for k, v in info.items():
lines.append(f" {k}: {v}")
lines.append(f" Image path: {self.graph.image_path}")
lines.append(f" Partition offset: {self.graph.partition_offset}")
case = self.graph.case
if case is not None:
lines.append(f" case_id: {case.case_id}")
lines.append(f" name: {case.name}")
for k, v in (case.meta or {}).items():
lines.append(f" {k}: {v}")
lines.append(f" sources: {len(case.sources)}")
for s in case.sources:
owner = f", owner={s.owner}" if s.owner else ""
platform = s.meta.get("platform") if s.meta else None
plat = f", platform={platform}" if platform else ""
lines.append(
f" - {s.id}: {s.label} "
f"(type={s.type}, mode={s.access_mode}{plat}{owner})"
)
else:
# Legacy single-image fallback — surface whatever case_info dict
# was passed in (e.g. the old CFReDS MD5 block).
for k, v in (self.graph.case_info or {}).items():
lines.append(f" {k}: {v}")
lines.append(f" Image path: {self.graph.image_path}")
lines.append(f" Partition offset: {self.graph.partition_offset}")
return "\n".join(lines)
async def _get_hypothesis_evidence_matrix(self) -> str:
return self.graph.hypothesis_evidence_matrix_markdown()
async def _get_actor_clusters(self) -> str:
clusters = self.graph.actor_clusters()
if not clusters:
return "(no entities recorded)"
# Show multi-member clusters first — they're the cross-source links
# the human reader most needs to see.
clusters.sort(key=lambda c: (-len(c["members"]), c["members"]))
lines = [f"=== Actor Clusters ({len(clusters)}) ==="]
for i, c in enumerate(clusters, 1):
members = c["members"]
label = "MULTI-SOURCE CLUSTER" if len(members) > 1 else "Single entity"
lines.append(f"\n[{label} #{i}] {len(members)} member(s):")
for eid in members:
ent = self.graph.entities.get(eid)
if ent:
lines.append(f" - {ent.summary()}")
if c["identifiers"]:
lines.append(" Aggregated identifiers:")
for ident in c["identifiers"]:
strong_tag = "strong" if ident.get("strong") else "weak"
lines.append(
f" [{strong_tag}] {ident.get('type')}={ident.get('value')} "
f"(on {ident.get('on_entity')})"
)
if c["coref_hypotheses"]:
lines.append(" Backing coref hypotheses (≥0.8 active):")
for hid in c["coref_hypotheses"]:
hyp = self.graph.hypotheses.get(hid)
if hyp:
lines.append(f" - {hid}: conf={hyp.confidence:.2f}, L={hyp.log_odds:+.2f}")
return "\n".join(lines)
async def _get_phenomena_by_source(self) -> str:
by_src: dict[str, list] = {}
for ph in self.graph.phenomena.values():
by_src.setdefault(ph.source_id or "(unbound)", []).append(ph)
if not by_src:
return "(no phenomena recorded)"
# Resolve source labels via graph.case when possible.
def _label(src_id: str) -> str:
if self.graph.case:
src = self.graph.case.get_source(src_id)
if src:
return f"{src_id}{src.label} ({src.type})"
return src_id
lines = [f"=== Phenomena by Source ({len(by_src)} source(s)) ==="]
for src_id in sorted(by_src):
phs = by_src[src_id]
lines.append(f"\n--- {_label(src_id)} ({len(phs)} phenomena) ---")
for ph in phs:
grounded = "G" if ph.verified_facts and ph.source_tool else "·"
lines.append(f" [{grounded}] {ph.summary()}")
return "\n".join(lines)
async def _get_entities(self) -> str:
@@ -165,27 +314,42 @@ class ReportAgent(BaseAgent):
return "\n".join(lines)
async def _verify_phenomena(self) -> str:
verified = []
unverified = []
grounded: list[str] = []
tool_only: list[str] = []
unverified: list[str] = []
for ph in self.graph.phenomena.values():
entry = f" [{ph.category}] {ph.title} (agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'})"
if ph.source_tool:
verified.append(entry)
nf = len(ph.verified_facts)
entry = (
f" [{ph.category}] {ph.title} "
f"(agent: {ph.source_agent}, tool: {ph.source_tool or 'N/A'}, facts: {nf})"
)
if ph.verified_facts and ph.source_tool:
grounded.append(entry)
elif ph.source_tool:
tool_only.append(entry)
else:
unverified.append(entry)
lines = ["=== Phenomena Verification Report ==="]
lines.append(f"\nVERIFIED ({len(verified)}have source_tool):")
lines.extend(verified)
lines.append(f"\nGROUNDED ({len(grounded)}facts + source_tool):")
lines.extend(grounded)
lines.append(f"\nTOOL-ONLY ({len(tool_only)} — source_tool, no facts):")
lines.extend(tool_only)
lines.append(f"\nUNVERIFIED ({len(unverified)} — no source_tool):")
lines.extend(unverified)
return "\n".join(lines)
async def _save_report(self, content: str, output_path: str) -> str:
try:
os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
with open(output_path, "w") as f:
f.write(content)
return f"Report saved to {output_path} ({len(content)} chars)"
except Exception as e:
return f"Error saving report: {e}"
"""Save the report and return the content itself.
The content is returned (rather than a "saved to ..." status string)
so that when tool_call_loop short-circuits on this terminal tool,
`final_text` is the full markdown — orchestrator writes it to the
canonical report.md path under runs/<ts>/.
The output_path argument is kept for backward compat but the model's
chosen path is ignored — the orchestrator owns the persistence path.
"""
if not content:
return ""
return content

134
agents/strategist.py Normal file
View File

@@ -0,0 +1,134 @@
"""InvestigationStrategist — the LLM that decides depth vs breadth.
DESIGN_STRATEGIST.md §3.
The strategist does NOT run forensic tools. Its job per round is exactly one
decision: propose 1-3 leads that would move an active hypothesis, OR declare
the investigation complete. It reads the graph through four read-only views
(graph_overview / source_coverage / marginal_yield / budget_status) and
expresses its decision through two write tools (propose_lead /
declare_investigation_complete).
This is the smallest possible agent in the system — the entire point is that
strategy decisions live in one agent so they're auditable and the rest of the
codebase doesn't carry implicit depth/breadth policy.
"""
from __future__ import annotations
import logging
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
logger = logging.getLogger(__name__)
class InvestigationStrategist(BaseAgent):
name = "strategist"
role = (
"Investigation strategist. You do not run forensic tools yourself. "
"Each round you take ONE decision: propose 1-3 new investigation leads "
"that would materially affect an active hypothesis, OR declare the "
"investigation complete. Your judgment is grounded in the graph "
"(hypotheses, sources, coverage, marginal yield, budget) — never in "
"speculation."
)
# At least one of these must be called every round, otherwise BaseAgent's
# forced RECORD retry kicks in and re-prompts the strategist to take a
# documented decision.
mandatory_record_tools = ("propose_lead", "declare_investigation_complete")
# declare_complete is terminal — calling it short-circuits the tool loop,
# which is what we want (strategist returns immediately on "done").
terminal_tools = ("declare_investigation_complete",)
# Strategist-specific tools, plus the read-only graph queries inherited
# from BaseAgent. NO graph write tools (no add_phenomenon / link_to_entity
# / observe_identity); the strategist must NOT mutate evidence directly.
_STRATEGY_TOOLS = (
"graph_overview",
"source_coverage",
"marginal_yield",
"budget_status",
"propose_lead",
"declare_investigation_complete",
)
def _register_graph_tools(self) -> None:
"""Strategist gets read-only graph queries + the six strategy tools.
It does NOT get write tools (no add_phenomenon, observe_identity,
link_to_entity, add_temporal_edge). Every graph mutation must come
from a dispatched worker, not from the planner.
"""
self._register_graph_read_tools()
for tool_name in self._STRATEGY_TOOLS:
td = TOOL_CATALOG.get(tool_name)
if td is None:
logger.warning(
"Strategist could not find tool %s in TOOL_CATALOG — "
"register_all_tools must run before agent instantiation.",
tool_name,
)
continue
self.register_tool(td.name, td.description, td.input_schema, td.executor)
def _build_system_prompt(self, task: str) -> str:
"""Strategist-specific prompt. Replaces the BaseAgent default which
walks an INVESTIGATE→RECORD→LINK workflow that is wrong for a
planner agent.
"""
return (
f"You are {self.name}, the investigation strategist.\n"
f"Role: {self.role}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW (do this exactly):\n"
f" 1. Call graph_overview FIRST. Look at: which hypotheses are\n"
f" active (conf 0.2-0.8) vs already supported/refuted; which\n"
f" ones have many edges but only 1 distinct_source; which had\n"
f" a recent_flip vs none in two rounds.\n"
f" 2. Call marginal_yield to see if the last rounds produced anything.\n"
f" 3. Call budget_status to know your runway.\n"
f" 4. For each candidate lead direction, call source_coverage on\n"
f" the relevant source to see what's been touched.\n"
f" 5. Take exactly ONE of these terminal actions:\n"
f" (a) Call propose_lead 1-3 times for leads that would\n"
f" materially move an active hypothesis. STOP after this.\n"
f" (b) Call declare_investigation_complete with a specific\n"
f" reason. STOP after this.\n"
f"\n"
f"DECISION CRITERIA — when to propose vs when to stop:\n"
f" PROPOSE when:\n"
f" - A hypothesis is supported only by ONE source — get\n"
f" cross-source corroboration. Same-source repeats are\n"
f" cheap (harmonic damping).\n"
f" - A hypothesis is in the active band (0.2 < conf < 0.8) —\n"
f" it needs the deciding evidence.\n"
f" - A high-value artefact is ✗ on source_coverage AND an\n"
f" active hypothesis depends on the kind of evidence that\n"
f" artefact would produce.\n"
f" STOP (declare_complete) when:\n"
f" - marginal_yield shows zero across 2+ rounds.\n"
f" - budget_status warns ≥90% on tool_calls or rounds.\n"
f" - all active hypotheses are resolved (supported or refuted).\n"
f" - coverage saturation: every ✗ on every source is irrelevant\n"
f" to active hypotheses.\n"
f"\n"
f"HARD RULES:\n"
f" - You CANNOT call investigation tools (list_directory,\n"
f" sqlite_query, parse_registry_key, extract_file, etc.) — your\n"
f" job is to direct workers, not to investigate yourself.\n"
f" - You CANNOT call write tools (add_phenomenon, observe_identity,\n"
f" link_to_entity, add_hypothesis, add_temporal_edge). All\n"
f" evidence mutations come from the workers you dispatch.\n"
f" - Every propose_lead MUST cite a real hyp-id from\n"
f" graph_overview's table — fabricated ids will be rejected.\n"
f" - Don't propose more than 3 leads in one round. Quality over\n"
f" quantity — a 4th lead almost always means you're not really\n"
f" sure what would move the graph.\n"
f" - Don't re-propose a lead that's already pending. The system\n"
f" deduplicates (motivating_hyp, expected_type, agent, source)\n"
f" so duplicates silently no-op, but they waste your budget."
)

View File

@@ -1,14 +1,21 @@
"""Timeline Agent — correlates evidence across time."""
"""Timeline Agent — connects existing phenomena with temporal edges.
Operates on phenomena already in the graph. Does NOT investigate the disk
image itself. The agent's only useful output is the temporal edges it
creates between phenomena.
"""
from __future__ import annotations
import json
import logging
from base_agent import BaseAgent
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from tool_registry import TOOL_CATALOG
logger = logging.getLogger(__name__)
class TimelineAgent(BaseAgent):
name = "timeline"
@@ -17,29 +24,39 @@ class TimelineAgent(BaseAgent):
"MAC timestamps and correlate events across all phenomena categories in the "
"evidence graph to reconstruct the sequence of activities on the system."
)
mandatory_record_tools = ("add_temporal_edge",)
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
super().__init__(llm, graph)
self._register_tools()
def _register_graph_tools(self) -> None:
"""Restrict to read-only graph tools — Timeline does not add phenomena."""
self._register_graph_read_tools()
def _register_tools(self) -> None:
# Filesystem timeline tool from catalog
td = TOOL_CATALOG.get("build_filesystem_timeline")
if td:
self.register_tool(td.name, td.description, td.input_schema, td.executor)
# Custom tool to get all phenomena with timestamps for correlation
self.register_tool(
name="get_timestamped_phenomena",
description="Get all phenomena that have timestamps, sorted chronologically. Use for timeline correlation.",
description=(
"Get all phenomena that have timestamps, sorted chronologically. "
"Returns each phenomenon's id, category, title, and a short description "
"preview. Use this as your primary input for temporal correlation."
),
input_schema={"type": "object", "properties": {}},
executor=self._get_timestamped_phenomena,
)
# Tool to add temporal edges between phenomena
self.register_tool(
name="add_temporal_edge",
description="Add a temporal relationship between two phenomena (before, after, or concurrent).",
description=(
"Add a temporal relationship edge between two existing phenomena. "
"Use 'before' when source phenomenon happened before target, "
"'concurrent' when they occurred within seconds of each other."
),
input_schema={
"type": "object",
"properties": {
@@ -56,6 +73,42 @@ class TimelineAgent(BaseAgent):
executor=self._add_temporal_edge,
)
def _build_system_prompt(self, task: str) -> str:
"""Focused prompt — Timeline connects existing phenomena, doesn't investigate."""
return (
f"You are {self.name}, a forensic timeline correlation analyst.\n"
f"Role: {self.role}\n\n"
f"Image: {self.graph.image_path}\n"
f"Current state: {self.graph.stats_summary()}\n\n"
f"Your task: {task}\n\n"
f"WORKFLOW:\n"
f"1. Call build_filesystem_timeline once to materialize MAC times for the disk.\n"
f"2. Call get_timestamped_phenomena to see all phenomena with timestamps, "
f"sorted chronologically. THIS IS YOUR PRIMARY INPUT.\n"
f"3. For each meaningful temporal relationship between phenomena, call "
f"add_temporal_edge(source_id, target_id, relation). Use 'before' when "
f"source happened first (the common case); 'concurrent' for events within "
f"a few seconds of each other.\n"
f" Examples of meaningful connections:\n"
f" - 'Cain installer executed' (before) 'Cain.exe first execution'\n"
f" - 'WHOIS first lookup' (before) 'WHOIS second lookup'\n"
f" - 'Recon tool cluster' (before) 'Anti-forensics defrag'\n"
f" - 'Tool installation' (before) 'Tool execution'\n"
f"4. Aim for 15-40 temporal edges that connect the major events into a "
f"forensic story.\n"
f"5. STOP after recording all meaningful temporal edges. Do not call any more tools.\n\n"
f"STRICT BOUNDARIES:\n"
f"- Your job is to CONNECT existing phenomena, NOT to discover new ones. "
f"You CANNOT call add_phenomenon — the tool isn't yours.\n"
f"- Use ONLY phenomenon IDs returned by get_timestamped_phenomena or "
f"list_phenomena. NEVER fabricate IDs.\n"
f"- Connect events that tell a forensic story (recon -> exploit -> cover-up). "
f"Do not exhaustively pair every two phenomena; focus on causally-relevant "
f"sequences.\n"
f"- The orchestrator handles report writing in the next phase. Your only "
f"output that propagates is the temporal edges you create."
)
async def _get_timestamped_phenomena(self) -> str:
items = [
ph for ph in self.graph.phenomena.values()
@@ -69,7 +122,15 @@ class TimelineAgent(BaseAgent):
lines = []
for ph in items:
lines.append(f"{ph.timestamp} | [{ph.category}] {ph.title} ({ph.id})")
lines.append(f" {ph.description[:150]}")
preview = ph.interpretation[:150] if ph.interpretation else ""
if ph.verified_facts:
fact_preview = ", ".join(
f"{f.get('type','?')}={str(f.get('value',''))[:40]}"
for f in ph.verified_facts[:3]
)
preview = f"{preview} [facts: {fact_preview}]" if preview else f"[facts: {fact_preview}]"
if preview:
lines.append(f" {preview}")
return "\n".join(lines)
async def _add_temporal_edge(

View File

@@ -5,6 +5,7 @@ from __future__ import annotations
import json
import logging
import time
import uuid
from typing import Any
from evidence_graph import EvidenceGraph
@@ -31,12 +32,30 @@ class BaseAgent:
name: str = "base"
role: str = "A forensic analysis agent."
# Tools the agent MUST invoke at least once for the run to count as productive.
# If none of these were called when tool_call_loop returns, run() fires a
# forced retry with an explicit "you forgot to record" instruction.
# Subclasses override to declare their own recording responsibility
# (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
# observe_identity (S5) counts as a recording too — it writes through the
# same grounding gateway and produces an identity_observation phenomenon.
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon", "observe_identity")
# Tools whose invocation ends the run immediately. After any terminal tool
# is called, tool_call_loop returns with that tool's result text as
# final_text. Used by agents whose "completion" is a single explicit
# action rather than "model decides to stop calling tools". For multi-call
# agents (filesystem records many phenomena) leave empty.
terminal_tools: tuple[str, ...] = ()
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
self.llm = llm
self.graph = graph
self._tools: dict[str, dict] = {} # name -> schema
self._executors: dict[str, Any] = {} # name -> async callable
self._record_call_counts: dict[str, int] = {}
self._work_log: list[str] = []
self._current_lead_id: str | None = None
def register_tool(
self,
@@ -51,7 +70,18 @@ class BaseAgent:
"description": description,
"input_schema": input_schema,
}
self._executors[name] = executor
if name in self.mandatory_record_tools:
self._executors[name] = self._wrap_record_executor(name, executor)
else:
self._executors[name] = executor
def _wrap_record_executor(self, name: str, executor: Any) -> Any:
"""Wrap a mandatory-record executor to count successful invocations."""
async def wrapped(*args, **kwargs):
result = await executor(*args, **kwargs)
self._record_call_counts[name] = self._record_call_counts.get(name, 0) + 1
return result
return wrapped
def get_tool_definitions(self) -> list[dict]:
"""Get tool definitions in Claude API format."""
@@ -83,37 +113,68 @@ class BaseAgent:
f" Call investigation tools (list_directory, parse_registry_key, etc.) to gather data.\n"
f" Only extract_file for forensically relevant files (user data, logs, configs, hives) — NOT system DLLs or OS files.\n"
f" Create add_lead for anything outside your expertise.\n\n"
f"Phase B — RECORD PHENOMENA:\n"
f" For EACH significant finding from Phase A, call add_phenomenon.\n"
f"Phase B — RECORD PHENOMENA (GROUNDED):\n"
f" For EACH significant finding from Phase A, call add_phenomenon with:\n"
f" * interpretation: your analysis — free text, NOT verified.\n"
f" * verified_facts: one entry per concrete atom (path, timestamp,\n"
f" inode, hash, identifier, count) you want recorded as truth.\n"
f" Each entry MUST have:\n"
f" - type: e.g. 'path', 'timestamp', 'inode', 'hash', 'identifier', 'count'\n"
f" - value: a VERBATIM substring from the tool output\n"
f" - invocation_id: the inv-xxx ID from the '[invocation: inv-xxx]'\n"
f" header at the top of the tool result that produced this value\n"
f" IDENTIFIERS — call observe_identity (in ADDITION to add_phenomenon)\n"
f" whenever you see an email, phone number, Apple ID, IMEI, wallet\n"
f" address, MAC, UDID, persistent nickname, or display name. Same\n"
f" grounding contract: value must be verbatim in the cited tool\n"
f" output. This is HOW cross-source attribution gets built — without\n"
f" it, we can't tell whether the Apple ID in keychain belongs to the\n"
f" same person as the Windows account on the USB.\n"
f" Do NOT call link_to_entity yet — just record all phenomena first.\n\n"
f"Phase C — LINK ENTITIES:\n"
f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
f" Then call link_to_entity for each relevant phenomenon.\n"
f" NEVER guess or fabricate a phenomenon ID. If an ID is not in list_phenomena output, it does not exist.\n\n"
f"Phase D — ANSWER:\n"
f" Only give your <answer> AFTER completing Phases B and C.\n\n"
f"IMPORTANT:\n"
f"- You MUST call add_phenomenon at least once before finishing\n"
f"- Complete each phase before starting the next\n"
f"- Other agents can ONLY see what you write to the graph\n"
f"- If you don't record findings, they are LOST\n"
f"- Include relevant file paths, inode numbers, timestamps, and raw data\n\n"
f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
f"- ONLY record findings that appear VERBATIM in tool results you received\n"
f"- NEVER invent or guess timestamps, file paths, inode numbers, or program names\n"
f"- If tool output was truncated, state '[truncated]' — do NOT fill in the missing data\n"
f"- If you are unsure whether something exists, call a tool to verify or create a lead — do NOT assume\n"
f"- Quote exact strings from tool output when recording evidence descriptions\n"
f"- Do NOT fabricate execution timestamps — only report timestamps returned by tools"
f"Phase D — STOP:\n"
f" Once all phenomena are recorded and entities linked, you are DONE.\n"
f" Do not call any more tools. The orchestrator picks up automatically.\n\n"
f"CRITICAL — RECORDING REQUIREMENT:\n"
f"- Only graph mutations propagate to other agents and the final report.\n"
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
f"a registry key) and found NOTHING, that absence IS evidence — call "
f"add_phenomenon with a 'No matches for X' title, the search scope in "
f"raw_data, and cite the search tool's invocation_id (verified_facts may "
f"be empty for a true negative; the cited invocation in source_tool still "
f"anchors it). Negative findings constrain the hypothesis space.\n"
f"- If you stop without having called add_phenomenon at least once, the task "
f"is FAILED and a forced retry will fire.\n\n"
f"GROUNDING GATEWAY — STRUCTURALLY ENFORCED:\n"
f"- Every tool result begins with '[invocation: inv-xxxxxxxx]' — that ID\n"
f" is what you cite in each fact's invocation_id.\n"
f"- fact.value must be a substring of the cited invocation's output.\n"
f" Case, whitespace, and path-separator (/ ↔ \\) variants are tolerated;\n"
f" anything else fabricated is REJECTED with a per-fact reason.\n"
f"- On REJECTED: quote the literal text from the output (or drop the\n"
f" fact), and put guesses / inferred paths / model names in\n"
f" `interpretation` instead. Then call add_phenomenon again.\n"
f"- You may cite ONLY invocations made within THIS task."
)
async def run(self, task: str) -> str:
async def run(self, task: str, lead_id: str | None = None) -> str:
"""Run this agent with a specific task."""
_log(task, event="agent_start", agent=self.name)
self.graph.agent_status[self.name] = "running"
self.graph._current_agent = self.name
# Fresh task scope per agent run. Used by the grounding gateway to
# check that facts in add_phenomenon cite invocations made *within
# this run* — preventing the agent from forwarding stale IDs from
# earlier work or another agent.
self.graph._current_task_id = f"task-{uuid.uuid4().hex[:8]}"
self._current_lead_id = lead_id
self._register_graph_tools()
self._record_call_counts.clear()
system = self._build_system_prompt(task)
messages = [{"role": "user", "content": task}]
@@ -122,12 +183,75 @@ class BaseAgent:
ph_before = len(self.graph.phenomena)
try:
final_text, _ = await self.llm.tool_call_loop(
final_text, conversation = await self.llm.tool_call_loop(
messages=messages,
tools=self.get_tool_definitions(),
tool_executor=self._executors,
system=system,
terminal_tools=self.terminal_tools,
)
# Forced-record retry: if the agent has any mandatory recording
# tools but never invoked any of them, force one more round with
# an explicit "you forgot to record" instruction. The mandatory
# set is declared on the class — Timeline → add_temporal_edge,
# Hypothesis → add_hypothesis, ReportAgent → (). For agents with
# empty mandatory_record_tools this branch is a no-op.
registered_mandatory = [
t for t in self.mandatory_record_tools if t in self._executors
]
recorded_any = any(
self._record_call_counts.get(t, 0) > 0
for t in registered_mandatory
)
if registered_mandatory and not recorded_any:
missing = "/".join(registered_mandatory)
logger.warning(
"[%s] finished without calling any of [%s] — forcing RECORD retry",
self.name, missing,
)
conversation.append({
"role": "user",
"content": (
f"STOP. You produced an answer without ever calling "
f"{missing}. Your answer is DISCARDED — only graph "
f"mutations propagate to other agents and the final "
f"report.\n\n"
f"You MUST now call {missing} for every significant "
f"finding from your prior investigation, including "
f"exact identifiers, timestamps, and the source_tool "
f"that produced each finding. If you genuinely found "
f"NOTHING noteworthy, call the recording tool ONCE "
f"with a 'No significant findings' style entry "
f"summarizing what you searched.\n\n"
f"Do not run more investigation tools. Just record "
f"what you already found. Then end."
),
})
# Narrow the retry tool surface so the agent can't wander off
# to investigate again — only RECORD and read-only graph
# query tools survive. Each grounding-rejected call burns one
# iteration, so the cap is 30 (not the original 10): a
# Timeline agent writing ~10 temporal edges with one rejection
# apiece needs ~20 turns under the rewritten gateway.
retry_tool_names = set(registered_mandatory) | {
"list_phenomena", "list_assets", "search_graph",
"add_temporal_edge", "link_to_entity", "add_lead",
"add_hypothesis", "save_report",
}
retry_tools = [
td for td in self.get_tool_definitions()
if td["name"] in retry_tool_names
]
final_text, _ = await self.llm.tool_call_loop(
messages=conversation,
tools=retry_tools,
tool_executor=self._executors,
system=system,
max_iterations=30,
terminal_tools=self.terminal_tools,
)
self._work_log.append(f"[Task: {task[:80]}] -> {final_text[:150]}")
except Exception:
self.graph.agent_status[self.name] = "failed"
@@ -143,9 +267,17 @@ class BaseAgent:
# ---- Graph interaction tools --------------------------------------------
def _register_graph_tools(self) -> None:
"""Register tools for querying and writing to the evidence graph."""
"""Register graph query + mutation tools.
# --- Read tools ---
Subclasses can override to restrict the toolset. For example, a
read-only agent (hypothesis, report) overrides this to skip
_register_graph_write_tools.
"""
self._register_graph_read_tools()
self._register_graph_write_tools()
def _register_graph_read_tools(self) -> None:
"""Register read-only graph + asset query tools."""
self.register_tool(
name="list_phenomena",
@@ -211,25 +343,114 @@ class BaseAgent:
executor=self._get_hypothesis_status,
)
# --- Write tools ---
self.register_tool(
name="list_assets",
description=(
"List all files extracted from the disk image. "
"Shows filename, category, size, local path, and inode. "
"Check this before calling extract_file to avoid re-extraction."
),
input_schema={
"type": "object",
"properties": {
"category": {
"type": "string",
"enum": [
"registry_hive", "chat_log", "prefetch", "network_capture",
"config_file", "address_book", "recycle_bin", "executable",
"text_log", "other",
],
"description": "Filter by category. Omit to list all.",
},
},
},
executor=self._list_assets,
)
self.register_tool(
name="find_extracted_file",
description=(
"Find an already-extracted file by inode or filename. "
"Returns the local path so you can use it directly with "
"parse_registry_key, read_text_file, etc. without re-extracting."
),
input_schema={
"type": "object",
"properties": {
"inode": {"type": "string", "description": "Inode to look up."},
"filename": {"type": "string", "description": "Filename or partial name to search."},
},
},
executor=self._find_extracted_file,
)
def _register_graph_write_tools(self) -> None:
"""Register graph mutation tools (add_phenomenon, add_lead, link_to_entity)."""
self.register_tool(
name="add_phenomenon",
description=(
"Record a forensic finding (phenomenon) on the evidence graph. "
"You MUST specify source_tool: the name of the tool call that produced this finding."
"Record a forensic finding on the evidence graph. The finding is "
"split into provenance-bound atoms (verified_facts) and free-form "
"analysis (interpretation). Each fact MUST cite the invocation_id "
"of a tool call you made in THIS task — the gateway checks every "
"fact's value against that call's real output, byte-for-byte. "
"Any fact that fails grounding causes the whole record to be "
"rejected with a list of failures; fix the facts and call again."
),
input_schema={
"type": "object",
"properties": {
"category": {"type": "string", "description": "Category of the finding."},
"title": {"type": "string", "description": "Short title."},
"description": {"type": "string", "description": "Detailed description. Quote exact data from tool output."},
"interpretation": {
"type": "string",
"description": (
"Free-form analysis text — your reasoning, why this "
"matters, what it implies. NOT verified by the gateway. "
"Rendered in reports as 'agent analysis', not truth."
),
},
"verified_facts": {
"type": "array",
"description": (
"Atoms you want preserved as ground truth. Each must "
"appear verbatim in the cited tool output."
),
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": (
"Kind of fact: path, timestamp, inode, "
"hash, identifier, count, raw, ..."
),
},
"value": {
"type": "string",
"description": (
"Verbatim substring from the cited tool "
"output. The gateway does a literal "
"string-in-string check — no paraphrasing."
),
},
"invocation_id": {
"type": "string",
"description": (
"ID from the '[invocation: inv-xxx]' header "
"of the tool call that produced this value."
),
},
},
"required": ["type", "value", "invocation_id"],
},
},
"raw_data": {"type": "object", "description": "Structured raw data supporting this finding."},
"timestamp": {"type": "string", "description": "Timestamp if any. ONLY use timestamps from tool output."},
"source_tool": {"type": "string", "description": "Name of the tool that produced this (e.g. 'list_directory')."},
},
"required": ["category", "title", "description", "source_tool"],
"required": ["category", "title", "source_tool"],
},
executor=self._add_phenomenon,
)
@@ -280,47 +501,65 @@ class BaseAgent:
executor=self._link_to_entity,
)
# --- Asset library tools ---
self.register_tool(
name="list_assets",
name="observe_identity",
description=(
"List all files extracted from the disk image. "
"Shows filename, category, size, local path, and inode. "
"Check this before calling extract_file to avoid re-extraction."
"Record a typed identifier (email / phone / Apple ID / IMEI / "
"wallet address / nickname / display name / …) for an entity. "
"Goes through the same grounding gateway as add_phenomenon — "
"value MUST be a verbatim substring of the cited tool output. "
"After attachment, the engine automatically proposes / "
"strengthens / weakens cross-source coreference hypotheses "
"between this entity and any others carrying the same or "
"conflicting identifiers. This is how 'is the Apple ID in iOS "
"keychain the same person as the Windows login name?' gets "
"answered. Call this in ADDITION to add_phenomenon for "
"identifier-bearing findings."
),
input_schema={
"type": "object",
"properties": {
"category": {
"entity_name": {"type": "string", "description": "Human-readable entity name (e.g. 'LEUNG YL', 'alice@example.com')."},
"entity_type": {
"type": "string",
"enum": [
"registry_hive", "chat_log", "prefetch", "network_capture",
"config_file", "address_book", "recycle_bin", "executable",
"text_log", "other",
],
"description": "Filter by category. Omit to list all.",
"enum": ["person", "program", "file", "host", "ip_address"],
"description": "Kind of entity this identifier belongs to (usually 'person').",
},
"identifier_type": {
"type": "string",
"description": (
"Strong (near-unique): email, phone_number, imei, "
"imsi, apple_id, icloud_id, google_account, "
"wallet_address, udid, mac_address, device_serial. "
"Weak (free-form, may collide): nickname, "
"display_name, username, screen_name."
),
},
"value": {
"type": "string",
"description": (
"The identifier value, quoted VERBATIM from the "
"tool output you cite in invocation_id."
),
},
"invocation_id": {
"type": "string",
"description": (
"ID from the '[invocation: inv-xxx]' header of "
"the tool call that surfaced this identifier."
),
},
"source_tool": {
"type": "string",
"description": "Name of the tool that produced the identifier.",
},
},
"required": [
"entity_name", "entity_type", "identifier_type",
"value", "invocation_id",
],
},
executor=self._list_assets,
)
self.register_tool(
name="find_extracted_file",
description=(
"Find an already-extracted file by inode or filename. "
"Returns the local path so you can use it directly with "
"parse_registry_key, read_text_file, etc. without re-extracting."
),
input_schema={
"type": "object",
"properties": {
"inode": {"type": "string", "description": "Inode to look up."},
"filename": {"type": "string", "description": "Filename or partial name to search."},
},
},
executor=self._find_extracted_file,
executor=self._observe_identity,
)
# ---- Tool executors -----------------------------------------------------
@@ -362,19 +601,33 @@ class BaseAgent:
self,
category: str,
title: str,
description: str,
interpretation: str = "",
verified_facts: list[dict] | None = None,
raw_data: dict | None = None,
timestamp: str | None = None,
source_tool: str = "",
# Back-compat: older prompts (and accidental LLM emissions) may pass
# ``description``; treat it as ``interpretation`` rather than failing.
description: str | None = None,
) -> str:
if description and not interpretation:
interpretation = description
# GroundingError propagates: llm_client._execute_single_tool turns
# raised exceptions into "Error executing add_phenomenon: <msg>" tool
# results the LLM sees, and _wrap_record_executor does NOT increment
# the mandatory-record counter (the increment only runs after a
# successful return), so the forced-retry mechanism still fires if
# the agent never lands a grounded phenomenon.
pid, merged = await self.graph.add_phenomenon(
source_agent=self.name,
category=category,
title=title,
description=description,
interpretation=interpretation,
verified_facts=verified_facts,
raw_data=raw_data,
timestamp=timestamp,
source_tool=source_tool,
from_lead_id=self._current_lead_id,
)
if merged:
return f"Phenomenon merged into existing: {pid}{title} (corroboration boost)"
@@ -416,6 +669,51 @@ class BaseAgent:
status = "linked to existing" if existing else "created and linked"
return f"Entity {status}: {entity_name} ({entity_type}) ←[{edge_type}]— {phenomenon_id}"
async def _observe_identity(
self,
entity_name: str,
entity_type: str,
identifier_type: str,
value: str,
invocation_id: str,
source_tool: str = "",
) -> str:
# GroundingError / ValueError propagate to llm_client's per-tool
# exception handler, which formats them back to the LLM. That keeps
# the mandatory-record counter honest — only a successful return
# triggers the increment in _wrap_record_executor.
result = await self.graph.observe_identity(
entity_name=entity_name,
entity_type=entity_type,
identifier_type=identifier_type,
value=value,
source_agent=self.name,
source_tool=source_tool,
invocation_id=invocation_id,
)
lines = [
f"Identity observed: {identifier_type}={value} "
f"on entity {result['entity_id']} ({entity_name})."
]
if result.get("new_identifier"):
lines.append(
f" Observation phenomenon: {result['phenomenon_id']}"
)
else:
lines.append(" (identifier already recorded on this entity — idempotent)")
for prop in result.get("coref_proposals", []):
lines.append(
f" → Coref candidate: {prop['other_entity_id']} via "
f"{prop['match']['edge_type']} (conf={prop['confidence']:.2f}, "
f"hypothesis={prop['hypothesis_id']})"
)
for c in prop.get("conflicts", []):
lines.append(
f" ⚠ conflict on {c['type']}: "
f"{c['new_value']} vs {c['other_value']}"
)
return "\n".join(lines)
async def _list_assets(self, category: str | None = None) -> str:
results = self.graph.list_assets(category)
if not results:

41
case.example.yaml Normal file
View File

@@ -0,0 +1,41 @@
# MASForensics case definition — template
#
# Copy this file to `case.yaml` and edit it for your case. If `case.yaml`
# exists in the working directory, `python main.py` loads it automatically;
# otherwise main.py falls back to interactive single-image selection.
#
# A case is a set of evidence sources. Each source has:
# id optional — auto-derived from label if omitted ("src-<slug>")
# label human-readable name
# type disk_image | mobile_extraction | archive | media_collection
# access_mode image | tree (optional — defaults by type)
# image = block device / disk image, navigated by Sleuth Kit
# tree = mounted filesystem / unpacked extraction, path-based
# owner optional — the person the source is associated with
# path filesystem path (relative paths resolve against this file)
# partition_offset image-mode only — sector offset of the partition to analyze
# meta optional free-form notes
#
# NOTE: at the current refit stage only image-mode (disk) sources are
# analysable; tree-mode sources are accepted but skipped.
case_id: example-case
name: "Example forensic case"
meta:
notes: "free-form case-level metadata"
sources:
- id: src-suspect-laptop
label: "Suspect laptop disk image"
type: disk_image
access_mode: image
owner: "John Doe"
path: image/suspect_laptop.E01
partition_offset: 0 # run `mmls <image>` to find the right offset
- id: src-suspect-phone
label: "Suspect phone extraction"
type: mobile_extraction
access_mode: tree
owner: "John Doe"
path: image/suspect_phone.zip

226
case.py Normal file
View File

@@ -0,0 +1,226 @@
"""Case and evidence-source model — the foundation for multi-evidence analysis.
A :class:`Case` is a collection of :class:`EvidenceSource` entries. Each source
has a *type* (disk image, mobile extraction, archive, ...) and an *access mode*
that determines how forensic tools reach its contents:
- ``"image"`` — a block device / disk image, navigated by The Sleuth Kit via
inode addressing (raw, E01, dd, ...).
- ``"tree"`` — an already-mounted filesystem or unpacked extraction,
navigated by ordinary filesystem paths.
This module is pure data model + loading. Partition probing and interactive
selection live in ``main.py``.
"""
from __future__ import annotations
import logging
import re
from dataclasses import asdict, dataclass, field
from pathlib import Path
logger = logging.getLogger(__name__)
# Recognised source types and access modes.
SOURCE_TYPES = {"disk_image", "mobile_extraction", "archive", "media_collection"}
ACCESS_MODES = {"image", "tree"}
# Disk-image file extensions for interactive discovery.
# P6 fix: ``.bin`` (and vmdk/vhd) added — extension globbing previously missed
# raw block-device dumps such as ``blk0_sda.bin``.
DISK_IMAGE_EXTS = {
".001", ".dd", ".raw", ".img", ".bin", ".e01", ".iso", ".vmdk", ".vhd",
}
# Default access mode per source type.
_DEFAULT_ACCESS_MODE = {
"disk_image": "image",
"mobile_extraction": "tree",
"archive": "tree",
"media_collection": "tree",
}
def slugify(text: str) -> str:
"""Reduce *text* to a lowercase, hyphen-separated slug for use in IDs."""
slug = re.sub(r"[^a-z0-9]+", "-", text.lower()).strip("-")
return slug or "src"
@dataclass
class EvidenceSource:
"""One piece of evidence within a :class:`Case`."""
id: str # "src-<slug>"
label: str # human-readable name
type: str # one of SOURCE_TYPES
path: str # filesystem path to the evidence
access_mode: str # "image" | "tree"
owner: str = "" # associated person, if known
partition_offset: int = 0 # sector offset (image-mode sources only)
meta: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return asdict(self)
@classmethod
def from_dict(cls, d: dict) -> EvidenceSource:
"""Reconstruct from a dict, ignoring unknown keys (forward-compatible)."""
known = set(cls.__dataclass_fields__)
return cls(**{k: v for k, v in d.items() if k in known})
def summary(self) -> str:
loc = (
f"@{self.partition_offset}"
if self.access_mode == "image" and self.partition_offset
else ""
)
owner = f" owner={self.owner}" if self.owner else ""
return f"[{self.id}] {self.label} ({self.type}/{self.access_mode}{loc}){owner}"
@dataclass
class Case:
"""A forensic case: a set of evidence sources plus metadata."""
case_id: str
name: str
sources: list[EvidenceSource] = field(default_factory=list)
meta: dict = field(default_factory=dict)
def to_dict(self) -> dict:
return {
"case_id": self.case_id,
"name": self.name,
"sources": [s.to_dict() for s in self.sources],
"meta": dict(self.meta),
}
@classmethod
def from_dict(cls, d: dict) -> Case:
return cls(
case_id=d.get("case_id", ""),
name=d.get("name", ""),
sources=[EvidenceSource.from_dict(s) for s in d.get("sources", [])],
meta=d.get("meta", {}),
)
def get_source(self, source_id: str) -> EvidenceSource | None:
for s in self.sources:
if s.id == source_id:
return s
return None
# ---------------------------------------------------------------------------
# case.yaml loading
# ---------------------------------------------------------------------------
def _build_source(raw: dict, base_dir: Path, index: int) -> EvidenceSource:
"""Validate and normalise one source entry from case.yaml.
Missing ``id`` is derived from the label; missing ``access_mode`` defaults
by type; relative paths are resolved against *base_dir* (the case file's
directory).
"""
label = str(raw.get("label") or raw.get("id") or f"source-{index}")
src_type = str(raw.get("type", "disk_image"))
if src_type not in SOURCE_TYPES:
logger.warning("Unknown source type %r for %r — treating as disk_image",
src_type, label)
src_type = "disk_image"
access_mode = str(raw.get("access_mode") or _DEFAULT_ACCESS_MODE.get(src_type, "tree"))
if access_mode not in ACCESS_MODES:
logger.warning("Unknown access_mode %r for %r — defaulting", access_mode, label)
access_mode = _DEFAULT_ACCESS_MODE.get(src_type, "tree")
src_id = str(raw.get("id") or f"src-{slugify(label)}")
if not src_id.startswith("src-"):
src_id = f"src-{slugify(src_id)}"
raw_path = str(raw.get("path", "")).strip()
path = raw_path
if raw_path:
p = Path(raw_path).expanduser()
if not p.is_absolute():
p = (base_dir / p)
path = str(p)
return EvidenceSource(
id=src_id,
label=label,
type=src_type,
path=path,
access_mode=access_mode,
owner=str(raw.get("owner", "")),
partition_offset=int(raw.get("partition_offset", 0) or 0),
meta=dict(raw.get("meta", {})),
)
def build_case(data: dict, base_dir: Path | None = None) -> Case:
"""Build a validated :class:`Case` from a loosely-typed case.yaml dict."""
base_dir = base_dir or Path.cwd()
sources: list[EvidenceSource] = []
seen_ids: set[str] = set()
for i, raw in enumerate(data.get("sources", []) or []):
if not isinstance(raw, dict):
logger.warning("Skipping malformed source entry #%d", i)
continue
src = _build_source(raw, base_dir, i)
if src.id in seen_ids:
src.id = f"{src.id}-{i}"
seen_ids.add(src.id)
if not src.path:
logger.warning("Source %r has no path — keeping but it is not analysable",
src.label)
sources.append(src)
return Case(
case_id=str(data.get("case_id", "case")),
name=str(data.get("name", "Untitled case")),
sources=sources,
meta=dict(data.get("meta", {})),
)
def load_case(path: str | Path = "case.yaml") -> Case | None:
"""Load a :class:`Case` from a case.yaml file. Returns None if absent."""
case_path = Path(path)
if not case_path.exists():
return None
import yaml
try:
data = yaml.safe_load(case_path.read_text()) or {}
except Exception as e:
logger.error("Failed to parse %s: %s", case_path, e)
return None
if not isinstance(data, dict):
logger.error("%s is not a YAML mapping", case_path)
return None
case = build_case(data, base_dir=case_path.resolve().parent)
logger.info("Loaded case %r with %d source(s) from %s",
case.name, len(case.sources), case_path)
return case
def single_source_case(
image_path: str,
partition_offset: int = 0,
label: str | None = None,
) -> Case:
"""Wrap a single disk image as a one-source Case (interactive fallback)."""
name = label or Path(image_path).name
src = EvidenceSource(
id=f"src-{slugify(Path(image_path).stem)}",
label=name,
type="disk_image",
path=image_path,
access_mode="image",
partition_offset=partition_offset,
)
return Case(case_id="adhoc", name=name, sources=[src])

71
config.example.yaml Normal file
View File

@@ -0,0 +1,71 @@
# MASForensics Configuration — template.
#
# Copy this file to `config.yaml` and fill in your API key. config.yaml is
# git-ignored so secrets don't land in commits. The two files share schema;
# only this template is tracked.
agent:
base_url: "https://api.deepseek.com"
api_key: "YOUR-API-KEY-HERE"
model: "deepseek-v4-pro"
max_tokens: 16384
reasoning_effort: "high" # DeepSeek/o1-style reasoning depth; omit to disable
thinking_enabled: true # DeepSeek extra_body.thinking switch
# Maximum rounds of hypothesis-directed investigation (Phase 3).
# Only consulted when strategist.enabled is false (legacy fallback path).
max_investigation_rounds: 1
# Phase 3 strategist loop (DESIGN_STRATEGIST.md). When enabled, the
# InvestigationStrategist agent decides each round whether to propose new
# leads or declare the investigation complete. When disabled, the legacy
# fixed-round investigation loop runs instead.
strategist:
enabled: true
max_rounds: 10
# Safety net: if the strategist keeps proposing leads but yield (new
# phenomena + edges + status flips) is zero for this many consecutive
# rounds, the orchestrator force-stops Phase 3 regardless.
hard_stop_marginal_yield_zero_rounds: 3
# Hard caps that bound the whole run. The strategist's budget_status tool
# reads these to pace its proposals; the orchestrator also enforces them
# as hard stops (DESIGN_STRATEGIST.md §4.2 step 7). Comment out any cap
# to make it unbounded.
budgets:
tool_calls_total: 5000
strategist_rounds_max: 10
wall_clock_minutes_max: 480
# Optional: override the per-edge-type log₁₀(LR) calibration table.
# Confidence updates accumulate these in odds space (additive, order-
# independent), then map back to probability via sigmoid. Single edge
# magnitudes: ≥ +0.602 lifts confidence above the 0.8 supported threshold,
# ≤ 0.602 drops it below the 0.2 refuted threshold.
# If omitted, evidence_graph._DEFAULT_LOG_LR is used.
# hypothesis_log_lr:
# direct_evidence: 2.0
# supports: 1.0
# consequence_observed: 1.0
# prerequisite_met: 0.5
# weakens: -0.5
# contradicts: -2.0
# Optional: manually specify initial hypotheses. If omitted, the
# HypothesisAgent auto-generates them from Phase 1 findings.
# hypotheses:
# - title: "..."
# description: "..."
# Investigation areas — LLM-derived from active hypotheses after Phase 2.
# Each entry below acts as a MANUAL OVERRIDE: it is seeded into the graph
# before the LLM derives areas, so manual entries always survive (slug-based
# dedupe; LLM only augments keyword/tool lists, never overwrites).
#
# investigation_areas:
# - area: shutdown_time
# description: "Last recorded shutdown time"
# agent: registry
# priority: 3
# keywords: [shutdown, last shutdown]
# tools: [get_shutdown_time]

File diff suppressed because it is too large Load Diff

View File

@@ -1,8 +1,10 @@
"""Custom LLM client using httpx for Claude Messages API via third-party proxy.
"""LLM client via the OpenAI SDK (works with DeepSeek's OpenAI-compatible API).
The proxy does not support Claude's native tool_use format (it strips the `tools`
field from requests). So we embed tool definitions in the system prompt and parse
structured JSON tool calls from the model's text output (ReAct-style).
Tool calling uses the OpenAI-native `tools=[...]` parameter. The model
returns structured tool_calls via the streaming protocol; we accumulate
them, dispatch to our executors, and feed results back as `role: "tool"`
messages. This eliminates the fragile "model writes JSON inside free
text" problem of the previous ReAct text mode.
"""
from __future__ import annotations
@@ -18,6 +20,7 @@ from dataclasses import dataclass, field
from typing import Any
import httpx
from openai import APIConnectionError, APIError, APITimeoutError, AsyncOpenAI
logger = logging.getLogger(__name__)
@@ -30,69 +33,81 @@ class LLMAPIError(Exception):
self.attempts = attempts
# Markers the model uses to signal tool calls and final answers
TOOL_CALL_TAG = "<tool_call>"
TOOL_CALL_END = "</tool_call>"
TOOL_RESULT_TAG = "<tool_result>"
TOOL_RESULT_END = "</tool_result>"
# Optional answer tags — kept for backward compat with prompts that wrap
# their final response in <answer>...</answer>. Native tool calling does
# not need these (no tool_calls = final), but if the model continues to
# emit them, we strip the tags so callers see clean text.
ANSWER_TAG = "<answer>"
ANSWER_END = "</answer>"
def _build_tools_prompt(tools: list[dict]) -> str:
"""Format tool definitions for inclusion in the system prompt."""
lines = ["You have access to the following tools:\n"]
for t in tools:
schema = t.get("input_schema", {})
props = schema.get("properties", {})
required = schema.get("required", [])
def _to_openai_tools(tools: list[dict]) -> list[dict]:
"""Convert internal tool definitions to OpenAI native function-tools format."""
return [
{
"type": "function",
"function": {
"name": t["name"],
"description": t["description"],
"parameters": t.get("input_schema", {"type": "object", "properties": {}}),
},
}
for t in tools
]
params = []
for pname, pdef in props.items():
req = " (required)" if pname in required else ""
desc = pdef.get("description", "")
ptype = pdef.get("type", "string")
enum_vals = pdef.get("enum")
if enum_vals:
allowed = ", ".join(f'"{v}"' for v in enum_vals)
params.append(f" - {pname}: {ptype}{req}{desc} Allowed values: [{allowed}]")
else:
params.append(f" - {pname}: {ptype}{req}{desc}")
param_block = "\n".join(params) if params else " (no parameters)"
lines.append(f"## {t['name']}\n{t['description']}\nParameters:\n{param_block}\n")
def _extract_first_balanced(text: str, open_char: str, close_char: str) -> str | None:
"""Return the first balanced [...] or {...} substring, or None if no balanced pair.
lines.append(
"## How to use tools\n"
"To call a tool, output a JSON block wrapped in XML tags like this:\n"
f"{TOOL_CALL_TAG}\n"
'{"name": "tool_name", "arguments": {"param1": "value1"}}\n'
f"{TOOL_CALL_END}\n\n"
"You can call multiple tools in sequence. After each tool call, you will receive the result in:\n"
f"{TOOL_RESULT_TAG}\n...result...\n{TOOL_RESULT_END}\n\n"
"When you have finished your analysis and have a final answer, wrap it in:\n"
f"{ANSWER_TAG}\nyour final answer here\n{ANSWER_END}\n\n"
"Think step by step. Call tools to gather evidence before drawing conclusions.\n"
"You MUST call at least one tool before giving your final answer."
Stack-based — handles nested brackets correctly (regex with .*? would
truncate at the first inner closing bracket, regex with .* would over-eat
trailing text). Brackets inside JSON string literals are ignored by
callers because the caller passes the result through json.loads which
re-parses with proper string handling.
"""
start = text.find(open_char)
if start < 0:
return None
depth = 0
for i in range(start, len(text)):
c = text[i]
if c == open_char:
depth += 1
elif c == close_char:
depth -= 1
if depth == 0:
return text[start:i + 1]
return None
def _safe_json_loads(text: str):
"""Parse JSON with progressive sanitization for LLM-produced output.
Tries (0) as-is, (1) escape stray backslashes outside valid JSON escapes
(\\" \\\\ \\/ \\b \\f \\n \\r \\t \\uXXXX). On final failure, logs raw
input (first 600 chars) so we can diagnose what the model emitted.
Used by orchestrator JSON callsites (_call_llm_for_json) and by
tool_call_loop when parsing tool_call arguments returned by the API.
"""
try:
return json.loads(text)
except json.JSONDecodeError:
pass
stage1 = re.sub(
r'\\(?!["\\/bfnrt]|u[0-9a-fA-F]{4})',
r'\\\\',
text,
)
return "\n".join(lines)
def _extract_tool_calls(text: str) -> list[dict]:
"""Extract tool call JSON blocks from model output."""
pattern = re.compile(
re.escape(TOOL_CALL_TAG) + r"\s*(.*?)\s*" + re.escape(TOOL_CALL_END),
re.DOTALL,
)
calls = []
for match in pattern.finditer(text):
raw = match.group(1).strip()
try:
parsed = json.loads(raw)
calls.append(parsed)
except json.JSONDecodeError:
logger.warning("Failed to parse tool call JSON: %s", raw[:200])
return calls
try:
return json.loads(stage1)
except json.JSONDecodeError as e:
logger.warning(
"_safe_json_loads failed after sanitize (%s); raw head[:600]=%r",
e, text[:600],
)
raise
def _extract_answer(text: str) -> str | None:
@@ -127,6 +142,14 @@ READ_ONLY_TOOLS: set[str] = {
# Parser reads
"read_text_file", "read_binary_preview", "search_text_file",
"read_text_file_section", "list_extracted_dir", "parse_pcap_strings",
"find_files",
# iOS plugin reads (S4)
"parse_plist", "sqlite_tables", "sqlite_query",
"parse_ios_keychain", "read_idevice_info",
# Android + media reads (S6) — set_active_partition is NOT read-only.
"probe_android_partitions", "ocr_image",
# Strategist view tools (DESIGN_STRATEGIST.md §2) — pure renders.
"graph_overview", "source_coverage", "marginal_yield", "budget_status",
}
@@ -234,50 +257,41 @@ _DECAY_TIERS: list[tuple[int, int]] = [
def _apply_progressive_decay(messages: list[dict]) -> list[dict]:
"""Truncate tool results in older messages to save context space.
"""Truncate the `content` of older `role: "tool"` messages to save context.
Operates in-place-style on a copy. Only touches user messages that
contain <tool_result> blocks (these are the tool-result messages
generated by tool_call_loop).
Each `role: "tool"` message in the conversation corresponds to one tool
call's result. We rank these messages by recency and progressively
truncate older ones according to `_DECAY_TIERS`.
"""
# Count rounds from the end. A "round" is a (assistant, user) pair.
# messages alternate: [user, assistant, user, assistant, user, ...]
# The initial user message is index 0, then pairs start at index 1.
total = len(messages)
if total <= 10: # not enough messages to bother
if total <= 10:
return messages
result = []
# Count tool-result user messages from the end
tool_result_indices = [
i for i, m in enumerate(messages)
if m["role"] == "user" and TOOL_RESULT_TAG in m.get("content", "")
tool_msg_indices = [
i for i, m in enumerate(messages) if m.get("role") == "tool"
]
# Build a set of indices that need decay, mapped to their max_chars
decay_map: dict[int, int] = {}
n_tool_msgs = len(tool_result_indices)
for rank, idx in enumerate(reversed(tool_result_indices)):
rounds_ago = rank # 0 = most recent, 1 = second most recent, ...
for rank, idx in enumerate(reversed(tool_msg_indices)):
rounds_ago = rank
for threshold, max_chars in _DECAY_TIERS:
if rounds_ago < threshold:
decay_map[idx] = max_chars
break
result = []
for i, msg in enumerate(messages):
if i in decay_map:
max_chars = decay_map[i]
content = msg["content"]
content = msg.get("content", "") or ""
if len(content) > max_chars + 200:
# Truncate but preserve the tool_result tags structure
truncated = content[:max_chars]
# Count how many tool results are in this message
n_results = content.count(TOOL_RESULT_TAG)
truncated += (
f"\n... [context compressed: {len(content)} -> {max_chars} chars, "
f"{n_results} tool result(s)]"
truncated = (
content[:max_chars]
+ f"\n... [context compressed: {len(content)} -> {max_chars} chars]"
)
result.append({"role": msg["role"], "content": truncated})
new_msg = dict(msg)
new_msg["content"] = truncated
result.append(new_msg)
else:
result.append(msg)
else:
@@ -301,44 +315,51 @@ _FOLD_SUMMARY_SYSTEM = (
class LLMClient:
"""Calls Claude Messages API through a third-party proxy using raw httpx.
"""Async LLM client via the OpenAI SDK.
Uses prompt-based tool calling (ReAct pattern) since the proxy does not
support Claude's native tool_use format.
Works with any OpenAI-compatible endpoint (OpenAI, DeepSeek, ...).
Tool calling is text-based (ReAct) — see module docstring.
"""
def __init__(
self,
base_url: str,
api_key: str,
model: str = "claude-sonnet-4-6",
model: str = "deepseek-v4-pro",
max_tokens: int = 4096,
proxy: str | None = "auto",
reasoning_effort: str | None = None,
thinking_enabled: bool = False,
) -> None:
self.base_url = base_url.rstrip("/")
self.api_key = api_key
self.model = model
self.max_tokens = max_tokens
# proxy="auto": read from env; proxy=None/""/"none": no proxy; proxy="http://...": use it
self.reasoning_effort = reasoning_effort
self.thinking_enabled = thinking_enabled
# proxy="auto": read from env; proxy=None/""/"none": no proxy
if proxy == "auto":
proxy_url = os.environ.get("https_proxy") or os.environ.get("HTTPS_PROXY")
elif proxy and proxy.lower() != "none":
proxy_url = proxy
else:
proxy_url = None
self._client = httpx.AsyncClient(
http_client = (
httpx.AsyncClient(proxy=proxy_url, timeout=300.0)
if proxy_url else None
)
self._client = AsyncOpenAI(
api_key=self.api_key,
base_url=self.base_url,
headers={
"x-api-key": self.api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
},
timeout=300.0,
proxy=proxy_url,
http_client=http_client,
)
async def close(self) -> None:
await self._client.aclose()
await self._client.close()
async def chat(
self,
@@ -346,169 +367,304 @@ class LLMClient:
system: str | None = None,
max_retries: int = 5,
) -> str:
"""Send a streaming chat request and return the assembled text response.
"""Send a streaming chat completion and return the assembled text."""
full_messages: list[dict] = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
Uses SSE streaming to keep the connection alive and avoid gateway
timeouts (504/524) on long-running completions.
"""
import asyncio as _asyncio
payload: dict[str, Any] = {
kwargs: dict[str, Any] = {
"model": self.model,
"messages": full_messages,
"max_tokens": self.max_tokens,
"messages": messages,
"stream": True,
}
if system:
payload["system"] = system
if self.reasoning_effort:
kwargs["reasoning_effort"] = self.reasoning_effort
if self.thinking_enabled:
kwargs["extra_body"] = {"thinking": {"type": "enabled"}}
for attempt in range(max_retries):
logger.debug("LLM request (stream): %d messages (attempt %d)", len(messages), attempt + 1)
logger.debug(
"LLM request (stream): %d messages (attempt %d)",
len(messages), attempt + 1,
)
text_parts: list[str] = []
try:
async with self._client.stream(
"POST", "/v1/messages", json=payload,
) as resp:
# Check for HTTP errors before consuming stream
if resp.status_code >= 400:
body = await resp.aread()
raise httpx.HTTPStatusError(
f"Server error '{resp.status_code}' for url '{resp.url}'",
request=resp.request,
response=resp,
)
# Parse SSE events
async for line in resp.aiter_lines():
if not line.startswith("data: "):
continue
data_str = line[6:] # strip "data: " prefix
if data_str.strip() == "[DONE]":
break
try:
event = json.loads(data_str)
except json.JSONDecodeError:
continue
event_type = event.get("type", "")
if event_type == "content_block_delta":
delta = event.get("delta", {})
if delta.get("type") == "text_delta":
text_parts.append(delta["text"])
elif event_type == "message_stop":
break
elif event_type == "error":
err_msg = event.get("error", {}).get("message", "Unknown streaming error")
raise httpx.HTTPStatusError(
err_msg, request=resp.request, response=resp,
)
stream = await self._client.chat.completions.create(**kwargs)
async for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if delta.content:
text_parts.append(delta.content)
text = "".join(text_parts)
logger.debug("LLM response (stream): %d chars", len(text))
return text
except (httpx.HTTPStatusError, httpx.ConnectError, httpx.ReadTimeout, httpx.RemoteProtocolError) as e:
except (APIConnectionError, APITimeoutError, APIError) as e:
if attempt < max_retries - 1:
wait = 2 ** attempt * 10
logger.warning("Request failed (%s), retrying in %ds...", e, wait)
await _asyncio.sleep(wait)
await asyncio.sleep(wait)
else:
raise LLMAPIError(
f"LLM API unreachable after {max_retries} attempts: {e}",
attempts=max_retries,
) from e
# Should not reach here, but just in case
return ""
async def _chat_with_tools(
self,
messages: list[dict],
openai_tools: list[dict],
max_retries: int = 5,
) -> tuple[str, str | None, list[dict]]:
"""Stream a chat completion with native tool calling enabled.
Returns:
(text_content, reasoning_content, raw_tool_calls).
- reasoning_content is non-None when DeepSeek thinking mode is
active; the caller MUST echo it back in the assistant message
on subsequent requests, or the API returns HTTP 400.
- raw_tool_calls is a list of {"id","name","arguments"} dicts;
arguments is the raw JSON string returned by the API.
"""
kwargs: dict[str, Any] = {
"model": self.model,
"messages": messages,
"max_tokens": self.max_tokens,
"stream": True,
"tools": openai_tools,
}
if self.reasoning_effort:
kwargs["reasoning_effort"] = self.reasoning_effort
if self.thinking_enabled:
kwargs["extra_body"] = {"thinking": {"type": "enabled"}}
for attempt in range(max_retries):
logger.debug(
"LLM request (stream+tools): %d messages, %d tools (attempt %d)",
len(messages), len(openai_tools), attempt + 1,
)
text_parts: list[str] = []
reasoning_parts: list[str] = []
tool_calls_acc: dict[int, dict] = {} # index -> {id, name, arguments}
try:
stream = await self._client.chat.completions.create(**kwargs)
async for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if delta.content:
text_parts.append(delta.content)
# DeepSeek thinking-mode: reasoning_content is returned
# alongside content and MUST be echoed back on subsequent
# requests, otherwise the API rejects with HTTP 400.
rc = getattr(delta, "reasoning_content", None)
if rc:
reasoning_parts.append(rc)
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
entry = tool_calls_acc.setdefault(
idx, {"id": None, "name": None, "arguments": ""},
)
if tc_delta.id:
entry["id"] = tc_delta.id
fn = tc_delta.function
if fn:
if fn.name:
entry["name"] = fn.name
if fn.arguments:
entry["arguments"] += fn.arguments
text = "".join(text_parts)
reasoning = "".join(reasoning_parts) or None
ordered = [tool_calls_acc[i] for i in sorted(tool_calls_acc)]
logger.debug(
"LLM response (stream+tools): %d chars, %d reasoning chars, %d tool calls",
len(text), len(reasoning or ""), len(ordered),
)
return text, reasoning, ordered
except (APIConnectionError, APITimeoutError, APIError) as e:
if attempt < max_retries - 1:
wait = 2 ** attempt * 10
logger.warning(
"Tool-call request failed (%s), retrying in %ds...", e, wait,
)
await asyncio.sleep(wait)
else:
raise LLMAPIError(
f"LLM API unreachable after {max_retries} attempts: {e}",
attempts=max_retries,
) from e
return "", None, []
async def tool_call_loop(
self,
messages: list[dict],
tools: list[dict],
tool_executor: dict[str, Any],
system: str | None = None,
max_iterations: int = 40,
max_iterations: int = 60,
terminal_tools: tuple[str, ...] = (),
) -> tuple[str, list[dict]]:
"""Run a ReAct-style tool-calling loop.
"""Run a tool-calling loop using OpenAI-native tool calls.
The model outputs <tool_call> blocks which we parse and execute,
feeding results back as <tool_result> blocks until the model
outputs an <answer> block.
The model returns structured `tool_calls` in its message; we
dispatch them through our executor dict and feed each result back
as a `role: "tool"` message with the matching `tool_call_id`. The
loop ends when:
- the model returns a message with no tool_calls (normal exit), or
- any tool in `terminal_tools` is called — in that case, the loop
short-circuits with that tool's result text as final_text. This
gives agents (notably ReportAgent) an explicit completion signal
that the old `<answer>` text tag used to provide.
Returns:
(final_text, all_messages)
(final_text, full_message_history)
"""
# Build system prompt with tool definitions
tools_prompt = _build_tools_prompt(tools)
full_system = f"{system}\n\n{tools_prompt}" if system else tools_prompt
terminal_set = set(terminal_tools)
openai_tools = _to_openai_tools(tools)
messages = list(messages) # don't mutate caller's list
_folded = False # Track whether we've already folded once this loop
# The caller may pass `messages` either as raw conversation (no system)
# together with `system=...`, OR as a complete history that already
# starts with the system message (retry path). Accept both shapes.
if messages and messages[0].get("role") == "system":
full_messages: list[dict] = list(messages)
else:
full_messages = []
if system:
full_messages.append({"role": "system", "content": system})
full_messages.extend(messages)
_folded = False
for i in range(max_iterations):
for _i in range(max_iterations):
# ── Context compression before each API call ──────────────
# Stage A: progressively decay old tool results
messages = _apply_progressive_decay(messages)
# Stage B: fold oldest messages into LLM summary if too long
if not _folded and len(messages) > _FOLD_THRESHOLD:
messages = await self._fold_old_messages(messages, full_system)
full_messages = _apply_progressive_decay(full_messages)
if not _folded and len(full_messages) > _FOLD_THRESHOLD:
full_messages = await self._fold_old_messages(full_messages)
_folded = True
elif _folded and len(messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
# Allow a second fold if messages grew back significantly
messages = await self._fold_old_messages(messages, full_system)
elif _folded and len(full_messages) > _FOLD_THRESHOLD + _FOLD_KEEP_RECENT:
full_messages = await self._fold_old_messages(full_messages)
text = await self.chat(messages, system=full_system)
text, reasoning, raw_tool_calls = await self._chat_with_tools(
full_messages, openai_tools,
)
# Check for final answer
answer = _extract_answer(text)
if answer is not None:
messages.append({"role": "assistant", "content": text})
return answer, messages
if not raw_tool_calls:
# Model produced a final response. Strip optional <answer>
# tags for backward compatibility with old prompts.
final_msg: dict[str, Any] = {"role": "assistant", "content": text}
if reasoning:
final_msg["reasoning_content"] = reasoning
full_messages.append(final_msg)
answer = _extract_answer(text)
return (answer if answer is not None else text), full_messages
# Check for tool calls
tool_calls = _extract_tool_calls(text)
# Parse arguments + build internal call dicts
parsed_calls: list[dict] = []
for rc in raw_tool_calls:
args_str = rc.get("arguments", "") or ""
try:
args = _safe_json_loads(args_str) if args_str.strip() else {}
except (json.JSONDecodeError, ValueError) as e:
logger.warning(
"Failed to parse arguments for tool %s: %s",
rc.get("name"), e,
)
args = {}
parsed_calls.append({
"id": rc.get("id"),
"name": rc.get("name", ""),
"arguments": args,
})
if not tool_calls:
# No tool calls and no answer tag — treat entire text as answer
messages.append({"role": "assistant", "content": text})
return text, messages
# Append the assistant turn with the raw tool_calls (and the
# DeepSeek-mandated reasoning_content echo-back), then execute.
asst_msg: dict[str, Any] = {
"role": "assistant",
"content": text or None,
"tool_calls": [
{
"id": rc.get("id"),
"type": "function",
"function": {
"name": rc.get("name", ""),
"arguments": rc.get("arguments", "") or "",
},
}
for rc in raw_tool_calls
],
}
if reasoning:
asst_msg["reasoning_content"] = reasoning
full_messages.append(asst_msg)
# Execute tool calls — read-only tools run in parallel
messages.append({"role": "assistant", "content": text})
result_parts = []
batches = _partition_tool_calls(tool_calls)
batches = _partition_tool_calls(parsed_calls)
t_batch_start = time.monotonic()
# Each entry: (tool_call_dict, raw_result, formatted_for_llm)
executed: list[tuple[dict, str, str]] = []
for batch in batches:
if batch.is_read_only and len(batch.calls) > 1:
batch_results = await self._execute_tool_batch_parallel(
results = await self._execute_tool_batch_parallel(
batch.calls, tool_executor, tools,
)
result_parts.extend(batch_results)
for tc, (raw, formatted) in zip(batch.calls, results):
executed.append((tc, raw, formatted))
else:
for tc in batch.calls:
result_parts.append(
await self._execute_single_tool(tc, tool_executor, tools)
raw, formatted = await self._execute_single_tool(
tc, tool_executor, tools,
)
executed.append((tc, raw, formatted))
# Emit folded tool-call summary for the terminal
t_batch_elapsed = time.monotonic() - t_batch_start
_emit_tool_call_summary(tool_calls, t_batch_elapsed)
_emit_tool_call_summary(parsed_calls, t_batch_elapsed)
# Feed results back as a user message
result_message = "\n\n".join(result_parts)
messages.append({"role": "user", "content": result_message})
# Append formatted tool results to the conversation (this is
# what the LLM sees on subsequent rounds — truncated for context
# economy).
for tc, _raw, formatted in executed:
full_messages.append({
"role": "tool",
"tool_call_id": tc["id"],
"content": formatted,
})
# Terminal-tool short-circuit: if the model called any tool in
# `terminal_tools`, end the loop immediately. The terminal tool's
# RAW result (untruncated) becomes final_text — the LLM may have
# produced a 20K-char report via save_report and we must not
# truncate it just because the LLM-facing copy is truncated.
if terminal_set:
for tc, raw, _formatted in executed:
name = tc.get("name", "")
if name in terminal_set:
logger.info(
"Terminal tool %s called — exiting tool_call_loop", name,
)
return raw, full_messages
logger.warning("Tool call loop hit max iterations (%d)", max_iterations)
return "[Max tool call iterations reached]", messages
return "[Max tool call iterations reached]", full_messages
async def _execute_single_tool(
self, tc: dict, tool_executor: dict[str, Any],
tools: list[dict] | None = None,
) -> str:
"""Execute a single tool call and return the formatted result."""
) -> tuple[str, str]:
"""Execute a single tool call.
Returns (raw_result, formatted_for_llm). `raw_result` is the
unmodified executor return (used by terminal-tool short-circuit as
final_text). `formatted_for_llm` is `[tool_name] {truncated}` and
is what gets fed back to the model as the tool message content.
"""
tool_name = tc.get("name", "")
tool_args = tc.get("arguments", {})
@@ -519,72 +675,106 @@ class LLMClient:
executor = tool_executor.get(tool_name)
if executor is None:
result_text = f"Error: unknown tool '{tool_name}'"
raw = f"Error: unknown tool '{tool_name}'"
else:
try:
result_text = await executor(**tool_args)
raw = await executor(**tool_args)
except Exception as e:
logger.error("Tool %s failed: %s", tool_name, e)
result_text = f"Error executing {tool_name}: {e}"
raw = f"Error executing {tool_name}: {e}"
return (
f"{TOOL_RESULT_TAG}\n"
f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
f"{TOOL_RESULT_END}"
)
formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
return raw, formatted
async def _execute_tool_batch_parallel(
self, calls: list[dict], tool_executor: dict[str, Any],
tools: list[dict] | None = None,
) -> list[str]:
"""Execute multiple read-only tool calls concurrently."""
) -> list[tuple[str, str]]:
"""Execute multiple read-only tool calls concurrently.
Returns a list of (raw_result, formatted_for_llm) tuples in the
same order as `calls`.
"""
logger.info("Executing %d read-only tools in parallel", len(calls))
async def _run_one(tc: dict) -> str:
async def _run_one(tc: dict) -> tuple[str, str]:
tool_name = tc.get("name", "")
tool_args = tc.get("arguments", {})
if tools:
tool_args = _fix_tool_args(tool_name, tool_args, tools)
logger.info("Calling tool (parallel): %s(%s)", tool_name, json.dumps(tool_args, ensure_ascii=False))
logger.info(
"Calling tool (parallel): %s(%s)",
tool_name, json.dumps(tool_args, ensure_ascii=False),
)
executor = tool_executor.get(tool_name)
if executor is None:
result_text = f"Error: unknown tool '{tool_name}'"
raw = f"Error: unknown tool '{tool_name}'"
else:
try:
result_text = await executor(**tool_args)
raw = await executor(**tool_args)
except Exception as e:
logger.error("Tool %s failed: %s", tool_name, e)
result_text = f"Error executing {tool_name}: {e}"
return (
f"{TOOL_RESULT_TAG}\n"
f"[{tool_name}] {_truncate_tool_result(result_text)}\n"
f"{TOOL_RESULT_END}"
)
raw = f"Error executing {tool_name}: {e}"
formatted = f"[{tool_name}] {_truncate_tool_result(raw)}"
return raw, formatted
results = await asyncio.gather(*[_run_one(tc) for tc in calls])
return list(results)
async def _fold_old_messages(
self, messages: list[dict], system: str,
self, messages: list[dict],
) -> list[dict]:
"""Fold old messages into an LLM-generated summary (Stage B).
Keeps the most recent _FOLD_KEEP_RECENT messages intact and
replaces earlier ones with a single summary message.
Preserves the leading system message (if any), keeps the most
recent _FOLD_KEEP_RECENT messages intact, and replaces the older
middle slice with a single summary user message.
"""
n_to_fold = len(messages) - _FOLD_KEEP_RECENT
# Pin the system message — it must NEVER be summarized away.
system_msgs: list[dict] = []
body = messages
if messages and messages[0].get("role") == "system":
system_msgs = [messages[0]]
body = messages[1:]
n_to_fold = len(body) - _FOLD_KEEP_RECENT
if n_to_fold <= 2:
return messages
old_messages = messages[:n_to_fold]
recent_messages = messages[n_to_fold:]
# Pull the fold boundary forward so we never split an assistant turn
# from its matching tool results. The API rejects (HTTP 400) any
# `role: "tool"` message that does not immediately follow an
# `assistant` message with `tool_calls`. We walk the boundary into
# `recent_messages` while its head is a `role: "tool"` message, or
# while the prior `recent` message is `assistant{tool_calls}` whose
# paired tools span the boundary.
while n_to_fold < len(body):
head = body[n_to_fold]
if head.get("role") == "tool":
n_to_fold += 1
continue
break
if n_to_fold >= len(body):
# Everything got folded — nothing recent to keep.
return system_msgs + [body[0]] if system_msgs else messages
old_messages = body[:n_to_fold]
recent_messages = body[n_to_fold:]
# Build a text dump of old messages for summarization
old_text_parts = []
for msg in old_messages:
role = msg["role"]
content = msg.get("content", "")
# Truncate each message for the summary prompt to avoid overload
role = msg.get("role", "?")
content = msg.get("content") or ""
# Render tool_calls (assistant turn) compactly.
if role == "assistant" and msg.get("tool_calls"):
tc_names = [
tc.get("function", {}).get("name", "?")
for tc in msg["tool_calls"]
]
content = (content + " " if content else "") + (
"called: " + ", ".join(tc_names)
)
if len(content) > 1000:
content = content[:1000] + "..."
old_text_parts.append(f"[{role}]: {content}")
@@ -608,7 +798,6 @@ class LLMClient:
logger.warning("Context folding failed: %s — keeping original messages", e)
return messages
# Replace old messages with a single summary
summary_message = {
"role": "user",
"content": (
@@ -616,4 +805,4 @@ class LLMClient:
f"messages in this conversation]\n\n{summary}"
),
}
return [summary_message] + recent_messages
return system_msgs + [summary_message] + recent_messages

163
main.py
View File

@@ -15,17 +15,21 @@ from pathlib import Path
import yaml
from agent_factory import AgentFactory
from case import (
DISK_IMAGE_EXTS, Case, EvidenceSource, load_case, single_source_case,
)
from evidence_graph import EvidenceGraph
from llm_client import LLMClient
from log_config import setup_logging
from orchestrator import AnalysisAborted, Orchestrator
from tool_registry import register_all_tools
from tools.archive import unzip_archive_sync
RUNS_DIR = Path("runs")
IMAGE_DIR = Path("image")
# Common forensic image extensions (only first segment / single-file formats)
_IMAGE_GLOBS = ["*.001", "*.dd", "*.raw", "*.img", "*.E01", "*.iso"]
# Persistent unpack cache for tree-mode sources (zip extractions). Lives
# at project root so multiple runs can reuse the same unpacked tree.
SOURCE_CACHE_DIR = Path(".cache/sources")
def load_config(path: str = "config.yaml") -> dict:
@@ -38,11 +42,13 @@ def load_config(path: str = "config.yaml") -> dict:
# ---------------------------------------------------------------------------
def _discover_images(search_dir: Path = IMAGE_DIR) -> list[Path]:
"""Find forensic disk image files under *search_dir*."""
images: set[Path] = set()
for glob in _IMAGE_GLOBS:
images.update(search_dir.glob(glob))
return sorted(images)
"""Find forensic disk image files under *search_dir* (case-insensitive ext)."""
if not search_dir.is_dir():
return []
return sorted(
p for p in search_dir.iterdir()
if p.is_file() and p.suffix.lower() in DISK_IMAGE_EXTS
)
def _parse_mmls(output: str) -> list[dict]:
@@ -110,7 +116,7 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
images = _discover_images(image_dir)
if not images:
print(f"No disk images found in {image_dir}/")
print("Supported formats: " + ", ".join(_IMAGE_GLOBS))
print("Supported extensions: " + ", ".join(sorted(DISK_IMAGE_EXTS)))
sys.exit(1)
if len(images) == 1:
@@ -153,6 +159,118 @@ def select_image_interactive(image_dir: Path | None = None) -> tuple[str, int]:
print("Invalid choice.")
def resolve_case() -> Case:
"""Resolve the Case to analyze.
Priority: an explicit case file given as a CLI argument, then ./case.yaml
in the working directory, then legacy interactive single-image selection.
"""
# 1. Explicit case file passed on the command line
if len(sys.argv) > 1 and sys.argv[1].lower().endswith((".yaml", ".yml")):
case = load_case(sys.argv[1])
if case is None:
print(f"Error: could not load case file {sys.argv[1]}")
sys.exit(1)
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
return case
# 2. ./case.yaml in the working directory
case = load_case()
if case is not None:
print(f"Loaded case: {case.name} ({len(case.sources)} sources)")
return case
# 3. Legacy interactive single-image selection
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
image_path, partition_offset = select_image_interactive(cli_dir)
return single_source_case(image_path, partition_offset)
def _is_analysable(src: EvidenceSource) -> bool:
"""A source is analysable when it has a path AND its mode has tooling.
S4 lights up tree-mode iOS extractions; image-mode disks were already
supported. Media-collection (screenshots) remain skipped until S6.
"""
if not src.path:
return False
if src.access_mode == "image":
return True
if src.access_mode == "tree" and src.type in ("mobile_extraction", "archive"):
return True
return False
def list_analysable_sources(case: Case) -> list[EvidenceSource]:
"""Return every analysable source in the case (orchestrator iterates them).
Pre-S6 main.py used to force-choose one source here; the multi-source
orchestrator (Phase 1 per-source triage) now consumes the full list.
Skipped sources are still reported for visibility.
"""
analysable = [s for s in case.sources if _is_analysable(s)]
skipped = [s for s in case.sources if not _is_analysable(s)]
if skipped:
print(
f"Note: {len(skipped)} source(s) not analysable in this build: "
+ ", ".join(f"{s.label} ({s.type})" for s in skipped)
)
if not analysable:
print("No analysable sources in this case.")
sys.exit(1)
print(f"Analysing {len(analysable)} source(s) — orchestrator will triage each in Phase 1:")
for s in analysable:
print(f" - {s.summary()}")
return analysable
def prepare_source(src: EvidenceSource) -> EvidenceSource:
"""Materialise a tree-mode source for analysis.
Mobile / archive sources arrive as .zip files. We unpack once into a
project-level cache (``.cache/sources/<src.id>/``) and rewrite
``src.path`` to point at the unpacked directory. Idempotent — a
second run with the cache present is a no-op (unzip_archive_sync
skips files that already exist with the matching size).
Disk-image and already-tree sources pass through unchanged.
"""
if src.access_mode != "tree":
return src
p = Path(src.path)
if p.is_dir():
return src # already a directory, nothing to do
if not p.is_file():
print(f"Warning: source path {src.path} does not exist; leaving as-is.")
return src
if p.suffix.lower() != ".zip":
# Other archive types (tar, 7z, ...) — not handled yet.
print(f"Warning: tree-mode source {src.id} is not a .zip "
f"({p.suffix}); leaving as-is.")
return src
dest = SOURCE_CACHE_DIR / src.id
dest.mkdir(parents=True, exist_ok=True)
# Password-protected zips (e.g. CTF artefacts) carry their key in
# case.yaml's meta.password — never logged, never persisted.
password = (src.meta or {}).get("password")
pw_note = " (password from meta)" if password else ""
print(f"Unpacking {p.name}{dest}{pw_note} (idempotent) ...")
result = unzip_archive_sync(str(p), str(dest), password=password)
first_line = result.split("\n", 1)[0]
print(" " + first_line)
if first_line.startswith("Error:"):
# Surface the multi-line guidance from _do_extract verbatim.
for extra in result.split("\n")[1:]:
print(" " + extra)
print(f" Source {src.id} stays unanalysable until this is resolved.")
# Leave src.path unchanged so the source remains marked unanalysable.
return src
src.path = str(dest)
src.access_mode = "tree"
return src
def find_resumable_run() -> Path | None:
"""Find the most recent incomplete run with a saved graph state."""
if not RUNS_DIR.exists():
@@ -219,25 +337,36 @@ async def async_main() -> None:
model=agent_cfg["model"],
max_tokens=agent_cfg.get("max_tokens", 4096),
proxy=agent_cfg.get("proxy", "auto"),
reasoning_effort=agent_cfg.get("reasoning_effort"),
thinking_enabled=agent_cfg.get("thinking_enabled", False),
)
# Initialize evidence graph
if graph is None:
# CLI arg takes priority, otherwise interactive prompt
cli_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else None
image_path, partition_offset = select_image_interactive(cli_dir)
case = resolve_case()
# case_info derived from THIS case's meta (case.yaml), not from
# config.yaml's legacy `cfreds_hacking_case` block. Without this,
# the old CFReDS evidence MD5s would be embedded in reports for
# every subsequent unrelated case.
graph = EvidenceGraph(
case_info=config.get("cfreds_hacking_case", {}),
case_info=dict(case.meta or {}),
persist_path=run_dir / "graph_state.json",
edge_log_lr=config.get("hypothesis_log_lr"),
)
graph.image_path = image_path
graph.partition_offset = partition_offset
graph.case = case
graph.extracted_dir = str(run_dir / "extracted")
analysable = list_analysable_sources(case)
# Prepare every analysable source up front (unzip tree-mode zips,
# etc.). Idempotent on cache hits — second run is a no-op.
prepared = [prepare_source(s) for s in analysable]
# Seed the active source so tools that resolve lazily have a target
# before Phase 1 begins; the orchestrator resets it per source.
graph.set_active_source(prepared[0])
else:
graph._persist_path = run_dir / "graph_state.json"
# Register all tools with bound image path
register_all_tools(graph.image_path, graph.partition_offset, graph, graph.extracted_dir)
# Register all tools — they resolve the active evidence source at call time
register_all_tools(graph)
# Create agent factory
factory = AgentFactory(llm, graph)

File diff suppressed because it is too large Load Diff

View File

@@ -5,6 +5,9 @@ description = "Multi-Agent System for Digital Forensics"
requires-python = ">=3.14"
dependencies = [
"httpx[socks]>=0.28.1",
"openai>=2.36.0",
"pillow>=12.2.0",
"pytesseract>=0.3.13",
"pyyaml",
"regipy>=6.2.1",
]

View File

@@ -13,8 +13,16 @@ from tool_registry import register_all_tools
async def main() -> None:
# Find the run to regenerate from
run_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("runs/2026-04-02T15-11-25")
# Find the run: CLI arg, or latest run with a graph_state.json
if len(sys.argv) > 1:
run_dir = Path(sys.argv[1])
else:
states = sorted(Path("runs").glob("*/graph_state.json"), reverse=True)
if not states:
print("No runs found in runs/")
return
run_dir = states[0].parent
print(f"Using latest run: {run_dir.name}")
state_path = run_dir / "graph_state.json"
if not state_path.exists():
@@ -24,8 +32,11 @@ async def main() -> None:
config = yaml.safe_load(open("config.yaml"))
agent_cfg = config["agent"]
# Load graph
graph = EvidenceGraph.load_state(state_path)
# Load graph (edge_log_lr from config — applied to the loaded graph)
graph = EvidenceGraph.load_state(
state_path,
edge_log_lr=config.get("hypothesis_log_lr"),
)
print(f"Loaded: {graph.stats_summary()}")
# LLM client with larger max_tokens for report
@@ -34,9 +45,11 @@ async def main() -> None:
api_key=agent_cfg["api_key"],
model=agent_cfg["model"],
max_tokens=16384,
reasoning_effort=agent_cfg.get("reasoning_effort"),
thinking_enabled=agent_cfg.get("thinking_enabled", False),
)
register_all_tools(graph.image_path, graph.partition_offset, graph)
register_all_tools(graph)
factory = AgentFactory(llm, graph)
# Run only the report agent

3677
tests/test_optimizations.py Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

156
tools/archive.py Normal file
View File

@@ -0,0 +1,156 @@
"""Archive extraction tools — generic unzip for tree-mode evidence sources.
Mobile extractions (iOS / Android backups), archive sources, and shared
work products all arrive as .zip files. The forensic agents work on the
unpacked tree; this module is the single entry point for safely turning
an archive into a directory.
Stdlib-only. No graph dependency.
"""
from __future__ import annotations
import logging
import os
import zipfile
from pathlib import Path
logger = logging.getLogger(__name__)
def _is_within(base: Path, target: Path) -> bool:
"""True when *target* resolves to a path inside *base* — symlink-safe."""
try:
base_r = base.resolve()
target_r = target.resolve()
except OSError:
return False
try:
target_r.relative_to(base_r)
except ValueError:
return False
return True
def _is_zip_encrypted(zf: zipfile.ZipFile) -> bool:
"""True when any entry has the zip 'encrypted' flag bit set."""
return any(info.flag_bits & 0x1 for info in zf.infolist())
def _do_extract(
zip_path: str,
dest_dir: str,
password: str | None = None,
) -> str:
"""Shared core for unzip_archive (async) and unzip_archive_sync.
Pure stdlib + filesystem I/O — no asyncio. Idempotent on rerun (files
whose target already exists at the matching size are skipped). Returns
a multi-line summary the agent can read directly.
"""
zp = Path(zip_path)
if not zp.is_file():
return f"Error: {zip_path} is not a file."
dest = Path(dest_dir)
dest.mkdir(parents=True, exist_ok=True)
extracted = 0
skipped: list[str] = []
total_bytes = 0
pwd_bytes = password.encode("utf-8") if password else None
try:
with zipfile.ZipFile(zp, "r") as zf:
encrypted = _is_zip_encrypted(zf)
if encrypted and pwd_bytes is None:
return (
f"Error: {zip_path} is password-protected. "
f"Provide the password via case.yaml's "
f"meta.password on this source, or pass `password=` "
f"explicitly. Stdlib zipfile only supports the legacy "
f"ZipCrypto algorithm — AES-encrypted zips (created by "
f"7-Zip / WinZip) need an external tool like 7z."
)
for info in zf.infolist():
name = info.filename
# Block absolute paths and parent-escape attempts up front.
if name.startswith(("/", "\\")) or ".." in Path(name).parts:
skipped.append(f"escape: {name}")
continue
target = dest / name
if not _is_within(dest, target):
skipped.append(f"escape: {name}")
continue
# Symlink entries — skip rather than risk traversing out.
if info.external_attr >> 16 & 0o120000 == 0o120000:
skipped.append(f"symlink: {name}")
continue
if info.is_dir():
target.mkdir(parents=True, exist_ok=True)
continue
# Skip if already extracted with matching size (idempotent rerun).
if target.exists() and target.stat().st_size == info.file_size:
continue
target.parent.mkdir(parents=True, exist_ok=True)
try:
with zf.open(info, "r", pwd=pwd_bytes) as src, open(target, "wb") as out:
while True:
chunk = src.read(65536)
if not chunk:
break
out.write(chunk)
except RuntimeError as e:
# zipfile raises RuntimeError for bad-password / AES-encrypted.
msg = str(e)
if "Bad password" in msg or "password required" in msg:
return (
f"Error: bad or missing password for {zip_path}. "
f"If the zip is AES-encrypted (7-Zip/WinZip), stdlib "
f"cannot decrypt it — use `7z x -p<pwd> ...` "
f"externally and point the source path at the result."
)
raise
extracted += 1
total_bytes += info.file_size
except zipfile.BadZipFile as e:
return f"Error: {zip_path} is not a valid zip archive: {e}"
except Exception as e:
return f"Error extracting {zip_path}: {e}"
parts = [
f"Extracted {extracted} file(s), {total_bytes} bytes, into {dest}",
]
if skipped:
parts.append(f"Skipped {len(skipped)} unsafe entries:")
for s in skipped[:10]:
parts.append(f" - {s}")
if len(skipped) > 10:
parts.append(f" ... ({len(skipped) - 10} more)")
return "\n".join(parts)
async def unzip_archive(
zip_path: str, dest_dir: str, password: str | None = None,
) -> str:
"""Extract *zip_path* into *dest_dir*. Idempotent on rerun.
Defensive: rejects entries with absolute paths, leading '..', or that
would resolve outside *dest_dir* (the classic zip-slip vector). Symlink
entries are skipped (we never follow symlinks into the host filesystem).
Password-protected zips need the password argument (or
``meta.password`` on the source in case.yaml) — stdlib ``zipfile``
only handles the legacy ZipCrypto algorithm.
"""
return _do_extract(zip_path, dest_dir, password)
def unzip_archive_sync(
zip_path: str, dest_dir: str, password: str | None = None,
) -> str:
"""Synchronous variant of :func:`unzip_archive` for startup-time prepare_source.
Same behaviour, just no async wrapping — used before the event loop
starts so we don't have to spin one up just to unpack a zip.
"""
return _do_extract(zip_path, dest_dir, password)

87
tools/media.py Normal file
View File

@@ -0,0 +1,87 @@
"""Media plugin — OCR for image evidence.
DESIGN.md §4.7: the model backend (DeepSeek) has no vision, so we MUST run
OCR locally for any image-bearing evidence. Tesseract via pytesseract is
the default; if the runtime is missing those packages, the tool returns a
clear install hint rather than failing silently.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
logger = logging.getLogger(__name__)
MAX_OUTPUT = 8000
_INSTALL_HINT = (
"Error: OCR runtime not available. Install with:\n"
" pip install pytesseract pillow\n"
" sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-tra\n"
"(or the equivalent for your distribution). Then retry."
)
def _has_ocr_runtime() -> tuple[bool, str]:
"""Return (available, reason). reason is empty when available."""
try:
import pytesseract # noqa: F401
from PIL import Image # noqa: F401
except ImportError as e:
return False, f"missing python package: {e.name}"
# Check the tesseract binary too.
import shutil
if shutil.which("tesseract") is None:
return False, "tesseract binary not on PATH"
return True, ""
async def ocr_image(file_path: str, lang: str = "eng+chi_sim+chi_tra") -> str:
"""Extract text from an image via tesseract.
*lang* defaults to English + Simplified + Traditional Chinese, matching
the multi-language artefacts the current case involves. Pass a single
language code (e.g. ``"eng"``) to skip language packs that aren't
installed.
"""
p = Path(file_path)
if not p.is_file():
return f"Error: {file_path} is not a file."
available, reason = _has_ocr_runtime()
if not available:
return f"{_INSTALL_HINT}\n[detail: {reason}]"
import pytesseract
from PIL import Image
try:
img = Image.open(p)
except Exception as e:
return f"Error: could not open image {file_path}: {e}"
try:
text = pytesseract.image_to_string(img, lang=lang)
except pytesseract.TesseractError as e:
msg = str(e)
if "Failed loading language" in msg or "Error opening data file" in msg:
return (
f"Error: tesseract is installed but missing language pack(s) for {lang!r}. "
f"Install the language data (e.g. tesseract-ocr-chi-sim) or pass a "
f"different `lang`. Detail: {msg}"
)
return f"Error running tesseract: {msg}"
except Exception as e:
return f"Error during OCR: {e}"
size = p.stat().st_size
header = (
f"ocr: {file_path} ({size} bytes, lang={lang}, "
f"{len(text.splitlines())} line(s))\n"
)
if len(text) > MAX_OUTPUT - len(header):
body = text[:MAX_OUTPUT - len(header)] + "\n[truncated]"
else:
body = text
return header + body

160
tools/mobile_android.py Normal file
View File

@@ -0,0 +1,160 @@
"""Android plugin tools — partition survey + sector translation.
DESIGN.md §4.7 安卓: ``mmls`` partitions → per-partition image-mode source;
``fsstat`` per partition to classify ext4/F2FS/raw/encrypted. The shared TSK
toolchain already handles ext4/F2FS reads, so once the agent picks a partition
offset the standard list_directory / extract_file / search_strings tools work.
Quirk: Samsung dumps (e.g. ``blk0_sda.bin``) use 4096-byte image sectors but
TSK tool flags accept 512-byte sectors by default. ``probe_android_partitions``
emits BOTH unit systems so the agent can plug the right ``partition_offset``
value into ``set_active_partition``.
"""
from __future__ import annotations
import asyncio
import logging
import re
from pathlib import Path
logger = logging.getLogger(__name__)
MAX_OUTPUT = 8000
# Partitions worth flagging when we encounter them — informs the agent's
# strategy. Not exhaustive; just opinionated hints.
_PARTITION_HINTS: dict[str, str] = {
"EFS": "modem firmware area; often contains IMEI / MAC / serial",
"PARAM": "boot parameters; cmdline + flags",
"BOOT": "kernel + initramfs (raw image)",
"RECOVERY": "recovery image (raw)",
"SYSTEM": "Android /system — read-only OS partition (ext4)",
"CACHE": "downloaded OTA payloads; usually transient",
"USERDATA": "/data — user apps, dbs, accounts; FBE-encrypted on modern devices",
"PERSISTENT": "Samsung persistent partition; carrier/device flags",
"STEADY": "Samsung steady-state config",
"HIDDEN": "Samsung hidden partition; check before assuming empty",
"CP_DEBUG": "modem debug logs",
"TOMBSTONES": "userland crash dumps",
}
def _parse_mmls_with_unit(output: str) -> tuple[int, list[dict]]:
"""Parse mmls output, returning (sector_size_bytes, partitions).
mmls states ``Units are in N-byte sectors`` near the top; we extract N
to translate between image-native units and the 512-byte units TSK
tools accept via ``-o``.
"""
sector_size = 512
m = re.search(r"Units are in (\d+)-byte sectors", output)
if m:
sector_size = int(m.group(1))
parts: list[dict] = []
for line in output.splitlines():
m = re.match(
r"\s*(\d{3}):\s+(\S+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(.*)",
line,
)
if not m:
continue
_row, slot, start, end, length, desc = m.groups()
if slot == "Meta" or slot.startswith("---"):
continue
parts.append({
"slot": slot,
"start_native": int(start),
"end_native": int(end),
"length_native": int(length),
"description": desc.strip(),
})
return sector_size, parts
async def _run(cmd: list[str], timeout: int = 30) -> tuple[int, str, str]:
proc = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
except asyncio.TimeoutError:
proc.kill()
return 124, "", f"timeout after {timeout}s"
return proc.returncode or 0, stdout.decode("utf-8", "replace"), stderr.decode("utf-8", "replace")
_FS_TYPE_RE = re.compile(r"File System Type:\s*(\S+)", re.IGNORECASE)
async def _classify_partition(image_path: str, sector_offset_512: int) -> str:
"""Run fsstat on a partition; return 'Ext4'/'Yaffs2'/'FAT'/'unknown'/'inaccessible'.
fsstat's "Cannot determine file system type" is treated as 'unknown'
typically means raw image (BOOT/RECOVERY/RADIO/…) or encrypted data
(modern userdata under FBE).
"""
rc, out, _err = await _run(["fsstat", "-o", str(sector_offset_512), image_path], timeout=15)
if rc != 0:
return "unknown"
m = _FS_TYPE_RE.search(out)
if m:
return m.group(1)
return "unknown"
async def probe_android_partitions(image_path: str) -> str:
"""Survey every partition on an Android disk dump and return a table.
The agent reads this once to plan its work: which partitions are
Ext4/F2FS (use TSK), which are raw (extract image / strings only),
which are encrypted (skip until decrypted).
"""
p = Path(image_path)
if not p.is_file():
return f"Error: {image_path} is not a file."
rc, out, err = await _run(["mmls", str(p)], timeout=30)
if rc != 0:
return f"Error: mmls failed (rc={rc}): {err.strip() or out.strip()}"
sector_size, parts = _parse_mmls_with_unit(out)
if not parts:
return f"No partitions detected in {image_path}."
lines = [
f"Android partition survey: {image_path}",
f" mmls reports {sector_size}-byte sectors (TSK -o expects 512-byte sectors)",
f" {len(parts)} data partitions",
"",
"| slot | name | start (native) | start (512-sector) | size | fs_type | hint |",
"|---|---|---:|---:|---|---|---|",
]
for prt in parts:
sector_512 = prt["start_native"] * sector_size // 512
bytes_size = prt["length_native"] * sector_size
# human-readable size
if bytes_size >= 1 << 30:
size_h = f"{bytes_size / (1 << 30):.1f} GB"
elif bytes_size >= 1 << 20:
size_h = f"{bytes_size / (1 << 20):.1f} MB"
else:
size_h = f"{bytes_size // 1024} KB"
fs_type = await _classify_partition(str(p), sector_512)
# Try to extract a friendly partition name from the description
# (mmls description often includes the partition name uppercase).
name_match = re.search(r"[A-Z][A-Z0-9_]{2,}", prt["description"])
pname = name_match.group(0) if name_match else prt["description"][:20]
hint = _PARTITION_HINTS.get(pname, "")
lines.append(
f"| {prt['slot']} | {pname} | {prt['start_native']} | "
f"{sector_512} | {size_h} | {fs_type} | {hint} |"
)
body = "\n".join(lines)
if len(body) > MAX_OUTPUT:
body = body[:MAX_OUTPUT] + "\n\n[truncated]"
return body

274
tools/mobile_ios.py Normal file
View File

@@ -0,0 +1,274 @@
"""iOS extraction parsers — plist / sqlite / keychain / iDevice info.
DESIGN.md §4.7 iOS plugin tools. All tree-mode, path-based — no Sleuth
Kit, no graph dependency. Stdlib + sqlite3 only.
iOS extractions typically arrive as a zip containing domain-rooted trees
(HomeDomain, AppDomain, etc.) with a flat ``iDevice_info.txt`` summary,
binary/XML plists, and several SQLite databases (sms.db, AddressBook,
keychain-2.db, app-specific stores like WhatsApp's ChatStorage.sqlite).
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import plistlib
import re
import sqlite3
from pathlib import Path
logger = logging.getLogger(__name__)
# Output cap (chars) — keeps a single tool result under the LLM context budget.
MAX_OUTPUT = 8000
def _trunc(text: str, limit: int = MAX_OUTPUT) -> str:
if len(text) <= limit:
return text
return text[:limit] + f"\n\n[Output truncated: {len(text)} chars total]"
# ---------------------------------------------------------------------------
# plist
# ---------------------------------------------------------------------------
def _to_jsonable(obj):
"""Make plist values JSON-serializable: bytes → hex preview, dates → iso."""
import datetime
if isinstance(obj, bytes):
if len(obj) <= 64:
return {"_bytes_hex": obj.hex()}
return {"_bytes_hex_preview": obj[:64].hex(), "_total_bytes": len(obj)}
if isinstance(obj, datetime.datetime):
return obj.isoformat()
if isinstance(obj, dict):
return {str(k): _to_jsonable(v) for k, v in obj.items()}
if isinstance(obj, (list, tuple)):
return [_to_jsonable(v) for v in obj]
return obj
async def parse_plist(file_path: str) -> str:
"""Parse a .plist file (XML or binary) and return its contents as JSON.
Both formats are handled transparently by ``plistlib.load``.
"""
p = Path(file_path)
if not p.is_file():
return f"Error: {file_path} is not a file."
try:
with open(p, "rb") as f:
data = plistlib.load(f)
except plistlib.InvalidFileException as e:
return f"Error: {file_path} is not a valid plist ({e})"
except Exception as e:
return f"Error parsing plist {file_path}: {e}"
serial = _to_jsonable(data)
rendered = json.dumps(serial, ensure_ascii=False, indent=2, default=str)
header = f"plist: {file_path} ({p.stat().st_size} bytes)\n"
return header + _trunc(rendered)
# ---------------------------------------------------------------------------
# sqlite
# ---------------------------------------------------------------------------
_SELECT_RE = re.compile(r"^\s*SELECT\b", re.IGNORECASE)
async def sqlite_tables(db_path: str) -> str:
"""List user tables in a sqlite file with row counts and column names."""
p = Path(db_path)
if not p.is_file():
return f"Error: {db_path} is not a file."
try:
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db_path} (read-only): {e}"
try:
cur = conn.cursor()
cur.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
)
tables = [r[0] for r in cur.fetchall()]
if not tables:
return f"No user tables in {db_path}."
lines = [f"sqlite: {db_path} ({len(tables)} tables)"]
for name in tables:
try:
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
count = cur.fetchone()[0]
except sqlite3.DatabaseError as e:
count = f"(count failed: {e})"
try:
cur.execute(f"PRAGMA table_info(\"{name}\")")
cols = [r[1] for r in cur.fetchall()]
except sqlite3.DatabaseError:
cols = []
lines.append(f" {name}: {count} row(s); cols: {', '.join(cols)}")
return _trunc("\n".join(lines))
finally:
conn.close()
async def sqlite_query(
db_path: str,
query: str,
max_rows: int = 100,
) -> str:
"""Run a single read-only SELECT against a sqlite file.
Multi-statement queries and anything other than a SELECT are rejected
(we open the database in read-only mode anyway, so writes would fail
too — but the explicit check keeps the agent honest).
"""
if not _SELECT_RE.match(query):
return "Error: only single SELECT statements are allowed."
if ";" in query.rstrip(";"):
return "Error: multi-statement queries are not allowed."
p = Path(db_path)
if not p.is_file():
return f"Error: {db_path} is not a file."
try:
conn = sqlite3.connect(f"file:{p}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db_path} (read-only): {e}"
try:
cur = conn.cursor()
try:
cur.execute(query)
except sqlite3.DatabaseError as e:
return f"Error executing query: {e}"
cols = [d[0] for d in cur.description] if cur.description else []
rows = cur.fetchmany(max(1, int(max_rows)))
lines = [
f"sqlite query: {db_path}",
f"columns: {cols}",
f"rows ({len(rows)}, capped at {max_rows}):",
]
for row in rows:
rendered = [
(v.hex() if isinstance(v, bytes) else str(v))
for v in row
]
lines.append(" " + " | ".join(rendered))
return _trunc("\n".join(lines))
finally:
conn.close()
# ---------------------------------------------------------------------------
# iOS keychain (keychain-2.db)
# ---------------------------------------------------------------------------
# Standard iOS keychain tables. genp = generic passwords, inet = internet
# passwords, cert = certificates, keys = key material. Forensic extractions
# of locked keychains have ``data`` columns NULL but accounting metadata
# (agrp, acct, svce) intact — already useful for attribution work.
_KEYCHAIN_TABLES = ("genp", "inet", "cert", "keys")
async def parse_ios_keychain(keychain_root: str) -> str:
"""Locate and summarize iOS keychain entries under *keychain_root*.
*keychain_root* may be a path to ``keychain-2.db`` directly or to a
directory that contains it (e.g. ``.../var/keychains``).
"""
root = Path(keychain_root)
db: Path | None = None
if root.is_file() and root.name == "keychain-2.db":
db = root
elif root.is_dir():
candidate = root / "keychain-2.db"
if candidate.is_file():
db = candidate
else:
# Fall back to a shallow recursive search.
for found in root.rglob("keychain-2.db"):
db = found
break
if db is None:
return f"No keychain-2.db found under {keychain_root}."
try:
conn = sqlite3.connect(f"file:{db}?mode=ro", uri=True)
except sqlite3.OperationalError as e:
return f"Error opening {db}: {e}"
try:
cur = conn.cursor()
cur.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name IN ({})".format(
",".join("?" * len(_KEYCHAIN_TABLES))
),
_KEYCHAIN_TABLES,
)
present = [r[0] for r in cur.fetchall()]
if not present:
return f"keychain-2.db at {db} has no recognised tables."
lines = [f"keychain: {db}"]
for name in present:
cur.execute(f"SELECT COUNT(*) FROM \"{name}\"")
count = cur.fetchone()[0]
lines.append(f"\n[{name}] {count} row(s)")
cur.execute(f"PRAGMA table_info(\"{name}\")")
cols = [r[1] for r in cur.fetchall()]
# Pick a useful subset of accounting columns when present.
preferred = [
c for c in ("agrp", "acct", "svce", "labl", "desc", "atyp", "srvr")
if c in cols
]
if not preferred:
preferred = cols[:5]
sel = ", ".join(f'"{c}"' for c in preferred)
cur.execute(f"SELECT {sel} FROM \"{name}\" LIMIT 30")
for row in cur.fetchall():
lines.append(" " + " | ".join(
(v.hex() if isinstance(v, bytes) else str(v))
for v in row
))
return _trunc("\n".join(lines))
finally:
conn.close()
# ---------------------------------------------------------------------------
# iDevice_info.txt
# ---------------------------------------------------------------------------
async def read_idevice_info(file_path: str, max_chars: int = 6000) -> str:
"""Read the standard iDevice_info.txt summary at the root of an iOS extraction.
The file is a flat ``Key: value`` dump from libimobiledevice / native
extraction tools. We surface the first *max_chars* of content verbatim
— the agent can search/extract specific keys via search_text_file if
the head isn't enough.
"""
p = Path(file_path)
if p.is_dir():
# Be helpful: if the agent passed the extraction root, find the file.
candidate = p / "iDevice_info.txt"
if candidate.is_file():
p = candidate
if not p.is_file():
return f"Error: {file_path} is not a file."
try:
with open(p, "r", encoding="utf-8", errors="replace") as f:
content = f.read(max_chars)
size = p.stat().st_size
header = f"iDevice_info: {p} ({size} bytes)\n"
if size > max_chars:
content += f"\n\n[Truncated: file is {size} bytes, showing first {max_chars}]"
return header + content
except Exception as e:
return f"Error reading {file_path}: {e}"

View File

@@ -215,20 +215,178 @@ async def parse_prefetch(file_path: str) -> str:
return f"[Error parsing Prefetch: {e}]"
async def list_extracted_dir(dir_path: str) -> str:
"""List files in an extracted directory."""
async def list_extracted_dir(dir_path: str, max_entries: int = 200) -> str:
"""Smart summary of a (potentially huge) extracted tree.
Earlier versions dumped up to 200 random entries then truncated — that
leaves the agent blind on 10k+-file iOS extractions. The new layout
returns a compact summary that scales: total counts, extension
breakdown, top-level directories with their sizes, and the largest
files. For targeted lookups (e.g. find every ``*.sqlite`` under the
tree) the agent should use ``find_files`` instead.
"""
if not os.path.isdir(dir_path):
return f"[Error: {dir_path} is not a directory]"
try:
entries = []
for root, dirs, files in os.walk(dir_path):
total_files = 0
total_bytes = 0
ext_counts: dict[str, int] = {}
ext_bytes: dict[str, int] = {}
top_level_dirs: dict[str, dict] = {}
biggest: list[tuple[int, str]] = [] # (size, relpath)
dir_path_abs = os.path.abspath(dir_path)
for root, dirs, files in os.walk(dir_path_abs):
# Track top-level directory aggregates (cheap; no per-entry cost
# beyond the walk we're already doing).
rel_root = os.path.relpath(root, dir_path_abs)
if rel_root == ".":
top_dirs = {d: {"files": 0, "bytes": 0} for d in dirs}
top_level_dirs.update(top_dirs)
top_key = None
else:
top_key = rel_root.split(os.sep, 1)[0]
if top_key not in top_level_dirs:
top_level_dirs[top_key] = {"files": 0, "bytes": 0}
for f in files:
full = os.path.join(root, f)
rel = os.path.relpath(full, dir_path)
size = os.path.getsize(full)
entries.append(f" {rel} ({size} bytes)")
if len(entries) > 200:
entries.append(f" ... (truncated)")
break
try:
size = os.path.getsize(full)
except OSError:
continue
total_files += 1
total_bytes += size
ext = os.path.splitext(f)[1].lower() or "(no ext)"
ext_counts[ext] = ext_counts.get(ext, 0) + 1
ext_bytes[ext] = ext_bytes.get(ext, 0) + size
if top_key is not None:
top_level_dirs[top_key]["files"] += 1
top_level_dirs[top_key]["bytes"] += size
# Maintain a top-10 largest list cheaply (bounded insertion).
if len(biggest) < 10:
biggest.append((size, os.path.relpath(full, dir_path_abs)))
biggest.sort(reverse=True)
elif size > biggest[-1][0]:
biggest[-1] = (size, os.path.relpath(full, dir_path_abs))
biggest.sort(reverse=True)
return f"Directory: {dir_path}\nFiles ({len(entries)}):\n" + "\n".join(entries)
def _human(n: int) -> str:
for unit in ("B", "KB", "MB", "GB"):
if n < 1024:
return f"{n:.1f}{unit}" if unit != "B" else f"{n}B"
n /= 1024
return f"{n:.1f}TB"
lines = [
f"Directory: {dir_path}",
f" Total: {total_files} file(s), {_human(total_bytes)}",
]
# Top-level directory layout (immediate children, sorted by file count).
if top_level_dirs:
lines.append(f"\nTop-level layout ({len(top_level_dirs)} dirs at root):")
sorted_tlds = sorted(
top_level_dirs.items(), key=lambda kv: -kv[1]["files"],
)[:15]
for d, stats in sorted_tlds:
lines.append(
f" {d}/ ({stats['files']} files, {_human(stats['bytes'])})"
)
if len(top_level_dirs) > 15:
lines.append(f" ... ({len(top_level_dirs) - 15} more top-level dirs)")
# Extension breakdown.
if ext_counts:
lines.append(f"\nExtension breakdown (top 15):")
for ext, count in sorted(ext_counts.items(), key=lambda kv: -kv[1])[:15]:
lines.append(
f" {ext}: {count} files, {_human(ext_bytes.get(ext, 0))}"
)
# Largest files (often the highest-value forensic targets).
if biggest:
lines.append("\nLargest files:")
for size, rel in biggest:
lines.append(f" {rel} ({_human(size)})")
lines.append(
f"\nNext step: call find_files with a pattern like "
f"'**/*.plist' or '**/keychain-2.db' to locate specific artefacts."
)
return "\n".join(lines)
except Exception as e:
return f"[Error listing {dir_path}: {e}]"
async def find_files(
root: str,
pattern: str,
max_results: int = 500,
) -> str:
"""Recursively find files under *root* whose path matches *pattern*.
Uses fnmatch-style globs against the *full relative path*; ``**`` is
treated as "any number of path segments" (so ``**/*.plist`` finds
every plist no matter how deep). Examples:
- ``**/sms.db`` — iOS SMS database
- ``**/keychain-2.db`` — iOS keychain
- ``**/ChatStorage.sqlite`` — WhatsApp app store
- ``HomeDomain/Library/**`` — anchor at a known iOS domain root
- ``**/*.{plist,sqlite,db}`` — multi-extension (use 2+ calls or a regex if needed)
Results are sorted by size descending — the biggest hits usually
matter most. Capped at *max_results* to keep the LLM context bounded.
"""
import fnmatch
if not os.path.isdir(root):
return f"[Error: {root} is not a directory]"
root_abs = os.path.abspath(root)
# Convert ``**`` (any-depth) to fnmatch's ``*`` (any chars including /).
# fnmatch doesn't natively distinguish segment vs path; expanding ``**``
# to ``*`` and letting fnmatch match the full relpath is good enough for
# forensic lookups.
fn_pattern = pattern.replace("**", "*")
hits: list[tuple[int, str]] = []
truncated = False
try:
for dirpath, _dirs, files in os.walk(root_abs):
for f in files:
full = os.path.join(dirpath, f)
rel = os.path.relpath(full, root_abs)
if fnmatch.fnmatch(rel, fn_pattern) or fnmatch.fnmatch(f, fn_pattern):
try:
size = os.path.getsize(full)
except OSError:
size = 0
hits.append((size, rel))
if len(hits) >= max_results * 4:
# Hard upper bound to keep the walk cheap on huge trees.
truncated = True
break
if truncated:
break
except Exception as e:
return f"[Error searching {root}: {e}]"
hits.sort(reverse=True)
if len(hits) > max_results:
truncated = True
hits = hits[:max_results]
lines = [
f"find_files: pattern={pattern!r} under {root}",
f" matches: {len(hits)}" + (" (truncated)" if truncated else ""),
]
if not hits:
lines.append(" (no matches)")
else:
for size, rel in hits:
lines.append(f" {rel} ({size} bytes)")
return "\n".join(lines)

485
tools/strategy.py Normal file
View File

@@ -0,0 +1,485 @@
"""Strategist-loop tools — read-only views over graph state that let the
InvestigationStrategist agent decide whether to keep investigating or to
declare the investigation complete.
DESIGN_STRATEGIST.md §2. Four read-only views:
graph_overview() → hypotheses + sources + pending leads snapshot
source_coverage(src_id) → which artefact categories on this source have
been touched vs are still ✗
marginal_yield(n_rounds) → how much information the last N rounds added
budget_status() → tool calls / rounds / wall-clock against caps
These are pure render functions over the graph — they MUST NOT mutate state.
The strategist never writes phenomena/edges directly; all graph mutations
happen through worker agents that the strategist dispatches via propose_lead
(which is registered separately in tool_registry).
"""
from __future__ import annotations
import time
from typing import Any
# ---------------------------------------------------------------------------
# Expected artefact catalogue (per source type)
#
# These are SOFT HINTS — items the strategist might want to check on a given
# source type if any active hypothesis depends on them. The catalogue is
# intentionally compact; expand it in-place when a new forensic specialty
# joins the toolset. Each entry:
#
# name human-readable artefact category
# detector how to recognise that this category has been touched — either
# a tool name OR a `<tool>@<path-substring>` pattern, joined with
# `|` for alternatives. The matcher is substring on the tool name
# and on the args' string representation.
# value_for one-line description of why this category might matter
# ---------------------------------------------------------------------------
EXPECTED_ARTEFACTS: dict[str, list[dict[str, str]]] = {
"disk_image+windows": [
{"name": "partition layout", "detector": "partition_info|mmls",
"value_for": "deleted files, hidden partitions"},
{"name": "filesystem walk", "detector": "list_directory|fls",
"value_for": "directory tree, recoverable deleted entries"},
{"name": "registry hives", "detector": "parse_registry_key|list_installed_software|get_user_activity",
"value_for": "installed software, user activity, timezone"},
{"name": "browser history", "detector": "list_directory@AppData|read_text_file@History|read_text_file@Bookmarks",
"value_for": "URL access, downloads, web search terms"},
{"name": "prefetch", "detector": "parse_prefetch|extract_file@Prefetch",
"value_for": "program execution evidence"},
{"name": "email/IM config", "detector": "get_email_config",
"value_for": "user accounts, configured mail/IM clients"},
{"name": "recycle bin", "detector": "list_directory@$Recycle|count_deleted_files",
"value_for": "deleted file metadata and recovery"},
],
"disk_image+android": [
{"name": "partition probe", "detector": "probe_android_partitions",
"value_for": "discover EFS / SYSTEM / USERDATA layout"},
{"name": "system properties", "detector": "read_text_file@build.prop|read_text_file@default.prop",
"value_for": "device model, OS version, CSC region"},
{"name": "app inventory", "detector": "list_directory@data/app|list_directory@data/data",
"value_for": "installed apps, package names"},
{"name": "user data dbs", "detector": "list_directory@data/data|sqlite_query",
"value_for": "messages, contacts, app-specific data"},
{"name": "device identity", "detector": "search_strings@imei|search_strings@serial|search_strings@DRI",
"value_for": "IMEI, serial, device fingerprint"},
],
"mobile_extraction": [
{"name": "device info", "detector": "read_idevice_info|read_text_file@iDevice_info",
"value_for": "model, iOS version, IMEI, ICCID, Bluetooth MAC, UDID"},
{"name": "AddressBook", "detector": "sqlite_query@AddressBook.sqlitedb",
"value_for": "contacts, owner identity"},
{"name": "SMS / iMessage", "detector": "sqlite_query@sms.db",
"value_for": "messaging content, OTP / verification codes"},
{"name": "WhatsApp messages", "detector": "sqlite_query@ChatStorage.sqlite|sqlite_query@WhatsApp",
"value_for": "WhatsApp content, group membership, call records"},
{"name": "WeChat", "detector": "sqlite_query@MM.sqlite|sqlite_query@wcdb|list_directory@WeChat",
"value_for": "WeChat IDs, messages, follow targets"},
{"name": "Call history", "detector": "sqlite_query@CallHistory|sqlite_query@call_history",
"value_for": "incoming/outgoing call log"},
{"name": "Safari history", "detector": "sqlite_query@History.db|read_text_file@Bookmarks.plist|parse_plist@Bookmarks",
"value_for": "URL access, bookmarks, search queries"},
{"name": "Photos library", "detector": "sqlite_query@Photos.sqlite|parse_plist@Photos",
"value_for": "photo metadata, EXIF, geolocation, source app"},
{"name": "iCloud accounts", "detector": "parse_plist@Accounts3|parse_ios_keychain",
"value_for": "Apple ID, registered services, authentication tokens"},
{"name": "app inventory", "detector": "list_directory@Bundle/Application|list_directory@Containers",
"value_for": "installed apps, app-specific containers"},
{"name": "Wi-Fi history", "detector": "parse_plist@com.apple.wifi|read_text_file@known_networks",
"value_for": "connected SSIDs, keys, first/last seen times"},
],
"media_collection": [
{"name": "archive unpack", "detector": "unzip_archive|list_directory",
"value_for": "extract images / docs for downstream analysis"},
{"name": "OCR text", "detector": "ocr_image",
"value_for": "screenshot text content (chat, transaction, IDs)"},
{"name": "metadata", "detector": "read_binary_preview|search_strings",
"value_for": "EXIF, embedded timestamps, device fingerprints"},
],
"archive": [
{"name": "archive unpack", "detector": "unzip_archive",
"value_for": "expose contents for further analysis"},
],
}
def _key_for_source(src) -> str:
"""Return the EXPECTED_ARTEFACTS key for a source: 'disk_image+platform'
when platform is set in meta, otherwise just the source type."""
src_type = getattr(src, "type", "")
if src_type == "disk_image":
platform = (getattr(src, "meta", {}) or {}).get("platform", "").lower()
if platform:
return f"disk_image+{platform}"
return src_type
def _detector_matches(detector: str, tool_name: str, args_str: str) -> bool:
"""Return True if any '|'-separated branch of `detector` matches.
A branch like ``sqlite_query@AddressBook.sqlitedb`` requires both the
tool name (substring) AND the args (substring) to match. A branch like
``parse_prefetch`` is a tool-name-only check.
"""
for branch in detector.split("|"):
branch = branch.strip()
if not branch:
continue
if "@" in branch:
t, sub = branch.split("@", 1)
if t in tool_name and sub.lower() in args_str.lower():
return True
else:
if branch in tool_name:
return True
return False
# ---------------------------------------------------------------------------
# graph_overview()
# ---------------------------------------------------------------------------
def graph_overview(graph) -> str:
"""Render hypotheses + sources + pending leads as the strategist's
primary decision view.
Annotates each hypothesis with the count of distinct sources that
contribute supporting (positive-LR) edges. A hypothesis with many edges
but only one source is a strategist signal to seek cross-source
corroboration.
"""
lines: list[str] = ["# Investigation State", ""]
# Hypotheses table.
if graph.hypotheses:
lines.append(f"## Hypotheses ({len(graph.hypotheses)})")
lines.append("")
lines.append(
"| id | title | L | conf | status | edges_in | distinct_sources | recent_flip |"
)
lines.append("|----|-------|---|------|--------|---------:|-----------------:|--------------|")
# Sort by absolute log-odds magnitude descending so the strategist
# sees the most decided hypotheses first; active ones float to the
# middle of the table where decisions matter most.
for hid, h in sorted(
graph.hypotheses.items(),
key=lambda kv: (kv[1].status != "active", -abs(kv[1].log_odds)),
):
in_edges = graph._adj_rev.get(hid, [])
edges_in = len(in_edges)
# Distinct sources contributing edges (looked up via source
# phenomenon's source_id; entity→entity edges have no source).
distinct_sources: set[str] = set()
for e in in_edges:
src_node = graph.phenomena.get(e.source_id)
if src_node is not None and src_node.source_id:
distinct_sources.add(src_node.source_id)
# Did this hypothesis's status change in the last 2 rounds?
recent = "no"
recent_rounds = graph.investigation_rounds[-2:]
for r in recent_rounds:
before = r.hypothesis_status_snapshot_before.get(hid)
after = r.hypothesis_status_snapshot_after.get(hid)
if before and after and before != after:
recent = f"yes ({before}{after} in R{r.round_number})"
break
title = (h.title or "")[:60].replace("|", "/")
lines.append(
f"| {hid[:14]} | {title} | {h.log_odds:+.2f} | "
f"{h.confidence:.2f} | {h.status} | {edges_in} | "
f"{len(distinct_sources)} | {recent} |"
)
lines.append("")
else:
lines.append("## Hypotheses\n\n_(none yet — Phase 2 has not produced any)_\n")
# Sources table.
if graph.case and graph.case.sources:
lines.append(f"## Sources ({len(graph.case.sources)})")
lines.append("")
lines.append(
"| id | type | phenomena | identities | last_touched_in_round |"
)
lines.append("|----|------|----------:|-----------:|----------------------|")
for src in graph.case.sources:
ph_count = sum(
1 for p in graph.phenomena.values() if p.source_id == src.id
)
id_count = sum(
1 for e in graph.entities.values()
for i in e.identifiers
if any(
p.source_id == src.id
for p in graph.phenomena.values()
if p.id == i.get("phenomenon_id")
)
)
# Latest round in which a tool invocation was made against this src.
last_r = ""
for r in reversed(graph.investigation_rounds):
if r.new_phenomena_count > 0:
# Heuristic: if any phenomenon created during this round
# was on this source, mark this round as the last touch.
in_round = [
p for p in graph.phenomena.values()
if p.source_id == src.id
and r.started_at <= p.created_at
and (not r.completed_at or p.created_at <= r.completed_at)
]
if in_round:
last_r = f"R{r.round_number}"
break
lines.append(
f"| {src.id} | {src.type} | {ph_count} | {id_count} | {last_r} |"
)
lines.append("")
# Pending leads.
pending = [l for l in graph.leads if l.status == "pending"]
if pending:
lines.append(f"## Pending Leads ({len(pending)})")
lines.append("")
lines.append("| id | from | target_agent | for_hypothesis | description |")
lines.append("|----|------|--------------|----------------|-------------|")
for l in pending[:20]:
desc = (l.description or "")[:80].replace("|", "/")
mh = l.motivating_hypothesis or l.hypothesis_id or ""
lines.append(
f"| {l.id} | {l.proposed_by or ''} | {l.target_agent} | "
f"{mh[:14] if mh != '' else ''} | {desc} |"
)
if len(pending) > 20:
lines.append(f"\n_(+{len(pending) - 20} more pending leads not shown)_")
lines.append("")
else:
lines.append("## Pending Leads\n\n_(none — no investigations queued)_\n")
# Interpretation hint at the end, plain English.
lines.append("---")
lines.append(
"**Interpretation hints**: A hypothesis with many edges but only one "
"distinct_source has fragile cross-source independence — a single "
"edge from a *different* source would do more for it than another "
"edge from the same source (harmonic damping makes repeats cheap). "
"Hypotheses in the active band (0.2 < conf < 0.8) are the ones a "
"well-targeted lead can flip. recent_flip = 'yes' means belief is "
"still moving on that hypothesis; 'no' across 2 rounds suggests "
"stability."
)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# source_coverage(source_id)
# ---------------------------------------------------------------------------
def source_coverage(graph, source_id: str) -> str:
"""Render which expected artefact categories have been touched on
*source_id*, and which remain ✗.
Output is markdown. The closing paragraph reminds the strategist that
coverage hints are heuristics — investigate ✗ items only when an active
hypothesis depends on them. This is the design's central guardrail
against the system devolving into a fixed forensic checklist.
"""
src = graph.case.get_source(source_id) if graph.case else None
if src is None:
return f"Error: source_id {source_id!r} not found in case."
key = _key_for_source(src)
expected = EXPECTED_ARTEFACTS.get(key, [])
# Collect this source's invocation history.
invs = [
inv for inv in graph.tool_invocations.values()
if inv.source_id == source_id
]
# For each expected category, decide ✓ / ✗ + show example invocation if ✓.
rows: list[tuple[str, str, str, str]] = []
for entry in expected:
name = entry["name"]
detector = entry["detector"]
value_for = entry["value_for"]
matched: str | None = None
for inv in invs:
args_str = ""
try:
args_str = " ".join(f"{k}={v}" for k, v in (inv.args or {}).items())
except Exception:
args_str = str(inv.args)
if _detector_matches(detector, inv.tool, args_str):
matched = f"{inv.tool}({args_str[:60]})"
break
mark = "" if matched else ""
evidence = matched or ""
rows.append((mark, name, evidence, value_for))
lines: list[str] = [
f"# Coverage of source `{source_id}` ({src.label})",
"",
f"Source type: `{src.type}` / access_mode: `{src.access_mode}`",
f"Invocations made against this source: **{len(invs)}**",
"",
]
if not expected:
lines.append(
f"_(no expected-artefact catalogue entry for source type `{key}` — "
"coverage cannot be assessed against a baseline)_"
)
else:
lines.append(
"| ✓/✗ | category | example invocation | what it would tell us |"
)
lines.append("|-----|----------|---------------------|------------------------|")
for mark, name, evidence, value_for in rows:
lines.append(
f"| {mark} | {name} | {evidence[:70].replace('|','/')} | {value_for} |"
)
n_covered = sum(1 for r in rows if r[0] == "")
n_total = len(rows)
lines.append("")
lines.append(f"Coverage: **{n_covered}/{n_total}** ({n_covered*100//max(n_total,1)}%)")
# Other invocations on this source that didn't match any expected entry —
# could be genuine novel exploration; strategist might want to know.
lines.append("")
lines.append("---")
lines.append(
"**Coverage hints are heuristics, not requirements.** Skip an item if "
"the case theory makes it irrelevant — a financial-fraud case has no "
"reason to OCR every photo. Investigate ✗ items only when they could "
"materially affect an active hypothesis. If you propose a lead just "
"because something is ✗, the strategist prompt is being misused."
)
return "\n".join(lines)
# ---------------------------------------------------------------------------
# marginal_yield(last_n_rounds)
# ---------------------------------------------------------------------------
def marginal_yield(graph, last_n_rounds: int = 2) -> str:
"""Render the last N investigation rounds' yield deltas.
Yield columns:
- new_phenomena: phenomena created during the round
- new_edges: edges (any direction) added during the round
- status_flips: hypotheses whose status changed during the round
A row of zeros means that round didn't move the graph. Two consecutive
such rows is strong evidence of diminishing returns; the strategist
should consider declare_investigation_complete with reason
marginal_yield_zero.
"""
rounds = [r for r in graph.investigation_rounds if r.completed_at]
if not rounds:
return (
"# Marginal Yield\n\n"
"_(no completed investigation rounds yet — yield not applicable)_"
)
recent = rounds[-max(1, last_n_rounds):]
lines = [f"# Marginal Yield (last {len(recent)} of {len(rounds)} rounds)", ""]
lines.append("| round | new_phenomena | new_edges | status_flips |")
lines.append("|-------|--------------:|----------:|-------------:|")
yields: list[tuple[int, int, int]] = []
for r in recent:
yields.append((r.new_phenomena_count, r.new_edges_count, r.status_flips))
lines.append(
f"| R{r.round_number} | {r.new_phenomena_count} | "
f"{r.new_edges_count} | {r.status_flips} |"
)
# Trend interpretation aid.
lines.append("")
if all(y == (0, 0, 0) for y in yields):
trend = (
"Yield is zero across these rounds — diminishing returns are "
"confirmed. Strongly consider declare_investigation_complete "
"(reason: marginal_yield_zero)."
)
elif len(yields) >= 2:
first = yields[0][0] + yields[0][1] + yields[0][2]
last = yields[-1][0] + yields[-1][1] + yields[-1][2]
if last == 0 and first > 0:
trend = (
"Yield collapsed to zero in the most recent round. One more "
"well-targeted probe is reasonable; another zero-yield round "
"after that means stop."
)
elif last < first / 2 and first > 0:
trend = (
f"Decelerating ({last}/{first}"
f"{int(100*last/first)}% of the earlier round). Diminishing "
"returns are accumulating."
)
else:
trend = "Yield is still active — further investigation is paying off."
else:
trend = (
"Only one completed round — too early to call a trend. Run at "
"least one more before considering completion."
)
lines.append(f"**Trend**: {trend}")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# budget_status()
# ---------------------------------------------------------------------------
def budget_status(graph, budgets: dict[str, Any] | None, start_time: float | None) -> str:
"""Render budget usage against config.yaml `budgets` block.
Counters:
- tool_calls: len(graph.tool_invocations)
- strategist_rounds: len(graph.investigation_rounds)
- wall_clock_minutes: now - start_time (when start_time is supplied)
"""
budgets = budgets or {}
tool_calls_used = len(graph.tool_invocations)
rounds_used = len(graph.investigation_rounds)
minutes_used: float | None = None
if start_time is not None:
minutes_used = (time.monotonic() - start_time) / 60.0
def _row(name: str, used: float, cap: Any) -> str:
if cap is None:
return f"| {name} | {used:g} | — | (unbounded) |"
pct = (used / cap) * 100 if cap else 0
return f"| {name} | {used:g} | {cap} | {pct:.0f}% |"
lines = ["# Budget Status", ""]
lines.append("| metric | used | cap | pct |")
lines.append("|--------|-----:|----:|----:|")
lines.append(_row("tool_calls", tool_calls_used, budgets.get("tool_calls_total")))
lines.append(_row("strategist_rounds", rounds_used, budgets.get("strategist_rounds_max")))
if minutes_used is not None:
lines.append(_row(
"wall_clock_minutes", round(minutes_used, 1),
budgets.get("wall_clock_minutes_max"),
))
# Pacing hint.
lines.append("")
flags = []
cap_calls = budgets.get("tool_calls_total")
cap_rounds = budgets.get("strategist_rounds_max")
if cap_calls and tool_calls_used / cap_calls >= 0.9:
flags.append("tool_calls budget ≥ 90% used — favour declare_complete")
if cap_rounds and rounds_used / cap_rounds >= 0.7:
flags.append("strategist rounds ≥ 70% used — only propose leads with high expected yield")
if flags:
lines.append("**Budget warnings**:")
for f in flags:
lines.append(f"- {f}")
else:
lines.append(
"Budget room remains. Standard rule: each propose_lead should "
"name a specific hypothesis it expects to move; otherwise skip it."
)
return "\n".join(lines)

222
uv.lock generated
View File

@@ -2,6 +2,15 @@ version = 1
revision = 3
requires-python = ">=3.14"
[[package]]
name = "annotated-types"
version = "0.7.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ee/67/531ea369ba64dcff5ec9c3402f9f51bf748cec26dde048a2f973a4eea7f5/annotated_types-0.7.0.tar.gz", hash = "sha256:aff07c09a53a08bc8cfccb9c85b05f1aa9a2a6f23728d790723543408344ce89", size = 16081, upload-time = "2024-05-20T21:33:25.928Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl", hash = "sha256:1f02e8b43a8fbbc3f3e0d4f0f4bfc8131bcb4eebe8849b8e5c773f3a1c582a53", size = 13643, upload-time = "2024-05-20T21:33:24.1Z" },
]
[[package]]
name = "anyio"
version = "4.13.0"
@@ -41,6 +50,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b2/fb/08b3f4bf05da99aba8ffea52a558758def16e8516bc75ca94ff73587e7d3/construct-2.10.70-py3-none-any.whl", hash = "sha256:c80be81ef595a1a821ec69dc16099550ed22197615f4320b57cc9ce2a672cb30", size = 63020, upload-time = "2023-11-29T08:44:46.876Z" },
]
[[package]]
name = "distro"
version = "1.9.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/fc/f8/98eea607f65de6527f8a2e8885fc8015d3e6f5775df186e443e0964a11c3/distro-1.9.0.tar.gz", hash = "sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed", size = 60722, upload-time = "2023-12-24T09:54:32.31Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2", size = 20277, upload-time = "2023-12-24T09:54:30.421Z" },
]
[[package]]
name = "h11"
version = "0.16.0"
@@ -110,12 +128,50 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
]
[[package]]
name = "jiter"
version = "0.14.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/6e/c1/0cddc6eb17d4c53a99840953f95dd3accdc5cfc7a337b0e9b26476276be9/jiter-0.14.0.tar.gz", hash = "sha256:e8a39e66dac7153cf3f964a12aad515afa8d74938ec5cc0018adcdae5367c79e", size = 165725, upload-time = "2026-04-10T14:28:42.01Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/4f/1e/354ed92461b165bd581f9ef5150971a572c873ec3b68a916d5aa91da3cc2/jiter-0.14.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:6f396837fc7577871ca8c12edaf239ed9ccef3bbe39904ae9b8b63ce0a48b140", size = 315277, upload-time = "2026-04-10T14:27:18.109Z" },
{ url = "https://files.pythonhosted.org/packages/a6/95/8c7c7028aa8636ac21b7a55faef3e34215e6ed0cbf5ae58258427f621aa3/jiter-0.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:a4d50ea3d8ba4176f79754333bd35f1bbcd28e91adc13eb9b7ca91bc52a6cef9", size = 315923, upload-time = "2026-04-10T14:27:19.603Z" },
{ url = "https://files.pythonhosted.org/packages/47/40/e2a852a44c4a089f2681a16611b7ce113224a80fd8504c46d78491b47220/jiter-0.14.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce17f8a050447d1b4153bda4fb7d26e6a9e74eb4f4a41913f30934c5075bf615", size = 344943, upload-time = "2026-04-10T14:27:21.262Z" },
{ url = "https://files.pythonhosted.org/packages/fc/1f/670f92adee1e9895eac41e8a4d623b6da68c4d46249d8b556b60b63f949e/jiter-0.14.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f4f1c4b125e1652aefbc2e2c1617b60a160ab789d180e3d423c41439e5f32850", size = 369725, upload-time = "2026-04-10T14:27:22.766Z" },
{ url = "https://files.pythonhosted.org/packages/01/2f/541c9ba567d05de1c4874a0f8f8c5e3fd78e2b874266623da9a775cf46e0/jiter-0.14.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:be808176a6a3a14321d18c603f2d40741858a7c4fc982f83232842689fe86dd9", size = 461210, upload-time = "2026-04-10T14:27:24.315Z" },
{ url = "https://files.pythonhosted.org/packages/ce/a9/c31cbec09627e0d5de7aeaec7690dba03e090caa808fefd8133137cf45bc/jiter-0.14.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:26679d58ba816f88c3849306dd58cb863a90a1cf352cdd4ef67e30ccf8a77994", size = 380002, upload-time = "2026-04-10T14:27:26.155Z" },
{ url = "https://files.pythonhosted.org/packages/50/02/3c05c1666c41904a2f607475a73e7a4763d1cbde2d18229c4f85b22dc253/jiter-0.14.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:80381f5a19af8fa9aef743f080e34f6b25ebd89656475f8cf0470ec6157052aa", size = 354678, upload-time = "2026-04-10T14:27:27.701Z" },
{ url = "https://files.pythonhosted.org/packages/7d/97/e15b33545c2b13518f560d695f974b9891b311641bdcf178d63177e8801e/jiter-0.14.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:004df5fdb8ecbd6d99f3227df18ba1a259254c4359736a2e6f036c944e02d7c5", size = 358920, upload-time = "2026-04-10T14:27:29.256Z" },
{ url = "https://files.pythonhosted.org/packages/ad/d2/8b1461def6b96ba44530df20d07ef7a1c7da22f3f9bf1727e2d611077bf1/jiter-0.14.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:cff5708f7ed0fa098f2b53446c6fa74c48469118e5cd7497b4f1cd569ab06928", size = 394512, upload-time = "2026-04-10T14:27:31.344Z" },
{ url = "https://files.pythonhosted.org/packages/e3/88/837566dd6ed6e452e8d3205355afd484ce44b2533edfa4ed73a298ea893e/jiter-0.14.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:2492e5f06c36a976d25c7cc347a60e26d5470178d44cde1b9b75e60b4e519f28", size = 521120, upload-time = "2026-04-10T14:27:33.299Z" },
{ url = "https://files.pythonhosted.org/packages/89/6b/b00b45c4d1b4c031777fe161d620b755b5b02cdade1e316dcb46e4471d63/jiter-0.14.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7609cfbe3a03d37bfdbf5052012d5a879e72b83168a363deae7b3a26564d57de", size = 553668, upload-time = "2026-04-10T14:27:34.868Z" },
{ url = "https://files.pythonhosted.org/packages/ad/d8/6fe5b42011d19397433d345716eac16728ac241862a2aac9c91923c7509a/jiter-0.14.0-cp314-cp314-win32.whl", hash = "sha256:7282342d32e357543565286b6450378c3cd402eea333fc1ebe146f1fabb306fc", size = 207001, upload-time = "2026-04-10T14:27:36.455Z" },
{ url = "https://files.pythonhosted.org/packages/e5/43/5c2e08da1efad5e410f0eaaabeadd954812612c33fbbd8fd5328b489139d/jiter-0.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:bd77945f38866a448e73b0b7637366afa814d4617790ecd88a18ca74377e6c02", size = 202187, upload-time = "2026-04-10T14:27:38Z" },
{ url = "https://files.pythonhosted.org/packages/aa/1f/6e39ac0b4cdfa23e606af5b245df5f9adaa76f35e0c5096790da430ca506/jiter-0.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:f2d4c61da0821ee42e0cdf5489da60a6d074306313a377c2b35af464955a3611", size = 192257, upload-time = "2026-04-10T14:27:39.504Z" },
{ url = "https://files.pythonhosted.org/packages/05/57/7dbc0ffbbb5176a27e3518716608aa464aee2e2887dc938f0b900a120449/jiter-0.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1bf7ff85517dd2f20a5750081d2b75083c1b269cf75afc7511bdf1f9548beb3b", size = 323441, upload-time = "2026-04-10T14:27:41.039Z" },
{ url = "https://files.pythonhosted.org/packages/83/6e/7b3314398d8983f06b557aa21b670511ec72d3b79a68ee5e4d9bff972286/jiter-0.14.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c8ef8791c3e78d6c6b157c6d360fbb5c715bebb8113bc6a9303c5caff012754a", size = 348109, upload-time = "2026-04-10T14:27:42.552Z" },
{ url = "https://files.pythonhosted.org/packages/ae/4f/8dc674bcd7db6dba566de73c08c763c337058baff1dbeb34567045b27cdc/jiter-0.14.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e74663b8b10da1fe0f4e4703fd7980d24ad17174b6bb35d8498d6e3ebce2ae6a", size = 368328, upload-time = "2026-04-10T14:27:44.574Z" },
{ url = "https://files.pythonhosted.org/packages/3b/5f/188e09a1f20906f98bbdec44ed820e19f4e8eb8aff88b9d1a5a497587ff3/jiter-0.14.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1aca29ba52913f78362ec9c2da62f22cdc4c3083313403f90c15460979b84d9b", size = 463301, upload-time = "2026-04-10T14:27:46.717Z" },
{ url = "https://files.pythonhosted.org/packages/ac/f0/19046ef965ed8f349e8554775bb12ff4352f443fbe12b95d31f575891256/jiter-0.14.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:8b39b7d87a952b79949af5fef44d2544e58c21a28da7f1bae3ef166455c61746", size = 378891, upload-time = "2026-04-10T14:27:48.32Z" },
{ url = "https://files.pythonhosted.org/packages/c4/c3/da43bd8431ee175695777ee78cf0e93eacbb47393ff493f18c45231b427d/jiter-0.14.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:78d918a68b26e9fab068c2b5453577ef04943ab2807b9a6275df2a812599a310", size = 360749, upload-time = "2026-04-10T14:27:49.88Z" },
{ url = "https://files.pythonhosted.org/packages/72/26/e054771be889707c6161dbdec9c23d33a9ec70945395d70f07cfea1e9a6f/jiter-0.14.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:b08997c35aee1201c1a5361466a8fb9162d03ae7bf6568df70b6c859f1e654a4", size = 358526, upload-time = "2026-04-10T14:27:51.504Z" },
{ url = "https://files.pythonhosted.org/packages/c3/0f/7bea65ea2a6d91f2bf989ff11a18136644392bf2b0497a1fa50934c30a9c/jiter-0.14.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:260bf7ca20704d58d41f669e5e9fe7fe2fa72901a6b324e79056f5d52e9c9be2", size = 393926, upload-time = "2026-04-10T14:27:53.368Z" },
{ url = "https://files.pythonhosted.org/packages/3c/a1/b1ff7d70deef61ac0b7c6c2f12d2ace950cdeecb4fdc94500a0926802857/jiter-0.14.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:37826e3df29e60f30a382f9294348d0238ef127f4b5d7f5f8da78b5b9e050560", size = 521052, upload-time = "2026-04-10T14:27:55.058Z" },
{ url = "https://files.pythonhosted.org/packages/0b/7b/3b0649983cbaf15eda26a414b5b1982e910c67bd6f7b1b490f3cfc76896a/jiter-0.14.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:645be49c46f2900937ba0eaf871ad5183c96858c0af74b6becc7f4e367e36e06", size = 553716, upload-time = "2026-04-10T14:27:57.269Z" },
{ url = "https://files.pythonhosted.org/packages/97/f8/33d78c83bd93ae0c0af05293a6660f88a1977caef39a6d72a84afab94ce0/jiter-0.14.0-cp314-cp314t-win32.whl", hash = "sha256:2f7877ed45118de283786178eceaf877110abacd04fde31efff3940ae9672674", size = 207957, upload-time = "2026-04-10T14:27:59.285Z" },
{ url = "https://files.pythonhosted.org/packages/d6/ac/2b760516c03e2227826d1f7025d89bf6bf6357a28fe75c2a2800873c50bf/jiter-0.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:14c0cb10337c49f5eafe8e7364daca5e29a020ea03580b8f8e6c597fed4e1588", size = 204690, upload-time = "2026-04-10T14:28:00.962Z" },
{ url = "https://files.pythonhosted.org/packages/dc/2e/a44c20c58aeed0355f2d326969a181696aeb551a25195f47563908a815be/jiter-0.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:5419d4aa2024961da9fe12a9cfe7484996735dca99e8e090b5c88595ef1951ff", size = 191338, upload-time = "2026-04-10T14:28:02.853Z" },
]
[[package]]
name = "masforensics"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
{ name = "httpx", extra = ["socks"] },
{ name = "openai" },
{ name = "pillow" },
{ name = "pytesseract" },
{ name = "pyyaml" },
{ name = "regipy" },
]
@@ -129,6 +185,9 @@ dev = [
[package.metadata]
requires-dist = [
{ name = "httpx", extras = ["socks"], specifier = ">=0.28.1" },
{ name = "openai", specifier = ">=2.36.0" },
{ name = "pillow", specifier = ">=12.2.0" },
{ name = "pytesseract", specifier = ">=0.3.13" },
{ name = "pyyaml" },
{ name = "regipy", specifier = ">=6.2.1" },
]
@@ -139,6 +198,25 @@ dev = [
{ name = "pytest-asyncio", specifier = ">=1.3.0" },
]
[[package]]
name = "openai"
version = "2.36.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
{ name = "distro" },
{ name = "httpx" },
{ name = "jiter" },
{ name = "pydantic" },
{ name = "sniffio" },
{ name = "tqdm" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f4/a1/4d5e84cf51720fc1526cc49e10ac1961abcccb55b0efb3d970db1e9a2728/openai-2.36.0.tar.gz", hash = "sha256:139dea0edd2f1b30c33d46ae1a6929e03906254140318e4608e98fe8c566f2e7", size = 753003, upload-time = "2026-05-07T17:33:17.075Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9d/1c/5d43735b2553baae2a5e899dcbcd0670a86930d993184d72ca909bf11c9b/openai-2.36.0-py3-none-any.whl", hash = "sha256:143f6194b548dbc2c921af1f1b03b9f14c85fed8a75b5b516f5bcc11a2a50c63", size = 1302361, upload-time = "2026-05-07T17:33:15.063Z" },
]
[[package]]
name = "packaging"
version = "26.0"
@@ -148,6 +226,39 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
]
[[package]]
name = "pillow"
version = "12.2.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
{ url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
{ url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
{ url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
{ url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
{ url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
{ url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
{ url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
{ url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
{ url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
{ url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
{ url = "https://files.pythonhosted.org/packages/6a/7a/c253e3c645cd47f1aceea6a8bacdba9991bf45bb7dfe927f7c893e89c93c/pillow-12.2.0-cp314-cp314-win32.whl", hash = "sha256:632ff19b2778e43162304d50da0181ce24ac5bb8180122cbe1bf4673428328c7", size = 6479723, upload-time = "2026-04-01T14:45:17.797Z" },
{ url = "https://files.pythonhosted.org/packages/cd/8b/601e6566b957ca50e28725cb6c355c59c2c8609751efbecd980db44e0349/pillow-12.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:4e6c62e9d237e9b65fac06857d511e90d8461a32adcc1b9065ea0c0fa3a28150", size = 7217400, upload-time = "2026-04-01T14:45:20.529Z" },
{ url = "https://files.pythonhosted.org/packages/d6/94/220e46c73065c3e2951bb91c11a1fb636c8c9ad427ac3ce7d7f3359b9b2f/pillow-12.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:b1c1fbd8a5a1af3412a0810d060a78b5136ec0836c8a4ef9aa11807f2a22f4e1", size = 2554835, upload-time = "2026-04-01T14:45:23.162Z" },
{ url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
{ url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
{ url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
{ url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
{ url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
{ url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
{ url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
{ url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
{ url = "https://files.pythonhosted.org/packages/c9/e4/4b64a97d71b2a83158134abbb2f5bd3f8a2ea691361282f010998f339ec7/pillow-12.2.0-cp314-cp314t-win32.whl", hash = "sha256:6bb77b2dcb06b20f9f4b4a8454caa581cd4dd0643a08bacf821216a16d9c8354", size = 6482084, upload-time = "2026-04-01T14:45:47.568Z" },
{ url = "https://files.pythonhosted.org/packages/ba/13/306d275efd3a3453f72114b7431c877d10b1154014c1ebbedd067770d629/pillow-12.2.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6562ace0d3fb5f20ed7290f1f929cae41b25ae29528f2af1722966a0a02e2aa1", size = 7225152, upload-time = "2026-04-01T14:45:50.032Z" },
{ url = "https://files.pythonhosted.org/packages/ff/6e/cf826fae916b8658848d7b9f38d88da6396895c676e8086fc0988073aaf8/pillow-12.2.0-cp314-cp314t-win_arm64.whl", hash = "sha256:aa88ccfe4e32d362816319ed727a004423aab09c5cea43c01a4b435643fa34eb", size = 2556579, upload-time = "2026-04-01T14:45:52.529Z" },
]
[[package]]
name = "pluggy"
version = "1.6.0"
@@ -157,6 +268,62 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
]
[[package]]
name = "pydantic"
version = "2.13.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "annotated-types" },
{ name = "pydantic-core" },
{ name = "typing-extensions" },
{ name = "typing-inspection" },
]
sdist = { url = "https://files.pythonhosted.org/packages/18/a5/b60d21ac674192f8ab0ba4e9fd860690f9b4a6e51ca5df118733b487d8d6/pydantic-2.13.4.tar.gz", hash = "sha256:c40756b57adaa8b1efeeced5c196f3f3b7c435f90e84ea7f443901bec8099ef6", size = 844775, upload-time = "2026-05-06T13:43:05.343Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fd/7b/122376b1fd3c62c1ed9dc80c931ace4844b3c55407b6fb2d199377c9736f/pydantic-2.13.4-py3-none-any.whl", hash = "sha256:45a282cde31d808236fd7ea9d919b128653c8b38b393d1c4ab335c62924d9aba", size = 472262, upload-time = "2026-05-06T13:43:02.641Z" },
]
[[package]]
name = "pydantic-core"
version = "2.46.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9d/56/921726b776ace8d8f5db44c4ef961006580d91dc52b803c489fafd1aa249/pydantic_core-2.46.4.tar.gz", hash = "sha256:62f875393d7f270851f20523dd2e29f082bcc82292d66db2b64ea71f64b6e1c1", size = 471464, upload-time = "2026-05-06T13:37:06.98Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8d/74/228a26ddad29c6672b805d9fd78e8d251cd04004fa7eed0e622096cd0250/pydantic_core-2.46.4-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:428e04521a40150c85216fc8b85e8d39fece235a9cf5e383761238c7fa9b96fb", size = 2102079, upload-time = "2026-05-06T13:38:41.019Z" },
{ url = "https://files.pythonhosted.org/packages/ad/1f/8970b150a4b4365623ae00fc88603491f763c627311ae8031e3111356d6e/pydantic_core-2.46.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:23ace664830ee0bfe014a0c7bc248b1f7f25ed7ad103852c317624a1083af462", size = 1952179, upload-time = "2026-05-06T13:36:59.812Z" },
{ url = "https://files.pythonhosted.org/packages/95/30/5211a831ae054928054b2f79731661087a2bc5c01e825c672b3a4a8f1b3e/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ce5c1d2a8b27468f433ca974829c44060b8097eedc39933e3c206a90ee49c4a9", size = 1978926, upload-time = "2026-05-06T13:37:39.933Z" },
{ url = "https://files.pythonhosted.org/packages/57/e9/689668733b1eb67adeef047db3c2e8788fcf65a7fd9c9e2b46b7744fe245/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7283d57845ecf5a163403eb0702dfc220cc4fbdd18919cb5ccea4f95ee1cdab4", size = 2046785, upload-time = "2026-05-06T13:38:01.995Z" },
{ url = "https://files.pythonhosted.org/packages/60/d9/6715260422ff50a2109878fd24d948a6c3446bb2664f34ee78cd972b3acd/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8daafc69c93ee8a0204506a3b6b30f586ef54028f52aeeeb5c4cfc5184fd5914", size = 2228733, upload-time = "2026-05-06T13:40:50.371Z" },
{ url = "https://files.pythonhosted.org/packages/18/ae/fdb2f64316afca925640f8e70bb1a564b0ec2721c1389e25b8eb4bf9a299/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:cd2213145bcc2ba85884d0ac63d222fece9209678f77b9b4d76f054c561adb28", size = 2307534, upload-time = "2026-05-06T13:37:21.531Z" },
{ url = "https://files.pythonhosted.org/packages/89/1d/8eff589b45bb8190a9d12c49cfad0f176a5cbd1534908a6b5125e2886239/pydantic_core-2.46.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7a5f930472650a82629163023e630d160863fce524c616f4e5186e5de9d9a49b", size = 2099732, upload-time = "2026-05-06T13:39:31.942Z" },
{ url = "https://files.pythonhosted.org/packages/06/d5/ee5a3366637fee41dee51a1fc91562dcf12ddbc68fda34e6b253da2324bb/pydantic_core-2.46.4-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:c1b3f518abeca3aa13c712fd202306e145abf59a18b094a6bafb2d2bbf59192c", size = 2129627, upload-time = "2026-05-06T13:37:25.033Z" },
{ url = "https://files.pythonhosted.org/packages/94/33/2414be571d2c6a6c4d08be21f9292b6d3fdb08949a97b6dfe985017821db/pydantic_core-2.46.4-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1a7dd0b3ee80d90150e3495a3a13ac34dbcbfd4f012996a6a1d8900e91b5c0fb", size = 2179141, upload-time = "2026-05-06T13:37:14.046Z" },
{ url = "https://files.pythonhosted.org/packages/7b/79/7daa95be995be0eecc4cf75064cb33f9bbbfe3fe0158caf2f0d4a996a5c7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:3fb702cd90b0446a3a1c5e470bfa0dd23c0233b676a9099ddcc964fa6ca13898", size = 2184325, upload-time = "2026-05-06T13:36:53.615Z" },
{ url = "https://files.pythonhosted.org/packages/9f/cb/d0a382f5c0de8a222dc61c65348e0ce831b1f68e0a018450d31c2cace3a5/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:b8458003118a712e66286df6a707db01c52c0f52f7db8e4a38f0da1d3b94fc4e", size = 2323990, upload-time = "2026-05-06T13:40:29.971Z" },
{ url = "https://files.pythonhosted.org/packages/05/db/d9ba624cc4a5aced1598e88c04fdbd8310c8a69b9d38b9a3d39ce3a61ed7/pydantic_core-2.46.4-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:372429a130e469c9cd698925ce5fc50940b7a1336b0d82038e63d5bbc4edc519", size = 2369978, upload-time = "2026-05-06T13:37:23.027Z" },
{ url = "https://files.pythonhosted.org/packages/f2/20/d15df15ba918c423461905802bfd2981c3af0bfa0e40d05e13edbfa48bc3/pydantic_core-2.46.4-cp314-cp314-win32.whl", hash = "sha256:85bb3611ff1802f3ee7fdd7dbff26b56f343fb432d57a4728fdd49b6ef35e2f4", size = 1966354, upload-time = "2026-05-06T13:38:03.499Z" },
{ url = "https://files.pythonhosted.org/packages/fc/b6/6b8de4c0a7d7ab3004c439c80c5c1e0a3e8d78bbae19379b01960383d9e5/pydantic_core-2.46.4-cp314-cp314-win_amd64.whl", hash = "sha256:811ff8e9c313ab425368bcbb36e5c4ebd7108c2bbf4e4089cfbb0b01eff63fac", size = 2072238, upload-time = "2026-05-06T13:39:40.807Z" },
{ url = "https://files.pythonhosted.org/packages/32/36/51eb763beec1f4cf59b1db243a7dcc39cbb41230f050a09b9d69faaf0a48/pydantic_core-2.46.4-cp314-cp314-win_arm64.whl", hash = "sha256:bfec22eab3c8cc2ceec0248aec886624116dc079afa027ecc8ad4a7e62010f8a", size = 2018251, upload-time = "2026-05-06T13:37:26.72Z" },
{ url = "https://files.pythonhosted.org/packages/e8/91/855af51d625b23aa987116a19e231d2aaef9c4a415273ddc189b79a45fee/pydantic_core-2.46.4-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:af8244b2bef6aaad6d92cda81372de7f8c8d36c9f0c3ea36e827c60e7d9467a0", size = 2099593, upload-time = "2026-05-06T13:39:47.682Z" },
{ url = "https://files.pythonhosted.org/packages/fb/1b/8784a54c65edb5f49f0a14d6977cf1b209bba85a4c77445b255c2de58ab3/pydantic_core-2.46.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:5a4330cdbc57162e4b3aa303f588ba752257694c9c9be3e7ebb11b4aca659b5d", size = 1935226, upload-time = "2026-05-06T13:40:40.428Z" },
{ url = "https://files.pythonhosted.org/packages/e8/e7/1955d28d1afc56dd4b3ad7cc0cf39df1b9852964cf16e5d13912756d6d6b/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:29c61fc04a3d840155ff08e475a04809278972fe6aef51e2720554e96367e34b", size = 1974605, upload-time = "2026-05-06T13:37:32.029Z" },
{ url = "https://files.pythonhosted.org/packages/93/e2/3fedbf0ba7a22850e6e9fd78117f1c0f10f950182344d8a6c535d468fdd8/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:c50f2528cf200c5eed56faf3f4e22fcd5f38c157a8b78576e6ba3168ec35f000", size = 2030777, upload-time = "2026-05-06T13:38:55.239Z" },
{ url = "https://files.pythonhosted.org/packages/f8/61/46be275fcaaba0b4f5b9669dd852267ce1ff616592dccf7a7845588df091/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:0cbe8b01f948de4286c74cdd6c667aceb38f5c1e26f0693b3983d9d74887c65e", size = 2236641, upload-time = "2026-05-06T13:37:08.096Z" },
{ url = "https://files.pythonhosted.org/packages/60/db/12e93e46a8bac9988be3c016860f83293daea8c716c029c9ace279036f2f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:617d7e2ca7dcb8c5cf6bcb8c59b8832c94b36196bbf1cbd1bfb56ed341905edd", size = 2286404, upload-time = "2026-05-06T13:40:20.221Z" },
{ url = "https://files.pythonhosted.org/packages/e2/4a/4d8b19008f38d31c53b8219cfedc2e3d5de5fe99d90076b7e767de29274f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7027560ee92211647d0d34e3f7cd6f50da56399d26a9c8ad0da286d3869a53f3", size = 2109219, upload-time = "2026-05-06T13:38:12.153Z" },
{ url = "https://files.pythonhosted.org/packages/88/70/3cbc40978fefb7bb09c6708d40d4ad1a5d70fd7213c3d17f971de868ec1f/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:f99626688942fb746e545232e7726926f3be91b5975f8b55327665fafda991c7", size = 2110594, upload-time = "2026-05-06T13:40:02.971Z" },
{ url = "https://files.pythonhosted.org/packages/9d/20/b8d36736216e29491125531685b2f9e61aa5b4b2599893f8268551da3338/pydantic_core-2.46.4-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fc3e9034a63de20e15e8ade85358bc6efc614008cab72898b4b4952bea0509ff", size = 2159542, upload-time = "2026-05-06T13:39:27.506Z" },
{ url = "https://files.pythonhosted.org/packages/1d/a2/367df868eb584dacf6bf82a389272406d7178e301c4ac82545ab98bc2dd9/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:97e7cf2be5c77b7d1a9713a05605d49460d02c6078d38d8bef3cbe323c548424", size = 2168146, upload-time = "2026-05-06T13:38:31.93Z" },
{ url = "https://files.pythonhosted.org/packages/c1/b8/4460f77f7e201893f649a29ab355dddd3beee8a97bcb1a320db414f9a06e/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:3bf92c5d0e00fefaab325a4d27828fe6b6e2a21848686b5b60d2d9eeb09d76c6", size = 2306309, upload-time = "2026-05-06T13:37:44.717Z" },
{ url = "https://files.pythonhosted.org/packages/64/c4/be2639293acd87dc8ddbcec41a73cee9b2ebf996fe6d892a1a74e88ad3f7/pydantic_core-2.46.4-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:3ecbc122d18468d06ca279dc26a8c2e2d5acb10943bb35e36ae92096dc3b5565", size = 2369736, upload-time = "2026-05-06T13:37:05.645Z" },
{ url = "https://files.pythonhosted.org/packages/30/a6/9f9f380dbb301f67023bf8f707aaa75daadf84f7152d95c410fd7e81d994/pydantic_core-2.46.4-cp314-cp314t-win32.whl", hash = "sha256:e846ae7835bf0703ae43f534ab79a867146dadd59dc9ca5c8b53d5c8f7c9ef02", size = 1955575, upload-time = "2026-05-06T13:38:51.116Z" },
{ url = "https://files.pythonhosted.org/packages/40/1f/f1eb9eb350e795d1af8586289746f5c5677d16043040d63710e22abc43c9/pydantic_core-2.46.4-cp314-cp314t-win_amd64.whl", hash = "sha256:2108ba5c1c1eca18030634489dc544844144ee36357f2f9f780b93e7ddbb44b5", size = 2051624, upload-time = "2026-05-06T13:38:21.672Z" },
{ url = "https://files.pythonhosted.org/packages/f6/d2/42dd53d0a85c27606f316d3aa5d2869c4e8470a5ed6dec30e4a1abe19192/pydantic_core-2.46.4-cp314-cp314t-win_arm64.whl", hash = "sha256:4fcbe087dbc2068af7eda3aa87634eba216dbda64d1ae73c8684b621d33f6596", size = 2017325, upload-time = "2026-05-06T13:40:52.723Z" },
]
[[package]]
name = "pygments"
version = "2.20.0"
@@ -166,6 +333,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151, upload-time = "2026-03-29T13:29:30.038Z" },
]
[[package]]
name = "pytesseract"
version = "0.3.13"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "packaging" },
{ name = "pillow" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9f/a6/7d679b83c285974a7cb94d739b461fa7e7a9b17a3abfd7bf6cbc5c2394b0/pytesseract-0.3.13.tar.gz", hash = "sha256:4bf5f880c99406f52a3cfc2633e42d9dc67615e69d8a509d74867d3baddb5db9", size = 17689, upload-time = "2024-08-16T02:33:56.762Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7a/33/8312d7ce74670c9d39a532b2c246a853861120486be9443eebf048043637/pytesseract-0.3.13-py3-none-any.whl", hash = "sha256:7a99c6c2ac598360693d83a416e36e0b33a67638bb9d77fdcac094a3589d4b34", size = 14705, upload-time = "2024-08-16T02:36:10.09Z" },
]
[[package]]
name = "pytest"
version = "9.0.2"
@@ -243,6 +423,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/65/eb/db13ab9b8d54e04f42b6619acca417ee37b07eb141a54884d13d20d7459e/regipy-6.2.1-py3-none-any.whl", hash = "sha256:b03110e5c4e12385e1ba53c032ccd120c6dcde1b71afb8c3b7aa4717a5a24e43", size = 134861, upload-time = "2026-01-22T15:26:05.653Z" },
]
[[package]]
name = "sniffio"
version = "1.3.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a2/87/a6771e1546d97e7e041b6ae58d80074f81b7d5121207425c964ddf5cfdbd/sniffio-1.3.1.tar.gz", hash = "sha256:f4324edc670a0f49750a81b895f35c3adb843cca46f0530f79fc1babb23789dc", size = 20372, upload-time = "2024-02-25T23:20:04.057Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl", hash = "sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2", size = 10235, upload-time = "2024-02-25T23:20:01.196Z" },
]
[[package]]
name = "socksio"
version = "1.0.0"
@@ -251,3 +440,36 @@ sdist = { url = "https://files.pythonhosted.org/packages/f8/5c/48a7d9495be3d1c65
wheels = [
{ url = "https://files.pythonhosted.org/packages/37/c3/6eeb6034408dac0fa653d126c9204ade96b819c936e136c5e8a6897eee9c/socksio-1.0.0-py3-none-any.whl", hash = "sha256:95dc1f15f9b34e8d7b16f06d74b8ccf48f609af32ab33c608d08761c5dcbb1f3", size = 12763, upload-time = "2020-04-17T15:50:31.878Z" },
]
[[package]]
name = "tqdm"
version = "4.67.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
]
[[package]]
name = "typing-extensions"
version = "4.15.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
]
[[package]]
name = "typing-inspection"
version = "0.4.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/55/e3/70399cb7dd41c10ac53367ae42139cf4b1ca5f36bb3dc6c9d33acdb43655/typing_inspection-0.4.2.tar.gz", hash = "sha256:ba561c48a67c5958007083d386c3295464928b01faa735ab8547c5692e87f464", size = 75949, upload-time = "2025-10-01T02:14:41.687Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/dc/9b/47798a6c91d8bdb567fe2698fe81e0c6b7cb7ef4d13da4114b41d239f65d/typing_inspection-0.4.2-py3-none-any.whl", hash = "sha256:4ed1cacbdc298c220f1bd249ed5287caa16f34d44ef4e9c3d0cbad5b521545e7", size = 14611, upload-time = "2025-10-01T02:14:40.154Z" },
]