feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status

DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses to ground its decision each round. graph_overview() — hypotheses table (log_odds, conf, edges_in, distinct_sources, recent_flip), sources table, pending leads. distinct_sources is the critical signal: a hypothesis with 23 edges but only 1 distinct_source has fragile cross- source independence and is a candidate for a corroboration-seeking lead. source_coverage(src) — per-source ✓/✗ against an expected-artefact catalogue. Catalogue is heuristic hints, NOT a forced checklist. Footer reminds the strategist to investigate ✗ items only when an active hypothesis depends on them — this is the "应试能力存在但不被绑死" guardrail. marginal_yield(N) — new phenomena / edges / status flips per recent round. Two consecutive zero-yield rounds = strong signal to declare complete. budget_status() — usage vs caps (tool_calls, rounds, wall clock). Pacing warnings at 70% / 90%. tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type table of (name, detector, value_for) entries. Detectors are substring patterns on tool name + args; the matcher resolves at call time against graph.tool_invocations. Catalogue covers iOS / Android / Windows disk / media-collection / archive source types. All four tools registered in tool_registry, listed as read-only in llm_client.READ_ONLY_TOOLS for parallel execution. They go through the invocation-logging wrapper so the strategist's reads are themselves auditable (the wrapper does NOT cache them — graph state changes between calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:19:54 -10:00
parent ca96f29849
commit 6ebbc675c1
4 changed files with 660 additions and 0 deletions
--- a/tool_registry.py
+++ b/tool_registry.py
@@ -24,6 +24,7 @@ from tools import mobile_ios as ios
 from tools import parsers
 from tools import registry as reg
 from tools import sleuthkit as tsk
+from tools import strategy as strat

 logger = logging.getLogger(__name__)

@@ -985,6 +986,97 @@ def register_all_tools(graph: Any) -> None:
        tags=["media", "ocr", "image"],
    )

+    # ---- Strategist-loop view tools (DESIGN_STRATEGIST.md §2) ----
+    # Pure read-only renders over graph state. The strategist agent uses
+    # these to decide whether to keep investigating or to declare complete.
+    # They go through invocation logging like every other tool (so the
+    # strategist's reads are auditable) but are NOT cacheable — graph
+    # state changes between calls and a stale snapshot would mislead.
+
+    async def _exec_graph_overview() -> str:
+        return strat.graph_overview(graph)
+
+    TOOL_CATALOG["graph_overview"] = ToolDefinition(
+        name="graph_overview",
+        description=(
+            "Top-level investigation state: hypotheses (with log-odds, "
+            "confidence, edges_in, distinct_sources contributing, recent "
+            "status flips), sources (phenomena/identity counts, last-touched "
+            "round), and pending leads. Always call this first when deciding "
+            "the next strategist action."
+        ),
+        input_schema={"type": "object", "properties": {}},
+        executor=_exec_graph_overview,
+        module="strategy",
+        tags=["strategy", "overview", "read-only"],
+    )
+
+    async def _exec_source_coverage(source_id: str) -> str:
+        return strat.source_coverage(graph, source_id)
+
+    TOOL_CATALOG["source_coverage"] = ToolDefinition(
+        name="source_coverage",
+        description=(
+            "Per-source artefact coverage report: which expected categories "
+            "have been touched (✓) vs not (✗) on the given source. Coverage "
+            "items are heuristic hints, not requirements — investigate ✗ "
+            "items only when an active hypothesis depends on them."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "source_id": {"type": "string", "description": "Source id, e.g. 'src-ios-chan'."},
+            },
+            "required": ["source_id"],
+        },
+        executor=_exec_source_coverage,
+        module="strategy",
+        tags=["strategy", "coverage", "read-only"],
+    )
+
+    async def _exec_marginal_yield(last_n_rounds: int = 2) -> str:
+        return strat.marginal_yield(graph, int(last_n_rounds))
+
+    TOOL_CATALOG["marginal_yield"] = ToolDefinition(
+        name="marginal_yield",
+        description=(
+            "How much information the last N investigation rounds added: "
+            "new phenomena, new edges, and hypothesis status flips per round. "
+            "Two consecutive zero-yield rounds means diminishing returns are "
+            "decisive — declare_investigation_complete with reason "
+            "marginal_yield_zero."
+        ),
+        input_schema={
+            "type": "object",
+            "properties": {
+                "last_n_rounds": {"type": "integer", "description": "How many recent rounds to summarise (default 2)."},
+            },
+        },
+        executor=_exec_marginal_yield,
+        module="strategy",
+        tags=["strategy", "yield", "read-only"],
+    )
+
+    async def _exec_budget_status() -> str:
+        return strat.budget_status(
+            graph,
+            getattr(graph, "budgets", None),
+            getattr(graph, "run_start_monotonic", None),
+        )
+
+    TOOL_CATALOG["budget_status"] = ToolDefinition(
+        name="budget_status",
+        description=(
+            "Budget vs caps: tool_calls, strategist_rounds, wall_clock_minutes. "
+            "Includes pacing hints when usage crosses 70% / 90% thresholds. "
+            "Use this to decide whether to keep proposing leads or to wind down."
+        ),
+        input_schema={"type": "object", "properties": {}},
+        executor=_exec_budget_status,
+        module="strategy",
+        tags=["strategy", "budget", "read-only"],
+    )
+
    # ---- Wrap every executor with invocation logging (+ cache + auto-record) ----
    # Must run AFTER all tools are registered. Every tool call now produces
    # a ToolInvocation entry on the graph (provenance for grounding), and