feat(strategist) S2: graph_overview / source_coverage / marginal_yield / budget_status

DESIGN_STRATEGIST.md §2. Four read-only view tools the strategist uses to ground its decision each round. graph_overview() — hypotheses table (log_odds, conf, edges_in, distinct_sources, recent_flip), sources table, pending leads. distinct_sources is the critical signal: a hypothesis with 23 edges but only 1 distinct_source has fragile cross- source independence and is a candidate for a corroboration-seeking lead. source_coverage(src) — per-source ✓/✗ against an expected-artefact catalogue. Catalogue is heuristic hints, NOT a forced checklist. Footer reminds the strategist to investigate ✗ items only when an active hypothesis depends on them — this is the "应试能力存在但不被绑死" guardrail. marginal_yield(N) — new phenomena / edges / status flips per recent round. Two consecutive zero-yield rounds = strong signal to declare complete. budget_status() — usage vs caps (tool_calls, rounds, wall clock). Pacing warnings at 70% / 90%. tools/strategy.py also exports EXPECTED_ARTEFACTS, a per-source-type table of (name, detector, value_for) entries. Detectors are substring patterns on tool name + args; the matcher resolves at call time against graph.tool_invocations. Catalogue covers iOS / Android / Windows disk / media-collection / archive source types. All four tools registered in tool_registry, listed as read-only in llm_client.READ_ONLY_TOOLS for parallel execution. They go through the invocation-logging wrapper so the strategist's reads are themselves auditable (the wrapper does NOT cache them — graph state changes between calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 02:19:54 -10:00
parent ca96f29849
commit 6ebbc675c1
4 changed files with 660 additions and 0 deletions
--- a/tests/test_optimizations.py
+++ b/tests/test_optimizations.py
@@ -3214,3 +3214,84 @@ class TestInvestigationRound:
        assert r.decision_rationale == "probe complete"
        assert hid in r.hypothesis_status_snapshot_before

+    @pytest.mark.asyncio
+    async def test_strategy_tool_helpers(self):
+        """Smoke-test the strategy tool renders against a small graph.
+        Renders are markdown strings — we assert structural anchors rather
+        than full text so the test is robust to wording tweaks.
+        """
+        from tools import strategy
+        from case import Case, EvidenceSource
+
+        graph = EvidenceGraph()
+        src = EvidenceSource(
+            id="src-test", label="test iOS", type="mobile_extraction",
+            access_mode="tree", path="/tmp/x",
+        )
+        graph.case = Case(case_id="c", name="c", sources=[src])
+        graph.set_active_source(src)
+        graph._current_agent = "ios_artifact"
+        graph._current_task_id = "task-1"
+
+        # graph_overview on empty graph still renders.
+        ov = strategy.graph_overview(graph)
+        assert "# Investigation State" in ov
+        assert "_(none yet" in ov  # hypotheses section
+        assert "src-test" in ov
+
+        # Source coverage — every entry should be ✗ initially.
+        cov = strategy.source_coverage(graph, "src-test")
+        assert "Coverage: **0/" in cov
+        assert "✗" in cov
+        assert "Coverage hints are heuristics" in cov
+
+        # Record one invocation that matches the AddressBook detector.
+        await graph.record_tool_invocation(
+            tool="sqlite_query",
+            args={"db_path": "/x/var/mobile/Library/AddressBook/AddressBook.sqlitedb"},
+            output="contact list",
+        )
+        cov2 = strategy.source_coverage(graph, "src-test")
+        assert "Coverage: **1/" in cov2 or "Coverage: **2/" in cov2
+
+        # marginal_yield: no rounds → empty render.
+        my = strategy.marginal_yield(graph)
+        assert "no completed investigation rounds" in my
+
+        # budget_status with no budgets shows unbounded.
+        bs = strategy.budget_status(graph, None, None)
+        assert "tool_calls" in bs
+        assert "(unbounded)" in bs
+
+        # budget_status with budgets + pacing hint.
+        bs2 = strategy.budget_status(
+            graph,
+            {"tool_calls_total": 1, "strategist_rounds_max": 1},
+            None,
+        )
+        assert "≥ 90%" in bs2  # already over 90% (1 of 1 tool calls used)
+
+    @pytest.mark.asyncio
+    async def test_marginal_yield_after_two_rounds(self):
+        """Verify marginal_yield captures phenomena/edge/status deltas."""
+        from tools import strategy
+
+        graph = EvidenceGraph()
+        hid = await graph.add_hypothesis("h", "d")
+
+        rid1 = await graph.start_investigation_round(1)
+        pid, _ = await graph.add_phenomenon(
+            "fs", "filesystem", "p1", "interp", source_tool="t",
+        )
+        await graph.update_hypothesis_confidence(hid, pid, "direct_evidence", "")
+        await graph.complete_investigation_round(rid1)
+
+        rid2 = await graph.start_investigation_round(2)
+        await graph.complete_investigation_round(rid2)
+
+        out = strategy.marginal_yield(graph, last_n_rounds=2)
+        assert "R1" in out and "R2" in out
+        assert "Trend" in out
+        assert ("collapsed" in out or "Decelerating" in out
+                or "Diminishing" in out or "diminishing" in out)
+