refactor: native tool calling + generic forced-retry + terminal exit
- llm_client: switch tool_call_loop from text-based <tool_call> regex to OpenAI-native tools=[...] / structured tool_calls field; accumulate delta.reasoning_content for DeepSeek thinking-mode echo-back; fold preserves system msg and aligns boundary to never orphan role:tool - base_agent: generic forced-retry via mandatory_record_tools class attr (filesystem -> add_phenomenon, timeline -> add_temporal_edge, hypothesis -> add_hypothesis, report -> save_report); count via executor wrapper - terminal_tools class attr + loop short-circuit: when a terminal tool is called, loop exits with its raw return as final_text. ReportAgent declares save_report as terminal - replaces the <answer>-tag stop signal that native tool calling broke - _execute_*: return (raw, formatted) - terminal exit uses untruncated raw, conversation history uses 3000-char-capped formatted - evidence_graph + orchestrator: LLM-derived InvestigationArea support (hypothesis-driven coverage check, replaces hardcoded _AREA_KEYWORDS / _AREA_TOOLS); manual yaml block kept as optional seed - strip <answer> references from agent prompts (no longer load-bearing) Verified on CFReDS image across 4 smoke runs: 0 JSON parse failures (was 3); 22 temporal edges from Phase 4 (was 0); ReportAgent exits via save_report (was max_iterations regression). 78/78 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -31,11 +31,26 @@ class BaseAgent:
|
||||
name: str = "base"
|
||||
role: str = "A forensic analysis agent."
|
||||
|
||||
# Tools the agent MUST invoke at least once for the run to count as productive.
|
||||
# If none of these were called when tool_call_loop returns, run() fires a
|
||||
# forced retry with an explicit "you forgot to record" instruction.
|
||||
# Subclasses override to declare their own recording responsibility
|
||||
# (timeline → add_temporal_edge, hypothesis → add_hypothesis, report → save_report).
|
||||
mandatory_record_tools: tuple[str, ...] = ("add_phenomenon",)
|
||||
|
||||
# Tools whose invocation ends the run immediately. After any terminal tool
|
||||
# is called, tool_call_loop returns with that tool's result text as
|
||||
# final_text. Used by agents whose "completion" is a single explicit
|
||||
# action rather than "model decides to stop calling tools". For multi-call
|
||||
# agents (filesystem records many phenomena) leave empty.
|
||||
terminal_tools: tuple[str, ...] = ()
|
||||
|
||||
def __init__(self, llm: LLMClient, graph: EvidenceGraph) -> None:
|
||||
self.llm = llm
|
||||
self.graph = graph
|
||||
self._tools: dict[str, dict] = {} # name -> schema
|
||||
self._executors: dict[str, Any] = {} # name -> async callable
|
||||
self._record_call_counts: dict[str, int] = {}
|
||||
self._work_log: list[str] = []
|
||||
self._current_lead_id: str | None = None
|
||||
|
||||
@@ -52,7 +67,18 @@ class BaseAgent:
|
||||
"description": description,
|
||||
"input_schema": input_schema,
|
||||
}
|
||||
self._executors[name] = executor
|
||||
if name in self.mandatory_record_tools:
|
||||
self._executors[name] = self._wrap_record_executor(name, executor)
|
||||
else:
|
||||
self._executors[name] = executor
|
||||
|
||||
def _wrap_record_executor(self, name: str, executor: Any) -> Any:
|
||||
"""Wrap a mandatory-record executor to count successful invocations."""
|
||||
async def wrapped(*args, **kwargs):
|
||||
result = await executor(*args, **kwargs)
|
||||
self._record_call_counts[name] = self._record_call_counts.get(name, 0) + 1
|
||||
return result
|
||||
return wrapped
|
||||
|
||||
def get_tool_definitions(self) -> list[dict]:
|
||||
"""Get tool definitions in Claude API format."""
|
||||
@@ -91,20 +117,19 @@ class BaseAgent:
|
||||
f" FIRST call list_phenomena to get the current IDs — do NOT rely on memory.\n"
|
||||
f" Then call link_to_entity for each relevant phenomenon.\n"
|
||||
f" NEVER guess or fabricate a phenomenon ID. If an ID is not in list_phenomena output, it does not exist.\n\n"
|
||||
f"Phase D — ANSWER:\n"
|
||||
f" Only give your <answer> AFTER completing Phases B and C.\n\n"
|
||||
f"Phase D — STOP:\n"
|
||||
f" Once all phenomena are recorded and entities linked, you are DONE.\n"
|
||||
f" Do not call any more tools. The orchestrator picks up automatically.\n\n"
|
||||
f"CRITICAL — RECORDING REQUIREMENT:\n"
|
||||
f"- Your <answer> block is DISCARDED by the orchestrator. Only graph mutations propagate.\n"
|
||||
f"- Other agents and the final report read ONLY the evidence graph "
|
||||
f"(phenomena, entities, edges).\n"
|
||||
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you end.\n"
|
||||
f"- Only graph mutations propagate to other agents and the final report.\n"
|
||||
f"- You MUST call add_phenomenon for EVERY significant finding BEFORE you stop.\n"
|
||||
f"- NEGATIVE findings count too. If you searched X (a directory, a pattern, "
|
||||
f"a registry key) and found NOTHING, that absence IS evidence — call "
|
||||
f"add_phenomenon with a 'No matches for X' title and the search scope in "
|
||||
f"raw_data. Negative findings constrain the hypothesis space and prevent "
|
||||
f"the next agent from wasting time re-searching.\n"
|
||||
f"- If you produce <answer> without having called add_phenomenon at least once, "
|
||||
f"the task is FAILED regardless of what you wrote in <answer>.\n"
|
||||
f"- If you stop without having called add_phenomenon at least once, the task "
|
||||
f"is FAILED and a forced retry will fire.\n"
|
||||
f"- Include exact file paths, inode numbers, timestamps, and the source_tool "
|
||||
f"that produced each finding.\n\n"
|
||||
f"ANTI-HALLUCINATION RULES — STRICTLY ENFORCED:\n"
|
||||
@@ -124,6 +149,7 @@ class BaseAgent:
|
||||
self._current_lead_id = lead_id
|
||||
|
||||
self._register_graph_tools()
|
||||
self._record_call_counts.clear()
|
||||
|
||||
system = self._build_system_prompt(task)
|
||||
messages = [{"role": "user", "content": task}]
|
||||
@@ -132,12 +158,60 @@ class BaseAgent:
|
||||
ph_before = len(self.graph.phenomena)
|
||||
|
||||
try:
|
||||
final_text, _ = await self.llm.tool_call_loop(
|
||||
final_text, conversation = await self.llm.tool_call_loop(
|
||||
messages=messages,
|
||||
tools=self.get_tool_definitions(),
|
||||
tool_executor=self._executors,
|
||||
system=system,
|
||||
terminal_tools=self.terminal_tools,
|
||||
)
|
||||
|
||||
# Forced-record retry: if the agent has any mandatory recording
|
||||
# tools but never invoked any of them, force one more round with
|
||||
# an explicit "you forgot to record" instruction. The mandatory
|
||||
# set is declared on the class — Timeline → add_temporal_edge,
|
||||
# Hypothesis → add_hypothesis, ReportAgent → (). For agents with
|
||||
# empty mandatory_record_tools this branch is a no-op.
|
||||
registered_mandatory = [
|
||||
t for t in self.mandatory_record_tools if t in self._executors
|
||||
]
|
||||
recorded_any = any(
|
||||
self._record_call_counts.get(t, 0) > 0
|
||||
for t in registered_mandatory
|
||||
)
|
||||
if registered_mandatory and not recorded_any:
|
||||
missing = "/".join(registered_mandatory)
|
||||
logger.warning(
|
||||
"[%s] finished without calling any of [%s] — forcing RECORD retry",
|
||||
self.name, missing,
|
||||
)
|
||||
conversation.append({
|
||||
"role": "user",
|
||||
"content": (
|
||||
f"STOP. You produced an answer without ever calling "
|
||||
f"{missing}. Your answer is DISCARDED — only graph "
|
||||
f"mutations propagate to other agents and the final "
|
||||
f"report.\n\n"
|
||||
f"You MUST now call {missing} for every significant "
|
||||
f"finding from your prior investigation, including "
|
||||
f"exact identifiers, timestamps, and the source_tool "
|
||||
f"that produced each finding. If you genuinely found "
|
||||
f"NOTHING noteworthy, call the recording tool ONCE "
|
||||
f"with a 'No significant findings' style entry "
|
||||
f"summarizing what you searched.\n\n"
|
||||
f"Do not run more investigation tools. Just record "
|
||||
f"what you already found. Then end."
|
||||
),
|
||||
})
|
||||
final_text, _ = await self.llm.tool_call_loop(
|
||||
messages=conversation,
|
||||
tools=self.get_tool_definitions(),
|
||||
tool_executor=self._executors,
|
||||
system=system,
|
||||
max_iterations=10,
|
||||
terminal_tools=self.terminal_tools,
|
||||
)
|
||||
|
||||
self._work_log.append(f"[Task: {task[:80]}] -> {final_text[:150]}")
|
||||
except Exception:
|
||||
self.graph.agent_status[self.name] = "failed"
|
||||
|
||||
Reference in New Issue
Block a user