# EGPv2 Diagnosis Subset (60 episodes) This subset is designed for workflow debugging, not headline scoring. ## Design goals - Stress `SQ1` to see whether the pipeline drifts from device-health diagnosis into generic occupancy or intrusion narratives. - Stress `SQ3` and `SQ4` on `L2/L3` cases to expose chunk-selection drift, evidence hallucination, and weak supervisor corrections. - Add a small `SQ2/SQ5` control pack to check whether errors are specific to the new workflow rather than universal. - Add 4 `TN` episodes to separate pure-normal false alarms from near-negative `FP` failures. ## Composition - `SQ1`: 12 episodes - 6 `TP` + 6 `FP` - one paired `TP/FP` example for each `DF-01 ... DF-06` - `SQ3`: 16 episodes - 8 `TP` + 8 `FP` - intrusion / behavior / elderly / child coverage - mostly `L2/L3` - `SQ4`: 16 episodes - 8 `TP` + 8 `FP` - fire-gas / behavior / elderly / child coverage - mostly `L2/L3` - `SQ2`: 6 episodes - intrusion / fire-gas / water-damage controls - `SQ5`: 6 episodes - emergency-planning controls - `TN sanity`: 4 episodes ## Recommended run order 1. Run this 60-episode subset first. 2. Inspect `results.jsonl` and focus on: - `egpv2_trace.triage_parsed` - `egpv2_trace.investigator_parsed` - `egpv2_trace.supervisor_parsed` - final `model_response` 3. If the same module-level error repeats, fix prompts or signal extraction before any larger run. ## PowerShell command ```powershell python EGPv2/run_egpv2.py ` --model Qwen/Qwen3.5-9B ` --api-base http://localhost:8000/v1 ` --no_thinking ` --max-tokens 4096 ` --workers 4 ` --output-dir results/qwen35_egpv2_diag60 ` --episode-id $(Get-Content EGPv2/diagnosis_subset_60.txt) ``` ## Group breakdown ### SQ1 Drift Pack (12) - `SQ1_TP_C_0005` / `SQ1_FP_C_0085` : `DF-01`, `L3` - `SQ1_TP_A_0006` / `SQ1_FP_A_0083` : `DF-02`, `L3` - `SQ1_TP_B_0000` / `SQ1_FP_B_0088` : `DF-03`, `L2` - `SQ1_TP_A_0036` / `SQ1_FP_A_0080` : `DF-04`, `L2` - `SQ1_TP_B_0011` / `SQ1_FP_B_0092` : `DF-05`, `L1` - `SQ1_TP_A_0004` / `SQ1_FP_C_0081` : `DF-06`, `L2` ### SQ3 Hard Pack (16) - `SQ3_TP_B_0457` / `SQ3_FP_C_0592` : `INS-01`, `L2` - `SQ3_TP_A_0433` / `SQ3_FP_B_0583` : `INS-05`, `L3` - `SQ3_TP_A_0478` / `SQ3_FP_B_0575` : `BA-03`, `L2` - `SQ3_TP_B_0452` / `SQ3_FP_C_0642` : `BA-01`, `L3` - `SQ3_TP_D_0464` / `SQ3_FP_D_0620` : `EL-03`, `L2` - `SQ3_TP_D_0443` / `SQ3_FP_D_0565` : `EL-07`, `L3` - `SQ3_TP_C_0447` / `SQ3_FP_C_0614` : `CH-02`, `L2` - `SQ3_TP_C_0444` / `SQ3_FP_C_0581` : `CH-04`, `L2` ### SQ4 Hard Pack (16) - `SQ4_TP_B_0721` / `SQ4_FP_B_0885` : `FG-02`, `L2` - `SQ4_TP_A_0720` / `SQ4_FP_A_0857` : `FG-01`, `L3` - `SQ4_TP_B_0768` / `SQ4_FP_C_0861` : `BA-03`, `L2` - `SQ4_TP_B_0722` / `SQ4_FP_B_0916` : `BA-01`, `L3` - `SQ4_TP_D_0745` / `SQ4_FP_D_0878` : `EL-03`, `L2` - `SQ4_TP_D_0752` / `SQ4_FP_D_0851` : `EL-02`, `L3` - `SQ4_TP_C_0737` / `SQ4_FP_C_0854` : `CH-01`, `L2` - `SQ4_TP_C_0727` / `SQ4_FP_C_0880` : `CH-04`, `L2` ### SQ2/SQ5 Controls (12) - `SQ2_TP_B_0192` / `SQ2_FP_A_0329` : `INS-02`, `L2` - `SQ2_TP_D_0206` / `SQ2_FP_D_0299` : `FG-03`, `L1` - `SQ2_TP_B_0220` / `SQ2_FP_C_0307` : `WD-03`, `L2` - `SQ5_TP_B_1054` / `SQ5_FP_B_1116` : `INS-04`, `L3` - `SQ5_TP_B_1037` / `SQ5_FP_B_1142` : `FG-02`, `L2` - `SQ5_TP_D_1012` / `SQ5_FP_B_1124` : `WD-01`, `L1` ### TN Sanity Pack (4) - `SQ1_TN_A_0135` - `SQ3_TN_A_0665` - `SQ4_TN_A_0961` - `SQ5_TN_A_1173`