Harbor Multi-Turn Dashboard

MULTI-TURN USER-AGENT INTERACTION SUMMARY

Functional Reward by Round

Average functional reward across tasks by round. Missing task-round rewards are excluded from each average.

MeanJudge by Round

Average correctness judge score by round, after averaging replicate scores within each task.

SSR by Round

Stable Solve Rate by round, using the fixed task denominator and the 0.85 correctness threshold.

Tasks

Job-level task outcomes. Select a task to inspect its native multi-turn timeline.