Pipeline-mode hard benchmark with fully isolated runs and strict task chaining. Data is normalized from container run summaries.
| # | Model | T0 | T1 | T2 | T3 | T4 | T5 | Mean Reward | Pipeline | ๐ |
|---|
YatCC-Hard-Pro is built from raw container-runs-summary-pipeline.json records, then normalized to the same leaderboard schema as YatCC and YatCC-Hard.
Mean Reward uses weighted task aggregation with weights [5%, 20%, 20%, 15%, 30%, 10%]. Empty or missing task outputs are treated as 0.
Powered by RAMP