YatCC-Hard-Pro Leaderboard

Pipeline-mode hard benchmark with fully isolated runs and strict task chaining. Data is normalized from container run summaries.

YatCC YatCC-Hard YatCC-Hard-Pro
#Model T0T1T2T3T4T5 Mean RewardPipeline๐Ÿ”„

๐Ÿงช About YatCC-Hard-Pro

YatCC-Hard-Pro is built from raw container-runs-summary-pipeline.json records, then normalized to the same leaderboard schema as YatCC and YatCC-Hard.

Mean Reward uses weighted task aggregation with weights [5%, 20%, 20%, 15%, 30%, 10%]. Empty or missing task outputs are treated as 0.

Powered by RAMP