Skip to main content

Command Palette

Search for a command to run...

Ship-Bench: Benchmarks

Series

Ship-Bench: Benchmarks

Ship-Bench tests if AI agents and coding tools can actually ship realistic software. It evaluates LLMs across a full agentic SDLC workflow: planning, architecture, UX/design, implementation, and QA.