
Burger.AI was a Cornell AI NYC Hackathon (February 2026) project that became a finalist. The system implements an automated pipeline for testing agentic financial AI: models that plan and call tools in a loop rather than returning a single reply.
A central idea was bridging the intent–action gap with explicit guardrails. I used pre-tool and post-tool hooks to validate JSON payloads, redact PII before it left controlled paths, and keep tool use aligned with policy.
Against that harness, baseline pass rates of 36% (Claude Sonnet 4.6) and 59% (GPT-4o-mini) improved to about 98% when measured against the OWASP-oriented criteria we targeted — a concrete signal that structure around tool calls matters as much as model choice.