Document AI Failures That Matter
Join Expert AI Critics Building the First Public Database of AI Failures
Be recognized for spotting what AI gets dangerously wrong. Get verified status, featured in published reports, and build your authority as an AI quality expert.
Why This Matters
AI systems are making critical decisions in healthcare, law, finance, and code. But when they fail, those failures often go undocumented, unverified, and unfixed. Users don't know what went wrong. Developers don't get specific feedback. The public can't hold AI accountable.
We're changing that. The AI Failure Evidence Project brings together domain experts like you to publicly document what AI systems get wrong, why it matters, and who verified it. Not rankings. Evidence. Not anonymous votes. Expert attribution.
Be recognized for finding failures before they cause harm. Build your portfolio. Launch your AI consulting career. Make AI safer.
How to Participate
Install Chrome Extension
Install the GrandJury Chrome extension in under 2 minutes. No coding required.
Installation Guide →Choose Your Domain
Pick your area of expertise: Medical, Legal, AI Safety, Code Generation, or Finance. Or evaluate across all domains.
Evaluate Shell AIs
Interact with our shell AI apps (GPT-4, Claude, Gemini). Test them, find failures, document what went wrong.
Submit Your Evaluations
Vote (good/bad) and write detailed comments explaining failures. Your evaluations are publicly visible with your name and credentials.
Get Featured & Verified
Top contributors get verified AI Jury status, featured in our published report, and media coverage opportunities.
Shell AI Projects Across Domains
We've set up AI chatbots running major LLMs (GPT-4, Claude, Gemini) for you to evaluate. Choose your domain and work at your own pace.
Medical AI
Evaluate AI medical advice, symptom checking, drug interactions. Example projects: GPT-4 Medical, Claude Health
Legal AI
Evaluate AI legal analysis, contract review, compliance advice. Example projects: GPT-4 Legal, Gemini Law
AI Safety
Evaluate AI alignment, harmful outputs, safety failures. Example projects: Claude Safety, GPT-4 Alignment
Code Generation
Evaluate AI-generated code for bugs, security vulnerabilities. Example projects: GPT-4 Code, Claude Dev
Finance AI
Evaluate AI financial advice, risk analysis, market predictions. Example projects: Gemini Finance, GPT-4 Trading
Self-directed, flexible participation. Evaluate as much or as little as you want.
What You Get (No Cash Prizes)
This is a recognition-based competition. We don't offer cash prizes. We offer something more valuable: credibility and authority.
Verified AI Jury Status
Top contributors become verified GrandJury AI Jury with official badge on public profile.
Featured in Published Report
Your best evaluations featured in "State of AI Failures" report with your name, credentials, and expert commentary.
Public Portfolio
Build your portfolio with public evaluation pages showing your expert analysis, linked to your professional profiles.
Media Coverage Opportunity
Top evaluators highlighted in press releases and media outreach to AI journalists and tech publications.
Premium Rates When Marketplace Launches
Verified AI Jury earn higher rates when AI developers start hiring evaluators through our marketplace.
Self-Directed Work
Choose what you evaluate, work at your pace. No assigned tasks, no deadlines, no stress. Unlike Outlier.ai or Scale AI.
Competition Timeline
Phase 1: Evaluation Period
[Start Date] - [End Date]
Submit your evaluations, document failures, build your portfolio
Phase 2: Review & Selection
[Dates]
We review all submissions and select top contributors for verification
Phase 3: Report Publication
[Date]
"State of AI Failures" report published, featuring verified experts and key findings
Who We're Looking For
Anyone passionate about AI quality and accountability. Domain expertise helps, but critical thinking matters more.
AI Safety Researchers
Already analyzing AI risks, alignment issues, safety failures on Twitter or LessWrong
Medical Professionals
Doctors, nurses, healthcare researchers worried about AI medical advice
Legal Professionals
Lawyers, paralegals, compliance experts analyzing AI legal tools
Senior Engineers
Experienced developers who can spot bugs, security issues, code quality problems
Finance Professionals
Financial advisors, risk analysts, economists evaluating AI financial tools
Current AI Evaluators
Working at Outlier.ai, Scale AI, or similar platforms and want more autonomy and recognition
AI Critics
Anyone already criticizing AI systems publicly on Twitter, Reddit, forums - get recognized for it
Frequently Asked Questions
Is this paid?
No cash prizes. This is recognition-focused. Top contributors get verified status, featured in published reports, and premium rates when our marketplace launches. Think of it as building your portfolio and authority.
How much time is required?
Completely flexible. Evaluate as much or as little as you want. Some participants submit 10 evaluations, others submit 100+. Quality matters more than quantity.
What if I'm not a domain expert?
Domain expertise helps but isn't required initially. If you can spot AI failures and explain why they matter, you're qualified. We review submissions based on insight quality, not credentials alone.
Are my evaluations really public?
Yes. All evaluations appear on public pages with your name and credentials (if you choose to provide them). This is core to our mission: public accountability requires public attribution.
What happens after the competition?
Top contributors (approximately 50 experts) receive verified AI Jury status. You'll have access to our marketplace where AI developers hire verified evaluators. You can continue evaluating new projects and earning recognition.
Can I stay anonymous?
We strongly encourage public attribution (it's our differentiator), but we can accommodate anonymity for sensitive cases. However, anonymous evaluations are less likely to be featured in reports or earn verified status.
How are winners selected?
We manually review all submissions based on: (1) Comment depth and insight quality, (2) Evidence specificity and clarity, (3) Domain expertise relevance, (4) Volume and consistency. It's not purely based on number of submissions.
What tools/software do I need?
Just Chrome browser and our Chrome extension. No coding, no special software, no complex setup.
Who's Behind This
Arthur Cho
Founder, GrandJury
AI/ML Product Manager with 8+ years building AI products (200k+ MAU). Master's in Applied Data Science from University of Michigan (GPA 3.9). Previously: Conversational AI Manager at HSBC, Product Manager at A*STAR AI spinoff.
Built GrandJury to solve a problem I experienced: AI quality signals are opaque, evaluators are anonymous, and developers don't get specific feedback. We're making AI accountability transparent.
Questions? Reach out anytime.
Ready to Join?
Be part of the first cohort documenting AI failures publicly. Build your authority, get verified, make AI safer.
Questions? Email us at hello@grandjury.xyz or connect on LinkedIn