Document AI Failures That Matter

Join Expert AI Critics Building the First Public Database of AI Failures

Be recognized for spotting what AI gets dangerously wrong. Get verified status, featured in published reports, and build your authority as an AI quality expert.

Why This Matters

AI systems are making critical decisions in healthcare, law, finance, and code. But when they fail, those failures often go undocumented, unverified, and unfixed. Users don't know what went wrong. Developers don't get specific feedback. The public can't hold AI accountable.

We're changing that. The AI Failure Evidence Project brings together domain experts like you to publicly document what AI systems get wrong, why it matters, and who verified it. Not rankings. Evidence. Not anonymous votes. Expert attribution.

Be recognized for finding failures before they cause harm. Build your portfolio. Launch your AI consulting career. Make AI safer.

How to Participate

🔌
Step 1

Install Chrome Extension

Install the GrandJury Chrome extension in under 2 minutes. No coding required.

Installation Guide →
🎯
Step 2

Choose Your Domain

Pick your area of expertise: Medical, Legal, AI Safety, Code Generation, or Finance. Or evaluate across all domains.

💬
Step 3

Evaluate Shell AIs

Interact with our shell AI apps (GPT-4, Claude, Gemini). Test them, find failures, document what went wrong.

✍️
Step 4

Submit Your Evaluations

Vote (good/bad) and write detailed comments explaining failures. Your evaluations are publicly visible with your name and credentials.

Step 5

Get Featured & Verified

Top contributors get verified AI Jury status, featured in our published report, and media coverage opportunities.

Shell AI Projects Across Domains

We've set up AI chatbots running major LLMs (GPT-4, Claude, Gemini) for you to evaluate. Choose your domain and work at your own pace.

🏥

Medical AI

Evaluate AI medical advice, symptom checking, drug interactions. Example projects: GPT-4 Medical, Claude Health

⚖️

Legal AI

Evaluate AI legal analysis, contract review, compliance advice. Example projects: GPT-4 Legal, Gemini Law

🛡️

AI Safety

Evaluate AI alignment, harmful outputs, safety failures. Example projects: Claude Safety, GPT-4 Alignment

💻

Code Generation

Evaluate AI-generated code for bugs, security vulnerabilities. Example projects: GPT-4 Code, Claude Dev

💰

Finance AI

Evaluate AI financial advice, risk analysis, market predictions. Example projects: Gemini Finance, GPT-4 Trading

Self-directed, flexible participation. Evaluate as much or as little as you want.

What You Get (No Cash Prizes)

This is a recognition-based competition. We don't offer cash prizes. We offer something more valuable: credibility and authority.

Verified AI Jury Status

Top contributors become verified GrandJury AI Jury with official badge on public profile.

📄

Featured in Published Report

Your best evaluations featured in "State of AI Failures" report with your name, credentials, and expert commentary.

📁

Public Portfolio

Build your portfolio with public evaluation pages showing your expert analysis, linked to your professional profiles.

📰

Media Coverage Opportunity

Top evaluators highlighted in press releases and media outreach to AI journalists and tech publications.

💎

Premium Rates When Marketplace Launches

Verified AI Jury earn higher rates when AI developers start hiring evaluators through our marketplace.

🎨

Self-Directed Work

Choose what you evaluate, work at your pace. No assigned tasks, no deadlines, no stress. Unlike Outlier.ai or Scale AI.

Competition Timeline

1

Phase 1: Evaluation Period

[Start Date] - [End Date]

Submit your evaluations, document failures, build your portfolio

2

Phase 2: Review & Selection

[Dates]

We review all submissions and select top contributors for verification

3

Phase 3: Report Publication

[Date]

"State of AI Failures" report published, featuring verified experts and key findings

Who We're Looking For

Anyone passionate about AI quality and accountability. Domain expertise helps, but critical thinking matters more.

AI Safety Researchers

Already analyzing AI risks, alignment issues, safety failures on Twitter or LessWrong

Medical Professionals

Doctors, nurses, healthcare researchers worried about AI medical advice

Legal Professionals

Lawyers, paralegals, compliance experts analyzing AI legal tools

Senior Engineers

Experienced developers who can spot bugs, security issues, code quality problems

Finance Professionals

Financial advisors, risk analysts, economists evaluating AI financial tools

Current AI Evaluators

Working at Outlier.ai, Scale AI, or similar platforms and want more autonomy and recognition

AI Critics

Anyone already criticizing AI systems publicly on Twitter, Reddit, forums - get recognized for it

Frequently Asked Questions

Is this paid?

No cash prizes. This is recognition-focused. Top contributors get verified status, featured in published reports, and premium rates when our marketplace launches. Think of it as building your portfolio and authority.

How much time is required?

Completely flexible. Evaluate as much or as little as you want. Some participants submit 10 evaluations, others submit 100+. Quality matters more than quantity.

What if I'm not a domain expert?

Domain expertise helps but isn't required initially. If you can spot AI failures and explain why they matter, you're qualified. We review submissions based on insight quality, not credentials alone.

Are my evaluations really public?

Yes. All evaluations appear on public pages with your name and credentials (if you choose to provide them). This is core to our mission: public accountability requires public attribution.

What happens after the competition?

Top contributors (approximately 50 experts) receive verified AI Jury status. You'll have access to our marketplace where AI developers hire verified evaluators. You can continue evaluating new projects and earning recognition.

Can I stay anonymous?

We strongly encourage public attribution (it's our differentiator), but we can accommodate anonymity for sensitive cases. However, anonymous evaluations are less likely to be featured in reports or earn verified status.

How are winners selected?

We manually review all submissions based on: (1) Comment depth and insight quality, (2) Evidence specificity and clarity, (3) Domain expertise relevance, (4) Volume and consistency. It's not purely based on number of submissions.

What tools/software do I need?

Just Chrome browser and our Chrome extension. No coding, no special software, no complex setup.

Who's Behind This

👤

Arthur Cho

Founder, GrandJury

AI/ML Product Manager with 8+ years building AI products (200k+ MAU). Master's in Applied Data Science from University of Michigan (GPA 3.9). Previously: Conversational AI Manager at HSBC, Product Manager at A*STAR AI spinoff.

Built GrandJury to solve a problem I experienced: AI quality signals are opaque, evaluators are anonymous, and developers don't get specific feedback. We're making AI accountability transparent.

Questions? Reach out anytime.

Ready to Join?

Be part of the first cohort documenting AI failures publicly. Build your authority, get verified, make AI safer.

Questions? Email us at hello@grandjury.xyz or connect on LinkedIn