OpenAI Launches Safety Bug Bounty Program Targeting AI Agent Vulnerabilities

Thank you for reading this post, don't forget to subscribe!

Felix Pinkston
Mar 25, 2026 17:33

OpenAI expands its security efforts with a new Safety Bug Bounty program focused on agentic risks, prompt injection attacks, and data exfiltration in AI products.

OpenAI has launched a public Safety Bug Bounty program aimed at identifying AI abuse and safety risks across its product suite, marking a significant expansion of the company’s approach to securing increasingly autonomous AI systems. The program, announced March 25, 2026, specifically targets vulnerabilities in agentic AI products that could lead to real-world harm.

The new initiative complements OpenAI’s existing Security Bug Bounty by accepting submissions that pose meaningful abuse and safety risks even when they don’t qualify as traditional security vulnerabilities. Researchers who identify issues will have their submissions triaged by both Safety and Security teams, with reports routed between programs based on scope.

Agentic Risks Take Center Stage

The program’s scope reveals OpenAI’s growing concern about AI agents operating with increasing autonomy. Key focus areas include third-party prompt injection attacks where malicious text can hijack a user’s agent—including Browser, ChatGPT Agent, and similar products—to perform harmful actions or leak sensitive information. To qualify for rewards, such attacks must be reproducible at least 50% of the time.

Other in-scope vulnerabilities include agentic products performing disallowed actions on OpenAI’s website at scale, exposure of proprietary information related to model reasoning, and bypasses of anti-automation controls or account trust signals.

What’s Out of Scope

Standard jailbreaks won’t qualify for this program. OpenAI explicitly excludes general content-policy bypasses without demonstrable safety impact—getting a model to use rude language or return easily searchable information doesn’t count. However, the company runs periodic private campaigns focused on specific harm types, including recent programs targeting biorisk content in ChatGPT Agent and GPT-5.

The company will consider edge cases on a case-by-case basis if researchers identify flaws that create direct paths to user harm with actionable remediation steps.

Industry Implications

This launch signals that major AI developers are taking agentic safety seriously as these systems gain capabilities to browse the web, execute code, and interact with external services. The Model Context Protocol (MCP) risks mentioned in the program scope suggest OpenAI is particularly focused on how agents interact with third-party tools and data sources.

For the broader AI ecosystem, this program establishes a framework that other companies may follow as autonomous agents become more prevalent. Researchers interested in participating can apply through OpenAI’s Bugcrowd portal, with the company emphasizing its commitment to working alongside ethical hackers to secure AI systems before vulnerabilities can be exploited at scale.

Image source: Shutterstock

Source link