OpenAI insiders have signed an open letter demanding a “right to warn” the public about AI risks

June 5, 2024:

Employees from some of the world’s leading AI companies published an unusual proposal on Tuesday, demanding that the companies grant them “a right to warn about advanced artificial intelligence.”

Whom do they want to warn? You. The public. Anyone who will listen.

The 13 signatories are current and former employees of OpenAI and Google DeepMind. They believe AI has huge potential to do good, but they’re worried that without proper safeguards, the tech can enable a wide range of harms.

“I’m scared. I’d be crazy not to be,” Daniel Kokotajlo, a signatory who quit OpenAI in April after losing faith that the company’s leadership would handle its technology responsibly, told me this week. Several other safety-conscious employees have recently left for similar reasons, intensifying concerns that OpenAI isn’t taking the risks of the tech seriously enough. (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. Our reporting remains editorially independent.)

Understanding AI and the companies that make it

Artificial intelligence is poised to change the world from media to medicine and beyond — and Future Perfect has been there to cover it.

It may be tempting to view the new proposal as just another open letter put out solely by “doomers” who want to press pause on AI because they worry it will go rogue and wipe out all of humanity. That’s not all that this is. The signatories share the concerns of both the “AI ethics” camp, which worries more about present AI harms like racial bias and misinformation, and the “AI safety” camp, which worries more about AI as a future existential risk.

These camps are sometimes pitted against each other. The goal of the new proposal is to change the incentives of leading AI companies by making their activities more transparent to outsiders — and that would benefit everyone.

The signatories are calling on AI companies to let them voice their concerns about the technology — to the companies’ boards, to regulators, to independent expert organizations, and, if necessary, directly to the public — without retaliation. Six of the signatories are anonymous, including four current and two former OpenAI employees, precisely because they fear being retaliated against. The proposal is endorsed by some of the biggest names in the field: Geoffrey Hinton (often called “the godfather of AI”), Yoshua Bengio, and Stuart Russell.

To be clear, the signatories are not saying they should be free to divulge intellectual property or trade secrets, but as long as they protect those, they want to be able to raise concerns about risks. To ensure whistleblowers are protected, they want the companies to set up an anonymous process by which employees can report their concerns “to the company’s board, to regulators, and to an appropriate independent organization with relevant expertise.”

An OpenAI spokesperson told Vox that current and former employees already have forums to raise their thoughts through leadership office hours, Q&A sessions with the board, and an anonymous integrity hotline.

“Ordinary whistleblower protections [that exist under the law] are insufficient because they focus on illegal activity, whereas many of the risks we are concerned about are not yet regulated,” the signatories write in the proposal. They have retained a pro bono lawyer, Lawrence Lessig, who previously advised Facebook whistleblower Frances Haugen and whom the New Yorker once described as “the most important thinker on intellectual property in the Internet era.”

Another of their demands: no more nondisparagement agreements that prevent company insiders from voicing risk-related concerns. Former OpenAI employees have long felt muzzled because, upon leaving, the company had them sign offboarding agreements with nondisparagement provisions. After Vox reported on employees who felt pressured to sign or else surrender their vested equity in the company, OpenAI said it was in the process of removing nondisparagement agreements.

Those agreements were so unusually restrictive that they raised alarm bells even for employees leaving the company on good terms, like Jacob Hilton, one of the signatories of the “right to warn” proposal. He wasn’t particularly worried about OpenAI’s approach to safety during his years working there, but when he left in early 2023 to pursue research elsewhere, the offboarding agreement made him worried.

“It basically threatened to take away a large fraction of my compensation unless I signed a nonsolicitation and nondisparagement agreement,” Hilton told me. “I felt that having these agreements apply so broadly would have a chilling effect on the ability of former employees to raise reasonable criticisms.”

Ironically, OpenAI’s attempt to silence him is what made him speak out.

Hilton signed the new proposal, he said, because companies need to know that employees will call them out if they talk a big game about safety in public — as OpenAI has done — only to then contradict that behind closed doors.

“Public commitments will often be written by employees of the company who really do care, but then the company doesn’t have a lot of incentive to stick to the commitments if the public won’t find out [about violations],” Hilton said. That’s where the new proposal comes in. “It’s about creating a structure where the company is incentivized to stick to its public commitments.”

This is about changing incentives for the whole AI industry

AI safety researchers often worry about AI models becoming misaligned — pursuing goals in ways that aren’t aligned with our values. But you know what’s really hard to align? Humans. Especially when all the incentives are pushing them in the wrong direction.

Those who finish second are rarely remembered in Silicon Valley; being first out of the gate is rewarded. The culture of competition means there’s a strong incentive to build cutting-edge AI systems fast. And the profit imperative means there’s also a strong incentive to commercialize those systems and release them into the world.

OpenAI employees have increasingly noticed this. Jan Leike, who helmed the company’s alignment team until he quit last month, said in an X post that “safety culture and processes have taken a backseat to shiny products.”

Carroll Wainwright, who worked under Leike, quit last week for similar reasons. “Over the past six months or so, I’ve become more and more concerned that the incentives that push OpenAI to do things are not well set up,” he told me. “There are very, very strong incentives to maximize profit that the leadership has succumbed to some of these incentives at a cost to doing more mission-aligned work.”

So the big question is how can we change the underlying incentive structure that drives all actors in the AI industry?

For a while, there was hope that setting up AI companies with unusual governance structures would do the trick. OpenAI, for example, started as a nonprofit, with a board whose mission was not to keep shareholders happy but to safeguard the best interests of humanity. Wainwright said that’s part of why he was excited to work there: He figured this structure would keep the incentives in order.

But OpenAI soon found that to run large-scale AI experiments these days, you need a ton of computing power — more than 300,000 times what you needed a decade ago — and that’s incredibly expensive. To stay at the cutting edge, it had to create a for-profit arm and partner with Microsoft. OpenAI wasn’t alone in this: The rival company Anthropic, which former OpenAI employees spun up because they wanted to focus more on safety, started out by arguing that we need to change the underlying incentive structure in the industry, including the profit incentive, but it ended up joining forces with Amazon.

As for the board that’s tasked with safeguarding humanity’s best interests? It sounds nice in theory, but OpenAI’s board drama last November — when the board tried to fire CEO Sam Altman only to see him quickly claw his way back to power — proved it doesn’t work.

“I think it showed that the board does not have the teeth one might have hoped it had,” Wainwright told me. “It made me question how well the board can hold the organization accountable.”

Hence this statement in the “right to warn” proposal: “AI companies have strong financial incentives to avoid effective oversight, and we do not believe bespoke structures of corporate governance are sufficient to change this.”

If bespoke won’t work, what will?

Regulation is an obvious answer, and there’s no question that more of that is needed. But that on its own may not be enough. Lawmakers often don’t understand quickly developing technologies well enough to regulate them with much sophistication. There’s also the threat of regulatory capture.

This is why company insiders want the right to warn the public. They’ve got a front-row seat to the developing technology and they understand it better than anyone. If they’re at liberty to speak out about the risks they see, companies may be more incentivized to take those risks seriously. That would be beneficial for everyone, no matter what kind of AI risk keeps them up at night.

Source link