Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

March 28, 2024:

Sarah Bird, Microsoft’s chief product officer of responsible AI, tells The Verge in an interview that her team has designed several new safety features that will be easy to use for Azure customers who aren’t hiring groups of red teamers to test the AI services they built. Microsoft says these LLM-powered tools can detect potential vulnerabilities, monitor for hallucinations “that are plausible yet unsupported,” and block malicious prompts in real time for Azure AI customers working with any model hosted on the platform.

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes,” she says.

Three features: Prompt Shields, which blocks prompt injections or malicious prompts from external documents that instruct models to go against their training; Groundedness Detection, which finds and blocks hallucinations; and safety evaluations, which assess model vulnerabilities, are now available in preview on Azure AI. Two other features for directing models toward safe outputs and tracking prompts to flag potentially problematic users will be coming soon.

Whether the user is typing in a prompt or if the model is processing third-party data, the monitoring system will evaluate it to see if it triggers any banned words or has hidden prompts before deciding to send it to the model to answer. After, the system then looks at the response by the model and checks if the model hallucinated information not in the document or the prompt.

In the case of the Google Gemini images, filters made to reduce bias had unintended effects, which is an area where Microsoft says its Azure AI tools will allow for more customized control. Bird acknowledges that there is concern Microsoft and other companies could be deciding what is or isn’t appropriate for AI models, so her team added a way for Azure customers to toggle the filtering of hate speech or violence that the model sees and blocks.

In the future, Azure users can also get a report of users who attempt to trigger unsafe outputs. Bird says this allows system administrators to figure out which users are its own team of red teamers and which could be people with more malicious intent.

Bird says the safety features are immediately “attached” to GPT-4 and other popular models like Llama 2. However, because Azure’s model garden contains many AI models, users of smaller, less used open-source systems may have to manually point the safety features to the models.

Source link

2.9 magnitude earthquake strikes New Jersey

Bill Maher tells RFK Jr.: ‘I hope you’re in the debates,’ but questions path to the White House

Hulking student who viscously beat aide in viral video blames school in lawsuit:

UT Austin suspends Pro-Palestinian student group after anti-Israel protest

It could be another very hot summer. Here’s what that means.

Hackers try to exploit WordPress vulnerability that’s as severe as it gets

You need $500. How should you get it?

So you’ve found research fraud. Now what?

Meta’s Ray-Ban Smart Shades Get a Fresh Blast of AI

7 Best Sleeping Pads (2024): For Camping, Backpacking, and Travel

The 25 Best iPad Accessories (2024): Cases, Keyboards, Chargers, and Hubs

Ceretone Core One OTC Hearing Aids Review: Tiny and Barely Useful

How Did Kim Kardashian Become A Lawyer Without Ever Attending Law School In Classrooms? Explained

Ashlyn Harris Reacts To GF Sophia Bush’s Controversial Glamour Essay!

Jelly Roll Performs Hip Hop Hits at Country-Themed Stagecoach Festival

Britney Spears Settles Conservatorship Dispute with Dad Jamie | Britney Spears, Jamie Spears | Just Jared: Celebrity News and Gossip

The 31 Best Mother’s Day gift ideas for 2024

Garry’s Mod Vs Nintendo: DMCA saga rolls on but who is the mysterious Aaron Peters?

Many people say their Apple IDs were inexplicably reset last night

Labouchere System – How the Labouchere Betting System Works

Tesla’s 2 million car Autopilot recall is now under federal scrutiny

Toyota will spend $1.4 billion to build electric 3-row SUV in Indiana

Honda to spend $11 billion on four EV factories in North America

Updating California’s grid for EVs may cost up to $20 billion

Samandar Official Trailer | Mayur Chauhan | Jagjeetsinh Vadher | Vishal Vada Vala | Gujarati Movie

Adeokin Yoruba Movie 2024 | Official Trailer | Showing Tomorrow 27th April On ApataTV+

Baby Slug’s Big Day Out – Official Trailer

Tharle Box | Kyaabre | Kannada Web Series | Official trailer (All episodes out now!)

Man Utd 1 – 1 Burnley

Willie Mullins seals historic first British trainers’ title | Racing News

Mo Salah and Jurgen Klopp: Liverpool forward appears to clash with manager on touchline at West Ham | Football News

Derby 2 – 0 Carlisle

Leave a Reply Cancel reply