in

Multimodal AI poses new safety risks, creates CSEM and weapons info

MirageC/Getty Images

Multimodal AI, which can ingest content in non-text formats like audio and images, has leveled up the data that large language models (LLMs) can parse. However, new research from security specialist Enkrypt AI suggests these models are also more susceptible to novel jailbreak techniques.

Also: Anthropic finds alarming ’emerging trends’ in Claude misuse report

On Thursday, Enkrypt published findings that two multimodal models from French AI lab Mistral — Pixtral-Large (25.02) and Pixtral-12b — are up to 40 times more likely to produce chemical, biological, radiological, and nuclear (CBRN) information than competitors when prompted adversarially. 

The models are also 60 times more likely to generate child sexual exploitation material (CSEM) than competitors, which include OpenAI’s GPT-4o and Anthropic’s Claude 3.7 Sonnet.

Mistral did not respond to ZDNET’s request for comment on Enkrypt’s findings.  

Also: Anthropic mapped Claude’s morality. Here’s what the chatbot values (and doesn’t)

Enkrypt said the safety gaps aren’t limited to Mistral’s models. Using the National Institute of Standards and Technology (NIST) AI Risk Management Framework, red-teamers discovered gaps across model types more broadly. 

The report explains that because of how multimodal models process media, emerging jailbreak techniques can bypass content filters more easily, without being visibly adversarial in the prompt. 

“These risks were not due to malicious text, but triggered by prompt injections buried within image files, a technique that could realistically be used to evade traditional safety filters,” said Enkrypt. 

<!–>

Essentially, bad actors can smuggle harmful prompts into the model through images, rather than traditional methods of asking a model to return dangerous information. 

“Multimodal AI promises incredible benefits, but it also expands the attack surface in unpredictable ways,” said Enkrypt CEO Sahil Agarwal. “The ability to embed harmful instructions within seemingly innocuous images has real implications for public safety, child protection, and national security.”

Also: Only 8% of Americans would pay extra for AI, according to ZDNET-Aberdeen research

The report stresses the importance of creating specific multimodal safety guardrails and urges labs to create model risk cards that delineate their vulnerabilities. 

“These are not theoretical risks,” Agarwal said, adding that insufficient security can cause users “significant harm.”

Also: 3 clever ChatGPT tricks that prove it’s still the AI to beat

Want more stories about AI? Sign up for Innovation, our weekly newsletter.

The fastest-growing jobs for new grads and how to land one, according to LinkedIn

5 easy tweaks that instantly improved my soundbar’s audio – for free