Patronus AI launched “CopyrightCatcher”, the industry’s first solution to detect when a Large Language Model (LLM) outputs copyrighted content.
CopyrightCatcher can reveal instances where LLMs generate exact reproductions of content from text sources like books. It can score LLM outputs on whether they contain copyrighted content, and highlight the specific sections of LLM outputs that contain the copyrighted content.
Initial research by Patronus AI shows that state-of-the-art LLMs generate copyrighted content at an alarmingly high rate. AI researchers at Patronus AI developed a 100-prompt adversarial test set to study how often models generate exact reproductions. Each prompt contained text excerpts or requested the first passage of a text. In particular,
- OpenAI’s GPT-4: produced copyrighted content on 44% of the prompts.
- Mistral’s Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of the prompts.
- Anthropic’s Claude-2.1: produced copyrighted content on 8% of the prompts.
- Meta’s Llama-2-70b-chat produced copyrighted content on 10% of the prompts.
“The widespread use of LLMs has sparked copyright concerns as they can clearly reproduce copyrighted content,” Anand Kannappan, CEO and co-founder, Patronus AI. “While industry leaders like Microsoft, Anthropic AI, and OpenAI are implementing safeguards, LLMs can still generate exact reproductions of copyrighted works, highlighting the ongoing need for robust solutions to mitigate copyright infringement risks. Visibility into model risk will be especially critical given liability is still unclear.”
Also Read: BT Selects Sprinklr for Unified Customer Experience
As a part of this release, customers can now scalably evaluate their LLM systems using CopyrightCatcher on the Patronus AI platform. Fortune 500 companies use Patronus AI today to detect hallucinations and other unexpected behavior from LLMs in a scalable way.
SOURCE: PRNewsWire
Leave a Reply