While tech giants and media headlines remain hyper-focused on the flashy battle to generate artificial video from scratch—fueled by platforms like OpenAI’s Sora and Google’s Veo—a quiet, multi-million dollar counter-revolution is taking place. The real data crisis for global enterprises isn’t a lack of new video; it’s that they are drowning in millions of hours of existing, unsearchable footage.
From sports broadcast vaults and security feeds to streaming libraries and corporate content stores, video has historically been “dark matter” to computers. Traditional Large Language Models (LLMs) cannot natively read video; instead, they are forced to chop clips up into isolated screenshots, stripping away the critical dimension of time, context, and causality.
Dismantling this fundamental barrier, video-native AI pioneer TwelveLabs has announced a massive $100 million Series B funding round.
Co-led by NEA and NAVER Ventures, with strategic participation from Amazon, the fresh capital will be weaponized to move past isolated video understanding and construct the world’s first full-stack Video Cognition System—ushering in an era the company defines as “Video Superintelligence.”
Inside the Tech: A Video-Native Substrate
Founded in 2021 by a team that met while serving in South Korea’s military cyber operations command, TwelveLabs built its architecture from the ground up on a contrarian hypothesis: machine intelligence must be trained on recorded reality in motion, not just written language.
Rather than force-fitting text-based models to watch video, TwelveLabs deploys a dual-layered, video-native foundation distributed via its own APIs and Amazon Bedrock:
- Marengo 3.0 (Perception): The world’s most advanced video embedding model. Marengo maps raw footage directly into a single semantic layer, capturing speech, complex soundscapes, physical motion, and on-screen text simultaneously over time.
- Pegasus 1.5 (Reasoning): Acts as a domain-specific markup language for video. It converts Marengo’s semantic data into structured information-such as precise scene boundaries, entity tracking, and temporal segments—allowing external LLMs to reason across footage seamlessly.
The critical architectural breakthrough here is Persistent Memory. Traditional models inspect a video at the exact moment of a query, forgetting the data the second the session ends. TwelveLabs understands a video once, converts it into a durable data representation, and keeps it addressable down to the exact frame. The archive ceases to be passive storage and becomes an active, compounding intelligence.
Also Read: Genesys Acquires Pinkfish to Shift Customer Experience From Conversation to Action
The Macro Impact on the Marketing and Advertising Industry
As Amazon enters the fray as a core investor-simultaneously establishing AWS as TwelveLabs’ preferred cloud and optimizing video inference workloads for AWS Trainium chips-the deal sends a massive shockwave directly through the Marketing and Advertising ecosystem.
The Realization of Absolute Contextual Ad Placement
As privacy regulations eliminate third-party cookies, the advertising world is pivoting hard toward contextual targeting. However, matching an ad to video content historically relied on surface-level metadata, basic titles, or audio transcripts.
With Video Superintelligence, ad-tech networks can dynamically match campaigns based on highly precise visual narratives and emotional contexts. For instance, an automotive brand can programmatically insert a commercial at the exact second a video features a road trip scene, or a sportswear brand can isolate a specific cross-sport athletic movement across millions of hours of streaming content without relying on manual human tags.
Streamlining Post-Production and Creative Resurfacing
For creative agencies and brand managers, video production is heavily bogged down by asset retrieval debt. Creative teams spend thousands of billable hours scrubbing through raw footage, B-roll reels, and historical campaign archives looking for a specific visual theme, setting, or background asset. Turning massive media archives into machine-readable memory allows agencies to search their entire historical library using natural language prompts (“Find all clips of a red sports car driving through a rainy city street at dusk”), slashing post-production cycle times from weeks to seconds.
How This Shapes Everyday Business Strategy
For media conglomerates, global brands, and sports networks looking to monetize their digital footprints, TwelveLabs’ infrastructure alters day-to-day data economics:
- Unlocking Extractive Monetization for Broadcasters: Sports franchises and legacy broadcasters sitting on decades of unlabeled game tape can use TwelveLabs to build automated highlight engines. Producers can instantly extract highly nuanced, multi-variable events—such as “every match-winning goal scored in the final five minutes where the crowd is cheering”—creating hyper-targeted content packages for fans and sponsors instantly.
- Proactive Brand Safety Monitoring at Scale: Brands face immense reputational risk if their digital advertisements run alongside toxic, unsafe, or violent video content on user-generated platforms. Moving past basic text-tag filtering, TwelveLabs allows brand safety engines to continuously read the actual, live visual stream of streaming channels, instantly pausing ad delivery if an on-screen narrative violates corporate brand compliance parameters.
- Fueling the Next Fleet of Visual AI Agents: As businesses prepare for autonomous digital workers to manage customer operations, these agents require a “visual cortex” to interact with the physical world. Providing a stable, API-accessible video data layer means enterprises can build custom agents capable of analyzing product assembly lines, auditing retail storefront traffic patterns, or automatically generating localized video recaps.
The Bottom Line
The last decade of artificial intelligence succeeded because it made written text programmable, turning static documents into active workflows.
TwelveLabs’ $100 million Series B raise and its deep infrastructure alliance with Amazon prove that the next frontier of enterprise value relies on making recorded reality programmable. By transforming passive video files into structured, computational memory, the company is demonstrating that the future of digital media belongs to those who stop trying to generate more synthetic noise, and start using autonomous systems to unlock the massive, hidden value within the footage they already own.



















