SenseTime Slashes AI Costs 60% in Pivot From Large Models

SenseTime’s new-generation “SenseNova 6.7 Flash-Lite” model is slashing token consumption by 60 percent, signaling a broader industry pivot away from building bigger AI models and toward systems that can operate within real-world enterprise constraints of cost and data sovereignty.

“In Asia, when I look at the AI boom, sovereignty is measuring more than the models,” Hans Dekkers, General Manager of IBM Asia Pacific, said in a recent interview, noting that 99% of enterprise data remains untouched by AI due to data exposure concerns.

The SenseNova model achieves its efficiency through a native multimodal architecture that removes an intermediate visual-to-text conversion layer, directly interpreting complex documents and charts. This contrasts with the brute-force scaling seen in models from competitors like Tencent and DeepSeek, instead targeting specific, high-value enterprise workflows.

The move reflects a growing enterprise strategy to deploy dozens of smaller, specialized models instead of a single, all-powerful one. This shift creates a new competitive front in AI: orchestration platforms that can manage a mix of models, a market IBM is targeting and where efficient, low-cost models like SenseTime’s could see significant demand.

The Enterprise Reality: Data Sovereignty Trumps Scale

While much of the market focuses on benchmark performance, enterprises face a structural mismatch when trying to deploy large, general-purpose AI. Regulatory fragmentation, particularly across Asia Pacific, makes data sovereignty a primary operational constraint. Companies are often reluctant to expose proprietary data to external, monolithic models, creating a barrier to AI adoption. “The choice is not between compliance and innovation… it’s about maintaining control across your entire digital architecture,” Dekkers said. This hesitation has left the vast majority of valuable enterprise data siloed and unused by AI systems, representing a massive, untapped market for tools that can work within these boundaries.

Efficiency Finds a Multibillion-Dollar Market

The demand for efficient, low-power AI is creating new opportunities in what GSI Technology’s Didier Lasserre calls “multibillion-dollar markets.” GSI’s Gemini II associative processing unit (APU) provides a clear case study. In a recent defense proof-of-concept, the chip achieved a time-to-first-token of roughly three seconds using just 30 watts of system power, a critical metric for drone surveillance. This performance in a power-constrained environment led directly to a contract win and is being leveraged for a new smart city project. The success of specialized hardware like GSI’s, which is funded by a 22% growth in its legacy SRAM business, proves the viability of building targeted AI solutions that prioritize latency and efficiency over raw scale, the very niche SenseTime is targeting with its lightweight model.

SenseTime’s approach with SenseNova 6.7 Flash-Lite fits directly into this emerging paradigm. By building a model that is inherently cheaper to run—slashing token usage in information search tasks by 60%—the company is betting that enterprises will favor cost savings and control over the prestige of using the largest available model. This is part of a larger trend toward a "bring your own model" environment, where companies use orchestration platforms to deploy the best tool for each specific job, whether it's a global giant like GPT-4, a regional player from Alibaba, or a specialized internal system. In this context, the most valuable player may not be the one with the biggest model, but the one that provides the most efficient and compliant solution for a specific business problem.

This article is for informational purposes only and does not constitute investment advice.