Meta's Muse Spark scores 52 on AI index, nearly tripling Llama 4 performance

After a year-long absence from the frontier of AI research, Meta has returned with a proprietary model that re-establishes it as a top contender.

Meta Platforms Inc. on Wednesday debuted Muse Spark, a proprietary artificial intelligence model that shows a nearly threefold performance leap over its predecessor and signals a major strategic shift away from the company’s open-source roots. The new model, the first from Meta’s recently formed Superintelligence Labs, aims to place the company back in direct competition with systems from OpenAI, Google, and Anthropic after its previous flagship, Llama 4, failed to meet expectations.

“This is the most powerful model that meta has released,” Alexandr Wang, Meta’s Chief AI Officer, said in a post on X, the social network favored by the machine learning community. Wang noted the model supports “tool-use, visual chain of thought, & multi-agent orchestration,” positioning it as the foundation for a “personal superintelligence.”

The launch marks a statistical comeback for Meta, which had been absent from the top-tier of AI performance for over a year. According to the Artificial Analysis Intelligence Index v4.0, Muse Spark achieved a score of 52, a dramatic improvement from the 18 scored by Llama 4 Maverick in 2025. The new score places Muse Spark in the top five global models, trailing only Gemini 3.1 Pro Preview and GPT-5.4, which both scored 57, and Claude Opus 4.6 at 53.

For investors, the release signals that Meta’s multi-billion-dollar overhaul of its AI division, including a $14.3 billion investment for a 49% stake in data-labeling firm ScaleAI, is beginning to yield results. The move to a proprietary model, however, raises questions about the future of the popular open-source Llama family, which powered over one million daily downloads and offered businesses an estimated 88% cost reduction compared to proprietary APIs.

A Return to Frontier Performance

Meta’s internal benchmarks, corroborated by independent auditing from Artificial Analysis, show Muse Spark’s strength in multimodal reasoning, particularly where visual information and logic intersect. In the CharXiv Reasoning benchmark for figure understanding, Muse Spark scored 86.4, well ahead of GPT-5.4’s 82.8 and Gemini 3.1 Pro’s 80.2. The model also scored 80.5% on the MMMU Pro vision benchmark, making it the second-most capable vision model on the market, surpassed only by Gemini 3.1 Pro Preview.

The model’s efficiency is another key factor. Muse Spark used just 58 million output tokens to complete the Intelligence Index benchmark, less than half the 157 million tokens required by Claude Opus 4.6 and the 120 million by GPT-5.4. Meta attributes this to a process called “thought compression,” which penalizes the model for excessive thinking time during training, forcing it to find more efficient reasoning pathways.

From Open Source Leader to Proprietary Challenger

The decision to launch Muse Spark as a proprietary model, confined to Meta’s apps and a private API preview, marks a significant departure. The Llama series, particularly Llama 2 and 3, became foundational infrastructure for thousands of developers and businesses, creating a global ecosystem. While a Meta spokesperson stated that existing Llama models will remain available, the company did not comment on future open-source development.

The shift comes as the open-weight landscape becomes increasingly competitive. Chinese models from Alibaba and Zhipu AI began to outpace Llama 4 on some benchmarks in late 2025, eroding Meta’s leadership in the space it had once dominated. While Wang has hinted at plans to “open-source future versions,” the initial proprietary release suggests Meta is prioritizing performance and control as it re-enters the frontier AI race. The company's shares, trading at a forward P/E ratio of 24, have yet to fully price in the potential revenue from a competitive proprietary model, with analysts watching closely to see if Muse Spark can translate benchmark wins into tangible product advantages.

This article is for informational purposes only and does not constitute investment advice.