A new report from Guotai Haitong argues the biggest bottleneck for Embodied AI is no longer algorithms but a massive data shortage, creating a new ‘picks and shovels’ investment cycle.
Back
A new report from Guotai Haitong argues the biggest bottleneck for Embodied AI is no longer algorithms but a massive data shortage, creating a new ‘picks and shovels’ investment cycle.

A paradigm shift from language-based AI to physically interactive “world models” is creating an investment boom in the underlying data infrastructure needed to train them. The primary bottleneck for embodied AI is no longer algorithms but an immense data gap, with data demand swelling to the exabyte-scale, according to a new Guotai Haitong report. This positions data collection, simulation, and processing firms as the core “picks and shovels” play for the next wave of artificial intelligence.
“The companies first to fill this data gap will act as the 'shovel sellers' of the physical AI era, commanding significant valuation premiums,” the Guotai Haitong report stated.
The data requirements for embodied intelligence are orders of magnitude greater than for large language models. While LLMs are trained on petabyte-scale text and image datasets, robots that interact with the world require exabyte-scale data that includes the physics of interaction—force, touch, and friction. This specialized, high-quality data is critically scarce, creating a fundamental bottleneck for the entire robotics industry.
This scarcity is forcing a re-evaluation of the robotics value chain. The focus is shifting from the robot hardware itself to the data infrastructure providers who can solve the collection and processing problem. This trend could trigger a significant inflow of capital into a new sub-sector of AI stocks focused on data tools and services, potentially benefiting them more than the robotics manufacturers in the short term.
To bridge the data gap, the industry is pursuing three main paths, each with distinct trade-offs:
Real-World Data: Collected via human-operated teleoperation rigs and motion-capture suits, this method provides the highest-fidelity data as it contains genuine physical interactions. However, its cost is prohibitive, scaling is difficult, and it cannot cover all edge-case scenarios. Companies like 1X Technologies prioritize this, arguing it's the only way to cross the "Sim2Real" gap.
Synthetic & Simulation Data: Using physics engines to generate massive, perfectly-labeled datasets in virtual environments. This approach is cheap and scalable, with firms like Galaxy General aiming for a 99 to 1 synthetic-to-real data ratio. Its primary weakness is the "Sim2Real" gap, where models trained in simulation fail to perform reliably in the real world due to subtle physical differences.
Video Data: A newer approach that uses the vast repository of internet video to teach models. Companies like Tesla and Figure AI are pivoting to this method, believing the sheer scale of video data outweighs its lack of direct physical properties. The challenge lies in "up-dimensioning" 2D video into 3D action, a complex technical hurdle.
The current consensus is that a hybrid approach—using simulation and video for mass pre-training, then fine-tuning with smaller, high-quality real-world data—will become the industry standard.
This strategic divergence is visible across the industry. Tesla has famously abandoned teleoperation for its Optimus robot, relying instead on video from its vehicle fleet. Figure AI, backed by OpenAI and Microsoft, has launched "Project Go-Big" to explore transferring skills from human videos to its robots with zero-shot learning.
Conversely, startups like Zhìyuán Jīqìrén (智元机器人) in China are reportedly using 100 percent real-world data for training their large models. This highlights the strategic bets being made on which data source will ultimately prove most effective.
The trend extends beyond robotics. Indian fintech giant Paytm, despite its large-scale AI ambitions, has no plans to build its own data centers. Instead, it will rent compute capacity from providers like NVIDIA and run its proprietary models on third-party infrastructure, as confirmed by CEO Vijay Shekhar Sharma on their Q4 FY26 earnings call. This strategy validates the "shovel-seller" thesis: even major tech players are choosing to be customers of, not competitors to, the core infrastructure providers.
The market is already rewarding the "shovel sellers." As seen with Europe's AI-fueled unicorn surge and massive funding rounds like the $2 billion raised by China's Moonshot AI, investors are pouring capital into companies that provide foundational capabilities. According to the Guotai Haitong report, investment is concentrating in four key areas:
For investors, this means the most promising opportunities in the embodied AI space may not be the companies building the robots, but the ones selling the essential data and tools required to make them intelligent.
This article is for informational purposes only and does not constitute investment advice.