JD.com Mobilizes Up to 600,000 People for AI Data Harvest
On March 16, Chinese e-commerce giant JD.com announced plans to build the world's largest data collection center for embodied intelligence, launching a massive campaign to address the "data famine" crippling the robotics industry. The initiative intends to mobilize over 100,000 of its own employees and up to 500,000 external personnel, including 100,000 citizens in the city of Suqian alone. The goal is to accumulate over 10 million hours of real-world physical interaction data within two years, providing the raw material needed to train sophisticated robots for complex tasks. This "human sea" tactic represents a brute-force attempt to solve what has become the primary bottleneck for AI robotics, where high-quality training data is now more critical than model architecture or raw computing power.
The project is deeply integrated with China's industrial ambitions, particularly within the Yizhuang Economic and Technical Development Zone in Beijing. The zone, which is home to over 300 robotics companies and a 10 billion yuan industry, provides the hardware and testing grounds. JD's initiative aims to supply the "brain" by generating massive datasets from its own real-world logistics, industrial, and retail scenarios, creating a closed-loop system from data collection to hardware iteration.
Logistics Network to Solve Robotics' High-Cost Data Problem
JD.com's strategy leverages its core business as a competitive advantage in the AI arms race. Unlike pure software companies or robotics startups, JD's sprawling physical supply chain offers a vast, continuous source of complex real-world interactions. This approach directly tackles the two major hurdles in robotics data acquisition: the "Sim-to-Real" gap and prohibitive costs. While many startups rely on virtual simulations, these models often fail to transfer to the real world because they cannot perfectly replicate nuanced physics like friction or the deformation of flexible materials.
The alternative, remote-operating robots to record human actions, is effective but economically unscalable. Industry estimates place the cost of capturing and cleaning a single high-quality, complex interaction task at several hundred dollars. By integrating data collection into the daily operations of its couriers and warehouse workers, JD aims to bypass this bottleneck. This model, similar to how Tesla uses its Gigafactories to train its Optimus robots, transforms a company's existing operational infrastructure into a proprietary data production line, creating a significant barrier to entry for competitors who lack such physical-world access.
Experts Question if 10 Million Hours Can Solve Quality Bottleneck
Despite the project's grand scale, industry experts are cautiously scrutinizing whether quantity can translate into the quality needed for a breakthrough. The core challenge in robotics is not a lack of video but a scarcity of "state-action pairs" that include precise physical feedback, such as force, torque, and tactile data. Simply recording a courier delivering a package provides visual data for a robot's world model but is almost useless for training its control policy—how firmly to grip an object without crushing it.
JD.com appears to be aware of this challenge, specifying that its plan includes collecting "1 million hours of robot body data" in the first year. This suggests a hybrid approach, combining broad human-centric video for general understanding with more targeted, high-fidelity data from robots performing tasks. However, fundamental issues remain, including the lack of a universal data standard. Data collected for one type of robot is often incompatible with another due to different hardware configurations. As JD pushes the industry into a new phase of heavy asset competition, its success will depend on solving not just the data volume problem, but the much harder challenges of data quality, standardization, and compliance.