- Why AI is getting physical

Physical AI: Taking human-robot collaboration to the next level, from the Capgemini Research Institute, explores how physical AI is transforming robotics, the value it unlocks, and how organizations can scale deployments safely and effectively.Contents

Why AI is getting physical

Three converging drivers

In 2026, physical AI has emerged as a defining theme in the global technology narrative, driven by a convergence of forces (see Figure 2). Deployments are already taking shape – particularly in manufacturing and logistics – generating real‑world operational data that is fueling rapid iteration and technical progress.

At the same time, market forces are converging, from rising operational pressures driven by labor shortages to surging venture investment, making this a pivotal moment for acceleration and scale.

Fundamental physical AI advances

Foundation models are redefining robot intelligence

For robots to operate autonomously in unpredictable, real‑world settings, they require a far more substantive understanding of the physical world – its underlying physics, dynamics, and cause‑and‑effect structure. Historically, robotic systems have lacked this foundation. Earlier AI-driven approaches, including conventional vision systems, relied on discrete components for perception, planning, and control but lacked a unifying, generalizable, physics-aware model to integrate them. The result was a set of brittle architectures that failed in real-world conditions.

Real‑world autonomy demands a far broader set of cognitive capabilities: the ability to interpret complex visual scenes, understand human instructions, reason toward goals, and anticipate the physical implications of actions. Large language models (LLMs) have significantly advanced reasoning and semantic understanding, but they cannot perceive or model the physical world. A new class of multimodal and physics‑aware AI models is emerging to close this gap and provide robots with the core intelligence required to operate in complex real‑world environments.

Multimodal foundation models

Multimodal foundation models are large‑scale AI systems trained across diverse data types – including images, text, audio, video, and tactile inputs – to strengthen contextual understanding and enable broader generalization across tasks. A wide set of actors, from large technology organizations and robotics startups to industry players building domain specific models, are developing these foundation models.

Vision‑language‑action (VLA) models

VLA models unify perception, language understanding, and motor control into a single architecture. By training on large‑scale datasets spanning visual inputs, textual instructions, and action trajectories, VLA models are demonstrating early generalization across tasks and environments, with reliability remaining an active research challenge.

Examples include NVIDIA’s Isaac GR00T, Google DeepMind’s Gemini Robotics models,and VLA models from robotics startups including Physical Intelligence, Skild AI, and TorqueAGI.In addition, Hugging Face, an open‑source AI platform, is developing LeRobot, an open‑source library that provides ready‑to‑use datasets, reusable training pipelines, and pretrained models – including VLA models – to lower the practical barriers to training robots.

VLA models unify perception, language understanding, and motor control

World models

World models provide robots with predictive, physics aware reasoning. These systems learn internal representations of the physical world, enabling robots to anticipate outcomes, plan, and reason about physical interactions.

Examples include World Labs, founded by AI pioneer Fei Fei Li, which has raised over $1 billion in funding and has developed the Marble world model, AMI Labs, a new venture co-founded by AI pioneer Yann LeCun, which has also raised over $1 billion in funding, and Runway, an AI startup known for creative video generation, now expanding its world models into robotics. In parallel, NVIDIA’s Cosmos Platform provides a data and training pipeline designed to support the development and integration of world models as they emerge, illustrating how the broader industry is converging around this architectural layer.

World models provide robots with predictive, physics aware reasoning.

Advances in simulation are compressing robot training cycles

Instead of learning primarily through slow, expensive, and risky physical trials, robots can now practice at scale in synthetic environments, rapidly iterating across millions of scenarios, edge cases, and failure modes before deployment. This dramatically compresses the learning cycle for robotics and reduces dependence on scarce real‑world data. While real world training and validation remain essential, simulation reduces reliance on extensive physical trials, helping lower cost and shorten development cycles.

For example, NVIDIA’s Isaac Sim is a robotics simulation application and synthetic data generation tool that allows developers to design, test, and train AI-driven robots in physics-based, photorealistic virtual environments.

The AI-robot-data flywheel is accelerating progress

A reinforcing AI-robot-data flywheel is now emerging in which improvements in AI enhance robot performance, deployed robots generate new real‑world data, and that data informs further model development. While significant gaps in real world data remain, this flywheel is starting to accelerate progress in improving performance, generalization, and scalability in physical AI.

Mind Robotics, an industrial robotics spin‑out from EV manufacturer Rivian, uses data from Rivian’s high volume production operations to train robots that are deployed back into Rivian’s plants, generating new data for further refinement – illustrating how data flywheels are taking shape in practice.

Robotics ecosystem advances

Breakthroughs in compute are enabling edge inference in real time

Physical AI systems require powerful onboard compute to handle perception, reasoning, and action in real time. Recent advancements in compute power are making this feasible. NVIDIA’s Jetson AGX Thor modules, Qualcomm’s Robotics RB5 platform, and Netherlands-based Axelera AI’s Metis AI platform, for example, aim to bring high-performance AI processing to the edge, enabling robots to run advanced AI models locally.At the same time, training large models as well as fleet level orchestration continue to rely heavily on cloud infrastructure, making hybrid edge-cloud architectures the standard approach for deploying physical AI systems.

Battery innovations are increasing robot uptime

Running compute-heavy AI models onboard places high demands on energy, making battery performance critical to autonomy. Advances in battery chemistry, packaging, and thermal and safety engineering are helping ease runtime constraints for mobile and humanoid robots. Ongoing research on solid-state batteries aims to improve energy performance and safety further. Chinese EV maker XPENG, for example, is exploring solid-state battery technology for humanoid robots to improve energy density and safety, and support compute‑heavy onboard AI tasks.

Falling hardware costs and new business models are widening access to robotics

Declining costs of key hardware components – notably sensors, actuators, and electric motors – are making advanced robotics more economically viable. Humanoid production costs, for example, have fallen roughly 30 fold over the past decade – from about $3 million to around $100,000 (with wide variation across lower‑end and cutting‑edge models), driven by advances in AI reasoning, actuator design, and battery systems.

Actuators are the largest cost component, accounting for around 50% of production costs. At the same time, new business models such as robotics‑as‑a‑service (RaaS) and flexible leasing are making robotics accessible without heavy upfront capital.

Connectivity breakthroughs are unlocking real time autonomy and spatial awareness for robots

Advancements in connectivity are unlocking the edge intelligence and spatial awareness that robots require to learn, navigate, and act effectively in realworld environments. 5G networks provide the reliable, low‑latency, high‑bandwidth connectivity that allows robots to perform perception and control at the edge, while precise wireless geolocation gives them an accurate, continuous understanding of where objects and assets are – something earlier positioning methods could not reliably deliver.

One example of innovation in wireless positioning comes from ZaiNar, a US-based startup developing technology that turns existing 5G, Wi‑Fi, and other wireless networks into sub‑meter‑accurate sensing layers, enabling continuous, real‑time positioning without new hardware or compute.

Humanoid production costs have fallen roughly 30 fold over the past decade

Economic drivers

Economic and demographic pressures are accelerating adoption

Labor shortages are a global challenge as aging populations shrink the workforce. In Europe, the ratio of people aged 65+ to those of working age is expected to rise from 28 to 43 per 100 by 2050. The core 25–54 age group is expected to decline by 35 million, cutting the labor force by about 10 million even as older age groups grow. In the US, one in five Americans is expected to be 65 or older by 2030 as the population continues aging. China’s working‑age population is also declining rapidly, and by 2035 more than 30% of its population will be aged 65 or older.

At the same time, labor costs continue to rise across major economies as employers compete for workers. These combined demographic and economic forces are strengthening the business case for automation.

Surging VC investments are fueling advances in physical AI and robotics

Venture capital (VC) investment in robotics hit an all‑time high in 2025, with robotics companies raising a record $40.7 billion, accounting for 9% of total VC funding and placing the sector among the leading investment categories, alongside AI software. Investment in world models alone surged from $1.4 billion in 2024 to $6.9 billion in 2025.

In addition, major humanoid robotics startups secured some of the largest rounds in the industry, including Figure ($1.5 billion), 1X Technologies ($1.0 billion), Apptronik ($734 million), Agility Robotics ($400 million), and Neura Robotics ($124 million), signaling strong investor conviction in general‑purpose embodied AI.

Investment in world models has surged

$1.4 billion in 2024

$6.9 billion in 2025

The full report

The report Physical AI: Taking human-robot collaboration to the next level draws on a global survey of 1,678 senior executives across 15 industries, complemented by in-depth interviews with experts.

Featuring:

Why physical AI is a game-changer for industry
The growing imperative to adopt physical AI
Scaling physical AI
Humanoids set the stage for general-purpose robotics
Recommendations for accelerating the physical AI revolution

Get the full report