This is a sponsored interview with NVIDIA robotics product lead Spencer Hang about how robotics / physical AI is being built. The core message is that robotics needs a three-computer stack: one computer to train models, one to simulate and evaluate them, and one deployed on the robot for real-world inference. The conversation repeatedly argues that video and language are not enough; robots need synthetic, contact, action, and physical interaction data to close the sim-to-real gap.
Watch on YouTube ›Get the market thesis, key claims, assets, contradictions, and follow-up questions from any financial video — then unlock a version personalized to your portfolio, watchlist, and favorite speakers.
The interview centers on NVIDIA’s view of robotics as a full stack problem rather than just a humanoid-robot hardware race. Spencer Hang frames the company’s approach around Jensen Huang’s “three computer solution”: a training computer for building the brain/model, a simulation computer for testing the model in a proxy world, and an edge/deployment computer for running the model on the actual robot. In that framing, DGX is for training, Omniverse and Cosmos are for simulation and world modeling, and IGX / AGX Jetson are for deployment in the physical world. A major theme is that physical AI differs from LLMs because there is not yet a rich corpus of “real world” interaction data for touch, contact, elasticity, manipulation, and force feedback. …
Near term, this looks like a bullish catalyst for NVIDIA’s robotics narrative into GTC, with the market likely to react to demos and ecosystem announcements more than to immediate revenue. The tactical risk is that expectations outpace what is commercially ready, especially for dexterous or surgical use cases.
Over the next few months, the base case is incremental validation of the robotics stack through simulation tools, benchmarks, and developer adoption rather than a sudden commercialization jump. The thesis strengthens if more tasks move from demo to hardware-in-the-loop and then to repeatable deployment.
Structurally, the interview argues that robotics will become a major compute platform and that NVIDIA intends to own much of the enabling infrastructure. The durable question is whether physical AI really follows the same scaling logic as digital AI, or whether hardware, safety, and interaction complexity slow the regime shift.
Video data alone is insufficient for physical AI because it provides semantic reasoning but not information about how objects physically interact with each other.
The speaker distinguishes between semantic understanding (what video models provide) and physical interaction data, which is the missing gap that defines physical AI.
Simulated/synthetic data can compensate for the lack of real-world physical interaction data needed to train robotic AI models.
The speaker argues that real physical interaction data (contact data) doesn't exist in large quantities, unlike text data for LLMs, so simulation must fill the gap.
World models like Cosmos trained on the dynamics of the world will be game changers for robotics by enabling neural simulation for data generation, policy evaluation, and onboard reasoning.
The speaker explains that world models trained on physical dynamics can be used for data generation, policy evaluation, and eventually onboard reasoning for robots, similar to how Alpha Maye works for autonomous vehicles.
Can you explain Nvidia's approach to robotics at a high level?
Spencer explains Nvidia’s “three computer” stack for robotics: one computer trains the brain/model, one simulates the world for testing and skill development, and a third is deployed in the real world on hardware like IGX and Jetson/AGX. He frames this as moving from brain training to simulation to physical deployment.
Why isn't video data enough for physical AI?
He says video models mainly provide semantic understanding of how objects relate in the world, but they do not capture how the world responds when you physically interact with it. The missing piece is contact and interaction data, such as how a finger, hand, or tool behaves against soft or rigid materials.
How do you know when simulated data is good enough?
He calls that a hard, almost million-dollar question and says synthetic data is more art than science. Because physical AI lacks the kind of established real-data corpora that helped train LLMs, teams are still figuring out what counts as a good demonstration, what modalities matter, and which data dimensions actually improve the model.
Unlock the full claims, asset map, scores, related transcripts, follow-up questions, and AI chat — shaped around your portfolio, watchlist, favorite speakers, and risks.