Dwarkesh Patel interviews Eric Jang about rebuilding AlphaGo from scratch, focusing on the core mechanics of Go, Monte Carlo tree search, value/policy networks, and how search turns a raw model into a much stronger player. The conversation then broadens into what AlphaGo implies for scaling laws, reinforcement learning, distillation, off-policy training, and automated AI research.
Watch on YouTube ›Get the market thesis, key claims, assets, contradictions, and follow-up questions from any financial video — then unlock a version personalized to your portfolio, watchlist, and favorite speakers.
This is a long-form technical interview with Eric Jang, introduced as the former VP of AI at 1X Technologies and previously a senior research scientist at Google DeepMind Robotics, discussing his sabbatical project of reconstructing and improving AlphaGo. He starts with why AlphaGo fascinated him: it solved a seemingly intractable search problem with deep learning, and modern tooling has made a project that once required a large research team and millions of dollars achievable on rented compute. A large portion of the discussion carefully explains Go itself, including the board, stone capture, suicide rules, Tromp-Taylor rules, endgame scoring, and why computer Go uses an unambiguous ruleset. From there, Eric builds the conceptual bridge to AlphaGo: a deterministic game with enormous branching factor and depth, where naive search is impossible. …
Near term, the actionable read is that search-plus-value pipelines can still produce large gains quickly if the problem has a strong verifier and a good initialization. The immediate risk is overgeneralizing AlphaGo-style heuristics to open-ended LLM reasoning before the target environment is sufficiently structured.
Over the next several months, the base case is continued progress from distillation, better priors, and more compute-efficient search loops rather than a brand-new algorithmic breakthrough. Validation will come from whether improved labels and value estimates keep compounding without collapsing off-distribution.
Structurally, the transcript argues that many hard problems may be better viewed as search problems that can be compressed into learned forward passes plus lightweight planning. If that holds broadly, the durable regime shift is toward systems that learn to imitate better search, not just raw end-to-end predictors.
AlphaGo was profound because deep learning solved a search problem that was long believed intractable for brute-force methods.
Eric frames AlphaGo as solving a historically intractable search class using deep learning.
KataGo achieved a roughly 40x reduction in compute needed to train a strong Go bot tabula rasa compared with earlier systems.
He explicitly states the 40x figure and says KataGo is very strong.
Monte Carlo tree search improves Go strength by combining policy priors with value estimates and a visit-count-driven exploration rule.
This is the technical core of his explanation of PUCT and MCTS.
Why is AlphaGo interesting, and why did you choose it for your sabbatical project?
He says AlphaGo captivated him because it showed how far deep learning could go on a problem long considered intractable for search. He also wanted to understand how a relatively small network could amortize such deep game-tree simulation, especially after seeing the early breakthroughs in 2014-2016.
When does a Go game end?
He says the game ends either when a player resigns or when both players pass consecutively.
How do you crack Go with AI, and how does AlphaGo work?
He says the approach is to first build intuition around the search process used to choose moves, then add deep learning to make that search efficient and tractable. He frames the rest of the explanation as an implementation-minded walkthrough of AlphaGo's move selection.
Unlock the full claims, asset map, scores, related transcripts, follow-up questions, and AI chat — shaped around your portfolio, watchlist, favorite speakers, and risks.