About the Video

About the Speaker: Andrej Karpathy

<aside> ⚠️

AI Warning

These section overviews, key points and quotes are comprehensively processed with AI.

This is explained further here in the Video Database.

As always, AI-generated content makes mistakes.

These are sanity-checked, formatted and lightly edited by a human, but not as thorough as the content itself, and said human also makes mistakes.

</aside>

Talking Points

[0:00] Introduction: Software is Changing Again

[1:19] Software 1.0, 2.0, and the Emergence of 3.0

Karpathy defines Software 1.0 as traditional code, Software 2.0 as neural network weights (with Hugging Face as its GitHub), and introduces Software 3.0 as LLMs that are programmable in natural language like English, representing a new kind of computer.

Software 1.0: Traditional code written by humans (e.g., Python, C++).
Software 2.0: Neural network weights, tuned by data, not directly written (e.g., AlexNet).
Hugging Face is presented as an equivalent of GitHub for Software 2.0 models.
Software 3.0: LLMs programmable with large libraries, remarkably using English.

"Software 1.0 is the code you write on the computer. Software 2.0 are basically neural networks." "And I think what's changed, and I think it's a fundamental change, is that neural networks became programmable with large libraries... It's a new kind of computer. And in my mind, it's worth giving it the designation of a Software 3.0." "And remarkably, these products are written in English. So it's kind of a very interesting programming language."

ycaistartup2025.006.jpeg

[5:15] The Shift from C++ to Neural Networks at Tesla

At Tesla, the Autopilot system saw a gradual replacement of C++ code (Software 1.0) with increasingly capable neural networks (Software 2.0), illustrating how new software paradigms can subsume older ones.

Initially, Tesla Autopilot had significant C++ code and some neural nets.
Over time, neural networks grew, and C++ code was deleted as functionality migrated to Software 2.0.
This is an example of a new software paradigm "eating through the stack."

"And I kind of observed that over time as we made the Autopilot better, basically the neural network grew in capability and size. And in addition to that, all this C++ code was being completed." "And so the software development stack was quite literally made through the software stack of the Autopilot."

[6:45] Understanding LLMs: A New Computing Paradigm

LLMs can be understood through analogies like utilities (providing "intelligence" on demand), power tools (requiring significant capital), but most accurately as new, complex operating systems.

LLMs as Utilities: Like electricity, trained by labs and served via APIs with demands for low latency and high uptime.
LLMs as Power Tools: CAPEX intensive, involving deep tech trees and R&D secrets.
LLMs as Operating Systems: The most fitting analogy; they are complex software ecosystems with parallels to Windows/macOS (closed) and Linux (open-source potential).
The LLM acts as a CPU, the context window as RAM, orchestrating compute and memory.

"LLM labs, like OpenAI, Gemini, Fungi, etc, they spend time to train the LLMs, and this is kind of equivalent to a built-in AI algorithm, and then there's op-ecs to serve them intelligence over APIs to all of us." "But actually, I think the analogy that makes the most sense, perhaps, is that, in my mind, LLM's have very strong analogies to operating systems..." "So the LLM is a new kind of computer. It's kind of like a CPU equivalent. The context windows are kind of like the memory."

ycaistartup2025.016.jpeg

[12:30] LLMs as 1960s Computing and a Shift in Access

Current LLM development mirrors 1960s computing with expensive, centralized mainframes accessed via time-sharing, but uniquely, this powerful technology is first available to consumers rather than large institutions.

LLM compute is expensive, leading to centralized cloud models and time-sharing.
Interacting with LLMs via text is like using an OS terminal; a general GUI is yet to be invented.
Personal computing for LLMs hasn't fully arrived, though local inference on devices like Mac Minis shows early signs.
Unlike early computers for military/government, LLMs are democratized, with consumers often getting access first.

"we're kind of like in this 1960s-ish era, where LLM compute is still very expensive for this new kind of a computer, and that forces the LLMs to be centralized in the cloud..." "whenever I talk to ChartsQt or some LLM directly in text, I feel like I'm talking to an operating system through the terminal." "And so it's really fascinating to me that we have a new magical computer, and it's like helping the oil bank. It's not helping the government to do something really crazy..."

ycaistartup2025.020.jpeg

[15:40] Summary: The Dawn of Programming New Computers

LLMs are complex, early-stage operating systems, distributed like utilities via time-sharing, and their unprecedented broad availability means it's now time for everyone to start programming these new computers.

LLMs are akin to circa 1960s operating systems.
They are currently available via time-sharing, distributed like a utility.
Unprecedentedly, they are in the hands of everyone, not just governments or large corporations.

"LLMs are complicated operating systems. They're circa 1960s computing, or we do computing already. And they're currently available via time-sharing and distributed like a utility." "And now it is our time to enter the industry and program these computers. It's crazy."

ycaistartup2025.023.jpeg

[16:50] The Psychology of LLMs: "Fallible People Spirits"

LLMs can be thought of as "fallible people spirits"—stochastic simulators with emergent human-like psychology, possessing deep knowledge but also cognitive deficits like hallucination and amnesia.

LLMs are stochastic simulations (autoregressive transformers) with emergent human-like psychology.
Strengths: Deep knowledge, vast memory (akin to "Rain Man").
Weaknesses: Hallucinations, jagged intelligence, anterograde amnesia (context window is only working memory), gullibility and security vulnerabilities.
Programming them involves leveraging superpowers while working around their deficits.

"The way I like to think about LLMs is that they're kind of like little spirits. They are stochastic simulations of, and the simulator in this case happens to be an autoregressive transformer." "So they certainly have superpowers and stuff in some respect, but they also have a bunch of, I would say, cognitive deficits."

ycaistartup2025.032.jpeg