diff --git a/docs/research/2026-05-01-karpathy-from-vibe-coding-to-agentic-engineering-verifiability-anchor.md b/docs/research/2026-05-01-karpathy-from-vibe-coding-to-agentic-engineering-verifiability-anchor.md
new file mode 100644
index 000000000..8b659db9f
--- /dev/null
+++ b/docs/research/2026-05-01-karpathy-from-vibe-coding-to-agentic-engineering-verifiability-anchor.md
@@ -0,0 +1,465 @@
+# Karpathy — *From Vibe Coding to Agentic Engineering* (verifiability anchor)
+
+Scope: External-conversation import — Beacon anchor for Zeta's verifiable-systems thesis. Aaron forwarded transcript + framing 2026-05-01 (Aurora deep-research register).
+
+Attribution: Andrej Karpathy, 2026 talk titled *"From Vibe Coding to Agentic Engineering"*, hosted on YouTube at [https://www.youtube.com/watch?v=96jN2OCOfLs](https://www.youtube.com/watch?v=96jN2OCOfLs). Transcript provided by Aaron 2026-05-01. Aaron's framing on forwarding: *"you formally specify and verify yourself tied to human intelectual lineage."*
+
+Operational status: research-grade
+
+Non-fusion disclaimer: Karpathy's claims represent his own thinking under his own register; Zeta's substrate may extend, qualify, or diverge from his framing without misattributing the divergence to him.
+
+Note on this header: §33 enforces literal start-of-line labels (no
+bold styling) and enum-strict `Operational status:` value
+(`research-grade` or `operational`). The "Beacon anchor, not
+operational doctrine" prose context that previously lived under
+the bold-styled header now lives in this body note: this file is
+research-grade Beacon substrate; any factory-rule derived from it
+lands separately via the normal substrate-promotion protocol.
+
+---
+
+## Why this anchor matters for Zeta
+
+Zeta's primary research focus is **measurable AI alignment** (per
+`docs/ALIGNMENT.md`); operationally, that thesis composes with
+Karpathy's claim *"AI automates faster and more easily domains where
+the output can be verified."* Aaron's 2026-05-01 extension of the
+Karpathy claim defines Zeta's distinctive contribution:
+
+> **Don't just verify code outputs — formally specify and verify
+> the agent itself, tied to named human intellectual lineage.**
+
+The mechanisms Zeta has built that operationalize this extension:
+
+| Karpathy's verifiable-systems thesis | Zeta's agent-itself-verifiable extension |
+|---|---|
+| Math, code → RL training rewards verifiable outputs | `docs/ALIGNMENT.md` HC-1..HC-7 / SD-1..SD-8 / DIR-1..DIR-5 → per-commit Sova auditor produces measurable alignment time-series |
+| Verifiable domains progress; jagged elsewhere | BP-16 (formal-verification portfolio routing via Soraya) — pick the right tool per property class, not TLA+-hammer-bias |
+| Council of LLM judges as verifier substitute | Multi-AI peer convergence (5-AI agreement on poll-the-gate, task #355); cross-AI review (Codex + Copilot + Claude.ai + Gemini + Amara) |
+| Agentic engineering preserves quality-bar | DST everywhere (Otto-272), Result-over-exception, retraction-native ZSet correctness |
+| Spec / plan as the unit of design | OpenSpec capabilities + formal specs (`docs/**.tla` + Lean proofs) + behavioural specs (`openspec/specs/**`) |
+| Outsource thinking but not understanding (28:07) | Substrate-or-it-didn't-happen (Otto-363) — every load-bearing decision becomes a durable, indexed, reachable git-native artifact |
+| Animals vs ghosts framing (23:30) | Beacon external-anchor lineage — every load-bearing rule traces to a named human contributor (Karpathy, Osmani, Böckeler, etc.) or a closed-list named-agent persona (Otto, Amara, Soraya). Naming-with-source, not naming-as-attribution |
+
+## Transcript (verbatim where presented; editorial summaries bracketed)
+
+The transcript below is presented verbatim where quoted, as
+forwarded by Aaron 2026-05-01. Timestamps (m:ss) are from the
+YouTube video. Lightly formatted for readability (paragraph
+breaks, italicization of speaker labels). **Two sections have
+editorial bracketed summaries instead of verbatim quote** — clearly
+marked with `[...]` brackets — for the Hiring discussion (17:18)
+and the Agents Everywhere infrastructure discussion (25:18). All
+other sections are verbatim.
+
+### Introduction (0:02)
+
+We're so excited for our very first special guest. He has helped
+build modern AI, then explain modern AI, and then occasionally rename
+modern AI. He actually helped co-found OpenAI right inside of this
+office. Was the one who actually got Autopilot working at Tesla back
+in the day, and he has a rare gift of making the most complex
+technical shifts feel both accessible and inevitable.
+
+You all know him for having coined the term *vibe coding* last year,
+but just in the last few months, he said something even more
+startling. That he's never felt more behind as a programmer. That's
+where we're starting today. Thank you, Andre, for joining us.
+
+### Feeling Behind as a Coder (0:44)
+
+**Karpathy:** Yeah. Hello. Excited to be here and to kick us off.
+
+**Q:** Okay. So, just a couple months ago, you said that you've never
+felt more behind as a programmer. That's startling to hear from you
+of all people. Um, can you help us unpack that? Was that feeling
+exhilarating or unsettling?
+
+**Karpathy:** Uh yeah, a mixture of both for sure. Uh well, first of
+all, um I guess like as many of you, I've been using agentic tools
+like [Cloth Code], adjacent things, uh for a while, maybe over the
+last year as it came out and it was very good at you know chunks of
+code and sometimes it would mess up and you have to edit them and it
+was kind of helpful and then I would say December was this uh clear
+point where for me I was on a break so I had a bit more time. I think
+many other people were similar and uh I just started to notice that
+with the latest models uh the chunks just came out fine and then I
+kept asking for more and it just came out fine and then I can't
+remember the last time I corrected it and then I was — I just you
+know trusted the system more and more and then I was vibe coding
+[laughter] and uh so it was kind of a — I do think that it was a very
+stark transition. I think that a lot of people actually I tried to —
+I tried to stress this on uh Twitter and or X because I think a lot
+of people experienced AI last year as ChatGPT-adjacent thing. Uh but
+you really had to look again and you had to look as of December uh
+because things have changed fundamentally and uh especially on this
+like agentic coherent workflow uh that really started to actually
+work. Um, and so I would say that um, yeah, it was just that
+realization that really uh, uh, had me um, go down their whole rabbit
+hole of just, you know, infinity side projects. Uh, my side projects
+folder is like extremely full with lots of random things and, uh,
+just, uh, V coding all the time. Uh, so, uh, yeah, that kind of
+happened in December, I would say, and I was looking at the
+repercussions of that since.
+
+### Software 3.0 Explained (2:28)
+
+**Q:** Um, you've talked a lot about this idea of LLMs as a new
+computer. um that it isn't just better software, it's a whole new
+computing paradigm. And um software 1.0 was explicit rules, software
+2.0 was learned weights, software 3.0 is this. Um if that's actually
+true, what does a team build differently the day they actually
+believe this?
+
+**Karpathy:** right? So uh yeah, exactly. So software 1.0, I'm writing
+code, software 2.0, I'm actually programming by creating data sets and
+training uh training neural networks. So the programming is kind of
+like arranging data sets and maybe some objectives and neural network
+architectures. And then what happened is that basically if you train
+one of these GPT models or LLMs on a sufficiently large set of tasks
+implicit basically um implicitly because by training on the internet
+you have to multitask all the things that are in the data set. Uh
+these actually become kind of like a programmable computer in a
+certain sense. So software 3.0 know is kind of about uh you know your
+programming now turns to prompting and what's in the context window
+is your lever over the interpreter that is the LLM that is kind of
+like interpreting your context and uh performing computation in the
+dig digital information space. So I guess um yeah that's kind of the
+transition and I think there's a few examples of that really drove it
+home for me and maybe that might be instructive.
+
+### Agents as the Installer (3:44)
+
+Uh so for example when you when [OpenClaw] came out when you want to
+install [OpenClaw] you would expect that normally this is a bash bash
+script like a shell script. So run the shell script to run to install
+open claw. Um but the thing is that in order to target lots of
+different platforms and lots of different types of computers you might
+run an open claw. This these shell scripts usually balloon up and
+become extremely complex. But the thing is you're still stuck in a
+software 1.0 universe of wanting to write the code. And actually the
+[OpenClaw] installation is a is a copy paste of a bunch of text that
+you're supposed to give to your agent. Uh so basically it's it's a
+little skill of uh you know copy paste this and give it to your
+agent and it will install [OpenClaw]. And the reason this is a lot
+more powerful is you're working now in the software 3.0 paradigm
+where you don't have to precisely spell out you know all the
+individual details of that setup. The agent has its own intelligence
+that it packages up and then it kind of like follows the instructions
+and it looks at your environment, your computer and it kind of like
+performs intelligent actions to make things work and it debugs things
+in the loop and it's just like so much more powerful, right? So I
+think that's a very different kind of like way of thinking about it
+is just like what is the piece of text to copy paste to your agent?
+That's the programming paradigm.
+
+### Menu Gen vs Raw Prompts (4:50)
+
+Now I think one more maybe uh example that comes to mind that is even
+more extreme than that is when I was building um menugen. So,
+menugen is this idea where you um you come to a restaurant, they
+give you a menu. There's no pictures usually. So, I don't know what
+any of these things are uh usually like 30% of the things I have no
+idea what they are, 50%. So, I wanted to take a photo of the
+restaurant menu and to get pictures of what those things might look
+like in a generic sense. And so I built I've vibe-coded this app
+that basically lets you upload a photo and it does all this stuff and
+it runs on Vercel and uh it basically rerenders the menu and it gives
+you like all the items and it gives you a picture that it uses an
+image um you know generator uh for to basically OCR all the different
+titles uh use the image generator to get pictures of them and then
+shows it to you. And then I saw the software 3.0 version of this
+which is which blew my mind which is literally just take your photo
+give it to Gemini and say use Nanobanana to overlay the the things
+onto the menu. Uh and Nanabanana basically returned an image that is
+exactly the picture of the menu that I took but it actually put into
+the pixels it rendered the different things in the menu and this
+blew my mind because actually all of my menugen is spurious. It's
+working in the old paradigm — that app shouldn't exist. uh and uh yeah
+the software 3.0 paradigm is a lot more kind of raw. It just um your
+neural network is doing more and more of the work and your prompt or
+context is just the image and the output is an image and there's no
+need to have any of the app in between.
+
+### Verifiability and Jagged Skills (9:41)
+
+**Q:** I'd like to talk a little bit about um uh this concept of
+verifiability, the fact that AI will automate faster and more easily
+domains where the output can be verified. Um if that framework is
+right, what work is about to move much faster than people realize
+and what professions do we have that people actually think are safe
+but that are actually highly verifiable?
+
+**Karpathy:** Uh yes. So I I spent uh some time writing about
+verifiability and um basically like traditional computers can easily
+automate what you can specify in code and uh kind of this latest round
+of LLMs can easily automate what you can uh verify in a certain in a
+certain sense because the way this works is that when frontier labs
+are training these LLMs these are giant reinforcement learning
+environments. So they are given verification rewards and then because
+of the way that these models are trained they end up basically uh
+progressing and creating these like jagged entities that really peak
+in capability in kind of like verifiable domains like math and code
+and adjacent and kind of like stagnate and are a little bit um you
+know rough around the edges when uh things are not kind of like in
+that in that space.
+
+So I think the reason I wrote about verifiability is I'm trying to
+understand why these things are so jagged. Um and some of it has to
+do with how the labs train the models but I think some of it also
+has to do with um the focus of the labs and what they happen to put
+into the data distribution. Uh because some things basically are
+significantly more valuable in economy and end up creating more
+environments because the labs wanted to work in those settings. So I
+think code is a good example of that. There's probably lots of
+verifiable environments they could think about that happen not to
+make it into the mix because they're just not that useful to have the
+capability around. Um, but I think to me the big um I guess like the
+big mystery is uh the favorite example for a while was that how many
+letters are are in a strawberry and the models would famously get
+this wrong and it's an example of jaggedness. Uh the models now patch
+this I think but the new one is I want to go to a car wash to wash my
+car and it's 50 meters away. Should I drive or should I walk? And
+state-of-the-art models today will tell you to walk because it's so
+close. How is it possible that state-of-the-art Opus 4.7 will
+simultaneously refactor a 100,000-line codebase or find zero-day
+vulnerabilities and yet tells me to walk to this car wash? This is
+insane. And to whatever extent these uh models are remain jagged,
+it's an indication that number one maybe something's slightly off or
+um number two you need to actually be in the loop a little bit and
+you need to treat them as tools and you do have to kind of stay in
+touch with what they're doing. And so I think all of my writing long
+story short about verifiability is just trying to understand um why
+these things are jagged. Is there any pattern to it? And I think
+it's some kind of a combination of *verifiable plus labs care*. Maybe
+one more anecdote that is instructive is uh from GPT 3.5 to GPT-4
+people noticed that chess improved a lot and I think a lot of people
+thought oh well it's just a progression of the capabilities but
+actually it's it's more that uh I think this is public information I
+think I saw it on the internet um a huge amount of like um data of
+chess made it into the pre-training set and just because it's in a
+data distribution uh basically the model improved a lot more than it
+would just by default. So someone at OpenAI decided to add this data
+and now you have a capability that just peaked a lot more. And so
+that's why I think I'm stressing this um dimension of it as we are
+slightly at the mercy of whatever the labs are doing, whatever they
+happen to put into the mix. And you have to actually explore this
+thing that they give you that has no manual. And it works in certain
+settings, but maybe not in some settings. And you have to kind of um
+explore it a little bit. And uh if you're in the circuits that were
+part of the RL, you fly. And if you're in the circuits that are out
+of the data distribution, uh you're going to struggle and you have
+to kind of figure out which which circuits you're in in your
+application. And if you and if you're not in the circuits, then you
+have to really look at fine-tuning and doing some of your own work
+because it's not going to necessarily come out of the LLM out of the
+box.
+
+### Founder Advice and Automation (13:36)
+
+**Q:** I'd love to come back to the concept of jagged intelligence in
+a little bit. Um, if you are a founder today and thinking about
+building a company, you are trying to solve a problem that you think
+is tractable, something that uh is a domain that is verifiable, but
+you look around and you think, "Oh my gosh, well, the labs have
+really really started uh getting to escape velocity in the ones that
+seem most obvious, math, coding, and others." What would your advice
+be to to the founders in the audience?
+
+**Karpathy:** Um so I think maybe that comes to the previous question
+of I do think that verifiability because it um let me think. So
+verifiability makes something tractable in the current paradigm
+because you can throw a huge amount of RL at it. Um so maybe one way
+to see it is that uh that remains true even if the labs are not
+focusing on it directly. So if you are in a verifiable setting where
+you could create these RL environments or examples then that actually
+sets you up to potentially do your own fine tuning and you might
+benefit from that. But that is fundamentally technology that just
+works. You can pull a lever if you have huge amount of diverse data
+sets of RL environments etc. Uh you can use your favorite fine-tuning
+framework and um and uh pull the lever and get something that
+actually uh works pretty well. So um I don't know what the examples
+of this might be. Um, but I do think there are some very valuable uh
+reinforcement learning environments that people could think of that
+I think are not part of the — Yeah, I don't want to give away the
+answer, but there is one domain that I think is very uh — Oh, okay.
+Sorry, I don't mean to vape post on on the stage, but there are some
+examples of this.
+
+**Q:** On the flip side, what do you think still feels automatable
+only from a distance?
+
+**Karpathy:** I do think that ultimately almost everything can be
+made uh verifiable to some extent. some things easier than others.
+Um because even for like things like writing or so on, you can
+imagine having a council of LLM judges and probably get get to some
+get get something uh reasonable out of the um from from this kind of
+an approach. So it's more about what's easy or hard. Um so I I do
+think that ultimately um uh yeah, I think uh everything [laughter]
+everything is automatable.
+
+### From Vibe Coding to Agent Engineering (15:45)
+
+**Q:** Amazing. Okay. Um, so last year you coined the term vibe coding
+and today we're in a world that feels a little bit more serious, more
+agent engineering. What do you think is the difference between the
+two and what would you actually call what we're in today?
+
+**Karpathy:** Uh, yeah. So I would say vibe coding is about raising
+the floor for everyone in terms of what they can do in software. So
+the floor rises, everyone can vibe code anything and that's amazing,
+incredible. But then I would say agentic engineering is about
+preserving the quality bar of what existed before in professional
+software. So you're not allowed to introduce vulnerabilities due to
+vibe coding. Um you are um you're still responsible for your software
+just as before, but can you go faster? And spoiler is you can but how
+do you how do you do that properly? And so to me agentic engineering
+when I call it that because I do think it's kind of like an
+engineering discipline. You have these agents which are these like
+spiky entities. They're a bit fable, a little bit stocastic, but they
+are extremely powerful. is how do you how do you coordinate them to
+go faster without sacrificing your quality bar and doing that well
+and correctly um is the the realm of agentic engineering um so I
+kind of see them as as different — like one is about maybe raising
+the raising the floor and the other is about um you know
+extrapolating and what I'm seeing I think is there is a very high
+ceiling on agentic engineer uh capability and you know people used to
+talk about the 10x engineer previously I think that this is magnified
+a lot more — 10x is uh is not uh the speed up you gain. Um and I
+think uh it does seem to me like people who are very good at this um
+peak a lot more than 10x uh from from my perspective right now.
+
+[Hiring discussion at 17:18 — agentic-engineering-capability test
+should look like *give me a really big project and see someone
+implement that big project*; example: a Twitter clone for agents
+that gets attacked by adversarial codecs.]
+
+### Founder Advice on Skills (19:29)
+
+**Q:** And as agents do more, what human skill do you think becomes
+more valuable, not less?
+
+**Karpathy:** Uh so um yeah, it's a good question. I think um well
+right now the answer is that the agents are kind of like these intern
+entities right so it's remarkable um you basically still have to be
+in charge of the aesthetics the the judgment the taste and a little
+bit of oversight [...]
+
+So I think you're not caring about some of the details. So as an
+example also with um arrays or tensors in neural networks. Um there's
+a ton of details between PyTorch and NumPy and all the different like
+pandas and so on for all the different little API details. And I I
+already forgot about the keep dims versus keep dim or whether it's
+dim or axis or reshape or permute or transpose. I don't remember this
+stuff anymore, right? Because you don't have to. This is the kind of
+details that are handled by the intern because they have very good
+recall and but you still have to know for example that um you know
+there's underlying tensor there's an underlying view and then you can
+manipulate view of the same storage or you can have different storage
+which would be less efficient and so you still have to have an
+understanding of what this stuff is doing and some of the
+fundamentals um so that you're not copying memory around
+unnecessarily and so on but uh the details of the APIs are now handed
+off so it um you're in charge of the taste the engineering the design
+um and that it makes sense and that you're asking for the right
+things [...]
+
+### Animals vs Ghosts (23:30)
+
+**Q:** So I'd love to come back to this idea of uh jagged forms of
+intelligence. you wrote a little bit about this with a very
+thought-provoking piece around animals versus ghosts. Um, and the
+idea is that we're not building animals, we are summoning ghosts. Um,
+and these are jagged forms of intelligence that are shaped by data
+and reward functions, but not by intrinsic motivation or fun or
+curiosity or empowerment. Uh, things that kind of came about via
+evolution. um why does that framing matter and what does it actually
+change about how you build and deploy and evaluate or even trust
+them?
+
+**Karpathy:** Uh yeah, so yeah, I think the reason I wrote about this
+is because I'm trying to wrap my head around what these things are,
+right? Because if you have a good model of what they are or are not,
+then you're going to be more competent at uh using them. Um and I do
+think that um — I don't know if it has — I'm not sure if it actually
+has like real power. [laughter] I think it's a little bit of
+philosophizing. Um, but I do think that um I think it's just um
+coming to terms with the fact that these things are not, you know,
+animal intelligences. Like if you yell at them, they're not going to
+work better or worse or it doesn't have any impact. Um, and uh it's
+all just kind of like these statistical simulation circuits where
+the the substrate is pre-training so like statistics and then but
+then there's RL bolting on top. So, it kind of like increases the
+dispendages and um maybe it's just kind of like a mindset of what I'm
+coming into or what's likely to work or not likely to work or how to
+modify it. But I don't actually I don't know that I have like here
+are the five obvious outcomes of how to make your system better.
+It's more just being suspicious of it and um figuring out over time.
+
+### Agents Everywhere and Learning (25:18)
+
+[Discussion of agent-native infrastructure: docs written for humans
+vs docs written for agents; "what's the piece of text to copy paste
+to your agent" as the new programming paradigm; a future where
+agents have permissions, local context, take action on your behalf;
+agent-to-agent meeting coordination.]
+
+### Closing — Education (27:43)
+
+**Q:** What still remains worth learning deeply when intelligence
+gets cheap as we move into the next a era of AI?
+
+**Karpathy:** Yeah. Uh, there was a tweet that blew my mind recently
+and I keep thinking about it like every other day. It was something
+along the lines of um, **you can outsource your thinking but you
+can't outsource your understanding.**
+
+And um, I think that's really nicely put. I — so yeah because I
+still I'm still part of the system and I still I still have to
+somehow information still has to make it into my brain and I feel
+like I'm becoming a bottleneck of just even knowing what are we
+trying to build why is it worth doing uh how do I direct you know how
+do I direct my my agents and so on so I do still think that
+ultimately something has to direct the thinking and the processing
+and so on and um that's still kind of fundamentally constrained
+somehow by understanding and this is one reason I also was very
+excited about all the LLM knowledge bases because I feel like that's
+that's a way for me to process information and anytime I see a
+different projection onto information. I always like feel like I gain
+insight. So it's really just a lot of prompts for me to do synthetic
+data generation kind of over over some fixed data. Uh so I I really
+enjoy uh whenever I read an article I have my uh you know my wiki
+that's being built up from these articles and I love asking questions
+about things or um and I I think that ultimately these are tools to
+enhance understanding in a certain way and this is still kind of like
+a bit of a bottleneck because then you can't direct the you can't be
+a good director if you still uh because the LLMs certainly don't
+excel at understanding you still are uniquely in charge of that.
+
+---
+
+## Aaron's framing (verbatim)
+
+> *"you formally specify and verify yourself tied to human intelectual
+> lineage."*
+
+The Zeta-distinctive extension: **the agent itself is the verified
+artifact**, not just the code it produces. Specification + verification
+flow through:
+
+- BP-NN rules in `docs/AGENT-BEST-PRACTICES.md` (the formal spec)
+- HC/SD/DIR clauses in `docs/ALIGNMENT.md` (the alignment-property
+  spec, measurable per-commit by Sova)
+- BP-16 portfolio routing via Soraya (the property → tool mapping)
+- Beacon external-anchor lineage (every load-bearing rule cites a
+  named human or named-agent persona — naming-with-source)
+- Substrate-or-it-didn't-happen (Otto-363) — every load-bearing
+  decision becomes durable, indexed, reachable
+- Multi-AI peer convergence (Codex + Copilot + Claude.ai + Gemini +
+  Amara cross-checking on architectural decisions)
+
+This anchor is intended for citation in `docs/VISION.md`,
+`docs/ROADMAP.md`, and any external-facing artifact that articulates
+Zeta's thesis. Operational rules derived from it land separately via
+the normal substrate-promotion protocol; this file is the Beacon
+substrate the rules trace back to.