Skip to content
James Jung
Go back

AI Engineer Europe 2026 — Session Summaries and Links

Europe’s first flagship AI Engineer conference ran April 8–10, 2026 at the Queen Elizabeth II Centre in London. Over 1,000 AI engineers, founders, and researchers across 11 technical tracks and 23 hands-on workshops — a smaller, sharper event than the San Francisco World’s Fair, with a distinctly different editorial tone.

Same deal as the Google Cloud Next and Code with Claude posts: AI summarized the sessions so you can scan what’s worth your time and pick what to watch. All recordings are on the AIE Europe 2026 YouTube playlist.


🏴󠁧󠁢󠁥󠁮󠁧󠁿 AI Engineer Europe 2026

Conference: AI Engineer Europe 2026
Dates: April 8–10, 2026
Location: Queen Elizabeth II Centre, London, UK
Scale: 1,000+ attendees · 100+ speakers · 11 technical tracks · 23 workshops
Organizer: swyx and the AI Engineer team

All session recordings: YouTube playlist


The Big Picture

If the 2025 AI Engineer World’s Fair was about “what can agents do?”, this London edition was about “what does it cost when you build fast but wrong?” The recurring counter-narrative across sessions: AI doesn’t lower the cost of bad code — it amplifies it. Agents are optimized to write code that runs, not code that is maintainable, and the entropy accumulates faster than ever. That message came from the infrastructure side (Armin Ronacher on agent-legible codebases), the model side (Sarah Chieng on fast models needing slow developers), the tooling side (Matt Pocock on why software fundamentals matter more now), and the research side (Raia Hadsell on where AI actually goes next).

The other dominant story: MCP is becoming the connective tissue of the agent ecosystem. 110 million SDK downloads per month in 12 months. David Soria Parra laid out a roadmap where MCP stops being a tool-connectivity layer and starts distributing domain knowledge itself — semantic context agents can reason over, not just endpoints to call.

And Malte Ubl made the European case: the real value in AI engineering isn’t being built in the foundation model labs. It’s being built in the application layer, and that’s where European teams are winning.


Day 1 — April 8

The New Application Layer — Malte Ubl, CTO Vercel

📺 Watch

The economic framing that set the tone for the whole conference. Ubl’s thesis: AI agents have shattered the old ceiling on what software is economically viable to build. Not all automation that should exist could be justified before — the manual coding cost made too many things unviable. Now we’re running a live experiment in economic elasticity: as software gets cheaper to build, we build more of it. Developer demand isn’t shrinking; it’s expanding.

But the more interesting half of the talk was about infrastructure. At Vercel, humans are now a minority: over 60% of page views over the past week were generated by AI agents. That changes what you need to build. Ubl’s three design principles for the agent era: API-first (every UI needs a parallel CLI/API that an agent can use), sandboxed execution (separate the agent harness from the execution environment — the 1999 security nightmare is coming again if you don’t), and agent archetypes (practical patterns: compressed research, information surfacing, toil elimination — Vercel’s internal support agent achieves 90% deflection). He also highlighted his own open-source tool just bash, a TypeScript-based bash interpreter with nanosecond startup times for sandboxed agentic execution.

On Europe specifically: Ubl pushed back on the narrative that Europe is losing the AI race. The AI SDK (Berlin-based Jared Gammel), Pi coding agent (Austrian-built), and OpenRouter are all European-origin application-layer infrastructure. His argument: when models commoditize, the value shifts to the engineers building the stable, model-agnostic layer on top — and that’s where European teams are competing.


State of the Claw — Peter Steinberger (OpenClaw Foundation / OpenAI)

📺 Watch

The OpenClaw project has become the fastest-growing open-source project in GitHub history in under five months: nearly 30,000 commits, 2,000 contributors, engineers from Nvidia, Microsoft, Red Hat, Tencent, and ByteDance now active. Steinberger (the project’s creator, now also at OpenAI) calls running it “a company on hard mode” — all volunteer-driven, with the bus factor as the primary engineering risk.

The security section was the most interesting. OpenClaw has logged 1,142 security advisories — about 16.6 per day. Steinberger’s read: the higher the criticality score on any individual report, the more likely it is to be “slop” generated by automated agents chasing security community credits. Most CVSS 10.0 issues require users to actively fight the recommended security setup to be exposed at all. The lesson he draws: you can’t build security-by-default for every possible misuse, but you can document the correct setup clearly and enforce it by default. Follow the docs and you’re fine.

On the future: Steinberger wants agents that are truly ubiquitous — talking from any room, intelligently using whatever display is nearby. He’s also exploring “dreaming” for agents: a scheduled process that garbage-collects memory logs during idle time, consolidating long-term knowledge without explicit user instruction. (Anthropic shipped something similar at Code with Claude the following month.) His closing argument: the human edge that remains is taste and system design — knowing which tasks are worth building and when to say no. AI generates code trivially; the challenge has become what to build and how to keep it coherent.


The Future of MCP — David Soria Parra (Anthropic)

📺 Watch

The stat that opened this talk: MCP hit 110 million SDK downloads per month within 12 months of launch. Soria Parra framed the current moment as the transition from “demos to production” — 2024 was the demo era, 2025 was coding agents, 2026 is the year agents go to production.

The key concept Soria Parra introduced is MCP Applications: agents that ship their own interface, served over an MCP server, deployable anywhere — Claude, ChatGPT, VS Code, Cursor — without modification. This requires proper semantics: both client and server need to understand each other, including how to render UI, not just exchange data. He introduced MCP Elicitation as the mechanism for this — a formal protocol for an MCP server to request structured input from an agent’s host environment.

The bigger 2026 roadmap move: MCP stops being purely a tool-connectivity layer and becomes a knowledge-distribution layer. The same protocol that connects tools will carry domain knowledge — structured semantic context that agents can reason over, not just API endpoints to call. The vision: an agent that knows how your organization works, what your data means, and what constraints govern your domain — all delivered via MCP alongside the tools. Soria Parra called 2026 agents capable of “applying a wide range of skills, composing complex calls using MCP and CLI, and connecting to various services” — general knowledge workers, not just coding assistants.


Day 2 — April 9

Beyond the Chatbot: The Frontier of Intelligence — Raia Hadsell, VP of Research, Google DeepMind

📺 Watch

The talk that had the longest time horizon of anything at the conference. Hadsell (PhD under Yann LeCun, now leading 1,200 scientists across 10 global DeepMind labs) opened with a provocation: while the industry obsesses over generative models, the most important and underrated work in AI is in embedding models. The framing device was the “Jennifer Aniston cell” — a neuroscience finding that specific neurons fire for a single concept regardless of how information arrives (photo, voice, written name). DeepMind is building that same cross-modal robustness into Gemini Embeddings 2: truly omnimodal (text, video, audio, documents), with a single vector representing up to 8,000 tokens, 128 seconds of video, or 80 seconds of audio. Matryoshka Representation Learning lets the same model operate at 256 dimensions for fast retrieval and expand for precision — the same architecture, different resolution.

The weather section showed the practical ceiling of this: AI weather models trained on 40 years of global data are now outperforming physics-based supercomputer simulations. GraphCast predicted Hurricane Lee’s Nova Scotia landfall 9 days out; conventional models managed 6. GenCast is more accurate than top benchmarks 97% of the time and produces a 15-day global forecast in 8 minutes on a single chip. FGN (Functional Generative Network) predicts cyclones directly (trajectory, wind speed, eye formation) and is already deployed at the US National Hurricane Center.

The capstone was Genie 3: a world model that generates high-definition, consistent, interactive 3D environments in real time. Demo showed walking down the Camden Canal in London and reshaping the entire world’s visual style via prompt while inside the simulation — with the world maintaining memory and physical consistency as you move through it. Hadsell’s framing for where this goes: new forms of entertainment and education built on real-time, promptable world simulation.


Fast Models Need Slow Developers — Sarah Chieng, Cerebras

📺 Check the AIE Europe playlist

The best-titled talk at the conference, and a deliberate provocation from the maker of the fastest inference hardware available. Chieng’s argument: inference speed has been a forcing function for bad developer habits. When models respond in milliseconds, developers iterate less carefully, rely on vibes over evals, and don’t catch the errors that slower iteration would surface. The research she cited: clean code measurably amplifies AI productivity gains. Code that follows clear conventions — one function, one task, clear names, no hidden side effects, error messages that describe specific problems — lets AI tools work faster and produce more reliable outputs.

The practical claim: without evals and maintainability discipline, the 2026 AI engineering stack is just a faster way to ship worse software. Fast inference makes this problem worse, not better, because it removes the natural friction that previously slowed down bad decisions. “AI speed without human control doesn’t produce innovation. It produces uncontrollable junk.” The remedy is the same things that always mattered: clean design, clear responsibilities in the code, and systematic testing.


Building pi in a World of Slop — Mario Zechner

📺 Watch

The contrarian case against the current AI coding agent stack, from the creator of libGDX. Zechner’s provocation: most AI coding tools are optimized to generate code that looks right, not code that is right — and the result is an accelerating wave of “slop” that is technically functional but unmaintainable, untestable, and incoherent at the system level.

His response was Pi: a minimalist agentic coding tool built on exactly four primitives — read, write, edit, bash. No scaffolding, no magic, no framework abstractions. The constraint is deliberate: by forcing the agent to operate with the same tools a careful human developer would use, Pi produces code that a human can actually understand and audit. Zechner’s argument isn’t that agents are bad — it’s that the current generation of tools creates a false impression of productivity by hiding complexity rather than eliminating it.


The Friction Is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro (Earendil)

📺 Watch

The companion talk to Zechner’s Pi keynote, from Ronacher (creator of Flask, Jinja2, and Click) and Cubeiro. Where Zechner attacked the tooling, Ronacher attacked the architecture. His case: codebases need to be agent-legible — designed to be read by AI without ambiguity — and most codebases today are not.

The properties of an agent-legible codebase: each function has exactly one task with a clear name, the written code matches actual behavior with no hidden side effects, and error messages describe specific problems instead of generic failures. The argument for why this matters now: AI agents traverse codebases non-linearly and don’t hold context the way a human who wrote the code does. Ambiguous or side-effect-heavy code creates hallucinated understanding that compounds with each agent hop. Clean code isn’t a style preference — it’s an agent reliability requirement.

The “Friction Is Your Judgment” framing: the points where writing clean code feels slow and annoying are exactly the points where human judgment is most needed. That friction is you noticing that something is genuinely complicated and needs thought. Automating away the friction doesn’t remove the complexity — it hides it until it breaks in production. The goal isn’t to remove human judgment from coding; it’s to make the parts that require human judgment as visible and undeniable as possible.


Day 3 — April 10

It Ain’t Broke: Why Software Fundamentals Matter More Than Ever — Matt Pocock (AI Hero)

📺 Watch

Pocock (creator of the TypeScript learning platform Total TypeScript, now building AI Hero) gave the fundamentalist counterweight to the vibe-coding hype. The core claim: the basics of software engineering — type safety, clear interfaces, separation of concerns, testability — are not quaint pre-AI artifacts. They are what allow AI tools to work well on your codebase.

The observation driving the talk: AI tools that autocomplete, generate, and refactor are dramatically better on codebases with strong type systems and clear contracts than on loose, dynamically-typed, side-effect-heavy code. TypeScript with strict mode isn’t slowing down AI tooling — it’s giving it a reliable model of the code to reason over. Teams ignoring fundamentals in the rush to ship with AI assistance are accumulating the fastest technical debt in software history.

Pocock’s practical advice: invest in TypeScript strict mode, keep functions small and pure, write meaningful tests (not just for coverage), and treat type errors as free evals on AI-generated code. The developers getting the most out of AI tooling are the ones who have maintained discipline on the fundamentals — not the ones who abandoned it.


/Spec27 Launch — Safe Intelligence

(Expo session — no standalone recording; see the Safe Intelligence launch post)

Safe Intelligence launched /Spec27 on the opening day of the conference and used the three days to workshop it with the engineering, founder, and product community in attendance. Spec27 is a validation framework aimed at the most pressing challenge in agent deployment: evaluating whether an agent is actually doing what you think it’s doing, at the specification level rather than the output level.

The core problem Spec27 addresses: most agent evals test whether outputs look right to a human reviewer, which doesn’t scale and doesn’t catch the failure modes that matter in production. Spec27 formalizes the expected behavior of an agent as a machine-readable specification, then tests the agent’s behavior against the spec across a distribution of inputs — including edge cases and adversarial inputs that human review would miss. The approach is validation-first: write the spec before you build the agent, and use it throughout development as a continuous harness.


Key Takeaways

1. “Agents in production” is the new mandate — and it’s harder than it sounds. The transition from demo to production requires evals, governance, and architectural discipline that most teams haven’t built yet. David Soria Parra and Safe Intelligence’s /Spec27 both point at the same gap: systematic validation of agent behavior is the missing piece.

2. The code quality argument runs in both directions. AI makes it cheaper to write code. It also makes the cost of bad code higher — because AI amplifies whatever is already in your codebase. Clean functions, clear interfaces, and strong type systems are no longer optional hygiene. They’re agent reliability infrastructure.

3. MCP is becoming a knowledge layer, not just a tools layer. 110M downloads/month validates the protocol. The next phase — distributing domain knowledge via MCP, not just tool connections — is the architectural shift worth watching.

4. The application layer is where the value accrues. Not the model, not the chip, not even the orchestration framework. The stable, model-agnostic application layer that users experience. That’s where the durable businesses and technical moats are forming.

5. The human edge is taste, judgment, and system design. Multiple speakers landed here from very different starting points. AI can’t know which tasks are worth building, when to say no, or how to keep a system coherent over time. Cultivating that judgment is the career investment that compounds.

6. The “fast models need slow developers” thesis will age well. Every cycle of faster inference will tempt teams to iterate less carefully. The teams who resist that temptation — who maintain evals, write specs, and keep their codebases legible — will widen their advantage over teams who don’t.

7. Dreaming / memory consolidation is a real pattern. Both Peter Steinberger (OpenClaw) and Anthropic (Code with Claude, one month later) independently shipped or discussed scheduled processes that review agent sessions and consolidate memory. This is becoming a standard pattern for long-running agents.


Sources


Share this post on:

Next Post
Code with Claude 2026 — Session Summaries and Links