התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 Encore: Will Poulter, Dave Beran, and The Bear 52:22
#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions
Manage episode 483844761 series 3585930
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
- Why JSON constraints are the wrong hammer—and how Schema-Aligned Parsing fixes it
- Whether “durable” should be a first-class keyword (think async/await for crash-safety)
- Shipping multi-language AI pipelines without forcing a Python microservice
- Token-bloat surgery, symbol tuning, and the myth of magic prompts
- How to keep humans sharp when 98 % of agent outputs are already correct
💡 Core Concepts
- Schema-Aligned Parsing (SAP)
- Parse first, panic later. The model can handle Markdown, half-baked YAML, or rogue quotes—SAP puts it into your declared type or raises. No silent corruption.
- Symbol Tuning
- Labels eat up tokens and often don’t help with your accuracy (in some cases they even hurt). Rename PasswordReset to C7, keep the description human-readable.
- Durable Execution
- Durable execution refers to a computing paradigm where program execution state persists despite failures, interruptions, or crashes. It ensures that operations resume exactly where they left off, maintaining progress even when systems go down.
- Prompt Compression
- Every extra token is latency, cost, and entropy. Axe filler words until the prompt reads like assembly. If output degrades, you cut too deep—back off one line.
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
- Newsletter
- X / Twitter
- Bluesky
- Website
- My Agency Aisbach (for ai implementations / strategy)
⏱️ Important Moments
- New DSL vs. Python Glue [00:54]
- Why bolting yet another microservice onto your stack is cowardice; BAML compiles instead of copies.
- Three-Nines on Flaky Models [04:27]
- Designing retries, fallbacks, and human overrides when GPT eats dirt 5 % of the time.
- Native Go SDK & OpenAPI Fatigue [06:32]
- Killing thousand-line generated clients; typing go get instead.
- “LLM = Pure Function” Mental Model [15:58]
- Replace mysticism with f(input) → output; unit-test like any other function.
- Tool-Calling as a Switch Statement [18:19]
- Multi-tool orchestration boils down to switch(action) {…}—no cosmic “agent” needed.
- Sneak Peek—durable Keyword [24:49]
- Crash-safe workflows without shoving state into S3 and praying.
- Symbol Tuning Demo [31:35]
- Swapping verbose labels for C0,C1 slashes token cost and bias in one shot.
- Inside SAP Coercion Logic [47:31]
- Int arrays to ints, scalars to lists, bad casts raise—deterministic, no LLM in the loop.
- Frameworks vs. Primitives Rant [52:32]
- Why BAML ships primitives and leaves the “batteries” to you—less magic, more control.
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
- Hit subscribe right now to help me understand what content resonates with you
- If you found value in this post, share it with one other developer or tech professional who's working with AI
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️
58 פרקים
Manage episode 483844761 series 3585930
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
- Why JSON constraints are the wrong hammer—and how Schema-Aligned Parsing fixes it
- Whether “durable” should be a first-class keyword (think async/await for crash-safety)
- Shipping multi-language AI pipelines without forcing a Python microservice
- Token-bloat surgery, symbol tuning, and the myth of magic prompts
- How to keep humans sharp when 98 % of agent outputs are already correct
💡 Core Concepts
- Schema-Aligned Parsing (SAP)
- Parse first, panic later. The model can handle Markdown, half-baked YAML, or rogue quotes—SAP puts it into your declared type or raises. No silent corruption.
- Symbol Tuning
- Labels eat up tokens and often don’t help with your accuracy (in some cases they even hurt). Rename PasswordReset to C7, keep the description human-readable.
- Durable Execution
- Durable execution refers to a computing paradigm where program execution state persists despite failures, interruptions, or crashes. It ensures that operations resume exactly where they left off, maintaining progress even when systems go down.
- Prompt Compression
- Every extra token is latency, cost, and entropy. Axe filler words until the prompt reads like assembly. If output degrades, you cut too deep—back off one line.
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
- Newsletter
- X / Twitter
- Bluesky
- Website
- My Agency Aisbach (for ai implementations / strategy)
⏱️ Important Moments
- New DSL vs. Python Glue [00:54]
- Why bolting yet another microservice onto your stack is cowardice; BAML compiles instead of copies.
- Three-Nines on Flaky Models [04:27]
- Designing retries, fallbacks, and human overrides when GPT eats dirt 5 % of the time.
- Native Go SDK & OpenAPI Fatigue [06:32]
- Killing thousand-line generated clients; typing go get instead.
- “LLM = Pure Function” Mental Model [15:58]
- Replace mysticism with f(input) → output; unit-test like any other function.
- Tool-Calling as a Switch Statement [18:19]
- Multi-tool orchestration boils down to switch(action) {…}—no cosmic “agent” needed.
- Sneak Peek—durable Keyword [24:49]
- Crash-safe workflows without shoving state into S3 and praying.
- Symbol Tuning Demo [31:35]
- Swapping verbose labels for C0,C1 slashes token cost and bias in one shot.
- Inside SAP Coercion Logic [47:31]
- Int arrays to ints, scalars to lists, bad casts raise—deterministic, no LLM in the loop.
- Frameworks vs. Primitives Rant [52:32]
- Why BAML ships primitives and leaves the “batteries” to you—less magic, more control.
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
- Hit subscribe right now to help me understand what content resonates with you
- If you found value in this post, share it with one other developer or tech professional who's working with AI
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️
58 פרקים
כל הפרקים
×
1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:52

1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:58

1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:01

1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:39

1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:35

1 #048 Why Your AI Agents Need Permission to Act, Not Just Read 57:03

1 #047 Architecting Information for Search, Humans, and Artificial Intelligence 57:22

1 #046 Building a Search Database From First Principles 53:29

1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:44

1 #044 Graphs Aren't Just For Specialists Anymore 1:03:35

1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:59

1 #042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:44

1 #041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:35

1 #040 Vector Database Quantization, Product, Binary, and Scalar 52:12

1 #039 Local-First Search, How to Push Search To End-Devices 53:09

1 #038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:24

1 #037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces 49:13

1 #036 How AI Can Start Teaching Itself - Synthetic Data Deep Dive 48:11

1 #035 A Search System That Learns As You Use It (Agentic RAG) 45:30

1 #034 Rethinking Search Inside Postgres, From Lexemes to BM25 47:16

1 #033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data) 51:26

1 #032 Improving Documentation Quality for RAG Systems 46:37

1 #031 BM25 As The Workhorse Of Search; Vectors Are Its Visionary Cousin 54:05

1 #030 Vector Search at Scale, Why One Size Doesn't Fit All 36:26

1 #029 Search Systems at Scale, Avoiding Local Maxima and Other Engineering Lessons 54:47

1 #028 Training Multi-Modal AI, Inside the Jina CLIP Embedding Model 49:22

1 #027 Building the database for AI, Multi-modal AI, Multi-modal Storage 44:54

1 #026 Embedding Numbers, Categories, Locations, Images, Text, and The World 46:44

1 #025 Data Models to Remove Ambiguity from AI and Search 58:40

1 #024 How ColPali is Changing Information Retrieval 54:57

1 #023 The Power of Rerankers in Modern Search 42:29

1 #022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) 46:06

1 #021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them 50:09

1 #020 The Evolution of Search, Finding Search Signals, GenAI Augmented Retrieval 52:16

1 #019 Data-driven Search Optimization, Analysing Relevance 51:14

1 #018 Query Understanding: Doing The Work Before The Query Hits The Database 53:02


1 #017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI 36:28

1 #016 Data Processing for AI, Integrating AI into Data Pipelines, Spark 46:26

1 #015 Building AI Agents for the Enterprise, Agent Cost Controls, Seamless UX 35:12

1 #014 Building Predictable Agents through Prompting, Compression, and Memory Strategies 32:14

1 Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3 14:53

1 #013 ETL for LLMs, Integrating and Normalizing Unstructured Data 36:48

1 #012 Serverless Data Orchestration, AI in the Data Stack, AI Pipelines 28:06

1 #011 Mastering Vector Databases, Product & Binary Quantization, Multi-Vector Search 40:06

1 #010 Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage 45:33

1 #009 Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack 27:53

1 #008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models 36:40

1 #007 Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture 38:12

1 #006 Data Orchestration Tools, Choosing the right one for your needs 32:37

1 #005 Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals 29:40

1 Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2 21:33

1 #004 AI with Supabase, Postgres Configuration, Real-Time Processing, and more 31:57

1 #003 AI Inside Your Database, Real-Time AI, Declarative ML/AI 36:04

1 Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1 13:37

1 #002 AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation 37:09

1 #001 Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure at LanceDB 34:04
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.