Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

Vanishing Gradients

תוכן מסופק על ידי Hugo Bowne-Anderson. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Hugo Bowne-Anderson או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

3M ago 41:27

MP3•בית הפרקים

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.

Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.

We talk through:

Treating LLM workflows as ETL pipelines for unstructured text
Error analysis: why you need humans reviewing the first 50–100 traces
Guardrails like retries, validators, and “gleaning”
How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs
Cheap vs. expensive models: when to swap for savings
Where agents fit in (and where they don’t)

If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.

LINKS

🎓 Learn more:

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338

63 פרקים

#Tech #Hugo BowneAnderson Its #Data Science #Machine Learning