התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות

#046 Building a Search Database From First Principles
Manage episode 471129384 series 3585930
Modern search is broken. There are too many pieces that are glued together.
- Vector databases for semantic search
- Text engines for keywords
- Rerankers to fix the results
- LLMs to understand queries
- Metadata filters for precision
Each piece works well alone.
Together, they often become a mess.
When you glue these systems together, you create:
- Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?
- Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.
- Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.
- Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.
- Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.
I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.
A lot of times, the query had to be run multiple times to achieve the desired amount.
So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.
Today on How AI Is Built, we are talking to Marek Galovic from TopK.
We talk about how they built a new search database with modern components. "How would search work if we built it today?”
Cloud storage is cheap. Compute is fast. Memory is plentiful.
One system that handles vectors, text, and filters together - not three systems duct-taped into one.
One pass handles everything:
Vector search + Text search + Filters → Single sorted resultBuilt with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.
The goal is to do search in 5 lines of code.
Marek Galovic:
Nicolay Gerold:
00:00 Introduction to TopK and Snowflake Comparison
00:35 Architectural Patterns and Custom Formats
01:30 Query Execution Engine Explained
02:56 Distributed Systems and Rust
04:12 Query Execution Process
06:56 Custom File Formats for Search
11:45 Handling Distributed Queries
16:28 Consistency Models and Use Cases
26:47 Exploring Database Versioning and Snapshots
27:27 Performance Benchmarks: Rust vs. C/C++
29:02 Scaling and Latency in Large Datasets
29:39 GPU Acceleration and Use Cases
31:04 Optimizing Search Relevance and Hybrid Search
34:39 Advanced Search Features and Custom Scoring
38:43 Future Directions and Research in AI
47:11 Takeaways for Building AI Applications
63 פרקים
Manage episode 471129384 series 3585930
Modern search is broken. There are too many pieces that are glued together.
- Vector databases for semantic search
- Text engines for keywords
- Rerankers to fix the results
- LLMs to understand queries
- Metadata filters for precision
Each piece works well alone.
Together, they often become a mess.
When you glue these systems together, you create:
- Data Consistency Gaps Your vector store knows about documents your text engine doesn't. Which is right?
- Timing Mismatches New content appears in one system before another. Users see different results depending on which path their query takes.
- Complexity Explosion Every new component doubles your integration points. Three components means three connections. Five means ten.
- Performance Bottlenecks Each hop between systems adds latency. A 200ms search becomes 800ms after passing through four components.
- Brittle Chains When one system fails, your entire search breaks. More pieces mean more breaking points.
I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.
A lot of times, the query had to be run multiple times to achieve the desired amount.
So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.
Today on How AI Is Built, we are talking to Marek Galovic from TopK.
We talk about how they built a new search database with modern components. "How would search work if we built it today?”
Cloud storage is cheap. Compute is fast. Memory is plentiful.
One system that handles vectors, text, and filters together - not three systems duct-taped into one.
One pass handles everything:
Vector search + Text search + Filters → Single sorted resultBuilt with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.
The goal is to do search in 5 lines of code.
Marek Galovic:
Nicolay Gerold:
00:00 Introduction to TopK and Snowflake Comparison
00:35 Architectural Patterns and Custom Formats
01:30 Query Execution Engine Explained
02:56 Distributed Systems and Rust
04:12 Query Execution Process
06:56 Custom File Formats for Search
11:45 Handling Distributed Queries
16:28 Consistency Models and Use Cases
26:47 Exploring Database Versioning and Snapshots
27:27 Performance Benchmarks: Rust vs. C/C++
29:02 Scaling and Latency in Large Datasets
29:39 GPU Acceleration and Use Cases
31:04 Optimizing Search Relevance and Hybrid Search
34:39 Advanced Search Features and Custom Scoring
38:43 Future Directions and Research in AI
47:11 Takeaways for Building AI Applications
63 פרקים
כל הפרקים
×1 #056 Building Solo: How One Engineer Uses AI Agents to Ship Production Code 1:12:24
1 #055 Embedding Intelligence: AI's Move to the Edge 1:05:35
1 #054 Building Frankenstein Models with Model Merging and the Future of AI 1:06:55
1 #053 AI in the Terminal: Enhancing Coding with Warp 1:04:30
1 #052 Don't Build Models, Build Systems That Build Models 59:22
1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51
1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57
1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00
1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38
1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34
1 #048 Why Your AI Agents Need Permission to Act, Not Just Read 57:02
1 #047 Architecting Information for Search, Humans, and Artificial Intelligence 57:21
1 #046 Building a Search Database From First Principles 53:28
1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:43
1 #044 Graphs Aren't Just For Specialists Anymore 1:03:34
1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:58
1 #042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:43
1 #041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:34
1 #040 Vector Database Quantization, Product, Binary, and Scalar 52:11
1 #039 Local-First Search, How to Push Search To End-Devices 53:08
1 #038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:23
1 #037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces 49:12
1 #036 How AI Can Start Teaching Itself - Synthetic Data Deep Dive 48:10
1 #035 A Search System That Learns As You Use It (Agentic RAG) 45:29
1 #034 Rethinking Search Inside Postgres, From Lexemes to BM25 47:15
1 #033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data) 51:25
1 #032 Improving Documentation Quality for RAG Systems 46:36
1 #031 BM25 As The Workhorse Of Search; Vectors Are Its Visionary Cousin 54:04
1 #030 Vector Search at Scale, Why One Size Doesn't Fit All 36:25
1 #029 Search Systems at Scale, Avoiding Local Maxima and Other Engineering Lessons 54:46
1 #028 Training Multi-Modal AI, Inside the Jina CLIP Embedding Model 49:21
1 #027 Building the database for AI, Multi-modal AI, Multi-modal Storage 44:53
1 #026 Embedding Numbers, Categories, Locations, Images, Text, and The World 46:43
1 #025 Data Models to Remove Ambiguity from AI and Search 58:39
1 #024 How ColPali is Changing Information Retrieval 54:56
1 #023 The Power of Rerankers in Modern Search 42:28
1 #022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) 46:05
1 #021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them 50:08
1 #020 The Evolution of Search, Finding Search Signals, GenAI Augmented Retrieval 52:15
1 #019 Data-driven Search Optimization, Analysing Relevance 51:13
1 #018 Query Understanding: Doing The Work Before The Query Hits The Database 53:01
1 #017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI 36:27
1 #016 Data Processing for AI, Integrating AI into Data Pipelines, Spark 46:25
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.