התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 Shuai Wang’s Journey from China to Charleston 38:30
#024 How ColPali is Changing Information Retrieval
Manage episode 442279090 series 3585930
ColPali makes us rethink how we approach document processing.
ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods.
In this episode, Jo Bergum, chief scientist at Vespa, shares his insights on how ColPali is changing the way we approach complex document formats like PDFs and HTML pages.
Introduction to ColPali:
- Combines late interaction scoring from Colbert with visual language model (PoliGemma)
- Represents screenshots of documents as multi-vector representations
- Enables searching across complex document formats (PDFs, HTML)
- Eliminates need for extensive text extraction and preprocessing
Advantages of ColPali:
- Handles messy, real-world data better than traditional methods
- Considers both textual and visual elements in documents
- Potential applications in various domains (finance, medical, legal)
- Scalable to large document collections with proper optimization
Jo Bergum:
- Vespa
- X (Twitter)
- PDF Retrieval with Vision Language Models
- Scaling ColPali to billions of PDFs with Vespa
Nicolay Gerold:
00:00 Messy Data in AI 01:19 Challenges in Search Systems 03:41 Understanding Representational Approaches 08:18 Dense vs Sparse Representations 19:49 Advanced Retrieval Models and ColPali 30:59 Exploring Image-Based AI Progress 32:25 Challenges and Innovations in OCR 33:45 Understanding ColPali and MaxSim 38:13 Scaling and Practical Applications of ColPali 44:01 Future Directions and Use Cases
61 פרקים
Manage episode 442279090 series 3585930
ColPali makes us rethink how we approach document processing.
ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods.
In this episode, Jo Bergum, chief scientist at Vespa, shares his insights on how ColPali is changing the way we approach complex document formats like PDFs and HTML pages.
Introduction to ColPali:
- Combines late interaction scoring from Colbert with visual language model (PoliGemma)
- Represents screenshots of documents as multi-vector representations
- Enables searching across complex document formats (PDFs, HTML)
- Eliminates need for extensive text extraction and preprocessing
Advantages of ColPali:
- Handles messy, real-world data better than traditional methods
- Considers both textual and visual elements in documents
- Potential applications in various domains (finance, medical, legal)
- Scalable to large document collections with proper optimization
Jo Bergum:
- Vespa
- X (Twitter)
- PDF Retrieval with Vision Language Models
- Scaling ColPali to billions of PDFs with Vespa
Nicolay Gerold:
00:00 Messy Data in AI 01:19 Challenges in Search Systems 03:41 Understanding Representational Approaches 08:18 Dense vs Sparse Representations 19:49 Advanced Retrieval Models and ColPali 30:59 Exploring Image-Based AI Progress 32:25 Challenges and Innovations in OCR 33:45 Understanding ColPali and MaxSim 38:13 Scaling and Practical Applications of ColPali 44:01 Future Directions and Use Cases
61 פרקים
כל הפרקים
×
1 Maxime Labonne on Model Merging, AI Trends, and Beyond 1:06:55

1 #053 AI in the Terminal: Enhancing Coding with Warp 1:04:30

1 #052 Don't Build Models, Build Systems That Build Models 59:22

1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51

1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57

1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00

1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38

1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34

1 #048 Why Your AI Agents Need Permission to Act, Not Just Read 57:02

1 #047 Architecting Information for Search, Humans, and Artificial Intelligence 57:21

1 #046 Building a Search Database From First Principles 53:28

1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:43

1 #044 Graphs Aren't Just For Specialists Anymore 1:03:34

1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:58
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.