התחל במצב לא מקוון עם האפליקציה Player FM !
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes
Manage episode 355037190 series 3446693
We discuss the Information Retrieval publication "The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes" by Nils Reimers and Iryna Gurevych, which explores how Dense Passage Retrieval performance degrades as the index size varies and how it compares to traditional sparse or keyword-based methods.
Timestamps:
00:00 Co-host introduction
00:26 Paper introduction
02:18 Dense vs. Sparse retrieval
05:46 Theoretical analysis of false positives(1)
08:17 What is low vs. high dimensional representations
11:49 Theoretical analysis o false positives (2)
20:10 First results: growing the MS-Marco index
28:35 Adding random strings to the index
39:17 Discussion, takeaways
44:26 Will dense retrieval replace or coexist with sparse methods?
50:50 Sparse, Dense and Attentional Representations for Text Retrieval
Referenced work:
Sparse, Dense and Attentional Representations for Text Retrieval by Yi Luan et al. 2020.
21 פרקים
Manage episode 355037190 series 3446693
We discuss the Information Retrieval publication "The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes" by Nils Reimers and Iryna Gurevych, which explores how Dense Passage Retrieval performance degrades as the index size varies and how it compares to traditional sparse or keyword-based methods.
Timestamps:
00:00 Co-host introduction
00:26 Paper introduction
02:18 Dense vs. Sparse retrieval
05:46 Theoretical analysis of false positives(1)
08:17 What is low vs. high dimensional representations
11:49 Theoretical analysis o false positives (2)
20:10 First results: growing the MS-Marco index
28:35 Adding random strings to the index
39:17 Discussion, takeaways
44:26 Will dense retrieval replace or coexist with sparse methods?
50:50 Sparse, Dense and Attentional Representations for Text Retrieval
Referenced work:
Sparse, Dense and Attentional Representations for Text Retrieval by Yi Luan et al. 2020.
21 פרקים
כל הפרקים
×
1 AGI vs ASI: The future of AI-supported decision making with Louis Rosenberg 54:42

1 EXAONE 3.0: An Expert AI for Everyone (with Hyeongu Yun) 24:57

1 Zeta-Alpha-E5-Mistral: Finetuning LLMs for Retrieval (with Arthur Câmara) 19:35

1 ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse) 34:48

1 Using LLMs in Information Retrieval (w/ Ronak Pradeep) 22:15

1 Designing Reliable AI Systems with DSPy (w/ Omar Khattab) 59:57

1 The Power of Noise (w/ Florin Cuconasu) 11:45

1 Benchmarking IR Models (w/ Nandan Thakur) 21:55

1 Baking the Future of Information Retrieval Models 27:05

1 Hacking JIT Assembly to Build Exascale AI Infrastructure 38:04

1 The Promise of Language Models for Search: Generative Information Retrieval 1:07:31

1 Task-aware Retrieval with Instructions 1:11:13

1 Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee 1:16:14

1 ColBERT + ColBERTv2: late interaction at a reasonable inference cost 57:30

1 Evaluating Extrapolation Performance of Dense Retrieval: How does DR compare to cross encoders when it comes to generalization? 58:30

1 Open Pre-Trained Transformer Language Models (OPT): What does it take to train GPT-3? 47:12

1 Few-Shot Conversational Dense Retrieval (ConvDR) w/ special guest Antonios Krasakis 1:23:11

1 Transformer Memory as a Differentiable Search Index: memorizing thousands of random doc ids works!? 1:01:40

1 Learning to Retrieve Passages without Supervision: finally unsupervised Neural IR? 59:10

1 The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes 54:13

1 Shallow Pooling for Sparse Labels: the shortcomings of MS MARCO 1:07:17
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.