Artwork

תוכן מסופק על ידי Demetrios. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Demetrios או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Player FM - אפליקציית פודקאסט
התחל במצב לא מקוון עם האפליקציה Player FM !

FrugalGPT: Better Quality and Lower Cost for LLM Applications // Lingjiao Chen // MLOps Podcast #172

1:02:58
 
שתפו
 

Manage episode 374955569 series 3241972
תוכן מסופק על ידי Demetrios. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Demetrios או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

MLOps Coffee Sessions #172 with Lingjiao Chen, FrugalGPT: Better Quality and Lower Cost for LLM Applications.

This episode is sponsored by QuantumBlack. We are now accepting talk proposals for our next LLM in Production virtual conference on October 3rd. Apply to speak here: https://go.mlops.community/NSAX1O // Abstract There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently. // Bio Lingjiao Chen is a Ph.D. candidate in the computer sciences department at Stanford University. He is broadly interested in machine learning, data management, and optimization. Working with Matei Zaharia and James Zou, he is currently exploring the fast-growing marketplaces of artificial intelligence and data. His work has been published at premier conferences and journals such as ICML, NeurIPS, SIGMOD, and PVLDB, and partially supported by a Google fellowship. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links ⁠Website: https://lchen001.github.io/ FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance paper: https://arxiv.org/abs/2305.05176 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Lingjiao on LinkedIn: Timestamps: [00:00] Lingjiao's preferred coffee [00:35] Takeaways [02:41] Sponsor Ad: Nayur Khan of QuantumBlack [05:27] Lingjiao's research at Stanford [07:51] Day-to-day research overview [10:11] Inventing data management inspired abstractions research [13:58] Agnostic Approach to Data Management [15:56] Frugal GPT [18:59] Just another data provider [19:51] Frugal GPT breakdown [26:33] First step of optimizing the prompts [28:04] Prompt overlap [29:06] Query Concatenation [32:30] Money saving [35:04] Economizing the prompts [38:52] Questions to accommodate [41:33] LLM Cascade [47:25] Frugal GPT saves cost and Improves performance [51:37] End-user implementation [52:31] Completion Cache [56:33] Using a vector store [1:00:51] Wrap up

  continue reading

431 פרקים

Artwork
iconשתפו
 
Manage episode 374955569 series 3241972
תוכן מסופק על ידי Demetrios. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Demetrios או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

MLOps Coffee Sessions #172 with Lingjiao Chen, FrugalGPT: Better Quality and Lower Cost for LLM Applications.

This episode is sponsored by QuantumBlack. We are now accepting talk proposals for our next LLM in Production virtual conference on October 3rd. Apply to speak here: https://go.mlops.community/NSAX1O // Abstract There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently. // Bio Lingjiao Chen is a Ph.D. candidate in the computer sciences department at Stanford University. He is broadly interested in machine learning, data management, and optimization. Working with Matei Zaharia and James Zou, he is currently exploring the fast-growing marketplaces of artificial intelligence and data. His work has been published at premier conferences and journals such as ICML, NeurIPS, SIGMOD, and PVLDB, and partially supported by a Google fellowship. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links ⁠Website: https://lchen001.github.io/ FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance paper: https://arxiv.org/abs/2305.05176 --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Lingjiao on LinkedIn: Timestamps: [00:00] Lingjiao's preferred coffee [00:35] Takeaways [02:41] Sponsor Ad: Nayur Khan of QuantumBlack [05:27] Lingjiao's research at Stanford [07:51] Day-to-day research overview [10:11] Inventing data management inspired abstractions research [13:58] Agnostic Approach to Data Management [15:56] Frugal GPT [18:59] Just another data provider [19:51] Frugal GPT breakdown [26:33] First step of optimizing the prompts [28:04] Prompt overlap [29:06] Query Concatenation [32:30] Money saving [35:04] Economizing the prompts [38:52] Questions to accommodate [41:33] LLM Cascade [47:25] Frugal GPT saves cost and Improves performance [51:37] End-user implementation [52:31] Completion Cache [56:33] Using a vector store [1:00:51] Wrap up

  continue reading

431 פרקים

כל הפרקים

×
 
Loading …

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

 

מדריך עזר מהיר

האזן לתוכנית הזו בזמן שאתה חוקר
הפעלה