56 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
Serving ML Models at a High Scale with Low Latency // Manoj Agarwal // MLOps Meetup #48
Manage episode 313294475 series 3241972
MLOps community meetup #48! Last Wednesday, we talked to Manoj Agarwal, Software Architect at Salesforce.
// Abstract:
Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often < 100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with; Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.
// Takeaways:
This talk explains Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.
// Bio:
Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.
//Relevant Links
https://engineering.salesforce.com/flow-scheduling-for-the-einstein-ml-platform-b11ec4f74f97
https://engineering.salesforce.com/ml-lake-building-salesforces-data-platform-for-machine-learning-228c30e21f16
----------- Connect With Us ✌️-------------
Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Manoj on LinkedIn: https://www.linkedin.com/in/agarwalmk/
Timestamps:
[00:00] Happy birthday Manoj!
[00:41] Salesforce blog post about Einstein and ML Infrastructure
[02:55] Intro to Serving Large Number of Models with Low Latency
[03:34] Manoj' background
[04:22] Machine Learning Engineering: 99% engineering + 1% machine learning - Alexey Gregorev on Twitter
[04:37] Salesforce Einstein
[06:42] Machine Learning: Big Picture
[07:05] Feature Engineering [07:30] Model Training
[08:53] Model Serving Requirements
[13:01] Do you standardize on how models are packaged in order to be served and if so, what standards Salesforce require and enforce from model packaging?
[14:29] Support Multiple Frameworks
[16:16] Is it easy to just throw a software library in there?
[27:06] Along with that metadata, can you breakdown how that goes?
[28:27] Low Latency
[32:30] Model Sharding with Replication
[33:58] What would you do to speed up transformation code run before scoring?
[35:55] Model Serving Scaling
[37:06] Noisy Neighbor: Shuffle Sharding
[39:29] If all the Salesforce Models can be categorized into different model type, based on what they provide, what would be some of the big categories be and what's the biggest?
[46:27] Retraining of the Model: Does that deal with your team or is that distributed out and your team deals mainly this kind of engineering and then another team deal with more machine learning concepts of it?
[50:13] How do you ensure different models created by different teams for data scientists expose the same data in order to be analyzed?
[52:08] Are you using Kubernetes or is it another registration engine? [53:03] How is it ensured that different models expose the same information?
445 פרקים
Manage episode 313294475 series 3241972
MLOps community meetup #48! Last Wednesday, we talked to Manoj Agarwal, Software Architect at Salesforce.
// Abstract:
Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often < 100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with; Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.
// Takeaways:
This talk explains Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.
// Bio:
Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.
//Relevant Links
https://engineering.salesforce.com/flow-scheduling-for-the-einstein-ml-platform-b11ec4f74f97
https://engineering.salesforce.com/ml-lake-building-salesforces-data-platform-for-machine-learning-228c30e21f16
----------- Connect With Us ✌️-------------
Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Manoj on LinkedIn: https://www.linkedin.com/in/agarwalmk/
Timestamps:
[00:00] Happy birthday Manoj!
[00:41] Salesforce blog post about Einstein and ML Infrastructure
[02:55] Intro to Serving Large Number of Models with Low Latency
[03:34] Manoj' background
[04:22] Machine Learning Engineering: 99% engineering + 1% machine learning - Alexey Gregorev on Twitter
[04:37] Salesforce Einstein
[06:42] Machine Learning: Big Picture
[07:05] Feature Engineering [07:30] Model Training
[08:53] Model Serving Requirements
[13:01] Do you standardize on how models are packaged in order to be served and if so, what standards Salesforce require and enforce from model packaging?
[14:29] Support Multiple Frameworks
[16:16] Is it easy to just throw a software library in there?
[27:06] Along with that metadata, can you breakdown how that goes?
[28:27] Low Latency
[32:30] Model Sharding with Replication
[33:58] What would you do to speed up transformation code run before scoring?
[35:55] Model Serving Scaling
[37:06] Noisy Neighbor: Shuffle Sharding
[39:29] If all the Salesforce Models can be categorized into different model type, based on what they provide, what would be some of the big categories be and what's the biggest?
[46:27] Retraining of the Model: Does that deal with your team or is that distributed out and your team deals mainly this kind of engineering and then another team deal with more machine learning concepts of it?
[50:13] How do you ensure different models created by different teams for data scientists expose the same data in order to be analyzed?
[52:08] Are you using Kubernetes or is it another registration engine? [53:03] How is it ensured that different models expose the same information?
445 פרקים
כל הפרקים
×
1 Bridging the Gap Between AI and Business Data // Deepti Srivastava // #325 57:13

1 The Creator of FastAPI’s Next Chapter // Sebastián Ramírez // #324 1:09:37

1 Everything Hard About Building AI Agents Today 47:02

1 Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318 54:01

1 Packaging MLOps Tech Neatly for Engineers and Non-engineers // Jukka Remes // #322 55:30

1 Hard Learned Lessons from Over a Decade in AI 48:42

1 Product Metrics are LLM Evals // Raza Habib CEO of Humanloop // #320 53:06

1 Getting AI Apps Past the Demo // Vaibhav Gupta // #319 50:29

1 Building Out GPU Clouds // Mohan Atreya // #317 47:57

1 A Candid Conversation Around MCP and A2A // Rahul Parundekar and Sam Partee // #316 SF Live 1:04:42

1 AI in M&A: Building, Buying, and the Future of Dealmaking // Kison Patel // #315 55:32

1 AI, Marketing, and Human Decision Making // Fausto Albers // #313 49:40

1 MLOps with Databricks // Maria Vechtomova // #314 52:43

1 Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312 1:01:37

1 Behavior Modeling, Secondary AI Effects, Bias Reduction & Synthetic Data // Devansh Devansh // #311 1:01:35
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.