27 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 164: Foundations of Podcast Growth: Grow Your Podcast Series Pt. 1 10:41
Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal
Manage episode 329915426 series 2865115
https://go.dok.community/slack
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation
Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.
243 פרקים
Manage episode 329915426 series 2865115
https://go.dok.community/slack
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
Many ML pipelines depend on shared filesystems for input, output and intermediate data storage. Standards such as CSI have made it possible for applications in Kubernetes to access a variety of data storage systems. Yet, data scientists still have to deal with low-level details of data access in order to execute their pipelines in Kubernetes. Datashim is a framework that manages the lifecycle of a Dataset object, a CustomResourceDefinition that represents a source of data. Datashim takes care of the details of data access while Kubernetes pods can declaratively access the data by referencing a Dataset in their specifications. This talk will describe Datashim and the Dataset object, discuss its use in ML pipelines, and demonstrate how its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Datashim is an incubating project of the Linux Foundation Data and AI Foundation
Srikumar Venugopal is a Research Scientist in IBM Research Europe in Dublin, Ireland. His research interests lie in the area of cloud computing and large-scale distributed systems, specifically in the topics of middleware, resource management, and scalability. He is the co-founder and current lead for the Datashim project.
243 פרקים
כל הפרקים
×
1 Implementing Data & Databases on K8s within the Dutch Government | DoKC Town Hall 44:54

1 Unsticking Ourselves from Glue: Migrating PayIt’s Data Pipelines to Argo Workflows and Hera | DoKC Town Hall 23:17

1 Repel Boarders! How to find a Kubernetes operator that really protects your data | DoKC Town Hall 19:22

1 DoK + Apache Spark | DoKC Town Hall 19:52

1 DoK @ Comcast - Deliver Business Outcomes & Improved DevX with Data Services on K8s | DoKC Town Hall 16:43

1 DoK Talks - What is Kafka? The rise of one of the world's most used streaming data technologies // Abbey Russell 15:28

1 DoK Talks - (almost)Everything you need to know about stateful cloud native network applications // W Watson 43:39

1 The Outer Nerd #001 - Dungeons & Dragons - Why should you care? // Abhi Vaidyanatha, Fabian Met & Chase Christensen 58:25

1 DoK Talks #155 - Databases at the edge with K3s and ARM devices // Sergio Méndez 49:40

1 DoK Talks #154 - StatefulSets in K8 // Srinivas Karnati 31:55

1 Data-driven Diversity, Equity, and Inclusion // Lisa-Marie Namphy, Melissa Logan, Tiffany Jachja, Audra Montenegro & Cortney Nickerson (DoK Day North America 2022) 19:50

1 Formula 1 telemetry processing using Apache Kafka on Kubernetes // Paolo Patierno (DoK Day North America 2022) 15:36

1 Choosing Kubernetes for Stateful Applications // Akshay Ram & Peter Schuurman (DoK Day North America 2022) 18:31

1 Kubernetes 360º - Data driven observability - from Secrets to logs // Ben Hirschberg (DoK Day North America 2022) 17:11

1 Shifting Left Stateful Applications In Kubernetes // Viktor Farcic (DoK Day North America 2022) 15:52

1 Medical - Healthcare Data on Kubernetes // Olyvia Rakshit & Prasad Dorbala (DoK Day North America 2022) 13:41

1 Highly Available Postgres Clusters In Kubernetes // John Long & Jonathan Gonzalez (DoK Day North America 2022) 15:04

1 Inter-Cluster PostreSQL on Kubernetes // Julian Fischer (DoK Day North America 2022) 17:07

1 Open Source Databases on Kubernetes- Best Practices // Peter Zaitsev (DoK Day North America 2022) 16:04

1 The Kubernetes Native Database // Jeffrey Carpenter (DoK Day North America 2022) 16:26

1 Databases on Kubernetes: Why are they important? // With Bhavin Shah, Xing Yang, Gabriele Bartolini & Patrick McFadin (DoK Day North America 2022) 34:51

1 Data streaming on Kubernetes // Yaniv Ben Hemo (DoK Day North America 2022) 13:51

1 Architecting Your First Event Driven Serverless Streaming Applications on K8 // Timothy Spann (DoK Day North America 2022) 13:29

1 Fybrik - A Kubernetes based platform for governed data use // Flora Gilboa-Solomon, Alexey Roytman, Maryna Strelchuk & Barry Hijkoop (DoK Day North America 2022) 20:59

1 The Challenges of Data Processing On Kubernetes - A look at Spark, Flink, Dask, and Ray // Holden Karau (DoK Day North America 2022) 20:09

1 Scaling our SaaS offering to thousands of clusters // Dax McDonald (DoK Day North America 2022) 21:04

1 Why we decided to migrate our Jaeger storage to ClickHouse on Kubernetes // Arul Jegadish Francis (DoK Day North America 2022) 13:48

1 Building a Digital Factory for the Sheet Metal Industry // Elie Assi (From the DoK Day North America 2022) 20:48

1 How we built our Big Data Stack (almost) entirely on top of Kubernetes // Neylson Crepalde (From DoK Day NA 2022) 16:00

1 Dok Talks #153 - CRD Panel // Eyar Zilberman & Álvaro Hernández 58:05

1 Dok #152-Running PostgreSQL in Kubernetes:from day 0 to day 2 with CloudNativePG // Gabriele Bartolini 1:03:50

1 Dok Talks #148 - Cost and Kubernetes // Chris Love 45:25

1 Dok Talks #151 - Analytics with Apache Superset and ClickHouse // Vijay Anand Ramakrishnan 33:00

1 Dok Talks #150 - Building a Simple Postgres Async Streaming Cluster // Julian Fischer 1:04:45

1 DoK Talks #149 - Overcoming challenges with protecting and migrating data in multi-cloud K8s environments // Sebastian Glab & Martin Phan 47:40

1 DoK Talks #147 - Evaluating Cloud Native Storage Vendors // Dinesh Majrekar 1:00:03

1 Dok Talks #146 - OpenFeature - Making feature flags a commodity // Oleg Nenashev 1:01:30

1 DoK Talks #145 - Making Hard Things Easy is Hard // Kurt Rinehart 57:40

1 DoK Talks #144 - We will Dok You! - The journey to adopt stateful workloads on k8s // Guy Menahem 1:06:30

1 DoK Talks #142 - Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your Stateful Workload // Peter Schuurman 58:45

1 DoK Talks #144 - Mastering MongoDB on Kubernetes, the power of operators // Arek Borucki 1:00:50

1 DoK Specials - Why are Operators paramount to running stateful workloads on Kubernetes? 53:36

1 DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella 1:00:10

1 DoK Talks #140 - Data protection of stateful environment // Timothy Dewin 42:35

1 DoK Talks #139 - Private DBaaS on Kubernetes // Sergey Pronin 53:25

1 DoK Talks #138 - Build your own social media analytics with Apache Kafka // Jakub Scholz 56:25

1 DoK Talks #137 - How to build your own “Doordash” app // Yaniv Ben Hemo 57:50

1 DoK Talks #136 - Building a mesh for databases from scratch and why // Maxwell Miao 47:45

1 DoK Specials - Learn by doing in the DoK Community // Bart Farrell 15:55

1 DoK Talks #135 - DoK isn't just Database on Kubernetes // Patrick McFadin 46:00

1 DoK Talks #134 - Introducing CloudNativePG // Gabriele Bartolini & Leonardo Cecchi 1:05:20

1 Dok Talks #133 - My First 90 days with Clickhouse // Alkin Tezuysal 47:20

1 DoK Specials - DEI Panel - We can do better 57:55

1 DoK Talks #132 - Time-series on SQL Server on Kubernetes on ARM64… without SQL Server! // Álvaro Hernández 1:05:15


1 Why run Postgres in Kubernetes (DoK Day EU 2022) // Gabriele Bartolini 10:02

1 What we've learned from running a PostgreSQL managed service on Kubernetes (DoK Day EU 2022) // Oleksii Kliukin 11:06

1 Weathering The Cloud Storm- Modern Data Management Patterns for Reliability and Availability (DoK Day EU 2022) // Denis Magda 10:46

1 Using Kubernetes to deliver a “serverless” service (DoK Day EU 2022) // Jim Walker 20:21

1 The many uses of Kubernetes cross cluster migration of persistent data (DoK Day EU 2022) // Ryan Kaw 7:39

1 The future of data on Kubernetes with Adobe and CNCF (DoK Day EU 2022) // Joseph Sandoval, Xing Yang & Sylvain Kalache 17:29

1 The Data on Kubernetes Landscape (DoK Day EU 2022) // Melissa Logan & Sylvain Kalache 10:25

1 Testing the Mettle- Evaluating data solutions for large-scale production to check who stacks up (DoK Day EU 2022) // Dinesh Majrekar 9:26

1 Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann 8:43

1 Running Kafka on Kubernetes, across three clouds at Adobe (DoK Day EU 2022) // Adi Muraru 16:48

1 Running a database on local NVMes on Kubernetes (DoK Day EU 2022) // Tomáš Nožička & Maciej Zimnoch 9:42

1 PV TrashCan - Protection against accidental deletion of PVs or Namespaces (DoK Day EU 2022) // Veda Talakad, Aditya Kulkarni & Aditya Dani 11:07

1 Protecting data with CSI Volume Snapshots on Kubernetes (DoK Day EU 2022) // Grant Griffiths 11:10

1 Operator Lifecycle Management (DoK Day EU 2022) // Julian Fischer 15:21



1 Microservices and Kubernetes for your Full Data Lifecycle (DoK Day EU 2022) // Steve Pousty 14:26

1 Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Developers (DoK Day EU 2022) // Arsh Sharma, Lapo Elisacci & Ramiro Berrelleza 14:02

1 Kanister & Kopia - An Open-Source Data Protection Match Made in Heaven (DoK Day EU 2022) // Pavan Navarathna 13:38

1 Is your database in Kubernetes production ready (DoK Day EU 2022) // Mykola Marzhan 15:21


1 Growing up fast - Kubernetes and Real-Time Analytic Applications (DoK Day EU 2022) // Robert Hodges 15:30

1 Graph in Kubernetes Panel (DoK Day EU 2022) // Wey Gu, Cheukting Ho & Feynman Zhou 20:15

1 From Laptop to Cloud. Developing Cloud-Native Applications with Containerized Databases (DoK Day EU 2022) - Nic Vermandé 17:16

1 Disaggregated Container Attached Storage - Yet Another Topology with What Purpose (DoK Day EU 2022) // Nick Connolly 9:32

1 Datashim - a framework for declarative management of datasets on Kubernetes (DoK Day EU 2022) // Srikumar Venugopal 15:36

1 Build your own social media analytics with Apache Kafka (DoK Day EU 2022) // Jakub Scholz 10:22


1 Autoscaling Stateful Workloads in Kubernetes (DoK Day EU 2022) // Mohammad Fahim Abrar & Md. Kamol Hasan 10:14


1 Dok Talks #131 - How to win friends and influence businesses // Fabian Met 1:00:48

1 Dok Talks #130- Leaning on Kubernetes Portability to Manage Databases Anywhere // Robert Hodges 1:04:45
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.