Artwork

תוכן מסופק על ידי The Data Bros and The Firebolt Data Bros. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Bros and The Firebolt Data Bros או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Player FM - אפליקציית פודקאסט
התחל במצב לא מקוון עם האפליקציה Player FM !
icon Daily Deals

How ZoomInfo transitioned from data graveyards to ROI-driven data projects

39:46
 
שתפו
 

Manage episode 438488392 series 3418247
תוכן מסופק על ידי The Data Bros and The Firebolt Data Bros. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Bros and The Firebolt Data Bros או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.

The Data Engineering Show is handcrafted by our friends over at: fame.so
Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.
Check out our three most downloaded episodes:
  continue reading

58 פרקים

Artwork
iconשתפו
 
Manage episode 438488392 series 3418247
תוכן מסופק על ידי The Data Bros and The Firebolt Data Bros. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Bros and The Firebolt Data Bros או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.

The Data Engineering Show is handcrafted by our friends over at: fame.so
Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.
Check out our three most downloaded episodes:
  continue reading

58 פרקים

כל הפרקים

×
 
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu , founder and CEO of Rising Wave , to explore the evolution of stream processing systems and the innovations his company is bringing to the space. What you’ll learn: Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system. How Rising Wave's architecture, using S3 as primary storage, delivers second-level scalability, while other systems can take hours to scale. The competitive landscape of stream processing, with Rising Wave's Postgres compatibility providing a significant advantage in ease of use. How one major company reduced its CPU requirements from 20,000 to just 600 by switching from a traditional stream processing system to Rising Wave. The rising importance of Apache Iceberg as a destination for stream processing output, helping companies avoid vendor lock-in. How streaming systems fit into modern data stacks, especially as companies seek to avoid being locked into proprietary systems. Yingjun Wu is the founder and CEO of Rising Wave, a stream processing system built in Rust and designed with a cloud-native architecture. With a PhD focused on stream processing and database systems, Yingjun previously worked at Redshift and IBM Research before founding Rising Wave. His company has developed a system that achieves significant performance and resource efficiency advantages over traditional stream processing solutions, while maintaining Postgres compatibility for ease of use. Episode Highlights: The Origins of Rising Wave (00:30) Yingjun shares his background in stream processing from his PhD days and explains how his experience at Redshift revealed the need for better stream processing solutions, especially since many data warehouse workloads involve data ingested from streaming sources like Kinesis or Kafka. Building a System from Scratch (04:10) Yingjun describes the challenging first 2-3 years of developing Rising Wave without customers, highlighting how trust is a major barrier for new database systems. After 2.5 years, they secured their first customers, including a startup and several larger companies, which helped establish Rising Wave's credibility. The Current Stream Processing Landscape (07:47) Benjamin asks about the current stream processing space, with Yingjun positioning Rising Wave as a leader, particularly for SQL-based workloads. He highlights several key advantages of Rising Wave, including its Rust-based implementation and S3-based storage architecture. S3 as Primary Storage (10:27) Yingjun explains their decision to use S3 as primary storage from day one, despite its slowness and expense. He discusses how they've optimized for these challenges and would still make the same architectural choice today due to benefits like simplified state management and superior elastic scaling. The Business Model (13:52) Rising Wave offers open-source, cloud, and on-premise versions of its product. Yingjun notes that many highly regulated industries require on-premise deployment, including customers in the banking and aerospace sectors. Typical Users and Competitive Advantages (15:01) When asked about their typical users, Yingjun explains they directly compete with Flink but have advantages in ease of use due to Postgres compatibility. Their users are either new to stream processing or are migrating from systems like Spark Streaming or Flink due to performance issues or development complexity. Apache Iceberg Integration (19:25) Yingjun discusses how Apache Iceberg is emerging as an important destination for Rising Wave output, as companies seek to avoid vendor lock-in with proprietary data warehouses. He explains how Rising Wave typically performs ETL functions before data is sent to Iceberg tables. The Future of Data Management (32:06) The conversation concludes with a discussion about Iceberg becoming a "single source of truth" for data, with multiple specialized query engines potentially accessing the same data. Yingjun and Eldad share perspectives on how this shift away from proprietary data lock-in is changing the data ecosystem. Episode Resources: Rising Wave Website Yingjun Wu LinkedIn The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. What You’ll Learn: How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems. What the “Push-Down Permission Management” security model is and how to implement it across different data systems. How to maintain consistent governance across various query engines like Spark, Trino, and Flink. Why interoperability, flexibility and open source ecosystem are becoming an important dynamics of data infrastructure rather than performance benchmarking. How to evaluate new data tools based on their real-world adoption rather than the social media hype. If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts instructions on how to do this here [insert link]. Lisa Cao is a Product Manager at DataStrato, specializing in AI/ML product partnerships and developer relations. With deep expertise in data catalog technologies and open-source ecosystems, she plays a key role in developing Apache Gravitino, an ASF incubating project that provides a unified governance and security layer for diverse data systems. Her work in developing extensible catalog frameworks has helped organizations manage complex data environments across multiple platforms. Episode Highlights: What is Apache Gravitino? (01:24) Apache Gravitino is a meta-catalog that serves as a unified data governance and security layer used to manage different data systems. Lisa shares that Gravitino was the first to release an iceberg rest catalog and ended up open sourcing for the general community to use and as time passed, Polaris and Unity Catalog were also announced in open source. She highlights that although Gravitino, Polaris and Unity Catalog are very similar, Gravitino differs in that it is able to support multiple catalogs. Unifying AI/ML and Big Data Stack (03:15) One of the interesting things about Gravitino is that it offers more than just a catalog of data models and these model catalogs are the first step into looking at how to merge two worlds of AI and ML catalogs. Lisa shares the goal of effective management, that is, creating a system that can store and manage different types of data models, track changes to the models, and control access to the models. Simplifying Data Governance (10:49) Think of Gravitino as a “traffic cop” that helps to manage and secure data from multiple sources. It is crucial to have a system that provides unified access control across all data sources, allowing teams to manage access and data governance so that ML teams don't have to worry about access. Lisa says that Apache Gravitino is the system that makes data accessible to different teams and users while making sure that it is secure and governed appropriately. The Gravitino’s Query Engine Solution (21:34) Every query engine has its own way of managing data, which makes it difficult to switch between engines - you have to reconfigure everything. Lisa highlights that Gravitino solves the problem by providing a single layer of data governance that works across multiple query engines. Navigating the Fast-Paced World of Data Engineering (24:41) Lisa talks about how fast the data engineering space is moving and shares some insights to catching up; Don’t try to learn everything at once. Don't get too deep into every tool Look for real-world adoption She warns against the social media hype that can amplify the messaging around new tools, making it seem everyone is using it, when in reality, that can’t be easily seen. Episode Resources: Apache Gravitino website The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. Together, they: Talk about the journey of DuckDB, an open-source analytical database system designed as a universal wrangling tool. Explain how DuckDB differs from SQLite, highlighting the analytical and transactional use cases. Discuss DuckDB’s special feature and its approach to innovation including creating their Parquet Reader. Explore the simple and efficient ecosystem of DuckDB, allowing developers to add custom functionality without changing its core stability. Consider Hannes' perspective on the role of AI in databases. Delve into the system’s infrastructure, design choices and the dedication of the team to ensure a continuous, reliable database system. If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts, instructions on how to do this are [insert link]. Hannes Mühleisen is the CEO of DuckDB Labs and a Professor in The Netherlands, renowned for co-creating DuckDB, an open-source analytical database system. With a background in database architecture and research from CWI database architectures group, he has pioneered the development of DuckDB as a universal data wrangling tool that can run everywhere from phones to space satellites. Under his leadership, DuckDB has achieved remarkable success, reaching 10 million downloads monthly and becoming a go-to solution for analytical database needs. His commitment to keeping DuckDB lightweight, portable, and hardware-agnostic while maintaining high performance has revolutionized how developers approach analytical database solutions. As both an academic and technology leader, Hannes brings unique insights into database architecture, open-source development, and the future of analytical data processing. Episode Highlights: The Purpose of DuckDB (01:04) Hannes gives a full description of what DuckDB is as well as what it is designed to do. He describes the tool as one that understands SQL and is specifically designed to simplify complex analytical use cases. SQLite vs DuckDB (02:53) Hannes compares two different tools stating that SQLite is an amazing system that is not meant for analytical queries but for transactional use cases while DuckDB is specifically designed for that exact purpose - analytical use cases. The Importance of Collaboration (08:14) Hannes states the need for community collaboration as the database engine space seems to have hundreds of brilliant people trying to solve the same problems. He shares his profound admiration for a team in Munich, praising them for their exploits in implementing concepts only described in paper. The Component-Based Architecture of DuckDB (11:25) Hannes highlights a special feature in DuckDB, that is, it can be used as a component and he explains that the in-process architecture is a success because of the memory of data sharing that can be achieved. The Parquet Reader Journey (17:51) Hannes explains how he built his Parquet Reader out of necessity, although he would have preferred not to. He shares how a creator named Ove Korn from Germany donated the reader to a project named “The Arrow Project” and managed it to the degree that the entire project depended on the use of the Parquet Reader and it became an issue to use both independently. Hannes adds that a parquet reader that is competent has no choice but to become a database engine which is one of the interesting things about development. The Role of AI in Database Interaction (22:41) Hannes states that he doesn’t think that AI has a place in a database engine but rather, it is needed for optimization because the researchers who built their careers on optimization are out of jobs. He explains that the role of AI should be for assistance tasks and not for a total execution. SQL - A Defined Interface (29:20) Hannes introduces us to a tool that allows us to pro-programmatically build a query called relational API stating that it helps to simplify the tasks of a programmer. Although, Hannes agrees that using a well-defined interface is important for components like databases, he also argues that SQL can provide a relatively defined behavior within a single system. The Golden Age of Database (38:57) Hannes concludes the episode by appreciating Firebolt and other engineers for taking on core engine tasks. He shares his excitement for the golden age of databases where there is a showcasing of what is possible. Quotes: “DuckDB is a universal data wrangling tool. It is a relational data management system that speaks SQL designed to do well on analytical use cases.” “We call ourselves the SQLite for analytics because it explains the original design goal of DuckDB very well.” “Within the database engine space, we are all working to solve the same problems, and that's like, a hundred of us on the planet.” “It actually turns out in order to make a competent parquet reader, you do need query execution. There is just no way around it.” “I really like this golden age of databases we are in and personally, as somebody who really likes tables and SQL, I'm quite happy to see things like firebolt and others really working on core engine stuff.” The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this episode of The Data Engineering Show , the bros sit with Daniel Pálma, Head of Marketing at Estuary. Join them as they; Talk about Daniel’s career transition from data engineering to marketing and how his background in data engineering has been a tremendous help to his marketing competence. Discuss the role of AI in the evolution of data movement ensuring a faster and easier process of creating data pipelines. Shine light on the challenges of vector databases and structured data in AI applications. Delve into the future of Apache Iceberg and data lakehouses, highlighting their current challenges. Shares insights on the golden age of data expressing the need for more data engineers, data analysts and data practitioners in the data space. If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts, instructions on how to do this are [insert link]. Daniel Pálma serves as Head of Marketing at Estuary, bringing a unique blend of technical expertise and marketing acumen to the data integration space. With nearly a decade of experience as a data engineer across startups, enterprises, and consulting roles, Daniel made a strategic pivot to marketing to help bridge the gap between complex technical solutions and their practical applications for data practitioners. His background in data engineering enables him to deeply understand the customers' challenges and create authentic, education-focused marketing content that resonates with technical audiences. Daniel’s thought leadership and content creation in the data engineering space, combined with his hands-on technical experience, positions him as a valuable voice in conversations about the evolution of data infrastructure and integration technologies. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI to explore the interesting world of data change management. Join them as they: Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain. Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and communicate the impact of changes before they reach production. Explain why the gap between data producers and consumers need to be bridged and how Gable continues to emphasize the need for effective communication and understanding data change management across teams Shine light on how AI can enhance data management by extracting semantics from code and effectively manage the translation output. Discuss Chad’s vision for 2025 which is to help companies start to care about data and how the changes made to data affect other people. Chad Sanderson is the CEO and co-founder of Gable AI, a data change management platform. Chad has over a decade of experience in data engineering and infrastructure space, holding significant roles at major companies like Microsoft, Oracle, Sephora where he focused on data quality and governance challenges. He is a former Head of Data at Convoy, a LinkedIn writer, and a published author. He lives in Seattle, Washington, and is the Chief Operator of the Data Quality Camp. His journey from data scientist to data engineer and ultimately to CEO was driven by a desire to transform how organizations manage and utilize data. Gable AI addresses the complexities of the data supply chain, by providing tools for code scanning, data contracts and governance as code, enabling teams to proactively manage data changes and impact. If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube. Episode Resources Gable AI website Chad Sanderson on LinkedIn The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects. They discuss: From Excel to Expert: From basic Excel tasks to a full mastery of BI tools like QlikView, Wouter has blended his technical and philosophical approaches to data to become a bona fide expert. Data Strategy as Transformation: Good change management principles have to be adhered to if a BI project is going to bear fruit. Focus on leadership alignment, KPI clarity, and user empowerment instead of simply implementing software. Challenges of Starting Small: Wouter has some tips to offer smaller companies around bootstrapping their data journey using existing tools, practical education, and even Gen AI. Balancing Scales: Smaller startups compared to large enterprises face a very different set of challenges. Wouter’s combination of philosophy and pragmatism brings fresh takes to building effective data solutions. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this special roundup episode of The Data Engineering Show , the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist. Topics include: Foundations of Data Engineering : Zach Wilson emphasizes the importance of core, tech-agnostic skills in data modeling, quality assurance, and storytelling. By sharing his experiences at Airbnb and in education, he reveals that effective data engineering hinges on creating robust data models, quality controls, and persuasive narratives rather than expertise in any single tool or language. Bridging Academia and Practice: Matthew Housley and Joe Reis delve into the need for better data education, emphasizing hands-on experience and data fundamentals over tool-specific training, and advocate for apprenticeships and real-world collaborations in educational settings. Legacy Meets Modern in Data Engineering: Krishnan Viswanathan reflects on recurring themes in data engineering and the importance of adapting legacy approaches to new data needs, underscoring the challenges and benefits of vendor-built versus in-house solutions. Join the Bros for a well-rounded exploration of current themes in data engineering, filled with practical advice for data professionals at any stage of their journey. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
In this episode of The Data Engineering Show , the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting up Kafka topics, Flink jobs, and schemas. Integration with Kubernetes: The project utilizes Kubernetes to handle infrastructure tasks, treating Kubernetes as a database for managing state. This integration simplifies the orchestration of data workflows and automates routine tasks. Consumer-Driven Model: Ryanne discusses the shift from a producer-driven to a consumer-driven data model, emphasizing the importance of understanding and addressing consumer needs to reduce engineering complexity and optimize data systems. Future of Data Engineering: The conversation touches on the ongoing experimental nature of Hoptimator and its potential to transform data engineering practices, highlighting its impact on LinkedIn's data infrastructure. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production databases safely, and why SQL is here to stay. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded. If you consider yourself a hardcore engineer, this episode is for you. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
There are two types of data influencers on LinkedIn: 1. Those who talk directly about the products and companies they work for 2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode. Megan is one of those influencers that combine the two approaches, and with almost 100K followers, her content seems to be resonating with many data folks. She talked to the bros about her approach to data advocacy as well as the power of notebooks, especially when they become broader and enable collaboration. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important. Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But without data engineering skills you’ll find it hard to be cost effective and see the bigger picture. The Data Engineering Show is handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen. Check out our three most downloaded episodes: Zach Wilson on What Makes a Great Data Engineer Joe Reis and Matt Housley on The Fundamentals of Data Engineering Bill Inmon, The Godfather of Data Warehousing…
 
Loading …

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

 

icon Daily Deals
icon Daily Deals
icon Daily Deals

מדריך עזר מהיר

האזן לתוכנית הזו בזמן שאתה חוקר
הפעלה