This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
…
continue reading
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io ...
…
continue reading
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading hig ...
…
continue reading
Little Fluffy PolyClouds: The Data Engineering Playbook is your essential guide to building cloud-agnostic data infrastructure. We provide practical, step-by-step strategies for designing and deploying resilient data systems across all major platforms, including AWS, Azure, and GCP.
…
continue reading
Discussions around Data Engineering
…
continue reading
Databases and data engineering episodes of Software Engineering Daily
…
continue reading
Unlocking the Power of Data: A Guide for Leaders and Executives" As a leader or executive, you know the importance of data in driving business decisions and staying ahead of the competition. But, with the increasing amount of data generated daily, it can be overwhelming to know where to start and how to utilize this valuable asset effectively. This blog, with multiple topics, addresses the technical terminology in data engineering and analytics on the cloud.
…
continue reading
1
From Context to Semantics: How Metadata Powers Agentic AI
1:06:17
1:06:17
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:06:17Summary In this episode Suresh Srinivas and Sriharsha Chintalapani explore how metadata platforms are evolving from human-centric catalogs into the foundational context layer for AI and agentic systems. They discuss the origins and growth of OpenMetadata and Collate, why “context” is necessary but “semantics” is critical for precise AI outcomes, an…
…
continue reading
1
The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft
25:46
25:46
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
25:46In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani, Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics. What You'll Learn: How to a…
…
continue reading
1
The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff
19:30
19:30
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
19:30The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards. Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML o…
…
continue reading
1
From Data Engineering to AI Engineering: Where the Lines Blur
26:59
26:59
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
26:59Summary In this solo episode of the Data Engineering Podcast, host Tobias Macey reflects on how AI has transformed the practice and pace of data engineering over time. Starting from its origins in the Hadoop and cloud warehouse era, he explores the discipline's evolution through ML engineering and MLOps to today's blended boundaries between data, M…
…
continue reading
1
Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics
58:48
58:48
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
58:48Summary In this episode Michael Toy, co-creator of Malloy, talks about rethinking how we work with data beyond SQL. Michael shares the origins of Malloy from his and Lloyd Tabb’s experience at Looker, why SQL’s mental model often fights human problem solving, and how Malloy aims to be a composable, maintainable language that treats SQL as the assem…
…
continue reading
1
Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers
34:24
34:24
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
34:24The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight. In this episode, András Gombosi, Senior Engineering Manager of Data Infra and …
…
continue reading
1
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
1:00:57
1:00:57
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:00:57Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they n…
…
continue reading
1
How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie
23:10
23:10
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
23:10Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability. In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry. …
…
continue reading
1
60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu
19:55
19:55
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
19:55What does MLOps look like when you are deploying 60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. M…
…
continue reading
1
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
51:46
51:46
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
51:46Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and…
…
continue reading
1
Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin
21:16
21:16
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
21:16The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments. In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team lev…
…
continue reading
1
The AI Data Paradox: High Trust in Models, Low Trust in Data
51:35
51:35
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
51:35Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems,…
…
continue reading
1
How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar
23:48
23:48
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
23:48The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat. In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen gove…
…
continue reading
1
Bridging the AI–Data Gap: Collect, Curate, Serve
50:40
50:40
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
50:40Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle …
…
continue reading
1
How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov
24:28
24:28
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
24:28The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data. In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entit…
…
continue reading
1
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access
1:05:00
1:05:00
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:05:00Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vect…
…
continue reading
1
Episode 3: The Pipeline Pit Crew: Monitoring, Troubleshooting, and Optimizing Your AWS Data
12:36
12:36
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
12:36Keep your data pipelines running smoothly! This episode covers Domain 3 (22% of the DEA-C01 exam). We dive into setting up alarms with CloudWatch, troubleshooting stuck jobs with Glue Logs, optimizing performance and cost in Redshift, and ensuring data quality with AWS Glue DataBrew.על ידי James
…
continue reading
1
Episode 4: The Data Fortress: Securing and Governing Data for the DEA-C01
12:20
12:20
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
12:20Lock down your data platform! This is the final domain, Domain 4 (18% of the DEA-C01 exam). We cover essential security best practices: using IAM and Lake Formation for access control, enforcing encryption with KMS (at rest and in transit), and securing network access via VPC and Security Groups.על ידי James
…
continue reading
1
Episode 2: AWS Data Store Mastery
14:16
14:16
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
14:16Where should you put your data? We tackle Domain 2 (26% of the DEA-C01 exam) by comparing Redshift, DynamoDB, and RDS. Learn how to design optimal schemas, use the AWS Glue Data Catalog, and implement S3 Lifecycle Policies to manage data lifespan and control costs.על ידי James
…
continue reading
1
Episode 1: Mastering the AWS Data Assembly Line
18:05
18:05
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
18:05This is the essential guide to Domain 1: Data Ingestion and Transformation—the biggest section (34%) of the AWS Certified Data Engineer - Associate (DEA-C01) exam! We break down the core components of a successful data pipeline. Learn to compare Batch vs. Streaming with services like Kinesis and DMS, master ETL/ELT using AWS Glue and EMR, and orche…
…
continue reading
1
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo
29:36
29:36
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
29:36The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines. In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.…
…
continue reading
1
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
1:04:16
1:04:16
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:04:16Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and cr…
…
continue reading
1
Transforming Data Pipelines at XENA Intelligence with Naseem Shah
28:32
28:32
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
28:32The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities. In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed A…
…
continue reading
1
Context Engineering as a Discipline: Building Governed AI Analytics
51:58
51:58
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
51:58Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his journey from initial skepticism to embracing agentic AI as model and a…
…
continue reading
1
Scaling Geospatial Workflows With Airflow at Overture Maps Foundation and Wherobots with Alex Iannicelli and Daniel Smith
24:03
24:03
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
24:03Using Airflow to orchestrate geospatial data pipelines unlocks powerful efficiencies for data teams. The combination of scalable processing and visual observability streamlines workflows, reduces costs and improves iteration speed. In this episode, Alex Iannicelli, Staff Software Engineer at Overture Maps Foundation, and Daniel Smith, Senior Soluti…
…
continue reading
1
Block Bad Data Before the Write with Nike’s Ashok Singamaneni
20:20
20:20
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
20:20על ידי The Firebolt Data Bros
…
continue reading
1
The Data Model That Captures Your Business: Metric Trees Explained
1:01:05
1:01:05
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:01:05Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has…
…
continue reading
1
Scaling Airflow for Enterprise Data Platforms at PepsiCo with Kunal Bhattacharya
19:04
19:04
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
19:04PepsiCo’s data platform drives insights across finance, marketing and data science. Delivering stability, scalability and developer delight is central to its success, and engineering leadership plays a key role in making this possible. In this episode, Kunal Bhattacharya, Senior Manager of Data Platform Engineering at PepsiCo, shares how his team m…
…
continue reading
1
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
56:31
56:31
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
56:31Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting…
…
continue reading
1
Building a Unified Data Platform at Pattern with William Graham
24:09
24:09
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
24:09The orchestration of data workflows at scale requires both flexibility and security. At Pattern, decoupling scheduling from orchestration has reshaped how data teams manage large-scale pipelines. In this episode, we are joined by William Graham, Senior Data Engineer at Pattern, who explains how his team leverages Apache Airflow alongside their open…
…
continue reading
1
How Astronomer Turns Proactive Monitoring Into Customer Success with Collin McNulty
25:34
25:34
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
25:34The evolution of Airflow continues to shape data orchestration and monitoring strategies. Leveraging it beyond traditional ETL use cases opens powerful new possibilities for proactive support and internal operations. In this episode, we are joined by Collin McNulty, Sr. Director of Global Support at Astronomer, who shares insights from his journey …
…
continue reading
1
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
52:58
52:58
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
52:58Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Ma…
…
continue reading
1
Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal
21:38
21:38
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
21:38In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performanc…
…
continue reading
1
Overcoming Data Engineering Challenges at Daiichi Sankyo Europe GmbH with Evgenii Prusov
19:26
19:26
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
19:26The shift to a unified data platform is reshaping how pharmaceutical companies manage and orchestrate data. Establishing standards across regions and teams ensures scalability and efficiency in handling large-scale analytics. In this episode, Evgenii Prusov, Senior Data Platform Engineer of Daiichi Sankyo Europe GmbH, joins us to discuss building a…
…
continue reading
1
Duck Lake: Simplifying the Lakehouse Ecosystem
1:10:41
1:10:41
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
1:10:41Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like I…
…
continue reading
1
Building a Data-Driven Beauty and Wellness Marketplace at StyleSeat with Paschal Onuorah
23:05
23:05
נגן מאוחר יותר
נגן מאוחר יותר
רשימות
לייק
אהבתי
23:05StyleSeat is revolutionizing how beauty and wellness professionals grow their businesses through data-driven tools. From streamlining scheduling to optimizing marketing, their platform empowers professionals to focus on their craft while expanding their client base. In this episode, Paschal Onuorah, Senior Data Engineer at StyleSeat, shares how the…
…
continue reading