Player FM - Internet Radio Done Right
33 subscribers
Checked 6d ago
הוסף לפני eight שנים
תוכן מסופק על ידי The Data Flowcast. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Flowcast או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Player FM - אפליקציית פודקאסט
התחל במצב לא מקוון עם האפליקציה Player FM !
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות
<
<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/netflix-sports-club-podcast">Netflix Sports Club Podcast</a></span>


Join host Kay Adams for an all-access deep dive into the latest Netflix Sports series and events. Tune in for exclusive interviews from top athletes, coaches, and correspondents as they share insider perspectives, behind-the-scenes stories, and expert analysis. Catch new episodes every other Friday! Watch the episodes on Spotify, Tudum, or the Netflix Sports YouTube Channel. Audio episodes are available to listen to wherever you get your podcasts.
GDPR, Self-Service Data, and Infrastructure Automation with Typeform
Manage episode 277847512 series 2053958
תוכן מסופק על ידי The Data Flowcast. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Flowcast או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Welcome back to the Airflow Podcast. This week, we met up with Albert Franzi and Carlos Escura from Typeform. Typeform is a tool that allows you to build beautiful interactive forms that you can use for a wide variety of use cases, including customer surveys, employee engagement, product feedback, and market research to name a few. In our conversation, we discussed Airflow as a tool for GDPR compliance, the concept of self-service data and how it allows your data operations team to function as a data platform team, and some of the more specialized infrastructure tooling that the Typeform team has built out to support their internal teams. For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at pete@astronomer.io if you're passionate about what we're doing and think you'd be a good addition to the team. Mentioned Resources: Dag Factory: https://github.com/ajbosco/dag-factory Astronomer Careers: https://careers.astronomer.io Guest Profiles: Albert Franzi: https://www.linkedin.com/in/albertfranzi/?originalSubdomain=es Carlos Escura: https://www.linkedin.com/in/carlosescura/en-us/
…
continue reading
68 פרקים
GDPR, Self-Service Data, and Infrastructure Automation with Typeform
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Manage episode 277847512 series 2053958
תוכן מסופק על ידי The Data Flowcast. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי The Data Flowcast או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.
Welcome back to the Airflow Podcast. This week, we met up with Albert Franzi and Carlos Escura from Typeform. Typeform is a tool that allows you to build beautiful interactive forms that you can use for a wide variety of use cases, including customer surveys, employee engagement, product feedback, and market research to name a few. In our conversation, we discussed Airflow as a tool for GDPR compliance, the concept of self-service data and how it allows your data operations team to function as a data platform team, and some of the more specialized infrastructure tooling that the Typeform team has built out to support their internal teams. For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at pete@astronomer.io if you're passionate about what we're doing and think you'd be a good addition to the team. Mentioned Resources: Dag Factory: https://github.com/ajbosco/dag-factory Astronomer Careers: https://careers.astronomer.io Guest Profiles: Albert Franzi: https://www.linkedin.com/in/albertfranzi/?originalSubdomain=es Carlos Escura: https://www.linkedin.com/in/carlosescura/en-us/
…
continue reading
68 פרקים
Semua episod
×T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Building the Future of Airflow Execution at Astronomer with Ian Buss and Piotr Chomiak 22:25
22:25
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי22:25
The evolution of orchestration in Airflow continues with innovations that address both scalability and security. From improving executor reliability to enabling remote execution, these advancements reshape how organizations manage data pipelines. In this episode, we’re joined by Ian Buss , Principal Software Engineer at Astronomer, and Piotr Chomiak , Principal Product Manager at Astronomer , who share insights into the Astro Executor and remote execution. Key Takeaways: 00:00 Introduction. 04:13 How product leadership drives scalability for enterprise needs. 08:23 Architectural changes that improve reliability and remove bottlenecks. 10:15 Metrics that enhance visibility into system performance. 12:54 The role of remote execution in addressing security requirements. 15:56 Differences between open-source solutions and managed offerings. 19:04 Broad industry adoption and applicability of remote execution. 20:39 Future advancements in language support and multi-tenancy. Resources Mentioned: Ian Buss https://www.linkedin.com/in/ian-buss/ Piotr Chomiak https://www.linkedin.com/in/piotr-chomiak-b1955624/ Astronomer | Website https://www.astronomer.io Apache Airflow https://airflow.apache.org/ Airflow Slack Community https://airflow.apache.org/community/ Beyond Analytics conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille 24:17
24:17
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי24:17
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable. In this episode, Sébastien Crocquevieille , Data Engineer at Numberly , unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features. Key Takeaways: 00:00 Introduction. 02:13 Overview of the company’s operations and global presence. 04:00 The tech stack and structure of the data engineering team. 04:24 Running nearly 2,000 DAGs in production using Airflow. 05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot. 07:05 Details on the Kubernetes-based Airflow setup using Helm charts. 09:31 Transition from GitSync to NFS for DAG syncing due to performance issues. 14:11 Making every team member Airflow-literate through local installation. 17:56 Using custom libraries and plugins to extend Airflow functionality. Resources Mentioned: Sébastien Crocquevieille https://www.linkedin.com/in/scroc/ Numberly | LinkedIn https://www.linkedin.com/company/numberly/ Numberly | Website https://numberly.com/ Apache Airflow https://airflow.apache.org/ Grafana https://grafana.com/ Apache Kafka https://kafka.apache.org/ Helm Chart for Apache Airflow https://airflow.apache.org/docs/helm-chart/stable/index.html Kubernetes https://kubernetes.io/ GitLab https://about.gitlab.com/ KubernetesPodOperator – Airflow https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 How Moniepoint Group Uses Airflow for Exposure Monitoring with Adeolu Adegboye 21:32
21:32
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי21:32
Managing financial data at scale requires precise orchestration and proactive monitoring to maintain operational efficiency. In this episode, we are joined by Adeolu Adegboye , Data Engineer at Moniepoint Group , who shares how his team uses data pipelines and workflow automation to manage high volumes of transactions, ensure timely alerts and support diverse stakeholders across the business. Key Takeaways: (00:00) Introduction. (02:48) The role of data engineering in supporting all business operations. (04:17) Leveraging workflow orchestration to manage daily processes. (05:20) Proactively monitoring for anomalies to prevent potential issues. (08:12) Simplifying complex insights for non-technical teams. (13:01) Improving efficiency through dynamic and parallel workflows. (14:19) Optimizing system performance to handle large-scale operations. (17:19) Exploring creative and innovative uses for workflow automation. Resources Mentioned: Adeolu Adegboye https://www.linkedin.com/in/adeolu-adegboye/ Moniepoint Group | LinkedIn https://www.linkedin.com/company/moniepoint-inc/ Moniepoint Group | Website https://www.moniepoint.com Apache Airflow https://airflow.apache.org/ ClickHouse https://clickhouse.com/ Grafana https://grafana.com/ Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Inside Bosch’s Airflow 3 Revolution: Remote Execution with Jens Scheffler 28:02
28:02
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי28:02
The evolution of Airflow has reached a milestone with the introduction of remote execution in Airflow 3, enabling flexible orchestration across distributed environments. In this episode, Jens Scheffler , Test Execution Cluster Technical Architect at Bosch , shares insights on how his team’s need for large-scale, cross-environment testing influenced the development of the Edge Executor and shaped this major release. Key Takeaways: (02:39) The role of remote execution in supporting large-scale testing needs. (04:44) How community support contributed to the Edge Executor’s development. (08:41) Navigating network and infrastructure limitations within secure environments. (13:25) Transitioning from database-heavy processes to an API-driven model. (14:16) How the new task SDK in Airflow 3 improves distributed task execution. (16:54) What is required to set up and configure the Edge Executor. (19:36) Managing multiple queues to optimize tasks across different environments. (23:30) Examples of extreme distance use cases for edge execution. Resources Mentioned: Jens Scheffler https://www.linkedin.com/in/jens-scheffler/ Bosch | LinkedIn https://www.linkedin.com/company/bosch/ Bosch | Website https://www.bosch.com/ Apache Airflow https://airflow.apache.org/ Edge Executor (Edge3 Provider Package) https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html Astronomer’s Astro Executor https://www.astronomer.io/docs/astro/astro-executor/ Beyond Analytics Conference https://astronomer.io/beyond/dataflowcast Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Inside Modern Data Infrastructure at Massdriver with Cory O’Daniel and Jake Ferriero 31:24
31:24
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי31:24
Managing modern data platforms means navigating a web of complex infrastructure, competing team needs and evolving security standards. For data teams to truly thrive, infrastructure must become both accessible and compliant without sacrificing velocity or reliability. In this episode, we’re joined by Cory O’Daniel , CEO and Co-Founder at Massdriver , and Jacob Ferriero , Senior Software Engineer at Astronomer , to unpack what it takes to make data platform engineering scalable, sustainable and secure. They share lessons from years of experience working with DevOps, ML teams and platform engineers and discuss how Airflow fits into the orchestration layer of today’s data stacks. Key Takeaways: (03:27) Making infrastructure accessible without deep ops knowledge. (07:23) Distinct personas and responsibilities across data teams. (09:53) Infrastructure hurdles specific to ML workloads. (11:13) Compliance and governance shaping platform design. (13:27) Tooling mismatches between teams cause friction. (15:13) Airflow’s orchestration role within broader system architecture. (22:10) Creating reusable infrastructure patterns for consistency. (24:13) Enabling secure access without slowing down development. (26:55) Opportunities to improve Airflow with event-driven and reliability tooling. Resources Mentioned: Cory O’Daniel https://www.linkedin.com/in/coryodaniel/ Massdriver | LinkedIn https://www.linkedin.com/company/massdriver/ Massdriver | Website https://www.massdriver.cloud/ Jacob Ferriero https://www.linkedin.com/in/jacob-ferriero/ Astronomer https://www.linkedin.com/company/astronomer/ Apache Airflow https://airflow.apache.org/ Prequel https://www.prequel.co/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 The Future of Airflow Telemetry with Bolke de Bruin 21:55
21:55
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי21:55
Telemetry has the potential to guide the future of Airflow, but only if it’s implemented transparently and with community trust. In this episode, we’re joined by Bolke de Bruin , Director at Metyis and a long-time Airflow PMC member. Bolke discusses how telemetry has been handled in the past, why it matters now and what it will take to get it right. Key Takeaways: (03:20) The role of foundations in establishing credibility and sustainability. (04:52) Why data collection is critical to open-source project direction. (07:24) Lessons learned from previous approaches to user data collection. (10:23) The current state of telemetry in the project. (10:53) Community trust as a prerequisite for technical implementation. (12:54) The importance of managing sensitive data within trusted ecosystems. (16:37) Ethical considerations in balancing participation and access. (18:45) Forward-looking ideas for improving workflow design and usability. Resources Mentioned: Bolke de Bruin https://www.linkedin.com/in/bolke/ Metyis | LinkedIn https://www.linkedin.com/company/metyis/ Metyis | Website http://www.metyis.com Apache Airflow https://airflow.apache.org/ Airflow Summit https://airflowsummit.org/ Airflow Dev List https://lists.apache.org/list.html?dev@airflow.apache.org https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Transforming the Airflow UI for Cloudera’s Users with Shubham Raj 22:28
22:28
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי22:28
Contributing to open-source projects can be daunting, but it can also unlock unexpected innovation. This episode showcases how one engineer’s journey with Apache Airflow led to impactful UI enhancements and infrastructure solutions at scale. Shubham Raj , Software Engineer II at Cloudera , shares how his team built a drag-and-drop DAG editor for non-coders, contributions which helped shape the Airflow 3.0 Ul and introduced features like external XCom control and bulk APls. Key Takeaways: (02:30) Day-to-day responsibilities building platforms that simplify orchestration. (05:27) Factors that make onboarding into large open-source projects accessible. (07:35) The value of improved user interfaces for task state visibility and control. (09:49) Enabling faster debugging by exposing internal data through APIs. (13:00) Balancing frontend design goals with backend functionality. (14:19) Creating workflow editors that lower the barrier to entry. (16:54) Supporting a variety of task types within a visual DAG builder. (19:32) Common infrastructure challenges faced by orchestration users. (20:37) Addressing dependency management across distributed environments. Resources Mentioned: Shubham Raj https://www.linkedin.com/in/shubhamrajofficial/ Cloudera | LinkedIn https://www.linkedin.com/company/cloudera/ Cloudera | Website https://www.cloudera.com/ Apache Airflow https://airflow.apache.org/ 2023 Airflow Summit https://airflowsummit.org/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Streamlining Thousands of Data Pipelines at Lyft with Yunhao Qing 19:34
19:34
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי19:34
Managing data pipelines at scale is not just a technical challenge. It is also an organizational one. At Lyft, success means empowering dozens of teams to build with autonomy while enforcing governance and best practices across thousands of workflows. In this episode, we speak with Yunhao Qing , Software Engineer at Lyft , about building a governed data-engineering platform powered by Airflow that balances flexibility, standardization and scale. Key Takeaways: (03:17) Supporting internal teams with a centralized orchestration platform. (04:54) Migrating to a managed service to reduce infrastructure overhead. (06:04) Embedding platform-level governance into custom components. (08:02) Consolidating and regulating the creation of custom code. (09:48) Identifying and correcting inefficient workflow patterns. (11:17) Replacing manual workarounds with native platform features. (14:32) Preparing teams for major version upgrades. (16:03) Leveraging asset-based scheduling for smarter triggers. (18:13) Envisioning GenAI and semantic search for future productivity. Resources Mentioned: Yunhao Qing https://www.linkedin.com/in/yunhao-qing Lyft | LinkedIn https://www.linkedin.com/company/lyft/ Lyft | Website https://www.lyft.com/ Apache Airflow https://airflow.apache.org/ Astronomer https://www.astronomer.io/ Kubernetes https://kubernetes.io/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Transforming Customer Education in Data Engineering at Astronomer with Marc Lamberti 22:19
22:19
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי22:19
Understanding the complexities of Apache Airflow can be daunting for newcomers and seasoned data engineers. But with the right guidance, mastering the tool becomes an achievable milestone. In this episode, Marc Lamberti , Head of Customer Education at Astronomer , joins us to share his journey from Udemy instructor to driving education at Astronomer, and how he's helping over 100,000 learners demystify Airflow. Key Takeaways: (02:36) Early exposure to Airflow while addressing inefficiencies in data workflows. (04:10) Common barriers to implementing open source tools in enterprise settings. (06:18) The shift from part-time teaching to a full-time focus on Airflow education. (07:53) A modular, guided approach to structuring educational content. (09:57) The value of highlighting underused Airflow features for broader adoption. (12:35) Certifications as a method to assess readiness and uncover knowledge gaps. (13:25) Coverage of essential Airflow concepts in the Fundamentals exam. (16:07) The DAG Authoring exam’s emphasis on practical, advanced features. (20:08) A call for more visible integration of Airflow with AI workflows. Resources Mentioned: Marc Lamberti https://www.linkedin.com/in/marclamberti/ Astronomer | LinkedIn https://www.linkedin.com/company/astronomer/ Astronomer Academy https://academy.astronomer.io/ Airflow Fundamentals Certification https://www.astronomer.io/certification/ DAG Authoring Certification https://academy.astronomer.io/plan/astronomer-certification-dag-authoring-for-apache-airflow-exam The Complete Hands-On Introduction to Airflow https://www.udemy.com/course/the-complete-hands-on-course-to-master-apache-airflow/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_Beta_Prof_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Beta&utm_content=deal4584&utm_term=_._ag_162511579404_._ad_696197165418_._kw__._de_c_._dm__._pl__._ti_dsa-1677053911088_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21168154305&gbraid=0AAAAADROdO3MpljfP-gssiYSmDEPdhZV9&gclid=Cj0KCQjw097CBhDIARIsAJ3-nxdjZA6G5-Y0-akk6Huksy2PLb04t92J4iNfUSIbMdrSAla_tb-o2N8aArOeEALw_wcB&couponCode=PMNVD3025 https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Embracing Data Mesh and SQL Sensors for Scalable Workflows at lastminute.com with Alberto Crespi 30:09
30:09
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי30:09
The flexibility of Airflow plays a pivotal role in enabling decentralized data architectures and empowering cross-functional teams. In this episode, we speak with Alberto Crespi , Data Architect at lastminute.com , who shares how his team scales Airflow across 12 teams while supporting both vertical and horizontal structures under a data mesh approach. Key Takeaways: (02:17) Defining responsibilities within data architecture teams. (04:15) Consolidating multiple orchestrators into a single solution. (07:00) Scaling Airflow environments with shared infrastructure and DevOps practices. (10:59) Managing dependencies and readiness using SQL sensors. (14:23) Enhancing visibility and response through Slack-integrated monitoring. (19:28) Extending Airflow’s flexibility to run legacy systems. (22:28) Integrating transformation tools into orchestrated pipelines. (25:54) Enabling non-engineers to contribute to pipeline development. (27:33) Fostering adoption through collaboration and communication. Resources Mentioned: Alberto Crespi https://www.linkedin.com/in/crespialberto/ lastminute.com | Website https://lastminute.com Apache Airflow https://airflow.apache.org/ dbt Labs https://www.getdbt.com/ Astronomer Cosmos https://github.com/astronomer/astronomer-cosmos GitLab Slack https://slack.com/ Kubernetes https://kubernetes.io/ Confluence https://www.atlassian.com/software/confluence Slack https://slack.com/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 The AI-Ready Pipeline: Reimagining Airflow at Veyer® Logistics with Anu Pabla 23:21
23:21
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי23:21
Innovation in orchestration is redefining how engineers approach both traditional ETL pipelines and emerging AI workloads. Understanding how to harness Airflow’s flexibility and observability is essential for teams navigating today’s evolving data landscape. In this episode, Anu Pabla , Principal Engineer at The ODP Corporation , joins us to discuss her journey from legacy orchestration patterns to AI-native pipelines and why she sees Airflow as the future of AI workload orchestration. Key Takeaways: (03:43) Engaging with external technology communities fosters innovation. (05:05) Mentoring early-career engineers builds confidence in a complex tech landscape. (07:51) Orchestration patterns continue to evolve with modern data needs. (08:41) Managing AI workflows requires structured and flexible orchestration. (10:35) High-quality, meaningful data remains foundational across use cases. (15:08) Community-driven open source tools offer lasting value. (16:59) Self-healing systems support both legacy and AI pipelines. (20:20) Orchestration platforms can drive future AI-native workloads. Resources Mentioned: Anu Pabla https://www.linkedin.com/in/atomicap/ The ODP Corporation https://www.linkedin.com/company/the-odp-corporation/ The ODP Corporation | Website https://www.theodpcorp.com/homepage Apache Airflow https://airflow.apache.org/ LlamaIndex https://www.llamaindex.ai/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Streamlining AI and ML Operations at IBM with BJ Adesoji and Ryan Yackel 24:44
24:44
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי24:44
The orchestration layer is foundational to building robust AI- and ML-powered data pipelines, especially in complex hybrid enterprise environments. IBM’s partnership with Astronomer reflects a strategic alignment to simplify and scale Airflow-based workflows across industries. In this episode, we’re joined by IBM ’s Senior Product Manager, BJ Adesoji , and GTM PM and Growth Leader , Ryan Yackel . We discuss how IBM customers are using Airflow in production, the challenges they face at scale and what the new IBM–Astronomer collaboration unlocks. Key Takeaways: (03:09) The growing importance of orchestration tools in enterprise environments. (04:48) How organizations are expanding orchestration beyond traditional use cases. (05:24) Common patterns across industries adopting orchestration platforms. (07:16) Why orchestration is essential for supporting business-critical workloads. (10:00) The role of orchestration in compliance and regulatory processes. (13:02) Challenges enterprises face when managing orchestration infrastructure. (14:58) Opportunities to simplify and centralize orchestration at scale. (19:11) The value of integrating orchestration with broader data toolchains. (20:54) How AI is shaping the future of orchestrated data workflows. Resources Mentioned: BJ Adesoji https://www.linkedin.com/in/bj-soji/ Ryan Yackel https://www.linkedin.com/in/ryanyackel/ IBM | LinkedIn https://www.linkedin.com/company/databand-ai/ IBM Databand https://www.ibm.com/products/databand IBM DataStage https://www.ibm.com/products/datastage IBM watsonx.governance https://www.ibm.com/products/watsonx-governance IBM Knowledge Catalog https://www.ibm.com/products/knowledge-catalog Apache Airflow https://airflow.apache.org/ watsonx Orchestrate https://www.ibm.com/products/watsonx-orchestrate Domino https://domino.ai/ Astronomer https://www.astronomer.io/ Snowflake https://www.snowflake.com/en/ dbt Labs https://www.getdbt.com/ Amazon SageMaker https://aws.amazon.com/sagemaker/ Cloudera https://www.cloudera.com/ MongoDB https://www.mongodb.com/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Inside the Custom Framework for Managing Airflow Code at Wix with Gil Reich 31:02
31:02
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי31:02
Efficient orchestration and maintainability are crucial for data engineering at scale. Gil Reich , Data Developer for Data Science at Wix , shares how his team reduced code duplication, standardized pipelines, and improved Airflow task orchestration using a Python-based framework built within the data science team. In this episode, Gil explains how this internal framework simplifies DAG creation, improves documentation accuracy, and enables consistent task generation for machine learning pipelines. He also shares lessons from complex DAG optimization and maintaining testable code. Key Takeaways: (03:23) Code duplication creates long-term problems. (08:16) Frameworks bring order to complex pipelines. (09:41) Shared functions cut down repetitive code. (17:18) Auto-generated docs stay accurate by design. (22:40) On-demand DAGs support real-time workflows. (25:08) Task-level sensors improve run efficiency. (27:40) Combine local runs with automated tests. (30:09) Clean code helps teams scale faster. Resources Mentioned: Gil Reich https://www.linkedin.com/in/gilreich/ Wix | LinkedIn https://www.linkedin.com/company/wix-com/ Wix | Website https://www.wix.com/ DS DAG Framework https://airflowsummit.org/slides/2024/92-refactoring-dags.pdf Apache Airflow https://airflow.apache.org/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Modernizing Legacy Data Systems With Airflow at Procter & Gamble with Adonis Castillo Cordero 22:13
22:13
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי22:13
Legacy architecture and AI workloads pose unique challenges at scale, especially in a global enterprise with complex data systems. In this episode, we explore strategies to proactively monitor and optimize pipelines while minimizing downstream failures. Adonis Castillo Cordero , Senior Automation Manager at Procter & Gamble , joins us to share actionable best practices for dependency mapping, anomaly detection and architecture simplification using Apache Airflow. Key Takeaways: (03:13) Integrating legacy data systems into modern architecture. (05:51) Designing workflows for real-time data processing. (07:57) Mapping dependencies early to avoid pipeline failures. (09:02) Building automated monitoring into orchestration frameworks. (12:09) Detecting anomalies to prevent performance bottlenecks. (15:24) Monitoring data quality to catch silent failures. (17:02) Prioritizing responses based on impact severity. (18:55) Simplifying dashboards to highlight critical metrics. Resources Mentioned: Adonis Castillo Cordero https://www.linkedin.com/in/adoniscc/ Procter & Gamble | LinkedIn https://www.linkedin.com/company/procter-and-gamble/ Procter & Gamble | Website http://www.pg.com Apache Airflow https://airflow.apache.org/ OpenLineage https://openlineage.io/ Azure Monitor https://azure.microsoft.com/en-us/products/monitor/ AWS Lookout for Metrics https://aws.amazon.com/lookout-for-metrics/ Monte Carlo https://www.montecarlodata.com/ Great Expectations https://greatexpectations.io/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
T
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

1 Building an End-to-End Data Observability System at Netflix with Joseph Machado 38:54
38:54
הפעל מאוחר יותר
הפעל מאוחר יותר
רשימות
לייק
אהבתי38:54
Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy. Joseph Machado , Senior Data Engineer at Netflix , joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems. Key Takeaways: . (03:14) Supporting data privacy and engineering efficiency within data systems. (10:41) Validating outputs with reconciliation checks to catch transformation issues. (16:06) Applying standardized patterns for auditing, validating and publishing data. (19:28) Capturing historical check results to monitor system health and improvements. (21:29) Treating data quality and availability as separate monitoring concerns. (26:26) Using containerization strategies to streamline pipeline executions. (29:47) Leveraging orchestration platforms for better visibility and retry capability. (31:59) Managing business pressure without sacrificing data quality practices. (35:46) Starting simple with quality checks and evolving toward more complex frameworks. Resources Mentioned: Joseph Machado https://www.linkedin.com/in/josephmachado1991/ Netflix | LinkedIn https://www.linkedin.com/company/netflix/ Netflix | Website https://www.netflix.com/browse Start Data Engineering https://www.startdataengineering.com/ Apache Airflow https://airflow.apache.org/ dbt Labs https://www.getdbt.com/ Great Expectations https://greatexpectations.io/ https://www.astronomer.io/events/roadshow/london/ https://www.astronomer.io/events/roadshow/new-york/ https://www.astronomer.io/events/roadshow/sydney/ https://www.astronomer.io/events/roadshow/san-francisco/ https://www.astronomer.io/events/roadshow/chicago/ Thanks for listening to “ The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI .” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations. #AI #Automation #Airflow #MachineLearning…
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.