התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 #52: Navigating the effect of AI on marketing jobs and the job market with Sue Keith, Landrum Talent Solutions 19:09
Consistent, Complete Distributed Stream Processing ft. Guozhang Wang
Manage episode 424666800 series 2510642
Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like.
In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper:
- Streaming correctness challenges
- Stream processing with Kafka
- Exactly-once in Kafka Streams
For context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees:
- Consistency: Ensuring unique and extant records
- Completeness: Ensuring the correct order of records, also referred to as exactly-once semantics.
Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."
EPISODE LINKS
- White Paper: Rethinking Distributed Stream Processing in Apache Kafka
- Blog: Rethinking Distributed Stream Processing in Apache Kafka
- Enabling Exactly-Once in Kafka Streams
- Why Kafka Streams Does Not Use Watermarks ft. Matthias Sax
- Streams and Tables: Two Sides of the Same Coin
- Watch the video version of this podcast
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Kafka streaming in 10 minutes on Confluent Cloud
- Use 60PDCAST to get $60 of free Confluent Cloud usage (details)
265 פרקים
Manage episode 424666800 series 2510642
Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like.
In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper:
- Streaming correctness challenges
- Stream processing with Kafka
- Exactly-once in Kafka Streams
For context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees:
- Consistency: Ensuring unique and extant records
- Completeness: Ensuring the correct order of records, also referred to as exactly-once semantics.
Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."
EPISODE LINKS
- White Paper: Rethinking Distributed Stream Processing in Apache Kafka
- Blog: Rethinking Distributed Stream Processing in Apache Kafka
- Enabling Exactly-Once in Kafka Streams
- Why Kafka Streams Does Not Use Watermarks ft. Matthias Sax
- Streams and Tables: Two Sides of the Same Coin
- Watch the video version of this podcast
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Kafka streaming in 10 minutes on Confluent Cloud
- Use 60PDCAST to get $60 of free Confluent Cloud usage (details)
265 פרקים
Όλα τα επεισόδια
×
1 Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates 11:25

1 How to use Data Contracts for Long-Term Schema Management 57:28

1 How to use Python with Apache Kafka 31:57

1 Next-Gen Data Modeling, Integrity, and Governance with YODA 55:55

1 Migrate Your Kafka Cluster with Minimal Downtime 1:01:30

1 Real-Time Data Transformation and Analytics with dbt Labs 43:41

1 What is the Future of Streaming Data? 41:29

1 What can Apache Kafka Developers learn from Online Gaming? 55:32


1 How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems 50:01

1 What is Data Democratization and Why is it Important? 47:27

1 Git for Data: Managing Data like Code with lakeFS 30:42

1 Using Kafka-Leader-Election to Improve Scalability and Performance 51:06

1 Real-Time Machine Learning and Smarter AI with Data Streaming 38:56
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.