32 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 Throwing good parties and building community (w/ Priya Parker) 38:16
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools
Manage episode 424666751 series 2510642
Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.
The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.
Why Use Apache Flink?
The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.
Why use ksqlDB/Kafka Streams?
Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.
There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.
Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.
EPISODE LINKS
- The Future of SQL: Databases Meet Stream Processing
- Building Real-Time Event Streams in the Cloud, On Premises
- Kafka Streams 101 course
- ksqlDB 101 course
- Watch the video version of this podcast
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more on Confluent Developer
- Use PODCAST100 for additional $100 of Confluent Cloud usage (details)
פרקים
1. Intro (00:00:00)
2. The world of stream processing (00:02:06)
3. Flink vs ksqlDB (00:06:26)
4. Example use case (00:18:34)
5. SQL was built for static data (00:20:03)
6. Concept of event time (00:25:51)
7. Session based window joins (00:29:30)
8. Processing streaming data with SQL (00:35:47)
9. Scaling Kafka Streams/ksqlDB (00:39:47)
10. Exactly-once semantics (00:45:39)
11. Choosing stream processing tools (00:48:15)
12. It's a wrap (00:53:52)
265 פרקים
Manage episode 424666751 series 2510642
Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.
The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.
Why Use Apache Flink?
The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.
Why use ksqlDB/Kafka Streams?
Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.
There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.
Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.
EPISODE LINKS
- The Future of SQL: Databases Meet Stream Processing
- Building Real-Time Event Streams in the Cloud, On Premises
- Kafka Streams 101 course
- ksqlDB 101 course
- Watch the video version of this podcast
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more on Confluent Developer
- Use PODCAST100 for additional $100 of Confluent Cloud usage (details)
פרקים
1. Intro (00:00:00)
2. The world of stream processing (00:02:06)
3. Flink vs ksqlDB (00:06:26)
4. Example use case (00:18:34)
5. SQL was built for static data (00:20:03)
6. Concept of event time (00:25:51)
7. Session based window joins (00:29:30)
8. Processing streaming data with SQL (00:35:47)
9. Scaling Kafka Streams/ksqlDB (00:39:47)
10. Exactly-once semantics (00:45:39)
11. Choosing stream processing tools (00:48:15)
12. It's a wrap (00:53:52)
265 פרקים
همه قسمت ها
×
1 Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates 11:25

1 How to use Data Contracts for Long-Term Schema Management 57:28

1 How to use Python with Apache Kafka 31:57

1 Next-Gen Data Modeling, Integrity, and Governance with YODA 55:55

1 Migrate Your Kafka Cluster with Minimal Downtime 1:01:30

1 Real-Time Data Transformation and Analytics with dbt Labs 43:41

1 What is the Future of Streaming Data? 41:29

1 What can Apache Kafka Developers learn from Online Gaming? 55:32

1 How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems 50:01

1 What is Data Democratization and Why is it Important? 47:27

1 Git for Data: Managing Data like Code with lakeFS 30:42

1 Using Kafka-Leader-Election to Improve Scalability and Performance 51:06

1 Real-Time Machine Learning and Smarter AI with Data Streaming 38:56

1 The Present and Future of Stream Processing 31:19

1 Top 6 Worst Apache Kafka JIRA Bugs 1:10:58

1 Learn How Stream-Processing Works The Simplest Way Possible 31:29

1 Building and Designing Events and Event Streams with Apache Kafka 53:06

1 Rethinking Apache Kafka Security and Account Management 41:23

1 Real-time Threat Detection Using Machine Learning and Apache Kafka 29:18

1 Improving Apache Kafka Scalability and Elasticity with Tiered Storage 29:32

1 Decoupling with Event-Driven Architecture 38:38

1 If Streaming Is the Answer, Why Are We Still Doing Batch? 43:58

1 Security for Real-Time Data Stream Processing with Confluent Cloud 48:33

1 Running Apache Kafka in Production 58:44

1 Build a Real Time AI Data Platform with Apache Kafka 37:18

1 Optimizing Apache JVMs for Apache Kafka 1:11:42


1 Application Data Streaming with Apache Kafka and Swim 39:10

1 International Podcast Day - Apache Kafka Edition | Streaming Audio Special 1:02:22


1 Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka 34:07

1 Reddit Sentiment Analysis with Apache Kafka-Based Microservices 35:23

1 Capacity Planning Your Apache Kafka Cluster 1:01:54

1 Streaming Real-Time Sporting Analytics for World Table Tennis 34:29

1 Real-Time Event Distribution with Data Mesh 48:59

1 Apache Kafka Security Best Practices 39:10

1 What Could Go Wrong with a Kafka JDBC Connector? 41:10

1 Apache Kafka Networking with Confluent Cloud 37:22

1 Event-Driven Systems and Agile Operations 53:22

1 Streaming Analytics and Real-Time Signal Processing with Apache Kafka 1:06:33

1 Blockchain Data Integration with Apache Kafka 50:59

1 Automating Multi-Cloud Apache Kafka Cluster Rollouts 48:29

1 Common Apache Kafka Mistakes to Avoid 1:09:43

1 Tips For Writing Abstracts and Speaking at Conferences 48:56

1 How I Became a Developer Advocate 29:48

1 Data Mesh Architecture: A Modern Distributed Data Model 48:42

1 Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools 55:55

1 Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB 33:56


1 Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli 49:07

1 Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic 34:41


1 Optimizing Apache Kafka's Internals with Its Co-Creator Jun Rao 48:54

1 Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood 51:09

1 Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic 38:25

1 Confluent Platform 7.1: New Features + Updates 10:01

1 Scaling an Apache Kafka Based Architecture at Therapie Clinic 1:10:56

1 Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs 23:13
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.