32 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


What Could Go Wrong with a Kafka JDBC Connector?
Manage episode 424666741 series 2510642
Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations.
In this episode, we'll cover how a JDBC connection works, and common issues with your database connection.
Why the Kafka JDBC Connector?
When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture.
The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka.
The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as:
- Infrequent snapshot times
- Out-of-order events
- Non-incremental sequences
- Hard deletes
To help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA.
They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate.
EPISODE LINKS
- Kafka Connect Deep Dive – JDBC Source Connector
- JDBC Source Connector: What could go wrong?
- Metadata parser
- Debezium Documentation
- Database Migration with Apache Kafka and Apache Kafka Connect
- Watch the video version of this podcast
- Francesco Tisiot’s Twitter
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more on Confluent Developer
פרקים
1. Intro (00:00:00)
2. Game of Thrones Sentiment Analysis (00:06:48)
3. Kafka Integration with JDBC Connector (00:11:34)
4. JDBC Connector – Polling Time (00:16:28)
5. Change Data Capture with Debezium (00:20:18)
6. Manage Data Flows with ksqlDB (00:30:01)
7. metadata-parser (00:32:41)
8. Tips on Getting Started with Debezium (00:34:54)
9. It's a wrap (00:39:22)
265 פרקים
Manage episode 424666741 series 2510642
Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations.
In this episode, we'll cover how a JDBC connection works, and common issues with your database connection.
Why the Kafka JDBC Connector?
When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture.
The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka.
The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as:
- Infrequent snapshot times
- Out-of-order events
- Non-incremental sequences
- Hard deletes
To help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA.
They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate.
EPISODE LINKS
- Kafka Connect Deep Dive – JDBC Source Connector
- JDBC Source Connector: What could go wrong?
- Metadata parser
- Debezium Documentation
- Database Migration with Apache Kafka and Apache Kafka Connect
- Watch the video version of this podcast
- Francesco Tisiot’s Twitter
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more on Confluent Developer
פרקים
1. Intro (00:00:00)
2. Game of Thrones Sentiment Analysis (00:06:48)
3. Kafka Integration with JDBC Connector (00:11:34)
4. JDBC Connector – Polling Time (00:16:28)
5. Change Data Capture with Debezium (00:20:18)
6. Manage Data Flows with ksqlDB (00:30:01)
7. metadata-parser (00:32:41)
8. Tips on Getting Started with Debezium (00:34:54)
9. It's a wrap (00:39:22)
265 פרקים
모든 에피소드
×
1 Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates 11:25

1 How to use Data Contracts for Long-Term Schema Management 57:28

1 How to use Python with Apache Kafka 31:57

1 Next-Gen Data Modeling, Integrity, and Governance with YODA 55:55

1 Migrate Your Kafka Cluster with Minimal Downtime 1:01:30

1 Real-Time Data Transformation and Analytics with dbt Labs 43:41

1 What is the Future of Streaming Data? 41:29

1 What can Apache Kafka Developers learn from Online Gaming? 55:32

1 How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems 50:01

1 What is Data Democratization and Why is it Important? 47:27

1 Git for Data: Managing Data like Code with lakeFS 30:42

1 Using Kafka-Leader-Election to Improve Scalability and Performance 51:06

1 Real-Time Machine Learning and Smarter AI with Data Streaming 38:56

1 The Present and Future of Stream Processing 31:19

1 Top 6 Worst Apache Kafka JIRA Bugs 1:10:58

1 Learn How Stream-Processing Works The Simplest Way Possible 31:29

1 Building and Designing Events and Event Streams with Apache Kafka 53:06

1 Rethinking Apache Kafka Security and Account Management 41:23

1 Real-time Threat Detection Using Machine Learning and Apache Kafka 29:18

1 Improving Apache Kafka Scalability and Elasticity with Tiered Storage 29:32

1 Decoupling with Event-Driven Architecture 38:38

1 If Streaming Is the Answer, Why Are We Still Doing Batch? 43:58

1 Security for Real-Time Data Stream Processing with Confluent Cloud 48:33

1 Running Apache Kafka in Production 58:44

1 Build a Real Time AI Data Platform with Apache Kafka 37:18

1 Optimizing Apache JVMs for Apache Kafka 1:11:42


1 Application Data Streaming with Apache Kafka and Swim 39:10

1 International Podcast Day - Apache Kafka Edition | Streaming Audio Special 1:02:22


1 Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka 34:07

1 Reddit Sentiment Analysis with Apache Kafka-Based Microservices 35:23

1 Capacity Planning Your Apache Kafka Cluster 1:01:54

1 Streaming Real-Time Sporting Analytics for World Table Tennis 34:29

1 Real-Time Event Distribution with Data Mesh 48:59

1 Apache Kafka Security Best Practices 39:10

1 What Could Go Wrong with a Kafka JDBC Connector? 41:10

1 Apache Kafka Networking with Confluent Cloud 37:22

1 Event-Driven Systems and Agile Operations 53:22

1 Streaming Analytics and Real-Time Signal Processing with Apache Kafka 1:06:33

1 Blockchain Data Integration with Apache Kafka 50:59

1 Automating Multi-Cloud Apache Kafka Cluster Rollouts 48:29

1 Common Apache Kafka Mistakes to Avoid 1:09:43

1 Tips For Writing Abstracts and Speaking at Conferences 48:56

1 How I Became a Developer Advocate 29:48

1 Data Mesh Architecture: A Modern Distributed Data Model 48:42

1 Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools 55:55

1 Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB 33:56


1 Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli 49:07

1 Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic 34:41


1 Optimizing Apache Kafka's Internals with Its Co-Creator Jun Rao 48:54

1 Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood 51:09

1 Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic 38:25

1 Confluent Platform 7.1: New Features + Updates 10:01

1 Scaling an Apache Kafka Based Architecture at Therapie Clinic 1:10:56

1 Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs 23:13

1 Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole 42:58

1 Handling 2 Million Apache Kafka Messages Per Second at Honeycomb 41:36


1 Serverless Stream Processing with Apache Kafka ft. Bill Bejeck 42:23

1 The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps 46:32


1 Intro to Event Sourcing with Apache Kafka ft. Anna McDonald 30:14

1 Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela 31:01


1 Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra 30:40

1 From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine 29:50

1 Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik 34:51

1 Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris 34:59

1 Running Hundreds of Stream Processing Applications with Apache Kafka at Wise 31:08
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.