What Could Go Wrong with a Kafka JDBC Connector?

Streaming Audio: Apache Kafka® & Real-Time Data

Streaming Audio: Apache Kafka® & Real-Time Data

Player FM - Internet Radio Done Right

32 subscribers

הוסף לפני six שנים

תוכן מסופק על ידי Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Species Unite

1
Trevor Ritland: The Golden Toad 34:01

לפני 4 weeks34:01

הפעל מאוחר יותר

רשימות

לייק

אהבתי

34:01

I think you could probably go back and track the stages of grief, probably that is what I went through. But I think if you do it right, you end up at acceptance. And that's where I ended up. And that's not to say that I've fully accepted the idea that the golden toad is extinct. Personally, I do still hold out hope that it could still be out there in those forests." - Trevor Ritland This conversation is with Trevor Ritland, who—along with his twin brother Kyle—authored The Golden Toad . The book chronicles their remarkable journey into Costa Rica’s cloud forest, once home to hundreds of brilliant golden toads that would emerge for just a few weeks each year—until, one day, they vanished without a trace. What began as a search for a lost species soon became something much more profound: a confrontation with ecological grief, a meditation on hope, and a powerful call to protect the natural world while we still can. Links: SpeciesUnite.com Kyle and Trevor: https://kyleandtrevor.com/ Instagram: https://www.instagram.com/adventureterm/ Goodreads - https://www.goodreads.com/book/show/222249677-the-golden-toad Amazon - https://www.amazon.com/Golden-Toad-Ecological-Mystery-Species/dp/163576996…

לפני 3 שנים 41:10

MP3•בית הפרקים

The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka.

The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as:

Infrequent snapshot times
Out-of-order events
Non-incremental sequences
Hard deletes

To help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA.

They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate.
EPISODE LINKS

פרקים

1. Intro (00:00:00)

2. Game of Thrones Sentiment Analysis (00:06:48)

3. Kafka Integration with JDBC Connector (00:11:34)

4. JDBC Connector – Polling Time (00:16:28)

5. Change Data Capture with Debezium (00:20:18)

6. Manage Data Flows with ksqlDB (00:30:01)

7. metadata-parser (00:32:41)

8. Tips on Getting Started with Debezium (00:34:54)

9. It's a wrap (00:39:22)

265 פרקים

#Tech #Tech News #News #Confluent #Event Stream Processing #Data #Event Driven Architecture #Open Source #Data In Motion #Kafka Cloud Native #Data Mesh #Data Pipeline #Serverless Kafka #Podcasting Education #Confluent, original creators of Apache Kafka® #original creators of Apache Kafka® #Apache Kafka® #Cloud IT #Real Time

Streaming Audio: Apache Kafka® & Real-Time Data

What Could Go Wrong with a Kafka JDBC Connector?

Streaming Audio: Apache Kafka® & Real-Time Data

32 subscribers

published לפני 3 שנים

שתפו

MP3•בית הפרקים

Infrequent snapshot times
Out-of-order events
Non-incremental sequences
Hard deletes

פרקים

1. Intro (00:00:00)

2. Game of Thrones Sentiment Analysis (00:06:48)

3. Kafka Integration with JDBC Connector (00:11:34)

4. JDBC Connector – Polling Time (00:16:28)

5. Change Data Capture with Debezium (00:20:18)

6. Manage Data Flows with ksqlDB (00:30:01)

7. metadata-parser (00:32:41)

8. Tips on Getting Started with Debezium (00:34:54)

9. It's a wrap (00:39:22)

265 פרקים

Alle episoder

Streaming Audio: Apache Kafka® & Real-Time Data

1
Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates 11:25

לפני 2 years11:25

11:25

Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates. Kafka Core: KIP-833 provides an updated timeline for KRaft. KIP-866 now is preview and allows migration from an existing ZooKeeper cluster to KRaft mode. KIP-900 introduces a way to bootstrap the KRaft controllers with SCRAM credentials. KIP-903 prevents a data loss scenario by preventing replicas with stale broker epochs from joining the ISR list. KIP-915 streamlines the process of downgrading Kafka's transaction and group coordinators by introducing tagged fields. Kafka Connect: KIP-710 provides the option to use a REST API for internal server communication that can be enabled by setting `dedicated.mode.enable.internal.rest` equal to true. KIP-875 offers support for native offset management in Kafka Connect. Connect cluster administrators can now read offsets for both source and sink connectors. This KIP adds a new STOPPED state for connectors, enabling users to shut down connectors and maintain connector configurations without utilizing resources. KIP-894 makes `IncrementalAlterConfigs` API available for use in MirrorMaker 2 (MM2), adding a new use.incremental.alter.config configuration which takes values “requested,” “never,” and “required.” KIP-911 adds a new source tag for metrics generated by the `MirrorSourceConnector` to help monitor mirroring deployments. Kafka Streams: KIP-339 improves Kafka Streams' error-handling capabilities by addressing serialization errors that occur before message production and extending the interface for custom error handling. KIP-889 introduces versioned state stores in Kafka Streams for temporal join semantics in stream-to-table joins. KIP-904 simplifies table aggregation in Kafka by proposing a change in serialization format to enable one-step aggregation and reduce noise from events with old and new keys/values. KIP-914 modifies how versioned state stores are used in Kafka Streams. Versioned state stores may impact different DSL processors in varying ways, see the documentation for details. Kafka Client: KIP-881 is now complete and introduces new client-side assignor logic for rack-aware consumer balancing for Kafka Consumers. KIP-887 adds the `EnvVarConfigProvider` implementation to Kafka so custom configurations stored in environment variables can be injected into the system by providing the map returned by `System.getEnv()`. KIP 641 introduces the `RecordReader` interface to Kafka's clients module, replacing the deprecated MessageReader Scala trait. EPISODE LINKS See release notes for Apache Kafka 3.5 Read the blog to learn more Download and get started with Apache Kafka 3.5 Watch the video version of this podcast…

Streaming Audio: Apache Kafka® & Real-Time Data

1
A Special Announcement from Streaming Audio 1:18

לפני 2 years1:18

1:18

After recording 64 episodes and featuring 58 amazing guests, the Streaming Audio podcast series has amassed over 130,000 plays on YouTube in the last year. We're extremely proud of these achievements and feel that it's time to take a well-deserved break. Streaming Audio will be taking a vacation! We want to express our gratitude to you, our valued listeners, for spending 10,000 hours with us on this incredible journey. Rest assured, we will be back with more episodes! In the meantime, feel free to revisit some of our previous episodes. For instance, you can listen to Anna McDonald share her stories about the worst Apache Kafka® bugs she’s ever seen, or listen to Jun Rao offer his expert advice on running Kafka in production. And who could forget the charming backstory behind Mitch Seymour's Kafka storybook, Gently Down the Stream? These memorable episodes brought us joy, and we're thrilled to have shared them with you. As we reflect on our accomplishments with pride, we also look forward to an exciting future. Until we meet again, happy listening! EPISODE LINKS Top 6 Worst Apache Kafka JIRA Bugs Running Apache Kafka in Production Learn How Stream-Processing Works The Simplest Way Possible Watch the video version of this podcast Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
How to use Data Contracts for Long-Term Schema Management 57:28

לפני 2 years57:28

57:28

Have you ever struggled with managing data long term, especially as the schema changes over time? In order to manage and leverage data across an organization, it’s essential to have well-defined guidelines and standards in place around data quality, enforcement, and data transfer. To get started, Abraham Leal (Customer Success Technical Architect, Confluent) suggests that organizations associate their Apache Kafka® data with a data contract (schema). A data contract is an agreement between a service provider and data consumers. It defines the management and intended usage of data within an organization. In this episode, Abraham talks to Kris about how to use data contracts and schema enforcement to ensure long-term data management. When an organization sends and stores critical and valuable data in Kafka, more often than not it would like to leverage that data in various valuable ways for multiple business units. Kafka is particularly suited for this use case, but it can be problematic later on if the governance rules aren’t established up front. With schema registry, evolution is easy due to its robust security guarantees. When managing data pipelines, you can also use GitOps automation features for an extra control layer. It allows you to be creative with topic versioning, upcasting/downcasting the data collected, and adding quality assurance steps at the end of each run to ensure your project remains reliable. Abraham explains that Protobuf and Avro are the best formats to use rather than XML or JSON because they are built to handle schema evolution. In addition, they have a much lower overhead per-record, so you can save bandwidth and data storage costs by adopting them. There’s so much more to consider, but if you are thinking about implementing or integrating with your data quality team, Abraham suggests that you use schema registry heavily from the beginning. If you have more questions, Kris invites you to join the conversation. You can also watch the KOR Financial Current talk Abraham mentions or take Danica Fine’s free course on how to use schema registry on Confluent Developer. EPISODE LINKS OS project KOR Financial Current Talk The Key Concepts of Schema Registry Schema Evolution and Compatibility Schema Registry Made Simple by Confluent Cloud ft. Magesh Nandakumar Kris Jenkins’ Twitter Watch the video version of this podcast Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
How to use Python with Apache Kafka 31:57

לפני 2 years31:57

31:57

Can you use Apache Kafka® and Python together? What’s the current state of Python support? And what are the best options to get started? In this episode, Dave Klein joins Kris to talk about all things Kafka and Python: the libraries, the tools, and the pros & cons. He also talks about the new course he just launched to support Python programmers entering the event-streaming world. Dave has been an active member of the Kafka community for many years and noticed that there were a lot of Kafka resources for Java but few for Python. So he decided to create a course to help people get started using Python and Kafka together. Historically, Java has had the most documentation, and people have often missed how good the Python support is for Kafka users. Python and Kafka are an ideal fit for machine learning applications and data engineering in general. Yet there are a lot of use cases for building, streaming, and machine learning pipelines. In fact, someone conducted a survey to find out what languages were most popular in the Kafka community and Python came in second after Java. That’s how Dave got the idea to create a course for newbies. In this course, Dave combines video lectures with code-heavy exercises to give developers a taste of what the code looks like, how to structure it, a preview of the shape of the code, and the structure of the classes and the functions so you can get hands-on practice using the library. He also covers building a producer and a consumer and using the admin client. And, of course, there is a module that covers working with the schemas supported by the Kafka library. Dave explains that Python opens up a world of opportunity and is ripe for expansion. So if you are ready to dive in, head over to developer.confluent.io to learn more about Dave’s course. EPISODE LINKS Blog: Getting Started with Python for Apache Kafka Course: Introduction to Apache Kafka for Python Developers Step-by-step guide: Building a Python client application for Kafka Coding in Motion Building and Designing Events and Event Streams with Apache Kafka Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Next-Gen Data Modeling, Integrity, and Governance with YODA 55:55

לפני 2 years55:55

55:55

In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale. Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets. The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data. This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation. To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer. Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more. ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles. EPISODE LINKS Current 2022 Talk: Next Gen Data Modeling in the Open Data Platform Data Mesh 101 Data Mesh Architecture: A Modern Distributed Data Model Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Migrate Your Kafka Cluster with Minimal Downtime 1:01:30

לפני 2 years1:01:30

1:01:30

Migrating Apache Kafka® clusters can be challenging, especially when moving large amounts of data while minimizing downtime. Michael Dunn (Solutions Architect, Confluent) has worked in the data space for many years, designing and managing systems to support high-volume applications. He has helped many organizations strategize, design, and implement successful Kafka cluster migrations between different environments. In this episode, Michael shares some tips about Kafka cluster migration with Kris, including the pros and cons of the different tools he recommends. Michael explains that there are many reasons why companies migrate their Kafka clusters. For example, they may want to modernize their platforms, move to a self-hosted cloud server, or consolidate clusters. He tells Kris that creating a plan and selecting the right tool before getting started is critical for reducing downtime and minimizing migration risks. The good news is that a few tools can facilitate moving large amounts of data, topics, schemas, applications, connectors, and everything else from one Apache Kafka cluster to another. Kafka MirrorMaker/MirrorMaker2 (MM2) is a stand-alone tool for copying data between two Kafka clusters. It uses source and sink connectors to replicate topics from a source cluster into the destination cluster. Confluent Replicator allows you to replicate data from one Kafka cluster to another. Replicator is similar to MM2, but the difference is that it’s been battle-tested. Cluster Linking is a powerful tool offered by Confluent that allows you to mirror topics from an Apache Kafka 2.4/Confluent Platform 5.4 source cluster to a Confluent Platform 7+ cluster in a read-only state, and is available as a fully-managed service in Confluent Cloud. At the end of the day, Michael stresses that coupled with a well-thought-out strategy and the right tool, Kafka cluster migration can be relatively painless. Following his advice, you should be able to keep your system healthy and stable before and after the migration is complete. EPISODE LINKS MirrorMaker 2 Replicator Cluster Linking Schema Migration Multi-Cluster Apache Kafka with Cluster Linking Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Real-Time Data Transformation and Analytics with dbt Labs 43:41

לפני 2 years43:41

43:41

dbt is known as being part of the Modern Data Stack for ELT processes. Being in the MDS, dbt Labs believes in having the best of breed for every part of the stack. Oftentimes folks are using an EL tool like Fivetran to pull data from the database into the warehouse, then using dbt to manage the transformations in the warehouse. Analysts can then build dashboards on top of that data, or execute tests. It’s possible for an analyst to adapt this process for use with a microservice application using Apache Kafka® and the same method to pull batch data out of each and every database; however, in this episode, Amy Chen (Partner Engineering Manager, dbt Labs) tells Kris about a better way forward for analysts willing to adopt the streaming mindset: Reusable pipelines using dbt models that immediately pull events into the warehouse and materialize as materialized views by default. dbt Labs is the company that makes and maintains dbt. dbt Core is the open-source data transformation framework that allows data teams to operate with software engineering’s best practices. dbt Cloud is the fastest and most reliable way to deploy dbt. Inside the world of event streaming, there is a push to expand data access beyond the programmers writing the code, and towards everyone involved in the business. Over at dbt Labs they’re attempting something of the reverse— to get data analysts to adopt the best practices of software engineers, and more recently, of streaming programmers. They’re improving the process of building data pipelines while empowering businesses to bring more contributors into the analytics process, with an easy to deploy, easy to maintain platform. It offers version control to analysts who traditionally don’t have access to git, along with the ability to easily automate testing, all in the same place. In this episode, Kris and Amy explore: How to revolutionize testing for analysts with two of dbt’s core functionalities What streaming in a batch-based analytics world should look like What can be done to improve workflows How to democratize access to data for everyone in the business EPISODE LINKS Learn more about dbt labs An Analytics Engineer’s Guide to Streaming Panel discussion: If Streaming Is the Answer, Why Are We Still Doing Batch? All Current 2022 sessions and slides Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
What is the Future of Streaming Data? 41:29

לפני 2 years41:29

41:29

What’s the next big thing in the future of streaming data? In this episode, Greg DeMichillie (VP of Product and Solutions Marketing, Confluent) talks to Kris about the future of stream processing in environments where the value of data lies in their ability to intercept and interpret data. Greg explains that organizations typically focus on the infrastructure containers themselves, and not on the thousands of data connections that form within. When they finally realize that they don't have a way to manage the complexity of these connections, a new problem arises: how do they approach managing such complexity? That’s where Confluent and Apache Kafka® come into play - they offer a consistent way to organize this seemingly endless web of data so they don't have to face the daunting task of figuring out how to connect their shopping portals or jump through hoops trying different ETL tools on various systems. As more companies seek ways to manage this data, they are asking some basic questions: How to do it? Do best practices exist? How can we get help? The next question for companies who have already adopted Kafka is a bit more complex: "What about my partners?” For example, companies with inventory management systems use supply chain systems to track product creation and shipping. As a result, they need to decide which emails to update, if they need to write custom REST APIs to sit in front of Kafka topics, etc. Advanced use cases like this raise additional questions about data governance, security, data policy, and PII, forcing companies to think differently about data. Greg predicts this is the next big frontier as more companies adopt Kafka internally. And because they will have to think less about where the data is stored and more about how data moves, they will have to solve problems to make managing all that data easier. If you're an enthusiast of real-time data streaming, Greg invites you to attend the Kafka Summit (London) in May and Current (Austin, TX) for a deeper dive into the world of Apache Kafka-related topics now and beyond. EPISODE LINKS What’s Ahead of the Future of Data Streaming? If Streaming Is the Answer, Why Are We Still Doing Batch? All Current 2022 sessions and slides Kafka Summit London 2023 Current 2023 Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
What can Apache Kafka Developers learn from Online Gaming? 55:32

לפני 2 years55:32

55:32

What can online gaming teach us about making large-scale event management more collaborative in real-time? Ben Gamble (Developer Relations Manager, Aiven) has come to the world of real-time event streaming from an usual source: the video games industry. And if you stop to think about it, modern online games are complex, distributed real-time data systems with decades of innovative techniques to teach us. In this episode, Ben talks with Kris about integrating gaming concepts with Apache Kafka®. Using Kafka’s state management stream processing, Ben has built systems that can handle real-time event processing at a massive scale, including interesting approaches to conflict resolution and collaboration. Building latency into a system is one way to mask data processing time. Ben says that you can efficiently hide latency issues and prioritize performance improvements by setting an initial target and then optimizing from there. If you measure before optimizing, you can add an extra layer to manage user expectations better. Tricks like adding a visual progress bar give the appearance of progress but actually hide latency and improve the overall user experience. To effectively handle challenging activities, like resolving conflicts and atomic edits, Ben suggests “slicing” (or nano batching) to break down tasks into small, related chunks. Slicing allows each task to be evaluated separately, thus producing timely outcomes that resolve potential background conflicts without the user knowing. Ben also explains how he uses pooling to make collaboration seamless. Pooling is a process that links open requests with potential matches. Similar to booking seats on an airplane, seats are assigned when requests are made. As these types of connections are handled through a Kafka event stream, the initial open requests are eventually fulfilled when seats become available. According to Ben, real-world tools that facilitate collaboration (such as Google Docs and Slack) work similarly. Just like multi-player gaming systems, multiple users can comment or chat in real-time and users perceive instant responses because of the techniques ported over from the gaming world. As Ben sees it, the proliferation of these types of concepts across disciplines will also benefit a more significant number of collaborative systems. Despite being long established for gamers, these patterns can be implemented in more business applications to improve the user experience significantly. EPISODE LINKS Going Multiplayer With Kafka —Current 2022 Building a Dependable Real-Time Betting App with Confluent Cloud and Ably Event Streaming Patterns Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Apache Kafka 3.4 - New Features & Improvements 5:13

לפני 2 years5:13

5:13

Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect. In Kafka Core: KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it’s not being used. KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs. KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode clusters, enabling the migration of existing metadata from Zookeeper to KRaft. KIP-876 adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot; the default is 1 hour. KIP-881 , an extension of KIP-392, makes it so that consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing. In Kafka Streams: KIP-770 updates some Kafka Streams configs and metrics related to the record cache size. KIP-837 allows users to multicast result records to every partition of downstream sink topics and adds functionality for users to choose to drop result records without sending. And finally, for Kafka Connect: KIP-787 allows users to run MirrorMaker2 with custom implementations for the Kafka resource manager and makes it easier to integrate with your ecosystem. Tune in to learn more about the Apache Kafka 3.4 release! EPISODE LINKS See release notes for Apache Kafka 3.4 Read the blog to learn more Download Apache Kafka 3.4 and get started Watch the video version of this podcast Join the Community…

Streaming Audio: Apache Kafka® & Real-Time Data

1
How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems 50:01

לפני 2 years50:01

50:01

How can you use OpenTelemetry to gain insight into your Apache Kafka® event systems? Roman Kolesnev, Staff Customer Innovation Engineer at Confluent, is a member of the Customer Solutions & Innovation Division Labs team working to build business-critical OpenTelemetry applications so companies can see what’s happening inside their data pipelines. In this episode, Roman joins Kris to discuss tracing and monitoring in distributed systems using OpenTelemetry. He talks about how monitoring each step of the process individually is critical to discovering potential delays or bottlenecks before they happen; including keeping track of timestamps, latency information, exceptions, and other data points that could help with troubleshooting. Tracing each request and its journey to completion in Kafka gives companies access to invaluable data that provides insight into system performance and reliability. Furthermore, using this data allows engineers to quickly identify errors or anticipate potential issues before they become significant problems. With greater visibility comes better control over application health - all made possible by OpenTelemetry's unified APIs and services. As described on the OpenTelemetry.io website, "OpenTelemetry is a Cloud Native Computing Foundation incubating project. Formed through a merger of the OpenTracing and OpenCensus projects." It provides a vendor-agnostic way for developers to instrument their applications across different platforms and programming languages while adhering to standard semantic conventions so the traces/information can be streamed to compatible systems following similar specs. By leveraging OpenTelemetry, organizations can ensure their applications and systems are secure and perform optimally. It will quickly become an essential tool for large-scale organizations that need to efficiently process massive amounts of real-time data. With its ability to scale independently, robust analytics capabilities, and powerful monitoring tools, OpenTelemetry is set to become the go-to platform for stream processing in the future. Roman explains that the OpenTelemetry APIs for Kafka are still in development and unavailable for open source. The code is complete and tested but has never run in production. But if you want to learn more about the nuts and bolts, he invites you to connect with him on the Confluent Community Slack channel. You can also check out Monitoring Kafka without instrumentation with eBPF - Antón Rodríguez to learn more about a similar approach for domain monitoring. EPISODE LINKS OpenTelemetry java instrumentation OpenTelemetry collector Distributed Tracing for Kafka with OpenTelemetry Monitoring Kafka without instrumentation with eBPF Kris Jenkins' Twitter Watch the video Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
What is Data Democratization and Why is it Important? 47:27

לפני 2 years47:27

47:27

Data democratization allows everyone in an organization to have access to the data they need, and the necessary tools needed to use this data effectively. In short, data democratization enables better business decisions. In this episode, Rama Ryali, a Senior IT and Data Executive, chats with Kris Jenkins about the importance of data democratization in modern systems. Rama explains that tech has unprecedented control over data and ignores basic business needs. Tech’s influence has largely gone unchecked and has led to a disconnect that often forces businesses to hire outside vendors for help turning their data into information they can use. In his role at RightData, Rama worked closely with Marketing, Sales, Customers, and Leadership to develop a no-code unified data platform that is accessible to everyone and fosters data democratization. So what is data democracy anyway? Rama explains that data democratization is the process of making data more accessible and open to a wider audience in a unified, no-code UI. It involves making sure that data is available to people who need it, regardless of their technical expertise or background. This enables businesses to make data-driven decisions faster and reduces the costs associated with acquiring, processing, and storing information. In addition, by allowing more people access to data, organizations can better collaborate and access tools that allow them to gain valuable insights into their operations and gain a competitive edge in the marketplace. In a perfect world, complicated tools supported by SQL, Excel, etc., with static views of data, will be replaced by a UI that anyone can use to analyze real-time streaming data. Kris coined a phase, “data socialization,” which describes the way that these types of tools can enable human connections across all areas of the organization, not just tech. Rama acknowledges that Excel, SQL, and other dev-heavy platforms will never go away, but the future of data democracy will allow businesses to unlock the maximum value of data through an iterative, democratic process where people talk about what the data is, what matters to other people, and how to transmit it in a way that makes sense. EPISODE LINKS RightData LinkedIn The 5 W’s of Metadata by Rama Ryali Real-Time Machine Learning and Smarter AI with Data Streaming Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Git for Data: Managing Data like Code with lakeFS 30:42

לפני 3 years30:42

30:42

Is it possible to manage and test data like code? lakeFS is an open-source data version control tool that transforms object storage into Git-like repositories, offering teams a way to use the same workflows for code and data. In this episode, Kris sits down with guest Adi Polak, VP of DevX at Treeverse, to discuss how lakeFS can be used to facilitate better management and testing of data. At its core, lakeFS provides teams with better data management. A theoretical data engineer on a large team runs a script to delete some data, but a bug in the script accidentally deletes a lot more data than intended. Application engineers can checkout the main branch, effectively erasing their mistakes, but without a tool like lakeFS, this data engineer would be in a lot of trouble. Polak is quick to explain that lakeFS isn’t built on Git. The source code behind an application is usually a few dozen mega bytes, while lakeFS is designed to handle petabytes of data; however, it does use Git-like semantics to create and access versions so adoption is quick and simple. Another big challenge that lakeFS helps teams tackle is reproducibility. Troubleshooting when and where a corruption in the data first appeared can be a tricky task for a data engineer, when data is constantly updating. With lakeFS, engineers can refer to snapshots to see where the product was corrupted, and rollback to that exact state. lakeFS also assists teams with reprocessing of historical data. With lakeFS data can be reprocessed on an isolated branch, before merging, to ensure the reprocessed data is exposed atomically. It also makes it easier to access the different versions of reprocessed data using any tag or a historical commit ID. Tune in to hear more about the benefits of lakeFS. EPISODE LINKS Adi Polak's Twitter lakeFS Git-for-data GitHub repo What is a Merkle Tree? If Streaming Is the Answer, Why Are We Still Doing Batch? Current 2022 sessions and slides Sign up for updates on Current 2023 Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Using Kafka-Leader-Election to Improve Scalability and Performance 51:06

לפני 3 years51:06

51:06

How does leader election work in Apache Kafka®? For the past 2 ½ years, Adithya Chandra, Staff Software Engineer at Confluent, has been working on Kafka scalability and performance, specifically partition leader election. In this episode, he gives Kris Jenkins a deep dive into the power of leader election in Kafka replication, why we need it, how it works, what can go wrong, and how it's being improved. Adithya explains that you can configure a certain number of replicas to be distributed across Kafka brokers and then set one of them as the elected leader - the others become followers. This leader-based model proves efficient because clients only have to write to the leader, who handles the replication process internally. But what happens when a broker goes offline, when a replica reassignment occurs, or when a broker shuts down? Adithya explains that when these triggers occur, one of the followers becomes the elected leader, and all the other replicas take their cue from the new leader. This failover reassignment ensures that messages are replicated effectively and efficiently with multiple copies across different brokers. Adithya explains how you can select a broker as the preferred election leader. The preferred leader then becomes the new leader in failure events. This reduces latency and ensures messages consistently write to the same broker for easier tracking and debugging. Leader failover cannot cover all failures, Adithya says. If a broker can’t be reached externally but can talk to other brokers in the cluster, leader failover won’t be triggered. If a broker experiences transient disk or network issues, the leader election process might fail, and the broker will not be elected as a leader. In both cases, manual intervention is required. Leadership priority is an important feature of Confluent Cloud that allows you to prioritize certain brokers over others and specify which broker is most likely to become the leader in case of a failover. This way, we can prioritize certain brokers to ensure that the most reliable broker handles more important and sensitive replication tasks. Additionally, this feature ensures that replication remains consistent and available even in an unexpected failure event. Improvements to this component of Kafka will enable it to be applied to a wide variety of scenarios. On-call engineers can use it to mitigate single-broker performance issues while debugging. Network and storage health solutions can use it to prioritize brokers. Adithya explains that preferred leader election and leadership failover ensure data is available and consistent during failure scenarios so that Kafka replication can run smoothly and efficiently. EPISODE LINKS Data Plane: Replication Protocol Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra Watch the video Kris Jenkins’ Twitter Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Real-Time Machine Learning and Smarter AI with Data Streaming 38:56

לפני 3 years38:56

38:56

Are bad customer experiences really just data integration problems? Can real-time data streaming and machine learning be democratized in order to deliver a better customer experience? Airy, an open-source data-streaming platform, uses Apache Kafka® to help business teams deliver better results to their customers. In this episode, Airy CEO and co-founder Steffen Hoellinger explains how his company is expanding the reach of stream-processing tools and ideas beyond the world of programmers. Airy originally built Conversational AI (chatbot) software and other customer support products for companies to engage with their customers in conversational interfaces. Asynchronous messaging created a large amount of traffic, so the company adopted Kafka to ingest and process all messages & events in real time. In 2020, the co-founders decided to open source the technology, positioning Airy as an open source app framework for conversational teams at large enterprises to ingest and process conversational and customer data in real time. The decision was rooted in their belief that all bad customer experiences are really data integration problems, especially at large enterprises where data often is siloed and not accessible to machine learning models and human agents in real time. (Who hasn’t had the experience of entering customer data into an automated system, only to have the same data requested eventually by a human agent?) Airy is making data streaming universally accessible by supplying its clients with real-time data and offering integrations with standard business software. For engineering teams, Airy can reduce development time and increase the robustness of solutions they build. Data is now the cornerstone of most successful businesses, and real-time use cases are becoming more and more important. Open-source app frameworks like Airy are poised to drive massive adoption of event streaming over the years to come, across companies of all sizes, and maybe, eventually, down to consumers. EPISODE LINKS Learn how to deploy Airy Open Source - or sign up for an Airy Cloud test instance Google Case Study about Airy & TEDi, a 2,000 store retailer Become an Expert in Conversational Engineering Supercharging conversational AI with human agent feedback loops Integrating all Communication and Customer Data with Airy and Confluent How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka Real-Time Threat Detection Using Machine Learning and Apache Kafka Watch the video Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
International Podcast Day - Apache Kafka Edition | Streaming Audio Special 1:02:22

לפני 3 years1:02:22

1:02:22

What’s your favorite podcast? Would you like to find some new ones? In celebration of International Podcast Day, Kris Jenkins invites 12 experts from the Apache Kafka® community to talk about their favorite podcasts. Unlike other episodes where guests educate developers and tell stories about Kafka, its surrounding technological ecosystem, or the Cloud, this special episode provides a glimpse into what these guests have learned through listening to podcasts that you might also find interesting. Through a virtual international tour, Kris chatted with Bill Bejeck (Integration Architect, Confluent), Nikoleta Verbeck (Senior Solutions Engineer, CSID, Confluent), Ben Stopford (Lead Technologist, OCTO, Confluent), Noelle Gallagher (Video Producer, Editor), Danica Fine (Senior Developer Advocate, Confluent), Tim Berglund (VP, Developer Relations, StarTree), Ben Ford (Founder and CEO, Commando Development), Jeff Bean (Group Manager, Technical Marketing, Confluent), Domenico Fioravanti (Director of Engineering, Therapie Clinic), Francesco Tisiot (Senior Developer Advocate, Aiven), Robin Moffatt (Principal, Developer Advocate, Confluent), and Simon Aubury (Principal Data Engineer, ThoughtWorks). They share recommendations covering a wide range of topics such as building distributed systems, travel, data engineering, greek mythology, data mesh, economics, and music and the arts. EPISODE LINKS Common Apache Kafka Mistakes to Avoid Flink vs Kafka Streams/ksqlDB Why Data Mesh ft. Ben Stopford Practical Data Pipeline ft. Danica Fine What Could Go Wrong with a Kafka JDBC Connector? Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt Serverless Stream Processing with Apache Kafka ft. Bill Bejeck Scaling an Apache Kafka-Based Architecture at Therapie Clinic Event-Driven Systems and Agile Operations Real-Time Stream Processing, Monitoring, and Analytics with Apache Kafka Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
How to Build a Reactive Event Streaming App - Coding in Motion 1:26

לפני 3 years1:26

1:26

How do you build an event-driven application that can react to real-time data streams as they happen? Kris Jenkins (Senior Developer Advocate, Confluent) will be hosting another fun, hands-on programming workshop—Coding in Motion: Watching the River Flow, to demonstrate how you can build a reactive event streaming application with Apache Kafka®, ksqlDB using Python. As a developer advocate, Kris often speaks at conferences, and the presentation will be available on-demand through the organizer’s YouTube channel. The desire to read comments and be able to interact with the community motivated Kris to set up a real-time event streaming application that would notify him on his mobile phone. During the workshop, Kris will demonstrate the end-to-end process of using Python to process and stream data from YouTube’s REST API into a Kafka topic, analyze the data with ksqlDB, and then stream data out via Telegram. After the workshop, you’ll be able to use the recipe to build your own event-driven data application. EPISODE LINKS Coding in Motion: Building a Reactive Data Streaming App Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka 34:07

לפני 3 years34:07

34:07

Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, Simon shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges. Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals. Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.) On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector. Simon’s system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long. EPISODE LINKS Real-Time Wildlife Monitoring with Apache Kafka Wildlife Monitoring Github ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together Event-Driven Architecture - Common Mistakes and Valuable Lessons Motion in Motion: Building an End-to-End Motion Detection and Alerting System with Apache Kafka and ksqlDB Watch the video version of this podcast Kris Jenkins’ Twitter Learn more on Confluent Developer Use PODCAST100 to get $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Reddit Sentiment Analysis with Apache Kafka-Based Microservices 35:23

לפני 3 years35:23

35:23

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey. Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog— Data Enrichment in Existing Data Pipelines Using Confluent Cloud . The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices: A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collected An API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topic A sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDB A results-displaying service that consumes from a topic with the calculations Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it. EPISODE LINKS Data Enrichment in Existing Data Pipelines Using Confluent Cloud Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Capacity Planning Your Apache Kafka Cluster 1:01:54

לפני 3 years1:01:54

1:01:54

How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost. Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures. Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars). Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase. Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts. Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective. EPISODE LINKS Capacity Planning and Sizing for Kafka Streams Tales from the Frontline of Apache Kafka DevOps Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more on Confluent Developer Use PODCAST100 to get $100 of free Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Streaming Real-Time Sporting Analytics for World Table Tennis 34:29

לפני 3 years34:29

34:29

Reimagining a data architecture to provide real-time data flow for sporting events can be complicated, especially for organizations with as much data as World Table Tennis (WTT). Vatsan Rama (Director of IT, ITTF Group) shares why real-time data is essential in the sporting world and how his team reengineered their data system in 18 months, moving from a solely on-premises infrastructure to a cloud-native data system that uses Confluent Cloud with Apache Kafka® as its central nervous system. World Table Tennis is a business created by the International Table Tennis Federation (ITTF) to manage the official professional Table Tennis series of events and its commercial rights. World Table Tennis is also leading the sport digital transformation and commercializes its software application for real-time event scoring worldwide. Previously, ITTF scoring was processed manually with a desktop-based, on-venue results system (OVR) —an on-premises solution to process match data that calculated rankings and records, then sent event information to other systems, such as scoreboards. To provide match status in real-time, which makes the sport more engaging for fans and adds a competitive edge for players, Vatsan reengineered their OVR system to allow instant data sync between on-premises competition systems with the Cloud. The redesign started by establishing an event-driven architecture with Kafka that consolidates all legacy data sources, including records in Excel along with some handwritten forms (some dating back 90 years, even including records from the 1930 World Championship). To reduce operational overhead and maintenance, the team decided to stream data through fully managed Kafka as a service on Azure, for a scalable, distributed infrastructure. Vatsan shares that multiple table tennis events can run in parallel globally, and every time an umpire marks scores in a table, the data moves from the venue into Confluent Cloud, and then the score and rankings are sent to betting organizations and individuals on their mobile apps. EPISODE LINKS Event Processing Application Fully Managed Apache Kafka on Azure Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Real-Time Event Distribution with Data Mesh 48:59

לפני 3 years48:59

48:59

Inheriting software in the banking sector can be challenging. Perhaps the only thing harder is inheriting software built by a committee of banks. How do you keep it running, while improving it, refactoring it, and planning a bigger future for it? In this episode, Jean-Francois Garet (Technical Architect, Symphony) shares his experience at Symphony as he helps it evolve from an inherited, monolithic, single-tenant architecture to an event mesh for seamless event-streaming microservices. He talks about the journey they’ve taken so far, and the foundations they’ve laid for a modern data mesh. Symphony is the leading markets’ infrastructure and technology platform, which provides a full communication stack (chat, voice and video meetings, file and screen sharing) for the financial industry. Jean-Francois shares that its initial system was inherited from one of the founding institutions—and features the highest level of security to ensure confidentiality of business conversations, coupled with compliance with regulations covering financial transactions. However, its stacks are monolithic and single tenant. To modernize Symphony's architecture for real-time data, Jean-Francois and team have been exploring various approaches over the last four years. They started breaking down the monolith into microservices, and also made a move towards multitenancy by setting up an event mesh. However, they experienced a mix of success and failure in both attempts. To continue the evolution of the system, while maintaining business deliveries, the team started to focus on event streaming for asynchronous communications, as well as connecting the microservices for real-time data exchange. As they had prior Apache Kafka® usage in the company, the team decided to go with managed Kafka on the cloud as their streaming platform. The team has a set of principles in mind for the development of their event-streaming functionality: Isolate product domains Reach eventual consistency with event streaming Clear contracts for the event streams, for both producers and consumers Multiregion and global data sharing Jean-Francois shares that data mesh is ultimately what they are hoping to achieve with their platform—to provide governance around data and make data available as a product for self service. As of now, though, their focus is achieving real-time event streams with event mesh. EPISODE LINKS The Definitive Guide to Building a Data Mesh with Event Streams Data Mesh 101 What is Data Mesh? ft. Zhamak Dehghani Data Mesh Architecture Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Apache Kafka Security Best Practices 39:10

לפני 3 years39:10

39:10

Security is a primary consideration for any system design, and Apache Kafka® is no exception. Out of the box, Kafka has relatively little security enabled. Rajini Sivaram (Principal Engineer, Confluent, and co-author of “Kafka: The Definitive Guide” ) discusses how Kafka has gone from a system that included no security to providing an extensible and flexible platform for any business to build a secure messaging system. She shares considerations, important best practices, and features Kafka provides to help you design a secure modern data streaming system. In order to build a secure Kafka installation, you need to securely authenticate your users. Whether you are using Kerberos (SASL/GSSAPI), SASL/PLAIN, SCRAM, or OAUTH. Verifying your users can authenticate, and non-users can’t, is a primary requirement for any connected system. But authentication is only one part of the security story. We also need to address other areas. Kafka added support for fine-grained access control using ACLs with a pluggable authorizer several years ago. Over time, this was extended to support prefixed ACLs to make ACLs more manageable in large organizations. Now on its second generation authorizer, Kafka is easily extendable to support other forms of authorization, like integrating with a corporate LDAP server to provide group or role-based access control. Even if you’ve set up your system to use secure authentication and each user is authorized using a series of ACLs if the data is viewable by anyone listening, how secure is your system? That’s where encryption comes in. Using TLS Kafka can encrypt your data-in-transit. Security has gone from a nice-to-have to being a requirement of any modern-day system. Kafka has followed a similar path from zero security to having a flexible and extensible system that helps companies of any size pick the right security path for them. Be sure to also check out the newest Apache Kafka Security course on Confluent Developer for an in-depth explanation along with other recommendations. EPISODE LINKS An Introduction to Apache Kafka Security: Securing Real-Time Data Streams Kafka Security course Kafka: The Definitive Guide v2 Security Overview Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
What Could Go Wrong with a Kafka JDBC Connector? 41:10

לפני 3 years41:10

41:10

Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations. In this episode, we'll cover how a JDBC connection works, and common issues with your database connection. Why the Kafka JDBC Connector? When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture. The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you’d like to execute against the database, and then the connector will take the data into Kafka. The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don’t represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector’s supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as: Infrequent snapshot times Out-of-order events Non-incremental sequences Hard deletes To help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs, and provides sequence numbers for you to speak the language of a DBA. They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate. EPISODE LINKS Kafka Connect Deep Dive – JDBC Source Connector JDBC Source Connector: What could go wrong? Metadata parser Debezium Documentation Database Migration with Apache Kafka and Apache Kafka Connect Watch the video version of this podcast Francesco Tisiot’s Twitter Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more on Confluent Developer…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Apache Kafka Networking with Confluent Cloud 37:22

לפני 3 years37:22

37:22

Setting up a reliable cloud networking for your Apache Kafka® infrastructure can be complex. There are many factors to consider—cost, security, scalability, and availability. With immense experience building cloud-native Kafka solutions on Confluent Cloud, Justin Lee (Principal Solutions Engineer, Enterprise Solutions Engineering, Confluent) and Dennis Wittekind (Customer Success Technical Architect, Customer Success Engineering, Confluent) talk about the different networking options on Confluent Cloud, including AWS Transit Gateway, AWS, and Azure Private Link, and discuss when and why you might choose one over the other. In order to build a secure cloud-native Kafka network, you need to consider information security and compliance requirements. These requirements may vary depending on your industry, location, and regulatory environment. For example, in financial organizations, transaction data or personal identifiable information (PII) may not be accessible over the internet. In this case, your network architecture may require private networking, which means you have to choose between private endpoints or a peering connection between your infrastructure and your Kafka clusters in the cloud. What are the differences between different networking solutions? Dennis and Justin talk about some of the benefits and drawbacks of different network architectures. For example, Transit Gateways offered by AWS are often a good fit for organizations with large, disparate network architectures, while Private Link is sometimes preferred for its security benefits. We also discuss the management overhead involved in administering different network architectures. Dennis and Justin also highlight their recently launched course on Confluent Developer—the Confluent Cloud Networking course. This hands-on course covers basic networking and cloud computing concepts that will offer support for you to get a clearer picture of the configurations and collaborate with the networking teams. EPISODE LINKS Cloud Networking course Manage Networking Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Event-Driven Systems and Agile Operations 53:22

לפני 3 years53:22

53:22

How do the principles of chaotic, agile operations in the military apply to software development and event-driven systems? As a former Royal Marine, Ben Ford (Founder and CEO, Commando Development) is also a software developer, with many years of experience building event streaming architectures across financial services and startups. He shares principles that the military employs in chaotic conditions as well as how these can be applied to event-streaming and agile development. According to Ben, the operational side of the military is very emergent and reactive based on situations, like real-time, event-driven systems. Having spent the last five years researching, adapting, and applying these principles to technology leadership, he identifies a parallel in these concepts and operations ranging from DevOps to organizational architecture, and even when developing data streaming applications. One of the concepts Ben and Kris talk through is Colonel John Boyd’s OODA loop, which includes four cycles: Observe : the observation of the incoming events and information Orient : the orientation stage involves reflecting on the events and how they are applied to your current situation Decide: the decision on what is the expected path to take. Then test and identify the potential outcomes Act : the action based on the decision, while also involves testing in generating further observations This concept of feedback loop helps to put in context and quickly make the most appropriate decision while understanding that changes can be made as more data becomes available. Ben and Kris also chat through their experience of building an event system together during the early days before the release of Apache Kafka® and more. EPISODE LINKS Building Real-Time Data Systems the Hard Way Mission Ctrl Mission Command: The Doctrine of Empowerment Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Streaming Analytics and Real-Time Signal Processing with Apache Kafka 1:06:33

לפני 3 years1:06:33

1:06:33

Imagine you can process and analyze real-time event streams for intelligence to mitigate cyber threats or keep soldiers constantly alerted to risks and precautions they should take based on events. In this episode, Jeffrey Needham (Senior Solutions Engineer, Advanced Technology Group, Confluent) shares use cases on how Apache Kafka® can be used for real-time signal processing to mitigate risk before it arises. He also explains the classic Kafka transactional processing defaults and the distinction between transactional and analytic processing. Jeffrey is part of the customer solutions and innovations division (CSID), which involves designing event streaming platforms and innovations to improve productivity for organizations by pushing the envelope of Kafka for real-time signal processing. What is signal intelligence? Jeffrey explains that it's not always affiliated with the military. Signal processing improves your operational or situational awareness by understanding the petabyte datasets of clickstream data, or the telemetry coming in from sensors, which could be the satellite or sensor arrays along a water pipeline. That is, bringing in event data from external sources to analyze, and then finding the pattern in the series of events to make informed decisions. Conventional On-Line Analytical Processing (OLAP) or data warehouse platforms evolved out of the transaction processing model. However, when analytics or even AI processing is applied to any data set, these algorithms never look at a single column or row, but look for patterns within millions of rows of transactionally derived data. Transaction-centric solutions are designed to update and delete specific rows and columns in an “ACID” compliant manner, which makes them inefficient and usually unaffordable at scale because this capability is less critical when the analytic goal is to look for a pattern within millions or even billions of these rows. Kafka was designed as a step forward from classic transaction processing technologies, which can also be configured in a way that’s optimized for signal processing high velocities of noisy or jittery data streams, in order to make sense, in real-time, of a dynamic, non-transactional environment. With its immutable, write-append commit logs, Kafka functions as a flight data recorder, which remains resilient even when network communications, or COMMs, are poor or nonexistent. Jeffrey shares the disconnected edge project he has been working on—smart soldier, which runs Kafka on a Raspberry Pi and x64-based handhelds. These devices are ergonomically integrated on each squad member to provide real-time visibility into the soldiers’ activities or situations. COMMs permitting, the topic data is then mirrored upstream and aggregated at multiple tiers—mobile command post, battalion, HQ—to provide ever-increasing views of the entire battlefield, or whatever the sensor array is monitoring, including the all important supply chain. Jeffrey also shares a couple of other use cases on how Kafka can be used for signal intelligence, including cybersecurity and protecting national critical infrastructure. EPISODE LINKS Using Kafka for Analytic Processing Watch the video version of this podcast Streaming Audio Playlist Learn more on Confluent Developer Use PODCAST100 to get $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Blockchain Data Integration with Apache Kafka 50:59

לפני 3 years50:59

50:59

How is Apache Kafka® relevant to blockchain technology and cryptocurrency? Fotios Filacouris (Staff Solutions Engineer, Confluent) has been working with Kafka for close to five years, primarily designing architectural solutions for financial services, he also has expertise in the blockchain. In this episode, he joins Kris to discuss how blockchain and Kafka are complementary, and he also highlights some of the use cases he has seen emerging that use Kafka in conjunction with traditional, distributed ledger technology (DLT) as well as blockchain technologies. According to Fotios, Kafka and the notion of blockchain share many traits, such as immutability, replication, distribution, and the decoupling of applications. This complementary relationship means that they can function well together if you are looking to extend the functionality of a given DLT through sidechain or off-chain activities, such as analytics, integrations with traditional enterprise systems, or even the integration of certain chains and ledgers. Based on Fotios’ observations, Kafka has become an essential piece of the puzzle in many blockchain-related use cases, including settlement, logging, analytics and risk, and volatility calculations. For example, a bitcoin trading application may use Kafka Streams to provide analytics on top of the price action of various crypto assets. Fotios has also seen use cases where a crypto platform leverages Kafka as its infrastructure layer for real-time logging and analytics. EPISODE LINKS Modernizing Banking Architectures with Apache Kafka New Kids On the Bloq Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Automating Multi-Cloud Apache Kafka Cluster Rollouts 48:29

לפני 3 years48:29

48:29

To ensure safe and efficient deployment of Apache Kafka® clusters across multiple cloud providers, Confluent rolled out a large scale cluster management solution. Rashmi Prabhu (Staff Software Engineer & Eng Manager, Fleet Management Platform, Confluent) and her team have been building the Fleet Management Platform for Confluent Cloud. In this episode, she delves into what Fleet Management is, and how the cluster management service streamlines Kafka operations in the cloud while providing a seamless developer experience. When it comes to performing operations at large scale on the cloud, manual processes work well if the scenario involves only a handful of clusters. However, as a business grows, a cloud footprint may potentially scale 10x, and will require upgrades to a significantly larger cluster fleet.d. Additionally, the process should be automated, in order to accelerate feature releases while ensuring safe and mature operations. Fleet Management lets you manage and automate software rollouts and relevant cloud operations within the Kafka ecosystem at scale—including cloud-native Kafka, ksqlDB, Kafka Connect, Schema Registry, and other cloud-native microservices. The automation service can consistently operate applications across multiple teams, and can also manage Kubernetes infrastructure at scale. The existing Fleet Management stack can successfully handle thousands of concurrent upgrades in the Confluent ecosystem. When building out the Fleet Management Platform, Rashmi and the team kept these key considerations in mind: Rollout Controls and DevX: Wide deployment and distribution of changes across the fleet of target assets; improved developer experience for ease of use, with rollout strategy support, deployment policies, a dynamic control workflow, and manual approval support on an as-needed basis. Safety: Built-in features where security and safety of the fleet are the priority with access control, and audits on operations: There is active monitoring and paced rollouts, as well as automated pauses and resumes to reduce the time to react upon failure. There’s also an error threshold, and controls to allow a healthy balance of risk vs. pace. Visibility: A close to real time, wide-angle view of the fleet state, along with insights into workflow progress, historical operations on the clusters, live notification on workflows, drift detection across assets, and so much more. EPISODE LINKS Optimize Fleet Management Software Engineer - Fleet Management Watch the video version of this podcast Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more with Kafka tutorials, resources, and guides at Confluent Developer Live demo: Intro to Event-Driven Microservices with Confluent Use PODCAST100 to get an additional $100 of free Confluent Cloud usage ( details )…

Streaming Audio: Apache Kafka® & Real-Time Data

1
Common Apache Kafka Mistakes to Avoid 1:09:43

לפני 3 years1:09:43

1:09:43

What are some of the common mistakes that you have seen with Apache Kafka® record production and consumption? Nikoleta Verbeck (Principal Solutions Architect at Professional Services, Confluent) has a role that specifically tasks her with performance tuning as well as troubleshooting Kafka installations of all kinds. Based on her field experience, she put together a comprehensive list of common issues with recommendations for building, maintaining, and improving Kafka systems that are applicable across use cases. Kris and Nikoleta begin by discussing the fact that it is common for those migrating to Kafka from other message brokers to implement too many producers, rather than the one per service. Kafka is thread safe and one producer instance can talk to multiple topics, unlike with traditional message brokers, where you may tend to use a client per topic. Monitoring is an unabashed good in any Kafka system. Nikoleta notes that it is better to monitor from the start of your installation as thoroughly as possible, even if you don't think you ultimately will require so much detail, because it will pay off in the long run. A major advantage of monitoring is that it lets you predict your potential resource growth in a more orderly fashion, as well as helps you to use your current resources more efficiently. Nikoleta mentions the many dashboards that have been built out by her team to accommodate leading monitoring platforms such as Prometheus, Grafana, New Relic, Datadog, and Splunk. They also discuss a number of useful elements that are optional in Kafka so people tend to be unaware of them. Compression is the first of these, and Nikoleta absolutely recommends that you enable it. Another is producer callbacks, which you can use to catch exceptions. A third is setting a `ConsumerRebalanceListener`, which notifies you about rebalancing events, letting you prepare for any issues that may result from them. Other topics covered in the episode are batching and the `linger.ms` Kafka producer setting, how to figure out your units of scale, and the metrics tool Trogdor. EPISODE LINKS 5 Common Pitfalls when Using Apache Kafka Kafka Internals course linger.ms producer configs. Fault Injection—Trogdor From Apache Kafka to Performance in Confluent Cloud Kafka Compression Interface ConsumerRebalanceListener Watch the video version of this podcast Nikoleta Verbeck’s Twitter Kris Jenkins’ Twitter Streaming Audio Playlist Join the Confluent Community Learn more on Confluent Developer Use PODCAST100 to get $100 of free Confluent Cloud usage ( details )…

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.