Capacity Planning Your Apache Kafka Cluster

Streaming Audio: Apache Kafka® & Real-Time Data

Streaming Audio: Apache Kafka® & Real-Time Data

Player FM - Internet Radio Done Right

32 subscribers

اضافه شده در six سال پیش

תוכן מסופק על ידי Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

1
TED Tech

בטל רישום

לפני 1 dayלפני 1d ago

בטל רישום

שבועי

From the construction of virtual realities to the internet of things to the watches on our wrists—technology's influence is everywhere. Its role in our lives is evolving fast, and we're faced with riveting questions and tough challenges that sit at the intersection of technology and humanity. Listen in every Friday , with host, journalist Sherrell Dorsey , as TED speakers explore the way tech shapes how we think about society, science, design, business, and more. Follow Sherrell on Instagram @sherrell_dorsey and on LinkedIn @sherrelldorsey Hosted on Acast. See acast.com/privacy for more information.

לפני 3 שנים 1:01:54

MP3•בית הפרקים

How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance?

When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.

Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.

Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).

Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.

Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.

Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective.

EPISODE LINKS

פרקים

1. Intro (00:00:00)

2. Kafka Cluster Capacity Planning—where to begin (00:10:12)

3. Put it in the cloud (00:16:55)

4. Three is the magic number: Kafka leader and 2 followers (00:18:17)

5. Multi-data center architectures (00:23:09)

6. Extensive testing is necessary (00:25:21)

7. Kafka message volumes (00:28:46)

8. Kafka Connect sink connectors (00:36:56)

9. Kafka Streams vs ksqlDB for DevOps (00:51:08)

10. It's a wrap! (00:59:56)

265 פרקים

#Tech #Tech News #News #Confluent #Event Stream Processing #Data #Event Driven Architecture #Open Source #Data In Motion #Kafka Cloud Native #Data Mesh #Data Pipeline #Serverless Kafka #Podcasting Education #Confluent, original creators of Apache Kafka® #original creators of Apache Kafka® #Apache Kafka® #Cloud IT #Real Time

Streaming Audio: Apache Kafka® & Real-Time Data