Understanding Kafka Producer and its Configurations

Src: https://kafka.apache.org/

Before diving into the producer configurations, let us understand how the producer sends the data to Kafka. Kafka stores the messages in the topic, which are further broken into partitions. Producers send the messages to the topic with keys. If there is no key with the value, it will follow a round-robin process to select the partition for storage. The data can be sent synchronously or asynchronously by Producers.

By default, data is sent asynchronously. In it, when send() is called, messages are not actually sent, instead they are collected in the buffer and sent later. It improves the throughput but affects latency, as instead of sending data one at a time, a batch of messages is sent.

buffer.memory: Size of a buffer, which collects the unsent messages for a partition. Every partition has its own buffer.

queue.buffering.max.messages: Number of unsent messages that can be queued.

batch.size: When data reaches the batch size it is sent, until that data will be grouped together. Used when speed of data generated is faster than the speed at which data is sent out. Default value is 16384 bytes.

linger.ms: Property set in the producer to delay of sending the batched data. There can be a time when data is produced less and sending data immediately will decrease throughput. To increase throughput, delay is added while sending data using linger.ms. By default set to 0.

**** Records will be sent when either batch size is reached or linger time. Even with default value of 0, records are batched as there are always delay in sending the request. It also depends on the partitioning strategy. ****

ack: Property tells when to send acknowledgement of data received. When value is 0, producers does not wait for ack to send another data. With value 0, it received the ack for the partition leader. If set to “all”, ack will be sent when all in sync replicas of leader received the data. It is also called message durability.

**** Durability is achieved through replication. If acks set to 0 or 1, the acknowledgement can be sent to producer before replicating data. It can be possible that leader goes down, resulting in loss of data. By setting acks = all data durability is maintained. ****

compression.type: Data produced by producer is not compressed by default (value = none). The supported compression type by kafka are gzip, snappy and lz4.

request.timeout.ms: Time for which client will wait for the response towards the request. By default value is 30s. If response is not received it will try again for n (value of retries) number of time, until response is received or reaches the retry limit.

retries: No of time producer will retry to send the message incase of failure ack received. Default value 2147483647.

max.in.flight.requests.per.connection: This property is main and can affect the ordering of messages in the partition/log. It represents the number of requests or messages that can be sent without receiving any acknowledgement from the broker. If retries are greater than 0, there can be a possibility of a change in orders of the records in the log. Suppose the value is set to three, and we sent three batches to the same partition. The first batch failed to write and the other batches were written successfully. The producer will retry to send the first batch again, resulting in the appearance of the first batch at the end of the log (after the second and third batch).

enable.idempotence: default value false. Should set to true if retries value > 0. There can be a case when the message was written successfully but the acknowledgement is not received. In that case, the producer will try again, resulting in a duplicate record. If enable.idempotence is set to true, a unique id will be attached to the message and if it retries broker will check if the message with id is already written or not.

These are the main configurations of Producers to tune the performance of Kafka according to our application. By default, Kafka is configured to provide high availability and low latency. Please like it, if you find it useful. If you have any question or query, write them down in comment section.

◾Software Developer ◾Medium Writer. https://web.cs.dal.ca/~ruminder

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store