How do I optimize Kafka Consumer and should I use Apache Spark?

Onkar Madli

7/24/23, 6:37 AM

I have written a Kafka consumer to consume encrypted data(~1MB) stream and decrypt them before adding them to the S3 bucket. It takes ~20mins to process 1000 records, and if I remove the encryption logic and run the same, it takes less than 3mins to process 1000records.

The following are the configs I am currently using.

    allow.auto.create.topics = true
    auto.commit.interval.ms = 5000
    auto.offset.reset = latest
    check.crcs = true
    client.dns.lookup = use_all_dns_ips
    connections.max.idle.ms = 540000
    default.api.timeout.ms = 60000
    enable.auto.commit = true
    exclude.internal.topics = true
    fetch.max.bytes = 52428800
    fetch.max.wait.ms = 500
    fetch.min.bytes = 1
    group.instance.id = null
    heartbeat.interval.ms = 3000
    interceptor.classes = []
    internal.leave.group.on.close = true
    internal.throw.on.fetch.stable.offset.unsupported = false
    isolation.level = read_uncommitted
    key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
    max.partition.fetch.bytes = 1000000
    max.poll.interval.ms = 300000
    max.poll.records = 500
    metadata.max.age.ms = 300000
    metric.reporters = []
    metrics.num.samples = 2
    metrics.recording.level = INFO
    metrics.sample.window.ms = 30000
    partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
    receive.buffer.bytes = 655360
    reconnect.backoff.max.ms = 1000
    reconnect.backoff.ms = 50
    request.timeout.ms = 30000
    retry.backoff.ms = 100
    send.buffer.bytes = 131072
    session.timeout.ms = 10000
    value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

The topic has 10partitions. I consumed with multiple consumers(1-10) by assigning them to the same consumer group. But no matter how many consumers I use, it consumed the same amount of data in the given time.

How do I make consumers faster? And can Apache Spark help in this?

0 + 0

kafka

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How do I optimize Kafka Consumer and should I use Apache Spark?

TH: ฉันจะเพิ่มประสิทธิภาพ Kafka Consumer ได้อย่างไร และฉันควรใช้ Apache Spark อย่างไร

RO: Cum optimizez Kafka Consumer și ar trebui să folosesc Apache Spark?

RU: Как оптимизировать Kafka Consumer и стоит ли использовать Apache Spark?

VI: Làm cách nào để tối ưu hóa Kafka Consumer và tôi có nên sử dụng Apache Spark không?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.