I have written a Kafka consumer to consume encrypted data(~1MB) stream and decrypt them before adding them to the S3 bucket. It takes ~20mins to process 1000 records, and if I remove the encryption logic and run the same, it takes less than 3mins to process 1000records.
The following are the configs I am currently using.
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
check.crcs = true
client.dns.lookup = use_all_dns_ips
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1000000
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 655360
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
send.buffer.bytes = 131072
session.timeout.ms = 10000
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
The topic has 10partitions. I consumed with multiple consumers(1-10) by assigning them to the same consumer group. But no matter how many consumers I use, it consumed the same amount of data in the given time.
How do I make consumers faster? And can Apache Spark help in this?