Are there "optimal" kafka configuration settings that you, or your customers, have found to improved performance?
It depends on how fast you are processing the data, max.poll.records, max.partition.fetch.bytes and max.poll.interval.ms are all related. You can experiment with these two settings to improve performance:
- max.poll.records=5000 (default=500)
o As you adjust max.poll.records to a larger value, your kafka consumer should be able to commit offset within the max.poll.interval.ms value. If your kafka consumer does not “keep up,” then you will get kafka consumer warning message.
- fetch.max.bytes=52428800*4 (default=1)
- max.partition.fetch.bytes=5242880 (default=1048576)
We have also included recommended base configurations in the Python and Java SDK (link to the config files). Please tune the following configurations to your system, if needed, based on this article here.