What are helpful metrics to troubleshoot Kafka connection issues?

Plotting the following metrics (with fetch-latency-avg and fetch-latency-max as the time period) would be helpful in identifying issues relating to unexpected disconnections by your consumer:

  • fetch-latency-avg: The average time taken for a fetch request.
  • fetch-latency-max: The max time taken for a fetch request.
  • fetch-rate: The number of fetch requests per second.
  • fetch-size-avg: The average number of bytes fetched per request.
  • fetch-size-max: The maximum number of bytes fetched per request.
  • fetch-throttle-time-avg: The average throttle time in ms. When quotas are enabled, the broker may delay fetch requests in order to throttle a consumer which has exceeded its limit. This metric indicates how throttling time has been added to fetch requests on average.
  • fetch-throttle-time-max: The maximum throttle time in ms.

Here is a resource that further details Kafka consumer metrics.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.