Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge #1

Merged
merged 2 commits into from
Mar 26, 2018
Merged

Merge #1

merged 2 commits into from
Mar 26, 2018

Conversation

VictorZeng
Copy link
Member

No description provided.

Robert Casey added 2 commits January 7, 2018 14:11
**New functionality:**
  - (Experimental) Implemented Java-based in-memory data storage which provides some new functionality.  Right now this is lightly implemented into the front-end, but over time will replace the current implementation.  This allowed me to implement some new functionality:
  - Report if consumer-group is currently active
    - This will eventually allow us to report on inactive consumer-groups
  - (Experimental) Report Burrow-like consumer-group status calculation via REST endpoint (/consumergroup), while updating Burrows rules a bit. The rules I implemnted here are:
    - Evaluate per consumer-group topic-partition:
      - Rule 0:  If there are no committed offsets, then there is nothing to calculate and the period is OK.
      - Rule 1:  If the difference between now and the last offset timestamp is greater than the difference between the last and first offset timestamps, the consumer has stopped committing offsets for that partition (error)
      - Rule 2:  If the consumer offset decreases from one interval to the next the partition is marked as a rewind (error)
      - Rule 3:  If over the stored period, the lag is ever zero for the partition, the period is OK
      - Rule 4:  If the consumer offset does not change, and the lag is non-zero, it's an error (partition is stalled)
      - Rule 5:  If the consumer offsets are moving, but the lag is consistently increasing, it's a warning (consumer is slow)
    - Roll-up all consumer-group topic-partitions per consumer-group and report a consumer-group status:
      - Set consumer-group status to ERROR if any topic-partition status is STOP
      - Set consumer-group status to ERROR if any topic-partition status is REWIND
      - Set consumer-group status to ERROR if any topic-partition status is STALL
      - Set consumer-group status to WARN if any topic-partition status is WARN
      - Set consumer-group status to OK if none of the above rules match

**Of course some of the bugs you were seeing were fixed as well:**
  - Synchronizing around all SQLite DB activity.  SQLite only allows one operation at a time with the DB file.
  - This fixed all DB create/update/delete issues at the expense of sometimes blocking DB operations while another DB operation is taking place. This is unavoidable using SQLite. Long term fix will be to replace SQLite with a more appropriate DB engine.
  - Fixed an issue where LogEndOffset and Lag can display incorrect values.
  - Added retry logic around building the ZkUtils object. This fixed the issue where we would not re-connect to Zookeeper if the zk service went down and then was restored.
  - Updated some dependency versions.
@VictorZeng VictorZeng merged commit 4721c8c into DarkPhoenixs:master Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant