-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds optional per table metrics #5030
base: main
Are you sure you want to change the base?
Conversation
For a subset of metrics in the tablet server and scan server adds optional tableId tags to meters. In a follow on change the compactor could be updated to emit per table metrics, however its current code is very process oriented and this change should be in its own commit. Each server process will automatically remove meters for tables that were delete or related to tables it has not been servicing in a while.
* currently have no table metrics object in the cache. It will also remove an per table metrics | ||
* object from the cache that have been inactive for a while or where the table was deleted. | ||
*/ | ||
public synchronized void refresh() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if it might be better to evaluate whether per table metrics should be added or removed when a tablet is hosted or unhosted in the ScanServer and TabletServer. There are explicit mechanisms in the TabletServer for hosting and unhosting tablets. In the ScanServer we have the TabletMetadataLoader for hosting a tablet, and we could add an evictionListener to the tabletMetadataCache to handle a tablet removal. Thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably would be better to do that for all cases. Its only being done for the tablet server on tablet load for this method. The three other cases you mentioned are not done.
For the scan server I could not find a good place to register on tablet load, I will circle back and see what I can find. For now its probably ok that scan server does not register on load because it has no gauges, so when a scan happen it will touch meters which will load metrics. However that is shaky ground, if gauges were ever used then those may not be loaded until the timer task kicks in. Would also be good to push code to TabletHostingServer so that the metrics code can interact w/ the same code for each server type.
If all 4 cases are covered with callbacks then we could run the timer task less frequently. For the unload case I was completely leaving that to the timer task to catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes related to this in afa401f. Was able to optimize and centralize the code for detecting changes in the set of table ids. Using those changes could efficiently handle a tablet being loaded and detect if anything needed to be done. However for the case of a tablet being unloaded found that is hard to handle that efficiently because when one tablet is unloaded other tablet may still have that same tablet id, so need to scan all tablets on each tablet unload to see if anything needs to be done. Decided not to do anything for this case and leave it to the periodic timer task. Was able to centralize that timer task and make it more efficient though.
The static analysis checks in the build had a really good find. Found a bug where I forgot check the future for the scheduled task.
|
return perTableMetrics.computeIfAbsent(tableId, tid -> { | ||
List<Meter> meters = new ArrayList<>(); | ||
T tableMetrics = newPerTableMetrics(registry, tableId, meters::add, | ||
List.of(Tag.of(TABLE_ID_TAG_NAME, tid.canonical()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it might be more useful that have a tableName tag and use the table name instead of the tableId.
@@ -169,7 +169,7 @@ | |||
<dependency> | |||
<groupId>io.micrometer</groupId> | |||
<artifactId>micrometer-bom</artifactId> | |||
<version>1.12.2</version> | |||
<version>1.13.6</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI that some things will have to change down in the accumulo-testing Terraform contrib code when this is merged due to the version change.
@@ -63,8 +81,55 @@ private long getTotalEntriesWritten() { | |||
return FileCompactor.getTotalEntriesWritten(); | |||
} | |||
|
|||
public static class TableMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some digging to see if Micrometer had any support for dynamic tags. I found micrometer-metrics/micrometer#4097, which essentially allows you to create templates for Meters, then these get registered when you supply the tags. I'm wondering if you had seen this, and if not, if it would change your implementation here. I'm thinking we could create the templates (MeterProvider in Micrometer) when the servers start up, then just apply and remove based on the table id tags.
For a subset of metrics in the tablet server and scan server adds optional tableId tags to meters. In a follow on change the compactor could be updated to emit per table metrics, however its current code is very process oriented and this change should be in its own commit.
Each server process will automatically remove meters for tables that were delete or related to tables it has not been servicing in a while.
closes #4511