Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Jan 29, 2020

This pull request is a first step to add instrumentation to the CacheDirectory added in #50693.

It adds a new mutable IndexInputStats object that allows to track various information about how CacheBufferedIndexInput interacts with the underlying cache file to satisfy the read operations. It keep tracks of small/large forward/backward seekings as well as the total number of bytes read from the IndexInput and the total number of bytes read/written from/to the CacheFile.

Note that the stats do not reflect the exact usage that Lucene does of the IndexInputs opened through a CacheDirectory: IndexInputStats is not aware of the read operations that are directly served at a higher level by the internal BufferedIndexInput's buffer. Instead it tracks what really hit the disk which is, I think, what is the most interesting for us at this stage.

This pull request does not expose the information through a REST API, this will be done in a follow up pull request once the low level instrumentation is validated.

@tlrx tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 29, 2020
@tlrx tlrx requested a review from DaveCTurner January 29, 2020 17:19
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I like the simplicity of tracking these stats directly within the CacheDirectory. I left a few comments and thoughts.

@tlrx
Copy link
Member Author

tlrx commented Jan 31, 2020

Thanks @DaveCTurner, I've applied your feedback and pushed some changes. I'm sure you have more comments, can you please have another look? Thanks!

@tlrx tlrx requested a review from DaveCTurner January 31, 2020 13:22
final String fileName = cacheFileReference.getFileName();
final byte[] copyBuffer = new byte[Math.toIntExact(Math.min(COPY_BUFFER_SIZE, end - start))];
try (IndexInput input = in.openInput(cacheFileReference.getFileName(), ioContext)) {
logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] of file [{}] to cache file", start, end, fileName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this will not include the shard ID so we might struggle to interpret the logs when there are lots of shards to search. I think logging cacheFileReference itself gives us everything we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I pushed 9de8453

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request for a bit more logging detail, otherwise LGTM.

@tlrx tlrx requested a review from DaveCTurner January 31, 2020 13:58
private int readDirectly(long start, long end, byte[] buffer, int offset) throws IOException {
final String fileName = cacheFileReference.getFileName();
final byte[] copyBuffer = new byte[Math.toIntExact(Math.min(COPY_BUFFER_SIZE, end - start))];
logger.trace(() -> new ParameterizedMessage("direct reading of range [{}-{}] from file [{}]", start, end, fileName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, here too :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tlrx tlrx requested a review from DaveCTurner January 31, 2020 14:14
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@tlrx
Copy link
Member Author

tlrx commented Jan 31, 2020

@elasticmachine run elasticsearch-ci/2

A CCR test failed previously with what seems a connection issue

@tlrx tlrx merged commit 229b953 into elastic:feature/searchable-snapshots Jan 31, 2020
@tlrx tlrx deleted the add-instrumentation-step-1 branch January 31, 2020 16:04
@tlrx
Copy link
Member Author

tlrx commented Jan 31, 2020

Thanks David!

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 4, 2020
This commit builds on elastic#51637, adding tracking of the total time spent fetching
data from the blob store and a linear regression model for these fetches.
tlrx added a commit that referenced this pull request Feb 6, 2020
This commit adds a REST API that exposes the various CacheDirectory 
stats added in #51637. It adds the necessary action, transport action 
and request and response objects as well as a new qa:rest project for 
REST tests. The REST endpoint is _searchable_snapshots/stats and 
can be filtered by index names.
DaveCTurner added a commit that referenced this pull request Feb 24, 2020
)

This commit builds on #51637, adding tracking of the total time spent fetching
data from the blob store.

Relates #50999.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants