Improve Snapshot Logging by joshua-adams-1 · Pull Request #137470 · elastic/elasticsearch

joshua-adams-1 · 2025-10-31T16:18:43Z

Clean up the reported logging start/end milliseconds into human-readable
string dates

Relates to: ES-12794

Clean up the reported logging start/end milliseconds into human-readable string dates Relates to: ES-12794

joshua-adams-1 · 2025-10-31T16:20:35Z

@DiannaHohensee I am assigning you as the reviewer since you were the one to create the ticket. The acceptance criteria was a bit vague - is this the only instances of snapshot logging you wanted improved regarding human readable dates? I searched through the snapshots/ repo and found these two files, but it's highly likely I've missed some!

As a note, I will push separate PRs to address the other two issues from the ticket:

Stop logging when there are no snapshots left
Add a log message indicating that all snapshots have finished or been paused

…shua-adams-1/elasticsearch into snapshot-logging-improvements

elasticsearchmachine · 2025-11-03T12:16:48Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

burqen

It's great to have more human friendly logs! And I do have some comments.

burqen · 2025-11-05T10:47:08Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-                    "[{}] Writing [{}] of size [{}b] to [{}] took [{}ms]",
+                    "[{}] Writing [{}] of size [{}b] to [{}]. Started at [{}], ended at [{}] and took [{}ms]",


When calculating the human readable form for start and end time we risk those values being different from the timestamps provided by the logger, since that might be configured with a different ZoneId (log4j can be configured by the user). This will be confusing for user. A safer way would be to only print the elapsed time, but change it to human readable form.

This could be done using using a TimeValue
Example here: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/ClusterStateObserver.java#L143

I tried to figure out if there is a way to get the zone id from log4j but it doesn't seem so, at least according to ChatGPT.

Good catch! Coming back to this PR, I don't think ES-12794 was intended to modify this log message (it was entirely intended to be within the SnapshotShutdownProgressTracker). However, there is definite value in using the TimeValue class to improve the logging format, and so I will incorporate that in the next revision

burqen · 2025-11-05T12:16:39Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java

-                Node shutdown cluster state update received at [{} millis]. \
-                Finished signalling shard snapshots to pause at [{} millis]. \
+                Node shutdown cluster state update received at [{}] [{} millis]. \
+                Finished signalling shard snapshots to pause at [{}] [{} millis]. \


The same comment about converting to clock time outside of the logger applies here.

In this case the value of wall clock time seems higher than in the write blob case, since we are reporting on an historic event rather than on something that just happened. Maybe it's worth it in this case?

burqen · 2025-11-05T12:20:33Z

server/src/main/java/org/elasticsearch/common/date/DateUtils.java

+    public static String convertMillisToDateTime(long millis) {
+        DateTimeFormatter formatter = DateTimeFormatter.ISO_LOCAL_DATE_TIME.withZone(ZoneId.of("UTC"));
+        return formatter.format(Instant.ofEpochMilli(millis));
+    }


If we need to do this conversion to wall clock time (see my other comments), I think we should try to use the formatters in org.elasticsearch.common.time rather than using Java's DateTimeFormatter directly. Especially since we get millis from a org.elasticsearch.common.time.TimeProvider (ThreadPool).

There is a DateUtils class already that I think would be a good fit, rather than creating a new one, see org.elasticsearch.common.time.DateUtils.

Good idea - have removed the unnecessary util method and inlined a DateFormatter

burqen · 2025-11-05T12:21:44Z

server/src/test/java/org/elasticsearch/common/date/DateUtilsTests.java

+import java.time.ZoneId;
+import java.time.format.DateTimeFormatter;
+
+public class DateUtilsTests extends ESTestCase {


Nice test coverage 👍

... and it's gone 🤣

…shua-adams-1/elasticsearch into snapshot-logging-improvements

joshua-adams-1 · 2025-11-07T13:35:07Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                blobStoreSnapshotMetrics.incrementCountersForPartUpload(partBytes, uploadTimeInMillis);
                logger.trace(
-                    "[{}] Writing [{}] of size [{}b] to [{}] took [{}ms]",
+                    "[{}] Writing [{}] of size [{}b] to [{}] took [{}/{}ms]",


Improving the log. Example: ...took [1.4s/1400ms]

burqen

This is a lot cleaner I think. Also, probably a reasonable middle way to use UTC for the date formatting. Approving 👍

joshua-adams-1 · 2025-11-10T09:41:46Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java

    );

    private static final Logger logger = LogManager.getLogger(SnapshotShutdownProgressTracker.class);
+    private static final DateFormatter DATE_TIME_FORMATTER = DateFormatter.forPattern("strict_date_optional_time");


Using the same approach as here

I haven't delved into the possibilities, but looking at the superficial outcome a log message from the testing you added looks like

[2025-11-11T02:25:35,945][INFO ][o.e.s.SnapshotShutdownProgressTracker][testTrackerPauseTimestamp] Current active shard snapshot stats on data node [local-node-id-for-test]. Node shutdown cluster state update received at [1970-01-01T00:00:00.000Z]. Finished signalling shard snapshots to pause at [1970-01-01T00:00:00.000Z]. Time between the node shutdown cluster state update and signalling shard snapshots to pause is [0 millis]Number shard snapshots running [0]. Number shard snapshots waiting for master node reply to status update request [0] Shard snapshot completion stats since shutdown began: Done [0]; Failed [0]; Aborted [0]; Paused [0]

Specifically, [1970-01-01T00:00:00.000Z]. Is there any way to get a UTC label on this so we know (without looking at the code) what timezone is being reported?

N.B I used DateFormatter.formatMillis() here which uses UTC as a default

N.B The Z at the end is actually the result of ZoneOffset.UTC.toString(). However, this isn't very user friendly so I have changed the log message to read:

... Node shutdown cluster state update received at [1970-01-01T00:00:00.000Z UTC] ...

Works for me, thanks 👍

DiannaHohensee

Looks good to me, just a nit and wondering if we can tack on UTC without too much trouble.

Thanks, Anton, for doing the review heavy lifting.

DiannaHohensee · 2025-11-11T02:32:59Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java

-                Finished signalling shard snapshots to pause at [{} millis]. \
+                Node shutdown cluster state update received at [{}]. \
+                Finished signalling shard snapshots to pause at [{}]. \
+                Time between the node shutdown cluster state update and signalling shard snapshots to pause is [{} millis]\


Missing a period and a space, . , at the end of this line, looks like this

... Time between the node shutdown cluster state update and signalling shard snapshots to pause is [0 millis]Number shard snapshots running [0]. ...

DiannaHohensee · 2025-11-11T02:53:47Z

server/src/main/java/org/elasticsearch/snapshots/SnapshotShutdownProgressTracker.java

    );

    private static final Logger logger = LogManager.getLogger(SnapshotShutdownProgressTracker.class);
+    private static final DateFormatter DATE_TIME_FORMATTER = DateFormatter.forPattern("strict_date_optional_time");


I haven't delved into the possibilities, but looking at the superficial outcome a log message from the testing you added looks like

[2025-11-11T02:25:35,945][INFO ][o.e.s.SnapshotShutdownProgressTracker][testTrackerPauseTimestamp] Current active shard snapshot stats on data node [local-node-id-for-test]. Node shutdown cluster state update received at [1970-01-01T00:00:00.000Z]. Finished signalling shard snapshots to pause at [1970-01-01T00:00:00.000Z]. Time between the node shutdown cluster state update and signalling shard snapshots to pause is [0 millis]Number shard snapshots running [0]. Number shard snapshots waiting for master node reply to status update request [0] Shard snapshot completion stats since shutdown began: Done [0]; Failed [0]; Aborted [0]; Paused [0]

Specifically, [1970-01-01T00:00:00.000Z]. Is there any way to get a UTC label on this so we know (without looking at the code) what timezone is being reported?

…shua-adams-1/elasticsearch into snapshot-logging-improvements

…-json * upstream/main: (158 commits) Cleanup files from repo root folder (elastic#138030) Implement OpenShift AI integration for chat completion, embeddings, and reranking (elastic#136624) Optimize AsyncSearchErrorTraceIT to avoid failures (elastic#137716) Removes support for null TransportService in RemoteClusterService (elastic#137939) Mute org.elasticsearch.index.mapper.DateFieldMapperTests testSortShortcuts elastic#138018 rest-api-spec: fix type of enums (elastic#137521) Update Gradle wrapper to 9.2.0 (elastic#136155) Add RCS Strong Verification Documentation (elastic#137822) Use docvalue skippers on dimension fields (elastic#137029) Introduce INDEX_SHARD_COUNT_FORMAT (elastic#137210) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesChatCompletion_AndThenCreatesTextEmbedding elastic#138012 Fix ES|QL search context creation to use correct results type (elastic#137994) Improve Snapshot Logging (elastic#137470) Support extra output field in TOP function (elastic#135434) Remove NumericDoubleValues class (elastic#137884) [ML] Fix ML calendar event update scalability issues (elastic#136886) Task may be unregistered outside of the trace context in exceptional cases. (elastic#137865) Refine workaround for S3 repo analysis known issue (elastic#138000) Additional DEBUG logging on authc failures (elastic#137941) Cleanup index resolution (elastic#137867) ...

Improve Snapshot Logging

cb318cb

Clean up the reported logging start/end milliseconds into human-readable string dates Relates to: ES-12794

joshua-adams-1 requested a review from DiannaHohensee October 31, 2025 16:18

joshua-adams-1 self-assigned this Oct 31, 2025

joshua-adams-1 added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Oct 31, 2025

elasticsearchmachine added the v9.3.0 label Oct 31, 2025

elasticsearchmachine and others added 4 commits October 31, 2025 16:26

[CI] Auto commit changes from spotless

229e3ab

Merge branch 'main' into snapshot-logging-improvements

053d536

Update unit tests

741ed12

Merge branch 'snapshot-logging-improvements' of https://github.com/jo…

2e0f6b5

…shua-adams-1/elasticsearch into snapshot-logging-improvements

joshua-adams-1 marked this pull request as ready for review November 3, 2025 12:16

elasticsearchmachine added the Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. label Nov 3, 2025

burqen requested changes Nov 5, 2025

View reviewed changes

joshua-adams-1 and others added 6 commits November 7, 2025 12:26

Remove DateUtils and use TimeValue

2cc1aa5

Merge branch 'main' into snapshot-logging-improvements

d8d949a

Use custom DATE_TIME_FORMATTER

ced5d1e

[CI] Auto commit changes from spotless

9e4ba81

Spotless apply

961e9e9

Merge branch 'snapshot-logging-improvements' of https://github.com/jo…

2065287

…shua-adams-1/elasticsearch into snapshot-logging-improvements

joshua-adams-1 commented Nov 7, 2025

View reviewed changes

Merge branch 'main' into snapshot-logging-improvements

a5652f8

joshua-adams-1 requested a review from burqen November 7, 2025 16:13

burqen approved these changes Nov 10, 2025

View reviewed changes

joshua-adams-1 commented Nov 10, 2025

View reviewed changes

DiannaHohensee approved these changes Nov 11, 2025

View reviewed changes

joshua-adams-1 added 3 commits November 11, 2025 14:01

Add UTC to log message

1e9a0ab

Merge branch 'snapshot-logging-improvements' of https://github.com/jo…

84e916b

…shua-adams-1/elasticsearch into snapshot-logging-improvements

Merge branch 'main' into snapshot-logging-improvements

9163bf6

joshua-adams-1 added 2 commits November 12, 2025 09:18

Merge branch 'main' into snapshot-logging-improvements

ef3f972

Merge branch 'main' into snapshot-logging-improvements

48baaeb

joshua-adams-1 merged commit 1ce5c7c into elastic:main Nov 13, 2025
34 checks passed

joshua-adams-1 deleted the snapshot-logging-improvements branch November 13, 2025 11:37

		"[{}] Writing [{}] of size [{}b] to [{}] took [{}ms]",
		"[{}] Writing [{}] of size [{}b] to [{}]. Started at [{}], ended at [{}] and took [{}ms]",

Conversation

joshua-adams-1 commented Oct 31, 2025

Uh oh!

joshua-adams-1 commented Oct 31, 2025

Uh oh!

elasticsearchmachine commented Nov 3, 2025

Uh oh!

burqen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

burqen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DiannaHohensee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joshua-adams-1 Nov 7, 2025 •

edited

Loading

joshua-adams-1 Nov 7, 2025 •

edited

Loading