Conversation
|
|
||
| private static Map<String, Object> prepareMap(Result esqlResult, String query) { | ||
| Map<String, Object> messageFields = new HashMap<>(); | ||
| messageFields.put("elasticsearch.slowlog.message", esqlResult.executionInfo().clusterAliases()); |
There was a problem hiding this comment.
This looks a bit surprising, message contains cluster aliases, is it intentional?
There was a problem hiding this comment.
I'm not sure we need that at all.
SearchSlowLog has searchContext.indexShard().shardId() in there, maybe we want something different
There was a problem hiding this comment.
I'm removing it for now, in case we'll reintroduce it
| messageFields.put("elasticsearch.slowlog.planning.took_millis", esqlResult.executionInfo().planningTookTime().millis()); | ||
| messageFields.put("elasticsearch.slowlog.search_type", "ESQL"); | ||
| String source = escapeJson(query); | ||
| messageFields.put("elasticsearch.slowlog.source", source); |
There was a problem hiding this comment.
I think source is a bit ambiguous. Should we call it query here?
There was a problem hiding this comment.
Yeah, I don't like it very much, query would be much better.
I'm using source only because it's what Search does, but I'm not sure we want to be consistent with it
There was a problem hiding this comment.
I thought about this, source for _search is the actual request body, while in our case it's only the query string.
I'll rename it as "query" for now.
| return; // TODO review, it happens in some tests, not sure if it's a thing also in prod | ||
| } | ||
| long tookInNanos = esqlResult.executionInfo().overallTook().nanos(); | ||
| var queryWarnThreshold = clusterSettings.get(ESQL_SLOWLOG_THRESHOLD_QUERY_WARN_SETTING).nanos(); |
There was a problem hiding this comment.
NIT: org.elasticsearch.common.settings.Setting#get logic is a bit tricky.
We should copy the value to the field and watch for updates using org.elasticsearch.common.settings.AbstractScopedSettings#initializeAndWatch
| final String node1, node2; | ||
| if (randomBoolean()) { | ||
| internalCluster().ensureAtLeastNumDataNodes(2); | ||
| node1 = randomDataNode().getName(); | ||
| node2 = randomValueOtherThan(node1, () -> randomDataNode().getName()); | ||
| } else { | ||
| node1 = randomDataNode().getName(); | ||
| node2 = randomDataNode().getName(); | ||
| } |
ivancea
left a comment
There was a problem hiding this comment.
Looks good! Added some comments/questions
| TimeValue.timeValueNanos(-1), | ||
| TimeValue.timeValueMillis(-1), |
There was a problem hiding this comment.
Is there some reason for the default to be -1ns instead of -1ms? I guess both would work the same, right?
There was a problem hiding this comment.
I think it's the same. let me change it
| // TODO: this logic assumes a single phase execution model, so it may need to altered once INLINESTATS is made CCS compatible | ||
| execInfo.markEndPlanning(); | ||
| if (execInfo.isCrossClusterSearch()) { | ||
| execInfo.markEndPlanning(); |
There was a problem hiding this comment.
Was this a bug? Was it working when not CCS?
There was a problem hiding this comment.
Not really.
This method was introduced with CCS implementation, and planningTookTime was only used in that contex.
Now we need it also for other purposes, so I'll set it also for non-CCS queries.
| latch.countDown(); | ||
| } | ||
| })); | ||
| latch.await(30, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Are we failing the test if the countdown isn't 0? I think await() does nothing about it, right?
Not sure what will happen here exactly if the query fails.
I see other test doing an assert like:
assertEquals("All requests must respond, requests: " + requests, 0, latch.getCount());
There was a problem hiding this comment.
Please use safeAwait(latch); instead. It should handle correctly the cases when operation times out.
There was a problem hiding this comment.
I see other test doing an assert like:
One more assert won't hurt, I'm adding it.
|
|
||
| private static final Logger queryLogger = LogManager.getLogger(SLOWLOG_PREFIX + ".query"); | ||
|
|
||
| private static final ToXContent.Params FORMAT_PARAMS = new ToXContent.MapParams(Collections.singletonMap("pretty", "false")); |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @luigidellaquila, I've created a changelog YAML for you. |
costin
left a comment
There was a problem hiding this comment.
Left some comments.
It would be really useful to see an example of what the esql slow logs looks like (I assume it will be part of the doc).
| planTelemetryManager.publish(planTelemetry, true); | ||
| slowLog.onQueryPhase(x, request.query()); |
There was a problem hiding this comment.
To help with encapsulation, please create two internal methods called when the query is successfully executed or is failing and move the metrics, telemetry and now the slow log there (e.g. onQuerySuccesful(), onQueryFailure())
| this(settings, null); | ||
| } | ||
|
|
||
| public void onQueryPhase(Result esqlResult, String query) { |
There was a problem hiding this comment.
This method and onQueryFailure are very similar - same code should be reused across both of them.
| jsonFields.put("elasticsearch.slowlog.error.message", exception.getMessage() == null ? "" : exception.getMessage()); | ||
| jsonFields.put("elasticsearch.slowlog.error.type", exception.getClass().getName()); | ||
| jsonFields.put("elasticsearch.slowlog.took", took); | ||
| jsonFields.put("elasticsearch.slowlog.took_millis", took / 1_000_000); | ||
| return new ESLogMessage().withFields(jsonFields); | ||
| } | ||
|
|
||
| private static Map<String, Object> prepareMap(Result esqlResult, String query, boolean success, User user) { | ||
| Map<String, Object> messageFields = new HashMap<>(); | ||
| if (user != null) { | ||
| messageFields.put("user.name", user.principal()); | ||
| } | ||
| if (esqlResult != null) { | ||
| messageFields.put("elasticsearch.slowlog.took", esqlResult.executionInfo().overallTook().nanos()); | ||
| messageFields.put("elasticsearch.slowlog.took_millis", esqlResult.executionInfo().overallTook().millis()); | ||
| messageFields.put("elasticsearch.slowlog.planning.took", esqlResult.executionInfo().planningTookTime().nanos()); | ||
| messageFields.put("elasticsearch.slowlog.planning.took_millis", esqlResult.executionInfo().planningTookTime().millis()); | ||
| } | ||
| String source = escapeJson(query); | ||
| messageFields.put("elasticsearch.slowlog.success", success); | ||
| messageFields.put("elasticsearch.slowlog.search_type", "ESQL"); | ||
| messageFields.put("elasticsearch.slowlog.query", source); | ||
| return messageFields; |
There was a problem hiding this comment.
Extract constants out of these repetitive strings including during their construction, such as the common prefix ("elasticsearch.slowlog")
Also there's a repetition on took/took_millis, which seems to be overridden always in of method.
|
Thanks for the feedback folks! @costin I added two examples of slow log output to the PR description. |
jfreden
left a comment
There was a problem hiding this comment.
Looks good! Just a comment on combining the SlowLogFieldProviders.
| } | ||
|
|
||
| @Override | ||
| public Map<String, String> queryFields() { |
| return fields.stream() | ||
| .flatMap(f -> f.indexFields().entrySet().stream()) | ||
| .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); | ||
| SlowLogFieldProvider slowLogFieldProvider = new SlowLogFieldProvider() { |
There was a problem hiding this comment.
The point of this is that it will merge all field providers provided by the plugins into one (in practice I think that's only the one in the security plugin). So this should ideally have an implementation for merging queryFields too and then this instance should be passed to the PluginService (instead of the list of providers).
If you don't want to implement the merge for indexFields and searchFields in the provider without index settings you could add an empty default implementation in the interface in the same way you did for queryFields.
jfreden
left a comment
There was a problem hiding this comment.
Thanks for implementing the suggested changes. LGTM pending CI and the comment I left above.
|
Thanks @jfreden! |
|
SlowLog -> QueryLog The intention is to make it more generic in the future, so that we can log queries not based on the execution time, but also based on other criteria. |
|
Docs PR here elastic/docs-content#816 |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Adding Query Log to ES|QL.
It can be configured at cluster level, using the following cluster settings (TimeValue):
By default, queries will be logged to
Example log in case of success:
Example log in case of failure:
TODO