ES|QL - Add parsing, preanalysis and analysis timing information to profile#139540
Conversation
|
Hi @carlosdelest, I've created a changelog YAML for you. |
…preanalysis-profiling' into enhancement/esql-analysis-preanalysis-profiling
…preanalysis-profiling' into enhancement/esql-analysis-preanalysis-profiling
…preanalysis-profiling' into enhancement/esql-analysis-preanalysis-profiling
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| assert ThreadPool.assertCurrentThreadPool(ThreadPool.Names.SEARCH); | ||
|
|
||
| executionInfo.markBeginPreAnalysis(); | ||
| PreAnalyzer.PreAnalysis preAnalysis = preAnalyzer.preAnalyze(parsed); |
There was a problem hiding this comment.
I'd like to suggest we split preanalysis into preanalysis as above and dependency-resolution (as FC calls) as it becoming a significant dependency for us.
| private transient TimeSpan.Builder preAnalysisTimeSpanBuilder; // Builder for preAnalysisTimeSpan | ||
| private transient TimeSpan analysisTimeSpan; // time elapsed for plan analysis | ||
| private transient TimeSpan.Builder analysisTimeSpanBuilder; // Builder for analysisTimeSpan | ||
| private TimeValue overallTook; |
There was a problem hiding this comment.
EsqlExecutionInfo has grown to almost 800 lines and has multiple responsibilities now (such as ccs info tracking, profiling, status reportin, isPartial, isStopped). Currently we are thinking about a ways to manage that.
I wonder if that would make sense to group all transient timers and define them in a new private final transient PlanningProfile field? This would allow to move some of this class complexity elsewhere.
There was a problem hiding this comment.
Makes sense - I've created a PlanningProfile class and moved planning related profiling there.
|
|
||
| public TimeSpan parsingTimeSpan() { | ||
| return parsingTimeSpan; | ||
| } |
There was a problem hiding this comment.
All of the markBegin** and markEnd** do the same things: verify the state and perform the action.
I wonder if we could introduce a component with Builder and Span that verifies those state transition.
I imagine something like:
class TimeSpanRecorder {
private TimeSpan.Builder builder;
private TimeSpan span;
public void start() {/*assert is called first time and start a builder*/}
public void stop() {/*assert is called first time and create a span*/}
public TimeSpan get() {..}
}
All usages could be simplified to:
executionInfo.planningProfile().analysis().start();
...
executionInfo.planningProfile().analysis().stop();
| parsingMarker = new TimeSpanMarker(PARSING, false, parsing); | ||
| preAnalysisMarker = new TimeSpanMarker(PRE_ANALYSIS, false, preAnalysis); | ||
| dependencyResolutionMarker = new TimeSpanMarker(DEPENDENCY_RESOLUTION, true, dependencyResolution); | ||
| analysisMarker = new TimeSpanMarker(ANALYSIS, true, analysis); |
There was a problem hiding this comment.
I believe dependencyResolutionMarker and analysisMarker are having allowMultipleCalls=true because of the possible analysis retry. I wonder if in this case we should sum the the duration instead of discarding the first attempt? The query is getting longer because of extra attempts.
Possibly we want to record the fact of retrying and total FieldCaps call count as well, but I believe this is out of scope of this change.
There was a problem hiding this comment.
I believe dependencyResolutionMarker and analysisMarker are having allowMultipleCalls=true because of the possible analysis retry.
Correct
I wonder if in this case we should sum the the duration instead of discarding the first attempt?
We're doing that - in case start() is invoked again when multiple invocations are allowed, we keep the original starting time.
We could count etc but let's get this change in and we can refine later if needed - at the very least we will get a longer time in dependency resolution / analysis that will hint about having multiple retriees.
There was a problem hiding this comment.
We're doing that - in case start() is invoked again when multiple invocations are allowed, we keep the original starting time.
I am not sure this is correct:
maker1.start();
maker1.stop();
...
maker2.start();
maker2.stop();
...
maker1.start();
maker1.stop();
This way the maker1 would keep the entire duration from first start to the last stop call, not the combined duration of first and second intervals.
This PR is a big improvement on its own, I do not mind changing repeated start/stop behavior later.
There was a problem hiding this comment.
I see your point - I was thinking on total duration, not in summing up the individual steps.
Yep, let's refine that behavior later 👍
…ysis-preanalysis-profiling # Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv
* upstream/main: (253 commits) Adds ST_SIMPLIFY geo spatial function (elastic#136309) Take control of max clause count verification in Lucene searcher (elastic#139752) [ML] Unmute Inference Test (elastic#139765) Parameterize the vector operation benchmark tests (elastic#139735) Fix node reduction pushdown tests for release tests (elastic#139548) Fix flakiness in TSDataGenerationHelper (elastic#139759) CPS: Copy existing resolved index expressions when constructing a new `SearchRequest` from an existing one (elastic#139596) Add release notes for v9.1.9 release (elastic#139674) Add lucene query for wildcards on high cardinality keyword fields. (elastic#139746) Suppress Tika entitlement warnings from AWT (elastic#139711) Check field storage when synthetic source is enabled, in tests (elastic#139715) Refactor VectorSimilarityType to know about its corresponding Function (elastic#139678) Merge fixes from patch branch back into main (elastic#139721) Define native bulk operations for vector square distance (elastic#139198) Use LongUpDownCounter for Linked Project Error Metrics (elastic#139657) ESQL: Add javadoc that explains version-aware planning (elastic#139706) Add helper to pick node for reindex relocation (elastic#139081) Fix auth serialization randomized version test (elastic#139182) ES|QL - Add parsing, preanalysis and analysis timing information to profile (elastic#139540) Mute org.elasticsearch.persistent.ClusterPersistentTasksCustomMetadataTests testMinVersionSerialization elastic#139741 ...
Adds parsing, preanalysis and analysis timing information to the profile output:
{ "profile": { "query": { "start_millis": 1765995832363, "stop_millis": 1765995832370, "took_millis": 7, "took_nanos": 7435667 }, "planning": { "start_millis": 1765995832363, "stop_millis": 1765995832366, "took_millis": 3, "took_nanos": 2865708 }, "parsing": { "start_millis": 1765995832363, "stop_millis": 1765995832364, "took_millis": 1, "took_nanos": 750125 }, "preanalysis": { "start_millis": 1765995832364, "stop_millis": 1765995832364, "took_millis": 0, "took_nanos": 31791 }, "dependency_resolution": { "start_millis": 1765995832364, "stop_millis": 1765995832365, "took_millis": 1, "took_nanos": 1165791 }, "analysis": { "start_millis": 1765995832365, "stop_millis": 1765995832365, "took_millis": 0, "took_nanos": 403458 } }dependency_resolution include a number of operations related to dependency resolution:
This helps understanding the amount of time ES|QL spends doing non planning operations, including specifically field_caps checks.