[6.8] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting #57174
Conversation
|
Pinging @elastic/ml-core (:ml) |
| List<JobSnapshotId> snapshotIds = new ArrayList<>(); | ||
| for (SearchHit hit : searchResponse.getHits()) { | ||
| modelSnapshots.add(ModelSnapshot.fromJson(hit.getSourceRef())); | ||
| JobSnapshotId idPair = new JobSnapshotId( |
There was a problem hiding this comment.
The logic here is different to the later branches as it doesn't have the new model snapshot retention options added in #56125
| ForecastRequestStats forecastRequestStats = ForecastRequestStats.LENIENT_PARSER.apply(parser, null); | ||
| if (forecastRequestStats.getExpiryTime().toEpochMilli() < cutoffEpochMs) { | ||
| forecastsToDelete.add(forecastRequestStats); | ||
| String expiryTime = stringFieldValueOrNull(hit, ForecastRequestStats.EXPIRY_TIME.getPreferredName()); |
There was a problem hiding this comment.
In 6.8 the doc value could be a Long rather than a String. It's why TimeField has this case:
What this means in practice is that a user running 6.8 who first used ML in 5.x will end up seeing the warning on lines 139-140 repeatedly and won't get any cleanup.
It's still OK to use stringFieldValueOrNull() to extract fields mapped as keyword or text from hits, but for fields mapped as date in 6.8 the code needs to handle both Long and String.
There was a problem hiding this comment.
Good point thanks.
I'm curious though why does this code throw if the object is a Long? This is from the 6.8 branch
There was a problem hiding this comment.
The tricky part is the condition checks the value is not a long. Thus, the logic there is that prior to 6.0, we expect a long. If it's not, then something's gone wrong. Otherwise, we fall through the last return of the method. Pretty confusing, I know.
|
I pushed another commit changing the date parsing. Nano second date format was not supported in 6.8 so I dropped the parsing of the fractional component and now expect the doc_value object to be either a String or a Long |
The queries performed by the expired data removers pull back entire documents where only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000.
If the user is suffering with many accumulated snapshots that were not cleaned up due to #47103 the size of this search response could be very large. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.
Backport of #57041