Skip to content

Conversation

@jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Sep 16, 2021

Summary

Fix #112164

The following was being reported for subsequent runs of a large CSV export on CI. TL;DR, the CSV row count was random, but always below the expected 4675 total.

Data
Run scan outputs:
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
-------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 146
   │ proc [kibana] results 0
-------------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 147
   │ proc [kibana] results 0
   │ proc [kibana] this.csvRowCount 4647
-------------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 139
   │ proc [kibana] results 0
----------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 42
   │ proc [kibana] results 0
   │ proc [kibana] this.csvRowCount 454

It appears that this was reproducible only with using the _scroll endpoint. After switching to using point in time per the recommendation in the docs, we are getting consistent CSV row counts again:

   │ proc [kibana] total hits 4675
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 175
   │ proc [kibana] this.csvRowCount 4675

The docs indicate that scroll should not be used to span more than 10000 docs, but in this case we were spanning less than half that. We should do an analysis to determine how far back this was introduced as it is likely a result of something in ES changing (still investigating).

How to test locally

(this can be automated by running the functional test generates a report from a new search with data: default)

  1. Set up ES with the data archive from x-pack/test/functional/es_archives/reporting/ecommerce
  2. Start Kibana
  3. Issue the following request:
curl 'http://localhost:5620/api/reporting/generate/csv_searchsource' \
  -H 'Connection: keep-alive' \
  -H 'sec-ch-ua: "Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"' \
  -H 'Content-Type: application/json' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' \
  -H 'kbn-version: 7.16.0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'Accept: */*' \
  -H 'Origin: http://localhost:5620' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: http://localhost:5620/app/discover?_t=1631716621016' \
  -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
  -u elastic:changeme \
  --data-raw $'{"jobParams":"(browserTimezone:UTC,columns:\u0021(),objectType:search,searchSource:(fields:\u0021((field:\'*\',include_unmapped:true)),filter:\u0021((meta:(field:order_date,index:\'5193f870-d861-11e9-a311-0fa548c5f953\',params:()),range:(order_date:(format:strict_date_optional_time,gte:\'2019-04-04T23:56:51.374Z\',lte:\'2019-08-29T16:18:51.821Z\')))),index:\'5193f870-d861-11e9-a311-0fa548c5f953\',parent:(filter:\u0021(),index:\'5193f870-d861-11e9-a311-0fa548c5f953\',query:(language:kuery,query:\'\')),sort:\u0021((order_date:desc)),trackTotalHits:\u0021t),title:\'Discover search [2021-09-15T14:48:11.140+00:00]\',version:\'7.16.0\')"}' \
  --compressed

Release note

Does this need a public release note?

Checklist

@jloleysens jloleysens added release_note:fix zDeprecated Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead v8.0.0 Team:AppServices v7.16.0 v7.15.1 labels Sep 16, 2021
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens jloleysens added release_note:skip Skip the PR/issue when compiling release notes and removed release_note:fix labels Sep 16, 2021
@tsullivan
Copy link
Member

After switching to using point in time per the recommendation in the docs, we are getting consistent CSV row counts again:

I looked back in the PR branch that added the current CSV export implementation: #88303. In the first commit / original implementation, the plan was to use PIT instead of the ES _scroll API. Unfortunately that plan was scrapped when we had test failures for exporting non-timebased data and unsorted data. I think we should come up with a plan to identify when PIT should be used to export the CSV. My guess is that 99% of the time, using PIT is the "right way" to do it.

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jloleysens jloleysens closed this Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Skip the PR/issue when compiling release notes v7.15.1 v7.16.0 v8.0.0 zDeprecated Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants