Skip to content

KQL expression parsing is slow #76811

@kobelb

Description

@kobelb

Parsing even a simple KQL expression string is slow and a complex KQL expression is disastrously slow:

const suite = new Benchmark.Suite();

suite
  .add('parse simple KQL', function () {
    esKuery.fromKueryExpression(
      'not fleet-agent-actions.attributes.sent_at: * and fleet-agent-actions.attributes.agent_id:1234567'
    );
  })
  .add('parse complex KQL', function () {
    esKuery.fromKueryExpression(
      `((alert.attributes.alertTypeId:.index-threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:siem.signals and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:siem.notifications and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:metrics.alert.threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:metrics.alert.inventory.threshold and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:logs.alert.document.count and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_cluster_health and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_license_expiration and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_cpu_usage and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_nodes_changed and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_logstash_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_kibana_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:monitoring_alert_elasticsearch_version_mismatch and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.transaction_duration and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.transaction_duration_anomaly and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:apm.error_rate and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.monitorStatus and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.tls and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)) or (alert.attributes.alertTypeId:xpack.uptime.alerts.durationAnomaly and alert.attributes.consumer:(alerts or builtInAlerts or siem or infrastructure or logs or monitoring or apm or uptime)))`
    );
  })
  .on('cycle', function (event) {
    console.log(String(event.target));
  })
  .run({ async: true, minSamples: 10 });
parse simple KQL x 999 ops/sec ±2.00% (74 runs sampled)
parse complex KQL x 4.08 ops/sec ±2.22% (15 runs sampled)

@elastic/kibana-app-arch it appears that a majority of the slowness is coming directly from the code which pegjs is generating. Have you all investigated the performance of KQL previously? It also looks like we're on version 0.9.0 of pegjs, is there any chance that an upgrade to 0.10.0 would improve the performance? There's also a --cache option which has fixed performance issues for at least one person; however, this contradicts the documentation:

--cache — makes the parser cache results, avoiding exponential parsing time in pathological cases but making the parser slower

If we're unable to drastically improve the performance of KQL, it'll be unsuitable to use in a lot of scenarios. When parsing KQL expressions server-side, this can lead to the event-loop being blocked and preventing all other operations from running. The Fleet team has had to switch from using a KQL string to building the KueryNode manually and we can potentially adapt this approach elsewhere. However, we can't use this everywhere if we want end-users to be able to specify the KQL strings.

CC'ing a few people...

@elastic/kibana-platform because SavedObjectsClient#find supports a KQL expression string
@elastic/kibana-alerting-services because your authorization is building a complex KQL expression string here
@elastic/siem because you all are using KQL to query Elasticsearch data-indices

Related: #69649, #89473

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions