@@ -781,3 +781,147 @@ three double quotes (`"""`) instead.
781781*** {eql-ref}/pipes.html#sort[`sort`]
782782*** {eql-ref}/pipes.html#unique[`unique`]
783783*** {eql-ref}/pipes.html#unique-count[`unique_count`]
784+
785+ [discrete]
786+ [[eql-how-sequence-queries-handle-matches]]
787+ ==== How sequence queries handle matches
788+
789+ <<eql-sequences,Sequence queries>> don't find all potential matches for a
790+ sequence. This approach would be too slow and costly for large event data sets.
791+ Instead, a sequence query handles pending sequence matches as a
792+ {wikipedia}/Finite-state_machine[state machine]:
793+
794+ * Each event item in the sequence query is a state in the machine.
795+ * Only one pending sequence can be in each state at a time.
796+ * If two pending sequences are in the same state at the same time, the most
797+ recent sequence overwrites the older one.
798+ * If the query includes <<eql-by-keyword,`by` fields>>, the query uses a
799+ separate state machine for each unique `by` field value.
800+
801+ .*Example*
802+ [%collapsible]
803+ ====
804+ A data set contains the following `process` events in ascending chronological
805+ order:
806+
807+ [source,js]
808+ ----
809+ { "index" : { "_id" : "1" } }
810+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
811+ { "index" : { "_id" : "2" } }
812+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
813+ { "index" : { "_id" : "3" } }
814+ { "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
815+ { "index" : { "_id" : "4" } }
816+ { "user": { "name": "root" }, "process": { "name": "bash" }, ...}
817+ { "index" : { "_id" : "5" } }
818+ { "user": { "name": "root" }, "process": { "name": "bash" }, ...}
819+ { "index" : { "_id" : "6" } }
820+ { "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
821+ { "index" : { "_id" : "7" } }
822+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
823+ { "index" : { "_id" : "8" } }
824+ { "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
825+ { "index" : { "_id" : "9" } }
826+ { "user": { "name": "root" }, "process": { "name": "cat" }, ...}
827+ { "index" : { "_id" : "10" } }
828+ { "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
829+ { "index" : { "_id" : "11" } }
830+ { "user": { "name": "root" }, "process": { "name": "cat" }, ...}
831+ ----
832+ // NOTCONSOLE
833+
834+ An EQL sequence query searches the data set:
835+
836+ [source,eql]
837+ ----
838+ sequence by user.name
839+ [process where process.name == "attrib"]
840+ [process where process.name == "bash"]
841+ [process where process.name == "cat"]
842+ ----
843+
844+ The query's event items correspond to the following states:
845+
846+ * State A: `[process where process.name == "attrib"]`
847+ * State B: `[process where process.name == "bash"]`
848+ * Complete: `[process where process.name == "cat"]`
849+
850+ To find matching sequences, the query uses separate state machines for each
851+ unique `user.name` value. Pending sequence matches move through each machine's
852+ states as follows:
853+
854+ [source,txt]
855+ ----
856+ { "index" : { "_id" : "1" } }
857+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
858+ // Creates sequence [1] in state A for the "root" user.
859+ //
860+ // root: A=[1]
861+
862+ { "index" : { "_id" : "2" } }
863+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
864+ // Creates sequence [2] in state A for "root", overwriting sequence [1].
865+ //
866+ // root: A=[2]
867+
868+ { "index" : { "_id" : "3" } }
869+ { "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
870+ // Nothing happens. The "elkbee" user has no pending sequence to move from state A to state B
871+
872+ { "index" : { "_id" : "4" } }
873+ { "user": { "name": "root" }, "process": { "name": "bash" }, ...}
874+ // Sequence [2] moves out of state A for "root". State B for "root" now contains [2, 4]
875+ // State A for "root" is now empty.
876+ //
877+ // root: A=[]
878+ // root: B=[2, 4]
879+
880+ { "index" : { "_id" : "5" } }
881+ { "user": { "name": "root" }, "process": { "name": "bash" }, ...}
882+ // Nothing happens. State A is empty for "root".
883+
884+ { "index" : { "_id" : "6" } }
885+ { "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
886+ // Creates sequence [6] in state A for "elkbee".
887+ //
888+ // elkbee: A=[6]
889+
890+ { "index" : { "_id" : "7" } }
891+ { "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
892+ // Creates sequence [7] in state A for "root".
893+ // Sequence [2, 4] remains in state B for "root".
894+ //
895+ // root: A=[7]
896+ // root: B=[2, 4]
897+
898+ { "index" : { "_id" : "8" } }
899+ { "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
900+ // Sequence [6, 8] moves to state B for "elkbee".
901+ // State A for "elkbee" is now empty.
902+ //
903+ // elkbee: A=[]
904+ // elkbee: B=[6, 8]
905+
906+ { "index" : { "_id" : "9" } }
907+ { "user": { "name": "root" }, "process": { "name": "cat" }, ...}
908+ // Sequence [2, 4, 9] is complete for "root".
909+ // State B for "root" is now empty.
910+ // Sequence [7] remains in state A.
911+ //
912+ // root: A=[7]
913+ // root: B=[]
914+
915+ { "index" : { "_id" : "10" } }
916+ { "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
917+ // Sequence [6, 8, 10] is complete for "elkbee".
918+ // State A and B for "elkbee" are now empty.
919+ //
920+ // elkbee: A=[]
921+ // elkbee: B=[]
922+
923+ { "index" : { "_id" : "11" } }
924+ { "user": { "name": "root" }, "process": { "name": "cat" }, ...}
925+ // Nothing happens. State B for "root" is empty.
926+ ----
927+ ====
0 commit comments