rrdom: an ad-hoc DOM for rrweb session data

For anyone familiar with rrweb's session data, a session has already contained enough information to do further analytic things. But to performant analysis to raw session data is not that simple.

For example, if we want to check 'whether the user has clicked a button that matches the CSS selector `.btn-submit-form`', we should:
1. replay the session
2. listen to the click event
3. when a click event was cast, check whether its target element matches `.btn-submit-form`

Since the replay deeply relies on the DOM API, we need a browser environment to do it, which is a bottleneck of performance when we need to do large scale analysis on many sessions.

So in this proposal, I suggest implementing an ad-hoc DOM called 'rrdom' as an environment to replay the session.

## Why not jsdom?

[jsdom](https://github.com/jsdom/jsdom) is a feature-rich DOM implementation written in JS and can run in Node.js. Since jsdom wants to get close to the web standard, it's quite complex and has a lot of overhead for our usage.

## What kinds of DOM API we should implement

We only need DOM manipulation APIs and CSS query selector APIs in rrdom.
To achieve this, we need a tree data structure to represent the DOM and provide the following public APIs:

- appendChild
- removeChild
- setAttribute
- querySelector
- querySelectorAll

## How to do performant analysis with rrdom

The steps do not change, but the first step will be much faster since we will replay the session in rrdom.

None of the visual effects of replay will be present, but we can match CSS selectors when certain interactive events occurred.

## Record a time-sensitive index

Sometimes we may want to build an index for the session. The index should contain some general information like:

1. clicked `.btn-submit-form` at 1500ms.
2. input at `input[name="email"]` at 3000ms.

This is useful if we want to build a UI that shows the events with the target descriptor(because rrweb only records an id of the target which is not user-friendly).

Also, it is useful if we want to some filter in the backend like 'show me the sessions contain click event on `.btn-submit-form`'.

The only thing we should be careful of is this index is time-sensitive. For example, if we clicked on the button element with id 1 at 1500ms and 3000ms, it may have a different class name or attributes with it.

## Isomorphic or not

The first version of rrdom will be written in JS, so we can use it in both Node.JS and browser.

But it is also possible to implement rrdom in some language like rust. Which may provide better performance when we need to do large scale analysis.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rrdom: an ad-hoc DOM for rrweb session data #419

Why not jsdom?

What kinds of DOM API we should implement

How to do performant analysis with rrdom

Record a time-sensitive index

Isomorphic or not

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

rrdom: an ad-hoc DOM for rrweb session data #419

Description

Why not jsdom?

What kinds of DOM API we should implement

How to do performant analysis with rrdom

Record a time-sensitive index

Isomorphic or not

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions