Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re_datastore: generalized deduplication #446

Open
Tracked by #1898
teh-cmc opened this issue Dec 2, 2022 · 1 comment
Open
Tracked by #1898

re_datastore: generalized deduplication #446

teh-cmc opened this issue Dec 2, 2022 · 1 comment
Labels
🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Dec 2, 2022

Generalized deduplication

  • deduplicate across timelines across multiple calls

  • automagically deduplicate within a single component table

  • note: probably never worth doing in real-time, but can be interesting while serializing

@teh-cmc teh-cmc added 🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself labels Dec 2, 2022
@teh-cmc
Copy link
Member Author

teh-cmc commented Apr 18, 2023

Interestingly, like computing cell sizes, this is one of these things that could be computed in the clients' batching threads (i.e. the hash of the cell's contents).

Another interesting tidbit: this is really not that different from the ongoing work regarding DataType deduplication (#1809).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

No branches or pull requests

1 participant