Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis indexation #102

Open
rehia opened this issue Jan 28, 2017 · 9 comments
Open

Redis indexation #102

rehia opened this issue Jan 28, 2017 · 9 comments

Comments

@rehia
Copy link
Contributor

rehia commented Jan 28, 2017

In #94, I was talking about some ways to improve redis implementation performance, mostly when there are a large number of events (or snapshots).
The problem with actual the implementation is the scan which usually need to scan every events (or snapshots) of an aggregate, and then make the necessary calculations to only take needed events.

I have made a quick test with my production data.
I have an aggregate with +46K events with a total amount of +117K
I tried to scan all the keys (only scanning, no get or mget) using 2 strategies:

  • I made a common scan on all the keys. It took 6181ms.
  • I put all the aggregate keys in a sorted set (members score was the event version), and made a zrange on all the keys. It took 415ms

I think that sorted sets could be a good option to index aggregates keys, at least for 2 reasons:

  • It seems more efficient to scan all the keys at once, for an aggregate
  • It will be more efficient to get keys between 2 revisions, for an aggregate (like you need to do when you reload state from a snapshot and an event)

This means that the database need to be indexed at first, and the index maintained. So whenever a client connects, we need to check if the store has been indexed, and if it's not the case, scan all the aggregates, and scan for their events and snapshots keys, and finally index the keys (events and snapshots separated obviously). Once done, each time an event is added, its key should be added to the corresponding sorted set.

What do you think about this solution? I'll try to work on an implementation. But if you have any remarks, objections or suggestions, don't hesitate!

@adrai
Copy link
Contributor

adrai commented Jan 29, 2017

Sounds like a breaking change for the redis implementation... (have to write this at least in the release notes)
As far as I know your production data is the biggest for redis... so you're the expert ;-)

@rehia
Copy link
Contributor Author

rehia commented Feb 10, 2017

I managed to make something work, but I need to do more tests.
I was also wondering if I should modify the existing redis implementation, or if I should apply Open Closed Principle and extend the existing implementation, and add a new database type (called indexedRedis ?), compatible with the existing implementation.
What do you think about it ?

@adrai
Copy link
Contributor

adrai commented Feb 10, 2017

I think it's ok to replace the current implementation when we can proof the new one to be better in form of performance etc...

@rehia
Copy link
Contributor Author

rehia commented Feb 10, 2017

Sure, but the store needs to be indexed, before any usage. It is done on init.
And it can take quite a long time (I've indexed my 122K+ events store in 170 sec on my Mac Book Pro 😅, because of the full scan made on all events), and might be done outside of the application (using a dedicated node task), depending on the size of the db.
So I was thinking about the best option not to break compatibility (or at least, here, not to cause an overhead when the application starts the first time with this option). It could be done using a dedicated option, or using another implementation.

@adrai
Copy link
Contributor

adrai commented Feb 11, 2017

But is the already stored data compatible with the new implementation?
So really only this indexing task is what it's needed?
So perhaps the indexing in the init is ok... but we can offer a script to execute to do this indexing on an other process?

@rehia
Copy link
Contributor Author

rehia commented Feb 11, 2017

Yes sure, it's fully compatible.
The events are not updated. Only some keys are added to better index events and snapshots.
But once you switch to an indexed redis store, the indexation is done. You can go back to a not indexed redis store, but you will have to do it manually (by removing a dedicated key).

What I can do, when you want to use an indexed redis store, is:

  • checking if the database is already indexed (using a dedicated key). if not, you will have to do it manually using a call to an index() method on the store (maybe in a script).
  • if you don't do the indexation, the common scan() method will be used, using SCAN redis command
  • once the indexation is done, the scan() method will use indexes

Why not offering a script for this, but I was wondering how to discover all the options needed to connect to the redis instance. Maybe giving a common example in the documentation would be a first step forward.

You're OK with that?

@adrai
Copy link
Contributor

adrai commented Feb 11, 2017 via email

@TyGuy
Copy link
Contributor

TyGuy commented Sep 5, 2018

@rehia do you still have interest in this, or plans to execute on it?

We are sitting with around 30k events in redis, and noticing it slowing down as well. I think this solution (indexing on aggregateId and using sorted sets) would be awesome!

If you don't have plans to continue it or work on it, would you be willing to share a WIP branch with your implementation? If/when I find some time, I could pick up where you left off, and carry this forward.

@rehia
Copy link
Contributor Author

rehia commented Sep 5, 2018

@TyGuy hi!

Sorry, but I couldn't take some time to start something on this. You can do it, if you have time for it.

The solution I suggested could be a good one, but as usual, it must first be validated on the field.

I hop you'll find more time than me 😅 Good luck !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants