Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make EntityManager Clone #3

Open
gterzian opened this issue Nov 6, 2023 · 12 comments
Open

Make EntityManager Clone #3

gterzian opened this issue Nov 6, 2023 · 12 comments

Comments

@gterzian
Copy link

gterzian commented Nov 6, 2023

Since the EntityRepository is Clone, you could remove the Arc around the EntityManager and instead make it Clone as well, since the DocHandle is Clone.

Besides removing the arc, if you were ever to start using DocHandle.changed somewhere, you would get what I think is a more intuitive kind of result than if you were to use the same refcounted doc handle everywhere: with the current setup it could lead to some updates apparently not propagating(because each doc handle keeps track of the latest state it has seen, and the arc would make this a single global thing as opposed to a local thing, see this note).

@teohhanhui
Copy link
Contributor

teohhanhui commented Nov 6, 2023

Thank you for your suggestion. I've pondered this some, but I think the current shared EntityManager is correct. The only way to get a DocHandle from the EntityManager is the doc method, where we're already cloning.

It'd in fact be problematic to have more than one EntityManager for the same document as it wouldn't be much of a manager at all. Currently the only shared state is the DocHandle, but I think that could change.

In the case of DefaultEntityRepository it's probably fine as it's just a way to query the document (so long as transactions are fully consistent between clones of DocHandles, I think?)

@gterzian
Copy link
Author

gterzian commented Nov 7, 2023

It'd in fact be problematic to have more than one EntityManager for the same document as it wouldn't be much of a manager at all.

Right, but that's not something that is inherently guaranteed by the current use of the Arc, since the DocHandle itself is clonable(so one could create multiple EntityManager to the same doc, each with a clone of the doc handle), which is why I think you may as well remove it.

Why does the EntityRepository need to be Clone?

transactions are fully consistent between clones of DocHandle

Yes they are, it's only the use of changed on multiple Arc<DocHandle> that could turn out confusing, so at this point it's only a hypothetical problem.

@teohhanhui
Copy link
Contributor

since the DocHandle itself is clonable(so one could create multiple EntityManager to the same doc, each with a clone of the doc handle)

Sure, but at that point the user is actively choosing to create a new EntityManager (no ambiguity like in the case of Clone), so they should already know each EntityManager is independent. And the changed problem does not arise as they're cloned DocHandles.

Why does the EntityRepository need to be Clone?

IIRC it doesn't need to be, and I'm okay with removing it to allow for a possible future where DefaultEntityRepository might need to hold some shared state that doesn't have an obvious Clone semantics.

@gterzian
Copy link
Author

gterzian commented Nov 7, 2023

Ok, I think it would be clearer to remove the Clone on the repository, and then also remove the arc around the entity manager.

@teohhanhui
Copy link
Contributor

teohhanhui commented Nov 7, 2023

remove the arc around the entity manager

That's the thing. The EntityManager is supposed to be shared. If we remove the Arc the user would be forced to create new EntityManagers, which is exactly what we don't want to happen.

The only alternative is to add a lifetime parameter to DefaultEntityRepository<'a, T> and pass in &'a EntityManager?

@gterzian
Copy link
Author

gterzian commented Nov 7, 2023

I think I don't completely understand the API you are sketching.

To be clear on my side:

  1. Sharing DocHandle, indirectly via a shared EntityManager, is ok but could lead to unexpected behavior if you ever start using the changed method, because the idea is that each clone of a DocHandle keeps it's own state with regards to whether the doc has changed since the doc handle last used it. So if you use an Arc<DocHandle>, and clone it in different threads, one thread may not wake-up on changed if the other thread already saw the last change.
    So if you expose changed via the API of the EntityManager, and then shared it in an arc, it could be confusing.

  2. Besides the above, which in some ways does not matter for the current use, it appears strange to me that an non-clonable EntityManager is used to wrapped a clonable DocHandle.

The point of the DocHandle is to make it easy for users to share the same automerge document across threads. So the DocHandle is Clone because cloning it is an easy way to use it in different places. And each clone keeps it's own state with regards to the last version of the document seen, so that changed works as expected from the point of view of one thread/task. So the underlying document is shared, but not the state pertaining to changed.

So, even though you are not using changed, it might still be a good ok to keep the use of the DocHandle "consistent" with what I describe above. And putting it inside an Arc<EntityManager> is similar to using an Arc<DocHandle>.

So that's it on my side :)

Now, with regards to what you are trying to do:

The EntityManager is supposed to be shared. If we remove the Arc the user would be forced to create new EntityManagers, which is exactly what we don't want to happen.

Why is it supposed to be shared?

Looking at the example, it seems you hand it over to the repository, and then transact on it, and then expect the repository to find the book transacted. If you were to transact in another task or thread, the expectation would break down(unless you used something like changed to synchronize).

So to me it appears that, similar to a DocHandle, you want the underlying document to be shared, not necessarily the thing wrapping it.

Currently the only shared state is the DocHandle, but I think that could change.

You can later add arcs to other shared state internally, as opposed to putting the thing itself in an arc.

It'd in fact be problematic to have more than one EntityManager for the same document as it wouldn't be much of a manager at all.

Please explain more about what you want it to achieve as a "manager".

@teohhanhui
Copy link
Contributor

Thanks for your patience in trying to explain all this to me. I'm pretty sure I must have been missing something.

Anyway, I'll have to take a long hard look at the code and get back to you.

@teohhanhui
Copy link
Contributor

Earlier you said:

it's only the use of changed on multiple Arc<DocHandle> that could turn out confusing, so at this point it's only a hypothetical problem.

But now you also said:

If you were to transact in another task or thread, the expectation would break down(unless you used something like changed to synchronize).

Does using multiple cloned DocHandles (not Arc<DocHandle> - as far as I can tell we never use that anywhere) lead to out-of-sync views when updating / querying the document from different threads / tasks?

@teohhanhui
Copy link
Contributor

If I understand the code in https://github.com/automerge/automerge-repo-rs/blob/847d0ff53ace756599eaf5380740b9c2ab356b1b/src/dochandle.rs correctly, last_heads is only relevant when using changed, so it's not relevant for our use case. And even if someone calls changed on the DocHandle we return from EntityManager::doc, that's already cloned. It will be the caller's conscious decision if they do decide to put it behind a shared pointer.

So I'm not really sure what's the problem...

@gterzian
Copy link
Author

gterzian commented Nov 9, 2023

Does using multiple cloned DocHandles (not Arc - as far as I can tell we never use that anywhere) lead to out-of-sync views when updating / querying the document from different threads / tasks?

Using cloned DocHandles would be correct, and it's using Arc<DocHandle> that could lead to unexpected behavior.

Arc - as far as I can tell we never use that anywhere

An Arc<EntityManager> that internally owns a DocHandle is similar, if you were to internally use changed on the handle.

So I'm not really sure what's the problem

At this point it's not a problem yet, but it may become a problem later, for example if you do anything with changed inside the entity manager. The problem at this point is rather one of design I would say. Putting the doc handle inside the Arc<EntityManager> goes against how I would use the doc handle, which is supposed to be owned by a given thread/task.

@teohhanhui
Copy link
Contributor

Is it correct to say that DocHandle should be !Sync then?

@gterzian
Copy link
Author

gterzian commented Nov 9, 2023

Is it correct to say that DocHandle should be !Sync then?

That pretty much sums it up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants