Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix loose db namespace and move packed-buffer handling into loose ref store #263

Closed
6 tasks done
Byron opened this issue Nov 26, 2021 · 0 comments
Closed
6 tasks done

Comments

@Byron
Copy link
Member

Byron commented Nov 26, 2021

The loose ref db needs to loose its built-in namespace as it' won't work well in a multi-threaded context unless each thread has their own loose ref DB. The latter is possible, but will stop working once ref-table arrives. So we need to refactor.

Currently the loose ref DB is Sync, but only awkwardly usable due to its packed-buffer being outsourced into the callers responsibility.

git-repository works around this by putting packed-refs into a thread-local cache. The latter isn't optimal as it might well be shared across threads.

The new store should be Sync, be prepared for ref-table (and to unify both loose and ref under one API), and store packed-refs in a thread-safe fashion (if the feature is enabled in git-features, that is).

This is the architecture that was developed during the discovery in #259 .

https://github.com/Byron/gitoxide/blob/38e36a69f5fa6aae2f4cfee7faf11a10cea0f1d0/experiments/odb-redesign/src/lib.rs#L608-L676

Tasks

  • thread-local ref namespaces in git-repository
  • general ref store (basis)
    • learned that it's currently a little unclear how ref-table will work and handle its shared state. Will it offload this to parameters like the file store? Or will it keep things internal right away? The latter is the preference currently, and if so it would serve as its own handle while being able to efficiently update its internal data structures after writing.
  • file db with shared mutable state for it to be doing the right thing by default
    • keep previous packed parameter accessible in a somewhat lower level method for those who build abstractions on top.
    • make using cached shared packs possible, similar to snapshots.
  • get latest version of this back into link-git of radicle-link - in progress - using mailing list based workflow. Now just a few patches without the upgrade, so nothing else to be done here.
Byron added a commit that referenced this issue Nov 26, 2021
If the `threading` feature is set, the `threading` module will contain thread-safe primitives
for shared ownership and mutation, otherwise these will be their single threaded counterparts.

This way, single-threaded applications don't have to pay for threaded primitives.
Byron added a commit that referenced this issue Nov 26, 2021
Previously these were shared in the shared Repo instance, which makes
threaded applications impossible to remain deterministic across multiple
connections.

Now they are local to the thread, which allowed some methods to remove
their Result<> as they cannot fail anymore, the reason for this being
a breaking change.
Byron added a commit that referenced this issue Nov 27, 2021
…hip (#263)

Note that by that definition, I think the current `Easy` implementation
does it wrongly in some places, particularly when references are
handled.
@Byron Byron mentioned this issue Nov 27, 2021
11 tasks
Byron added a commit that referenced this issue Nov 27, 2021
…hip (#263)

Note that by that definition, I think the current `Easy` implementation
does it wrongly in some places, particularly when references are
handled.
Byron added a commit that referenced this issue Nov 27, 2021
Byron added a commit that referenced this issue Nov 27, 2021
This also takes over the existing modifiable buffer implementation
and gives it interior mutability to allow more parallel read-only
paths.
Byron added a commit that referenced this issue Nov 27, 2021
…n is overkill (#263)

Instead, it's better to simply let the Loose db handle its packed refs
buffer itself, because that's the only way it will ever be used, instead
of adding too much layering and duplication of what's essentially
boilerplate.
Byron added a commit that referenced this issue Nov 27, 2021
…which seems to work except for one caveat: Progress now demands to be
Sync even though there is no need and it shouldn't have to think that.

Have to go back and try to fix that, as it ultimately bubbles up to
every method that uses such method generically, including trait
bounds which really shouldn't have to be that strict.
Byron added a commit that referenced this issue Nov 27, 2021
Amazing, and unexpected, glad that worked out after all.
Byron added a commit that referenced this issue Nov 27, 2021
…Sync` (#263)

This helps to assure that thread-local computations always work with the
kind of types we provide. The ones that are carrying out actions are
notably not `Sync` anymore.

We cater to that by defining our bounds accordingly, but for those
who want to use other utilities that need Sync, using types like
`Repository` and `thread_local!()` is the only way to make this
work.
Byron added a commit that referenced this issue Nov 27, 2021
…uable (#263)

Because that way, the error type can be simpler as it contains errors
related to the packed buffer being initialized.

Now it's strangely on the side, but maybe one can also re-arrange errors
entirely.

Seems harder than putting the packed buffer where it belongs.
Byron added a commit that referenced this issue Nov 27, 2021
…only (#263)

That way, file databases can't be repositioned anymore, it's recommended
to recreate it if that's desired.
Byron added a commit that referenced this issue Nov 27, 2021
This is all thanks to the magic of ref-guard mapping, which indeed
transfers pretty well into the land of single-threads.
Byron added a commit that referenced this issue Nov 27, 2021
The packed buffer is now handled internally while loading it on demand.
When compiled with `git-features/parallel` the `file::Store` remains
send and sync.

The packed refs buffer is shared across clones and it's recommended
to clone one `file::Store` instance per thread, each of which can
use its own namespace.
Byron added a commit that referenced this issue Nov 28, 2021
…ked-refs buffer (#263)

It is instead read from the internally synchronized buffer, shared
across all instances.
Byron added a commit that referenced this issue Nov 28, 2021
All that without blocking on shared state.
The idea is that the first transaction can safe time, which at least
helps in the single-threaded case.
Byron added a commit that referenced this issue Nov 28, 2021
The latter is useful when iterating without blocking everyone else
or having issues with deadlocks if an iterator tries to find a reference
while the iteration is ongoing.
Byron added a commit that referenced this issue Nov 28, 2021
Byron added a commit that referenced this issue Nov 28, 2021
…refixed(…)` respectively (#263)

This way, it's possible to keep shared ownership of the packed buffer
while allowing the exact same iterator machinery to work as before.
Byron added a commit that referenced this issue Nov 28, 2021
Byron added a commit that referenced this issue Nov 28, 2021
That way, we avoid races around the mtime precision, which on some
systems isn't high enough, to provide a consistent view right after
a change.
Byron added a commit that referenced this issue Nov 28, 2021
`Reference::log_iter(…)` now is a platform instead of a forward iterator,
which requires a call to `.all()` to return the forward iterator like
previously.

`Reference::log_iter_rev(…)` was removed in favor of
`Reference::log_iter(…).rev()`.
Byron added a commit that referenced this issue Nov 28, 2021
The latter now returns a standard Platform to iterate over all
reflog entries from oldest to newest or vice versa.
@Byron Byron changed the title Fix loose db namespace and move packed-buffer handling into 'general' store Fix loose db namespace and move packed-buffer handling into loose ref store Nov 28, 2021
Byron added a commit that referenced this issue Nov 28, 2021
…ore (#263)

Instead this could be done by running all tests for the file store with
the type replaced to be the general store.

This means we have a normal run that runs with the default types, and
a general-store run that replaces the type used with the one for the
general store and it should all be homogenous. The latter is certainly
possible now that the loose ref store sports interior mutability
and a few platforms to get rid of some state.
Byron added a commit that referenced this issue Nov 28, 2021
…ut it (#263)

The question is really how reftable works and if all state can be shared
and sync, or if some should rather be thread-local (similar to what the
ODB is going to do).
Byron added a commit that referenced this issue Nov 28, 2021
At this low level, it's important to be clear about RefLogs and rather
force the caller to specify the ref-log mode. Technically it depends
on a few factors, `git-repository` deals with them, but certainly
shouldn't default to anything without being clear.
Byron added a commit that referenced this issue Nov 28, 2021
Byron added a commit that referenced this issue Nov 28, 2021
That way, abstractions can still be built that have other ways of
managing the packed-refs buffer, allowing it to stay more persistent.
Byron added a commit that referenced this issue Nov 28, 2021
…fer()` (#263)

This makes much clearer what it actually does, as previously it might
have been a stored packed buffer as well.
Byron added a commit that referenced this issue Nov 28, 2021
These methods allow using an own packed buffer, usually obtained through
`cached_packed_buffer()`.
Byron added a commit that referenced this issue Nov 29, 2021
…ow_packed() (#263)

This allows a stable/non-changing buffer to be used.
Byron added a commit that referenced this issue Nov 29, 2021
This makes the API complete as now there is a methods that uses the
internal buffer, or the provided one, where both can have its benefits.
Byron added a commit that referenced this issue Nov 29, 2021
@Byron Byron closed this as completed Nov 30, 2021
Byron added a commit that referenced this issue Dec 1, 2021
It's much slower than the read lock version, for sure, and inhibits
concurrency even more than what libgit2 does.
Byron added a commit that referenced this issue Dec 1, 2021
…cks (#263)

Here is an explanation for this:
radicle-dev/radicle-link#792 (comment)

It turns out the upgradable locks are only great if there is heaviest
contention, as they can be a little faster.
Byron added a commit that referenced this issue Dec 2, 2021
)

Multiple writers can come to the same conclusion and try to reload
a packed buffer. Now they will reassuure themselves this is still needed
after getting exclusive write access.

Notably, git doesn't do that as it uses snapshots, and libgit2 also
doesn't seem to do that. Technically this is a bit slower in the common
case due to the extra state call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant