-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Log, Checkpoints and Run-Locks #166
base: main
Are you sure you want to change the base?
Conversation
@khaeru, this PR and the open questions are likely interesting to you, too. Please chime in if you know the answers/have some preferences for those :) I imagine people would not be happy to wait an unknown amount of time on someone else releasing a lock, so my guess is: if you need a lock but can't get it, we should immediately raise an error. |
Okay, will try! Here is some info that I hope is useful, maybe not direct answers to questions but context that might make the answers/requirements more clear. First some things I am not 100% clear on, but just stating here so I can be corrected if I've misunderstood:
Some comments about requirements from the message_ix side:
In terms of locking/versioning/etc., here are some example cases and what I think we would expect:
In general:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #166 +/- ##
=======================================
+ Coverage 87.0% 87.1% +0.1%
=======================================
Files 230 235 +5
Lines 8170 8711 +541
=======================================
+ Hits 7112 7593 +481
- Misses 1058 1118 +60
🚀 New features to boost your workflow:
|
No! Checkpoints connect a particular run (-id) to a transaction, they serve as points to roll back to, but i think this might be irrelevant for the message integration anyway.
The revert logic is always the same: select the relevant objects from the version table at the particular transaction, then delete all current data and insert the old data. For the checkpoint we get
Ok good, i think this approach is preferable for me as well. Example cases: 1 and 2 seems doable with the same logic, but that logic is out of scope for this PR. 3.: Okay, the timeout needs to be implemented, will need to look into that.
I agree, unfortunately all the names i can think of are taken. PSQL even has "checkpoints" as well, so in fact all these terms ('transact(ion)', 'commit', 'checkpoint') have (sometimes multiple) other meanings. |
Hi, the first version of this feature is available on this branch.
Further I need clarification on some issues by @danielhuppmann and @khaeru. Details follow. Regions and UnitsAs of now regions and units are tracked as versioned objects, but not rolled back with a run. This means, the following code will throw a region1 = platgorm.regions.create("Region 1")
platgorm.regions.create("Region 2")
run = platform.runs.create(model, scenario)
data_with_region1 = pd.DataFrame(...) # Has IAMC Data that references "Region 1"
data_with_region2 = pd.DataFrame(...) # Has IAMC Data that references "Region 2"
with run.transact():
run.iamc.add(data_with_region1)
run.checkpoints.create("Add data with region 1")
run.iamc.add(data_with_region2)
run.checkpoints.create("Add data with region 2")
run.iamc.remove(data_with_region1)
run.checkpoints.create("Remove data with region 1")
platform.regions.delete(region1)
with run.transact():
# Revert to "Add data with region 1"
run.checkpoints[1].revert() # Will throw an error!! This seems good to me, because if incorrect regions were deleted, we do not want to implicitly create them again when rolling back a run. Automatic RollbackRight now, every time a run exits the with run.transact():
run.iamc.add(test_data_1)
run.checkpoints.create("Add test data 1")
run.iamc.add(test_data_2)
# the run will now contain only test_data_1
assert run.iamc.tabulate() == test_data_1 Is that right or should the data stay? Client-Side Locking MechanismThe run lock is now implemented in the facade layer using the two backend methods
Hence, the @contextmanager
def transact(self) -> Generator[None, None, None]:
self._model = self.backend.runs.lock(self._model.id)
self.owns_lock = True
yield
checkpoint_df = self.checkpoints.tabulate()
checkpoint_transaction = int(checkpoint_df["transaction__id"].max())
assert self._model.lock_transaction is not None
if checkpoint_transaction > self._model.lock_transaction:
self.backend.runs.revert(self._model.id, checkpoint_transaction)
else:
self.backend.runs.revert(self._model.id, self._model.lock_transaction)
self._model = self.backend.runs.unlock(self._model.id)
self.owns_lock = False A more sophisticated and secure locking mechanism is a lot more work, as it requires me to keep track of the users that have locked runs. Posponed for an indefinite time, alright? LockRequired ExceptionSince the client now keeps track of which locks it owns, we can throw exceptions in the facade layer when an operation is attempted that requires a lock.
A list of operations in the optimization part will have to be worked out as well, can you think of anything else? Did I understand the assignment correctly? Run Meta Indicators/DataAre not covered by the rollback logic or the LockRequired exception, should this be the case? I'm sure as soon as I post this comment I will think of something else, but for now: please advise! |
As mentioned, message_ix/current ixmp test suites and usage don't need this, so no opinion.
Stay? This is what I meant above about "following semantics of the standard library". See How to use the connection context manager in the docs for the
Yes, that's fine. If something gets inadvertently locked by another user (and it shouldn't if we're encouraging users to use the context manager), it's already a great improvement that
This is more for @glatterf42 I think. Just to clarify, one point above was that creating a Run instance at all should be safe against collisions. That is to say, rather than locking a existing Run while certain things are done to it, two simultaneous attempts to create new (not-yet-existing) Runs that want the same or similar identifiers should be safe. If that's already achieved some other way than the locking mechanism you've written, e.g. by lower-level transaction control in the DB layer, then we're fine.
I'm not aware of people using these through |
Thanks @meksor, this is very nice!
This makes sense.
Would it make sense that the
Agree with @khaeru, that's ok for our use case.
Please add creation and deletion of meta indicators to the operations that are change-logged and require a lock. Adding the optimization-items to the change-log-and-locks-logic can be left for a follow-up PR (led by @glatterf42), in my opinion. |
This PR is meant to contain the three major parts of what we internally often call the "versioning" feature.
Change Log
The change log is an in-database representation of all transactions run on a given ixmp4 platform.
This requires the generation of "version tables" from database models (by adding columns and removing constraints) and a central "transaction" table tracking all transactions committed, referenced by each version table.
This part of the feature is logically/structurally built around the concepts in a database, so it might actually cover more functionality than needed (f.e. there is no way to update a region, but the system tracks update statements on the region table).
Currently this is handled by
sqlalchemy-history
as it has support for alembic migrations and can be replaced withsqlalchemy-continuum
which would enable database-native versioning for postgres via postgres triggers.Enabling and disabling these features on a per-platform basis (to avoid performance problems with sqlite) is still a little unclear to me and will need a little more time investment.
The tricky parts:
Most of the overhead can be alleviated by using database-native table triggers, but these are hard to write and maintain so I would suggest relying on a library for that as well
Versioning of the data points table puts me in front of a choice:
They can either:
a) Prevent deletes entirely for all regions, variables and units - hiding the unused ones when
SELECT
ingor
b) Version every single table in the database that is subject to a relation and query the version variant of the related table when rolling back
I went with option b).
Run Checkpoints
To be able to distinguish different workflow steps in a sea of "transaction" records (one ixmp4 api call might create multiple records) we will need to add another table to ixmp4 which references run__id and transaction__id and stores a user-supplied message.
We now have /four/ relevant tables.
The original table xy.
The version table xy_version.
The transaction table transactions.
The new checkpoints table.
May also include a timestamp and user information.
The last table (checkpoints) can be used to query all workflow steps for a specific run and to revert to said steps by looking at which transaction record they point to and replaying or rolling back the change history.
This would probably look kind of like this:
Run Locks
To make sure that a user does not include unintended changes by a different user, a locking mechanism is needed for each "run" object/row. A user shall be able to acquire a lock, preventing all other users from writing to objects related to the run.
Still unclear to me:
If a user tries to acquire the lock to a locked run, should the operation:
a) block until the other user releases the lock
b) fail immediately
Also, I remember the uploads crashing and not releasing the lock being a problem in the past. How do we want to handle this in the future?