chore(dao/command): Add transaction decorator to try to enforce "unit of work" by john-bodley · Pull Request #24969 · apache/superset

john-bodley · 2023-08-11T23:44:29Z

SUMMARY

This is a PR I've had on the back-burner for many months, but have struggled with on numerous occasions—often in part due to the flakey/delicate tests (and their associated frameworks). The initial desire was to fulfill the approach outlined in [SIP-99B] Proposal for (re)defining a "unit of work", but alas I failed, in part due to the challenges trying to untangle Superset logic which inherently is not overly conducive to adhering to the construct that a command should serve as a "unit of work".

Why is that? It's complicated, but asynchronous logic does not help given that a Celery task running within the confines of another command needs to read a previously committed state given the READ COMMITTED isolation level. Issues like this could likely be overcome by having two commands—prepare and execute—as opposed to a single execute command.

The TL;DR is this PR should likely be interpreted as the first phase of SIP-99B. The general framework holds, i.e., DAOs no longer commit and a transaction decorator is used to wrap any command which perform either an INSERT, UPDATE, or DELETE.

Finally, I apologize for the size of the PR. I struggled to downside the footprint, but once you start enforcing that DAOs should not commit, then the files which touched begins to snowball.

Regrettably my time (for now) working on Apache Superset is likely drawing to a close, so for completeness I thought there was merit in sharing the incremental diff for what I was hoping to achieve in case @michael-s-molina @villebro et al. wanted to carry the baton on.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

Has associated issue: [SIP-99B] Proposal for (re)defining a "unit of work" #25108
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

john-bodley · 2023-08-11T23:53:52Z

superset/daos/chart.py

We're really inconsistent with our error handling. The BaseDAO.delete method wraps all SQLAlchemyError errors as DAODeleteFailedError whereas here they are left as is.

john-bodley · 2024-05-21T21:17:32Z

superset/examples/energy.py

Obvious comment and thus not needed.

john-bodley · 2024-05-21T21:17:50Z

superset/examples/helpers.py

Obvious comment.

john-bodley · 2024-05-21T21:18:24Z

superset/examples/tabbed_dashboard.py

Obvious comment.

john-bodley · 2024-05-21T21:18:39Z

superset/examples/world_bank.py

Obvious comment.

john-bodley · 2024-06-20T22:55:36Z

superset/commands/key_value/create.py

@michael-s-molina what's your thinking about flushing? Sadly there's no DAO used here and thus it's required given than on the next line the entry.id is used, but in general should the DAO flush or should it be left up to the caller who is context aware?

@john-bodley @michael-s-molina should we create a DAO for the Key Value entities, and leave permalinks et al as the commands? This could clarify things. I can do that refactor if needed.

@michael-s-molina what's your thinking about flushing? Sadly there's no DAO used here and thus it's required given than on the next line the entry.id is used, but in general should the DAO flush or should it be left up to the caller who is context aware?

There's no right or wrong answer but I prefer to execute flush operations only when necessary to minimize database load. That means that the command is responsible to call flush when necessary.

@john-bodley @michael-s-molina should we create a DAO for the Key Value entities, and leave permalinks et al as the commands? This could clarify things. I can do that refactor if needed.

This would definitely improve things and make the code more similar to rest of the application.

@john-bodley @michael-s-molina here's a PR to convert the KV commands into a DAO: #29344

john-bodley · 2024-06-20T23:05:51Z

superset/commands/sql_lab/execute.py

@michael-s-molina here's a better example where db.session.flush() is called in the command given that it's not invoked by the underlying DAO.

john-bodley · 2024-06-21T21:40:19Z

tests/unit_tests/utils/lock_tests.py

This isn't needed for these tests and causes issues when used with a nested transaction when we want to rollback to a previous SAVEPOINT.

We do need to ensure that the lock is committed to the metastore somehow. But let me do the key value DAO refactor first, that might help clean up this test. In the interim, feel free to relax/disable the test if needed.

I think I disagree with this change (I may not have been able to accurately communicate why this is needed). But no worries, I will address this in #29344 after this PR lands and try to document the logic better.

john-bodley · 2024-06-23T16:02:40Z

tests/integration_tests/fixtures/unicode_dashboard.py

As the name suggests the one_or_none() method could return None.

john-bodley · 2024-06-23T16:04:42Z

tests/integration_tests/tags/dao_tests.py

This mimics the logic in the code.

codecov · 2024-06-26T04:02:57Z

Codecov Report

Attention: Patch coverage is 89.57816% with 42 lines in your changes missing coverage. Please review.

Project coverage is 83.89%. Comparing base (76d897e) to head (51cb1f6).
Report is 1094 commits behind head on master.

Files with missing lines	Patch %	Lines
superset/extensions/pylint.py	0.00%	8 Missing ⚠️
superset/cli/update.py	0.00%	3 Missing ⚠️
superset/daos/tag.py	50.00%	3 Missing ⚠️
superset/cli/examples.py	0.00%	2 Missing ⚠️
superset/cli/main.py	0.00%	2 Missing ⚠️
superset/cli/test.py	0.00%	2 Missing ⚠️
superset/commands/database/ssh_tunnel/update.py	81.81%	2 Missing ⚠️
superset/commands/importers/v1/examples.py	0.00%	2 Missing ⚠️
superset/sqllab/sql_json_executer.py	33.33%	2 Missing ⚠️
superset/commands/dataset/duplicate.py	96.66%	1 Missing ⚠️
... and 15 more

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #24969       +/-   ##
===========================================
+ Coverage   60.48%   83.89%   +23.40%     
===========================================
  Files        1931      518     -1413     
  Lines       76236    37468    -38768     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31434    -14680     
+ Misses      28017     6034    -21983     
+ Partials     2105        0     -2105

Flag	Coverage Δ
hive	`49.10% <47.89%> (-0.06%)`	⬇️
javascript	`?`
mysql	`77.41% <88.83%> (?)`
postgres	`77.52% <89.08%> (?)`
presto	`53.72% <47.89%> (-0.08%)`	⬇️
python	`83.89% <89.57%> (+20.40%)`	⬆️
sqlite	`76.99% <86.10%> (?)`
unit	`59.65% <56.32%> (+2.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

john-bodley · 2024-06-27T02:23:47Z

tests/integration_tests/charts/data/api_tests.py

This test only seems to fail for the test-postgres-presto workflow.

unrelated comment: at some point we should replace Presto with Trino, as that's really where the broader community is at right now..

michael-s-molina

Thank you for all the hard work here @john-bodley. Even though we were not able to fully implement SIP-99B, this PR is a step in the right direction and removes a lot of unnecessary code. I left some first-pass comments:

tests/integration_tests/databases/api_tests.py

superset/sqllab/sql_json_executer.py

tests/integration_tests/celery_tests.py

michael-s-molina · 2024-06-27T11:59:52Z

superset/utils/decorators.py

+
+            try:
+                result = func(*args, **kwargs)
+                db.session.commit()  # pylint: disable=consider-using-transaction


Because we were not able to use begin_nested here, do you see any point where previously we had only a flush that could be potentially rollbacked and now we have a @transaction which will effectively commit? Something like:

Previously:

CommandA: try: do_something() CommandB() commit() except Exception: rollback() CommandB: do_something() flush()

Now:

@transaction CommandA: do_something() CommandB() @transaction CommandB: do_something()

@michael-s-molina given that these @transaction decorators are defined at the "unit of work" level I think we're ok, i.e., I'm not sure where we ever had nested commands where one never committed and the outer explicitly rolled back.

I think for now we should consider commands as the unit of work, meaning we should assume they always commit at the end. If this is not the case we should probably introduce a sort-of notion of a sub-command, that doesn't commit. But let's leave that for a follow-up.

michael-s-molina · 2024-06-27T12:44:05Z

superset/commands/database/update.py

            database.set_sqlalchemy_uri(database.sqlalchemy_uri)
            ssh_tunnel = self._handle_ssh_tunnel(database)
            self._refresh_catalogs(database, original_database_name, ssh_tunnel)
        except SSHTunnelError:  # pylint: disable=try-except-raise


I believe in this case you don't need the try/catch as there's no event logging or anything in the catch block.

superset/commands/sql_lab/execute.py

… of work"

villebro

This is a HUGE step in the right direction, and finally introduces a coherent pattern for dealing with complex ORM handling during the request lifecycle. Given that this fundamentally changes how the backend operates, I fear there may be significant risk for regressions here. However, those should be easy to fix now that we have consistent flushing, committing and rollbacking. If nothing else, these potential regrssions will highlight critical gaps in our test coverage. Therefore, I feel the benefits of this change far outweigh the intermediate regression risks it introduces.

villebro · 2024-06-28T07:11:04Z

pyproject.toml

 commands =
    superset db upgrade
    superset init
+    superset load-test-users


Random observation that's not directly related to this PR: I've always felt it's weird that the core application has functionality for loading test users. I feel at some point we should break that out into the test suite.

villebro · 2024-06-28T07:12:20Z

scripts/permissions_cleanup.py

    for pvm in pvms:
        pvms_dict[(pvm.permission, pvm.view_menu)].append(pvm)
    duplicates = [v for v in pvms_dict.values() if len(v) > 1]
-    len(duplicates)


What on earth was this? 🤔

villebro · 2024-06-28T07:51:06Z

superset/utils/decorators.py

+
+            try:
+                result = func(*args, **kwargs)
+                db.session.commit()  # pylint: disable=consider-using-transaction


I think for now we should consider commands as the unit of work, meaning we should assume they always commit at the end. If this is not the case we should probably introduce a sort-of notion of a sub-command, that doesn't commit. But let's leave that for a follow-up.

villebro · 2024-06-28T07:56:45Z

tests/integration_tests/charts/data/api_tests.py

unrelated comment: at some point we should replace Presto with Trino, as that's really where the broader community is at right now..

villebro · 2024-06-28T07:57:30Z

tests/integration_tests/dashboards/commands_tests.py

        }
        command = v1.ImportDashboardsCommand(contents, overwrite=True)
        command.run()
-        command.run()


villebro · 2024-06-28T08:01:29Z

tests/unit_tests/utils/lock_tests.py

I think I disagree with this change (I may not have been able to accurately communicate why this is needed). But no worries, I will address this in #29344 after this PR lands and try to document the logic better.

michael-s-molina

Thank you @john-bodley for addressing the comments. I agree with @villebro that the benefits greatly outweigh the risks here.

pull-request-size bot added the size/XL label Aug 11, 2023

john-bodley changed the title ~~John bodley dao nested session~~ chore(dao): Use nested sessions Aug 11, 2023

john-bodley commented Aug 11, 2023

View reviewed changes

john-bodley force-pushed the john-bodley--dao-nested-session branch 2 times, most recently from de2c324 to 37f0b24 Compare August 16, 2023 23:54

john-bodley force-pushed the john-bodley--dao-nested-session branch from 37f0b24 to 4e51d4d Compare August 19, 2023 05:53

john-bodley mentioned this pull request Aug 19, 2023

fix: Ensure SQLAlchemy sessions are closed #25031

Merged

9 tasks

john-bodley force-pushed the john-bodley--dao-nested-session branch from 4e51d4d to b5c2e81 Compare August 19, 2023 05:55

john-bodley force-pushed the john-bodley--dao-nested-session branch 4 times, most recently from 7b201dd to ef7bcd1 Compare January 30, 2024 01:32

john-bodley changed the title ~~chore(dao): Use nested sessions~~ chore(dao/command): Use nested sessions Jan 30, 2024

john-bodley force-pushed the john-bodley--dao-nested-session branch 11 times, most recently from f03d7b0 to 10dc755 Compare January 31, 2024 00:38

john-bodley force-pushed the john-bodley--dao-nested-session branch from 10dc755 to 092aa45 Compare February 13, 2024 05:08

github-actions bot added the api Related to the REST API label Feb 13, 2024

pull-request-size bot added size/XXL and removed size/XL labels Feb 13, 2024

john-bodley force-pushed the john-bodley--dao-nested-session branch 2 times, most recently from 82de210 to 7b651ef Compare February 13, 2024 23:16

john-bodley force-pushed the john-bodley--dao-nested-session branch from c2e2027 to 19afeb3 Compare May 21, 2024 19:56

john-bodley commented May 21, 2024

View reviewed changes

superset/examples/energy.py Outdated

Copy link

Member Author

john-bodley May 21, 2024 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious comment and thus not needed.

john-bodley commented May 21, 2024

View reviewed changes

superset/examples/helpers.py Outdated

Copy link

Member Author

john-bodley May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious comment.

john-bodley commented May 21, 2024

View reviewed changes

superset/examples/tabbed_dashboard.py Outdated

Copy link

Member Author

john-bodley May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious comment.

john-bodley commented May 21, 2024

View reviewed changes

superset/examples/world_bank.py Outdated

Copy link

Member Author

john-bodley May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious comment.

john-bodley force-pushed the john-bodley--dao-nested-session branch from 19afeb3 to 69a6e66 Compare June 12, 2024 16:15

john-bodley mentioned this pull request Jun 12, 2024

chore: Remove the need for explicit bubble up of certain exceptions #29235

Merged

9 tasks

john-bodley force-pushed the john-bodley--dao-nested-session branch from 69a6e66 to c71a3cb Compare June 13, 2024 18:57

john-bodley mentioned this pull request Jun 18, 2024

fix(key-value): use flush instead of commit #29286

Merged

9 tasks

john-bodley force-pushed the john-bodley--dao-nested-session branch 3 times, most recently from 51ce868 to ff38e15 Compare June 18, 2024 23:09

john-bodley commented Jun 20, 2024

View reviewed changes

john-bodley commented Jun 21, 2024

View reviewed changes

john-bodley commented Jun 23, 2024

View reviewed changes

tests/integration_tests/tags/dao_tests.py Outdated

Copy link

Member Author

john-bodley Jun 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mimics the logic in the code.

john-bodley mentioned this pull request Jun 24, 2024

chore(key-value): convert command to dao #29344

Merged

9 tasks

john-bodley commented Jun 27, 2024

View reviewed changes

michael-s-molina requested changes Jun 27, 2024

View reviewed changes

michael-s-molina reviewed Jun 27, 2024

View reviewed changes

superset/commands/sql_lab/execute.py Outdated Show resolved Hide resolved

chore(dao/command): Add transaction decorator to try to enforce "unit…

51cb1f6

… of work"

villebro approved these changes Jun 28, 2024

View reviewed changes

michael-s-molina approved these changes Jun 28, 2024

View reviewed changes

justinpark mentioned this pull request Jul 10, 2024

fix(dataset): Obscure error message on creation #29554

Open

9 tasks

betodealmeida mentioned this pull request Aug 30, 2024

feat: allow certain exceptions to commit #30067

Closed

9 tasks

ablanchard mentioned this pull request Sep 10, 2024

[4.1.0rc2] sqlalchemy InvalidRequestError: This nested transaction is inactive when trying to activate embedding on a dashboard #30216

Closed

3 tasks

betodealmeida mentioned this pull request Sep 25, 2024

feat: turn off autoflush on integration tests #30394

Closed

9 tasks

Conversation

john-bodley commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

Choose a reason for hiding this comment

Uh oh!

john-bodley May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

villebro Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michael-s-molina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

john-bodley Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

villebro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

john-bodley commented Aug 11, 2023 •

edited

Loading

john-bodley May 21, 2024 •

edited

Loading

villebro Jun 28, 2024 •

edited

Loading

codecov bot commented Jun 26, 2024 •

edited

Loading

john-bodley Jun 27, 2024 •

edited

Loading

villebro Jun 28, 2024 •

edited

Loading