You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am following the Dagster University course Dagster Essentials and encountered a race condition at the end of Lesson 4: Asset dependencies.
The assets have the following lineage:
taxi_trips and taxi_zones both write data to the DuckDB database and manhattan_stats as well as trips_by_week read from it. With DuckDB it is only possible to have one open read/write connection or multiple read connections at the same time. In my case a race condition was triggered due to multiple open open read/write connections at the same time.
It is possible to set read_only=True when connecting to the database in manhattan_stats as well as trips_by_week, to reduce the problem. However, it is still possible to get a race condition with taxi_trips and taxi_zones, since both of them need a read/write connection. Nonetheless, it is possible to circumvent the problem with IOManagers or with using op_tags and setting the maximum parallel execution of assets with this tag to 1. But they are more advanced features.
So maybe it would be a solution to have only read connections for manhattan_stats as well as trips_by_week and add a warning that in case of an error due to a DuckDB connection issue it should be sufficient to re-execute the pipeline?
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered:
Thank you @kornerc - your proposition of using read-only connectors and concurrency limits for partitioned assets is quite elegant. I agree that it would be worthwhile to add a note of this in the lesson, as I recall a few other community members running into this issue too.
I will make a ticket for us internally to update the lesson accordingly, and will update here once complete. Thank you!
What's the issue or suggestion?
I am following the Dagster University course
Dagster Essentials
and encountered a race condition at the end ofLesson 4: Asset dependencies
.The assets have the following lineage:
![Global_Asset_Lineage](https://private-user-images.githubusercontent.com/1033108/342399423-3c8a395e-5ed4-451c-9a19-68eb9f787a8d.svg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAzODg4ODYsIm5iZiI6MTcyMDM4ODU4NiwicGF0aCI6Ii8xMDMzMTA4LzM0MjM5OTQyMy0zYzhhMzk1ZS01ZWQ0LTQ1MWMtOWExOS02OGViOWY3ODdhOGQuc3ZnP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MDdUMjE0MzA2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MWQyN2MzNjQ2N2RlZjIwNmNhMDkxMzg0OTViMDUxN2YyMjc2NTIzNDBlY2RjOWNiYTAzMzUyMjVhM2FkMWMzNCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.EmUvRjQDvxPEadncXJGOd_8Q8L6yqq-H3GEiKHydeRk)
taxi_trips
andtaxi_zones
both write data to the DuckDB database andmanhattan_stats
as well astrips_by_week
read from it. With DuckDB it is only possible to have one open read/write connection or multiple read connections at the same time. In my case a race condition was triggered due to multiple open open read/write connections at the same time.It is possible to set
read_only=True
when connecting to the database inmanhattan_stats
as well astrips_by_week
, to reduce the problem. However, it is still possible to get a race condition withtaxi_trips
andtaxi_zones
, since both of them need a read/write connection. Nonetheless, it is possible to circumvent the problem with IOManagers or with usingop_tags
and setting the maximum parallel execution of assets with this tag to 1. But they are more advanced features.So maybe it would be a solution to have only read connections for
manhattan_stats
as well astrips_by_week
and add a warning that in case of an error due to a DuckDB connection issue it should be sufficient to re-execute the pipeline?Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: