-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Add stream_without_transaction #4526
Conversation
Hi, Please correct me if I'm misunderstanding. But I believe this is more like a pagination helper than a streaming functionality. If I'm not misremembering, the repo stream is meant to wrap the cursor functionality of the database. So it is quite different. If I'm not misinterpreting your suggestion, there are some nice libraries to help with pagination like flop: https://github.com/woylie/flop. |
Hi Greg, You are right, this is more of a paginated stream. Though I see
The function is fairly small but currently the only way to stream a whole table is to use either Maybe if the function should not be added to Ecto itself we should update the docs so users are aware of the implications of long running transactions. I personally believe this is a nice QOL addition to Ecto. Could also update |
I believe there is a bit of a conflation happening between streaming and retrieving things in smaller chunks. The idea behind stream is you take a single query and get the results in chunks. This is supported in SQL databases like Postgres and MySQL using cursors. The transaction restriction is actually not imposed by Ecto but by those databases. They require a transaction because a single statement can't be split amongst multiple transactions. The streaming in your proposal is more of a choice you made to implement batching. It is actually fetching the results in a batch manner using more than one query. Instead of using |
Good point. Yeah it is more batching my function should be named Should I go ahead and close this here now then or would you want me to add some gotchas/pitfalls to |
I think it's good advice but very specific to Postgres in regards to the vacuuming. I'm not sure if we usually put that advice in Ecto SQL or here. I will take a closer look. |
Maybe you already know about this, but one of the more insidious things I ran into with long transactions was this: https://www.cybertec-postgresql.com/en/streaming-replication-conflicts-in-postgresql/. Basically you have a replica that is using a row in a transaction and the main server sends you an update to delete that row. If your query isn't done after a certain amount of time it's cancelled. |
For streaming outside of a transaction, I resorted to simply using a cursor in postgres to do this. So far I have not had issues. But for other databases cough sqlite cough we don't have cursors. |
Agreed with everything @greg-rychlewski said. This is not the same as the streaming functionality and it emits different results. Given this can be implemented fully on top of Ecto Repos, it does not require new functionality at the adapter level, I suggest implementing it as a library. :) Thank you. |
Hi,
This is a draft proposal to add a new function called
stream_without_transaction
.Currently
Repo.stream
must be run inside an transaction for Postgres. This can be an issue for multiple reasons:Say we have a huge users table and we want to sync the users subscription status once per day, we would probably write it as following:
The issue is that the transaction runs very long. And long running transactions are not good: All the dead tuples that became dead after the transaction started cannot be vacuumed until the transaction completes. This is an issue when you have tables that are being updated frequently and need to be vacuumed often. Long transactions will contribute to table bloat as such.
The new function
stream_without_transaction
solves that, we can use that based on two approaches:For smaller tables we can just use offset because the costs of using offset is negligible for small tables:
next
function which returns a new query which will get us the next "page"I have not bothered with adding tests/docs as I wanted to first see whether the maintainers agree with adding such a function.
Let me know what you think! :)