Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reworking Chapter 3.5 "3.2 Choosing A Database Crate" #191

Closed
weiznich opened this issue Mar 12, 2023 · 5 comments
Closed

Consider reworking Chapter 3.5 "3.2 Choosing A Database Crate" #191

weiznich opened this issue Mar 12, 2023 · 5 comments

Comments

@weiznich
Copy link

This part of the chapter is out-dated and contains quite a few interesting claims:

Firstly in the Compile-time Safety section it claims that both diesel and sqlx check your queries at compile time. That's true for trivial queries, but not for more complex ones or dynamic queries. Sqlx obviously cannot check queries that are constructed at runtime. This includes common things like having an IN expression with a dynamic value list. In addition there are known cases where the Sqlx query checking reports the wrong results, as that for example happens for LEFT JOIN. There the nullability of expressions coming from the joined table is inferred incorrectly. In contrast diesel is able to check these cases correctly as well. In addition diesel also allows to build compile time checked queries that include conditional only at runtime known parts. Claiming that these guarantees seems to be a bit misleading.

Secondly for the "Query Interface" part you describe that diesel uses a DSL while the other two crates use plain SQL. This section is likely that one that is influenced most by personal taste. While that it is correct that diesel uses a DSL, I do not thing that this gives the reader a honest way to evaluate how the diesel provided DSL looks like. Yes, diesel provides a DSL, but one that's close tho SQL. That means that yes you cannot transfer that code easily to any other framework without rewriting query queries. But you don't need to spend a huge amount of work to learn that DSL if you already know SQL. In addition transferring plain SQL queries from one framework to another one also normally requires basically rewriting at least all the locations that emit a query.
Later you claim that with diesel you might need to end up to write SQL for expression complex queries as a downside. You link the "Extending Diesel" guide there, which specifically shows how to extend diesel in such a way that you getting the compile time grantees given by diesel for custom extensions as well.

Thirdly the "Async Support" section. There is quite a lot to write here. So let's first start with the easy things: Diesel supports async now via the diesel-async crate. So as of today (and quite some time back) that section is factually incorrect. That written: You write that diesel does not roll out async support in the near future (at the point of writing). That was even back than not correct. Given the discussion in the linked issue I would even say that it was back something between a gross simplification and a clear lie. Please correct that!
On a more fundamental review this section does a really bad job explaining why you even need an async database interface in the first place. It talks about some general reasons why you might want to use an async web framework, but it does not really explain why your database library needs to be async as well. It states nebulously that using an async database library "give you less headaches", which is just not correct. There are well known solutions to prevent the outlined problem via an async connection pool (See deadpool-diesel) or similar patterns. Point is: It is relatively easy to prevent that from happening and there are existing crates for that. Beside of that: As that topic comes up again and again and again from a certain async hyper focused kind of people: Do you have any numbers to backup that claim at all? I haven't seen a good rust/diesel specific benchmark for that yet, which make all these async performance claims about database connections feel like magic fairy powder.

All that written: I do not request you to update your whole book there. It's totally fine to say: I'm using sqlx because I'm liking their interface more or something like that. Just do not try to make up claims about alternatives that are untrue.

@LukeMathWalker
Copy link
Owner

LukeMathWalker commented Mar 12, 2023

Firstly in the Compile-time Safety [...]

Happy to point that sqlx's mechanisms have some limitations, but I disagree that the examples you bring up disqualify the statement "sqlx provides compile-time guarantees on query correctness".

Yes, diesel provides a DSL, but one that's close tho SQL.

I see where you are coming from, but a DSL is still a DSL. It can be easy to learn, because it's close to SQL, but it's still not the same as working with SQL as its primary interface.
You are rehashing the same arguments used by almost all ORMs, no matter the language. And I fundamentally disagree with the tradeoff being made and the value it provides to users.
Is it a subjective statement, based on personal experience using both libraries? 100% yes, as many others in the book when performing similar choices both in terms of crate selection and architectural patterns. Which is why "opinionated introduction" is right there on the book cover, instead of having to be a prefix to every paragraph.

Later you claim that with diesel you might need to end up to write SQL for expression complex queries as a downside. You link the "Extending Diesel" guide there, which specifically shows how to extend diesel in such a way that you getting the compile time grantees given by diesel for custom extensions as well.

The point being made is not about compile-time guarantees - it's about making sure that users don't get the false impression that by learning the DSL they will never have to interface with SQL. It's a classic example of the 200% problem - you had one problem (database queries) and you transformed it into two problems (learning SQL and learning a custom DSL).

Diesel supports async now via the diesel-async crate. So as of today (and quite some time back) that section is factually incorrect.

Happy to make a correction.
But please, represent facts fairly: up until October 2022 (i.e. 5 months ago), diesel-async was a project licensed using AGPLv3 with a README that gave away strong "this is still an experiment" vibes. I'm glad to see it's now in a place where you are recommending it to others, but that's definitely not "quite some time back".

That written: You write that diesel does not roll out async support in the near future (at the point of writing). That was even back than not correct. Given the discussion in the linked issue I would even say that it was back something between a gross simplification and a clear lie. Please correct that!

That section will go away given the correction above, but I disagree that it was "a gross simplification or a clear lie". I obviously can't know what you meant, but given what you wrote in that thread I stand by my interpretation.

It states nebulously that using an async database library "give you less headaches", which is just not correct.

There is a linked footnote, exactly where that sentence stands, with a much more detailed explanation of what the issue is:

Async runtimes are based around the assumptions that futures, when polled, will yield control back to the executor "very quickly". If you run blocking IO code by mistake on the same threadpool used by the runtime to poll asynchronous tasks you get yourself in troubles - e.g. your application might mysteriously hang under load. You have to be careful and always make sure that blocking IO is performed on a separate threadpool using functions like tokio::spawn_blocking or async_std::spawn_blocking.

It looks like we have a different definition of nebulous.

There are well known solutions to prevent the outlined problem via an async connection pool (See deadpool-diesel) or similar patterns.

The well-known solution is a crate that up until December 2022 averaged less than 50 downloads per day?
The well-known solution, when the book was written and pretty much every time this discussion pops up in the Rust community, is "be really careful and use spawn_blocking". I'm happy to see there is now something more structured that is less likely to see users end up with mysterious hangs.


I'll make what I believe to be the necessary corrections ahead of publishing the next revision (~in a month or two).

Said that, I really don't appreciate the tone and many of the adjectives you used to characterise my work, my perspective, or my intention. I'm happy to amend other factual issues in the book that might have arisen since the ecosystem shifted since it was first published, but I have honestly no intention whatsoever of engaging on a debate with you here (or elsewhere).

@weiznich
Copy link
Author

First of all thanks for your fast answer.

I see where you are coming from, but a DSL is still a DSL. It can be easy to learn, because it's close to SQL, but it's still not the same as working with SQL as its primary interface.
You are rehashing the same arguments used by almost all ORMs, no matter the language. And I fundamentally disagree with the tradeoff being made and the value it provides to users.
Is it a subjective statement, based on personal experience using both libraries? 100% yes, as many others in the book when performing similar choices both in terms of crate selection and architectural patterns. Which is why "opinionated introduction" is right there on the book cover, instead of having to be a prefix to every paragraph.

As already written before: I'm totally fine with your own conclusion to prefer a SQL based solution there. What I'm looking for is more another text clarification that this is personal taste thing. It's fine to have that on the book cover, but people tend to not reading the whole thing but just pointing to parts claiming wild things based on extracts.

The point being made is not about compile-time guarantees - it's about making sure that users don't get the false impression that by learning the DSL they will never have to interface with SQL. It's a classic example of the 200% problem - you had one problem (database queries) and you transformed it into two problems (learning SQL and learning a custom DSL).

Well in that case my objections about DSL vs SQL are important. This argument would be relevant if your would need to learn a DSL that is totally unrelated to SQL, but that's not the case for diesels DSL. Again if you know SQL you should be able to pick up the DSL in a few hours. So it's not a 200% problem. (To be clear: I do believe that you need to know at least basic SQL anyway to write efficient code with diesel as well)
Otherwise: I do not thing you compare the same thing here. You compare building an abstracting with writing a complete query. If you just want to write a query with diesel you would use diesel::sql_query, if you want to write an abstraction with one of the other crate you would need to write the basic query builder + whatever corresponds to this diesel extension.

Happy to point that sqlx's mechanisms have some limitations, but I disagree that the examples you bring up disqualify the statement "sqlx provides compile-time guarantees on query correctness".

I'm not claiming that sqlx provides no compile time guarantees. I'm just saying that you cannot say that the guarantees provided by both crates are comparable. You probably shouldn't summarize that as both crates provide these guarantees. Maybe just be more detailed there and state that sqlx provides these guarantees in that cases, while diesel provides additional guarantees in these cases? Or summarize it in the table as sqlx: compile time guarantees for static queries, diesel: compile time guarantees for static and dynamic queries?

I think my largest problem here is that on the one hand you are quite opinionated about a DSL that's really close to SQL vs SQL, but on the other hand you just skip over these differences in what guarantees are provided.

That section will go away given the correction above, but I disagree that it was "a gross simplification or a clear lie". I obviously can't know what you meant, but given what you wrote in that thread I stand by my interpretation.

Please provide a link where I or a different diesel core team member wrote that this is not planed.

Other than I consider that as gross simplification to write that something is not planed as the linked discussion clearly outlines technical problems and states: These need to be solved and to our best knowledge they cannot be solved with the current language limitations. We also stated several times that we are open for solutions/contributions there and discussed several solutions with potential contributors. There is an important difference between something is not planed and something is not possible due to language limitations/nobody has figured out how to make it work given a set of constraints. (Also it's always a gross simplification to "summarize" a discussion spanning pages in a single sentence)
I will not argue about the point that diesel did not provide an async interface back than and it would be totally fine if you just wrote that. I just consider that "not planned" part as wrong. You should not have written that without actually checking this claim with the maintainers.

But please, represent facts fairly: up until October 2022 (i.e. 5 months ago), diesel-async was a project licensed using AGPLv3 with a README that gave away strong "this is still an experiment" vibes. I'm glad to see it's now in a place where you are recommending it to others, but that's definitely not "quite some time back".

First of all: Using AGPLv3 as license or stating that it is experimental is something different that not existing or not planed.

To be clear here: It's still not in a state that I consider as good due to the language level issues, but at that point it's on par to what other crates promote as "that's production ready" regardless of the issues around cancellation safety and transactions. (To be super clear here: The issues outlined in this comment are still not solved by any crate, especially if you consider cancellation as well). That's an issue that in unsolved for all async database connection crates and that cannot be solved without language level changes.

The well-known solution is a crate that up until December 2022 averaged less than 50 downloads per day?
The well-known solution, when the book was written and pretty much every time this discussion pops up in the Rust community, is "be really careful and use spawn_blocking". I'm happy to see there is now something more structured that is less likely to see users end up with mysterious hangs.

Ok, let's break that down: deadpool-diesel is obviously an example for a crate implementing this solution. Sure it wasn't super popular back than when you wrote the book. My point here is more to demonstrate that such an abstraction is more or less trivial to build.
Just to link a solutions that is much older than your book or the even older than async/await in rust: https://github.com/TechEmpower/FrameworkBenchmarks/blob/ef202bb6ef535086ccf94d0a4064548fe41b4ca8/frameworks/Rust/actix/src/db.rs. So it's not that there is "now" as solution that prevents this problems. It was already there at the time you wrote your book.
To state that again: My argument here is that this isn't the blocker you make people believe with your wording. Solutions for this are known for long time.

There is a linked footnote, exactly where that sentence stands, with a much more detailed explanation of what the issue is:

   Async runtimes are based around the assumptions that futures, when polled, will yield control back to the executor "very quickly". If you run blocking IO code by mistake on the same threadpool used by the runtime to poll asynchronous tasks you get yourself in troubles - e.g. your application might mysteriously hang under load. You have to be careful and always make sure that blocking IO is performed on a separate threadpool using functions like [tokio::spawn_blocking](https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html) or [async_std::spawn_blocking](https://docs.rs/async-std/1.6.3/async_std/task/fn.spawn_blocking.html).

It looks like we have a different definition of nebulous.

I'm aware of that footnote. My point is if that's the only reason why you require an async database library that reasoning is quite weak for the reasons outlined above. At the time of your writing there were known solutions to handle this problem with diesel, so I did not consider that as your full reasoning why you believe an async database library is advantageous. These potential other reasons are why I called your reasoning nebulous. Seems like I assumed to much in that case.

Said that, I really don't appreciate the tone and many of the adjectives you used to characterise my work, my perspective, or my intention. I'm happy to amend other factual issues in the book that might have arisen since the ecosystem shifted since it was first published, but I have honestly no intention whatsoever of engaging on a debate here (or elsewhere).

I'm sorry that you feel that way about this post, but please consider my perspective as well. I feel my work misrepresented by promoting things I do not consider as correct and stating things I've never said. I hoped that you or someone else would correct that on your own, but as your work is often used as the source to claim things about diesel that are not correct I finally filled this issue. Hopefully we can resolve this without trying to blame each other.

@Bodobolero
Copy link

Bodobolero commented Mar 12, 2023

I am a reader of the book and from the whole approach of the book it was clear from the beginning that it is an opinionated book.
I still love the book due to its approach.
I think one shouldn‘t be offended as an author of another crate that was not picked by the book author and also accept that someone who takes the extra time to write a book like this (which is a huge effort) can not spend endless time on researching all the facts and follow all the opiniated discussions about crates, their histories, revisions etc.
As I reader I can only say that I fully understand that I can not take every opinion in the book as the total truth but need to research the crate ecosystem myself.

Thus pursuing every written quotation about your own work like an evangelist in a tone with orders like „Please correct that!“ seems wrong to me too, @weiznich

Best regards, and thanks to Luca for writing the book

@weiznich
Copy link
Author

@Bodobolero It's not about criticizing the conclusions Luca has made or even being picky about why they did not choose diesel. It's a totally fine decision to use sqlx there and I can even understand how Luca came to that conclusion.

I filled this issue mainly for the following reasons:

  • Things have changed, diesel-async exists since quite a while
  • Make Luca aware of the fact that the wording around claiming that async diesel is not planed is misleading and that I feel my work misrepresented here. I personally would just have stated that there is no async diesel version and just skipped any speculation around whether that will change or why that's the case. Otherwise they would need to summarize that discussion in greater detail or just check with me or any other diesel core team member why that's the case.
  • Make Luca aware of the different guarantees provided by diesel and sqlx. I feel that these fact might be important to decide which crate to use for some use-case. This is a recommendation based on the feedback I get from some diesel users.

In addition there is some general discussion about async in there. I must admit that I believe that some people generally tend to assume that async will make things better for no obvious reason. That's why I'm always asking for numbers there.

That written, there is one thing I disagree in your comment: Spending time and effort on researching is exactly what I would expect from someone that writes a book about something. In fact due to my own career I know how much effort it takes to get such things correct and what others expect there. I know that this take more time and effort than most people expect, but exactly that's what differentiate a good writing from something else.

@LukeMathWalker
Copy link
Owner

LukeMathWalker commented Mar 12, 2023

Quoting straight from the comment you linked:

Again this does not mean we are not willing to implement an async interface only that we are not able to do that without changes to rust itself.

I assume you'd be more satisfied with interpreting that as "they are not planning to move forward with an async interface until some of what they consider key blockers are removed at the language level". But, on a practical level, it doesn't change the conclusion - there was no first-party support for an async interface in diesel and it was not going to materialize in the medium/short-term. Conclusion which proved correct in hindsight - diesel-async came out more than two years after.

But I fundamentally agree with you - writing a sufficiently detailed summary of that discussion would have required more space than I believed to be worth dedicating to the subject in that section of the book, which is why there is a direct link to the conversation.

I have nothing else to add on the other points being mentioned that was not answered in my first comment and, as already mentioned, I'm not interested in engaging in a debate here considering how you chose to articulate your positions and opinions. I'll lock the thread.

Repository owner locked and limited conversation to collaborators Mar 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants