Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically set PRAGMAs using connection query params #85

Conversation

luislavena
Copy link
Contributor

Introduce the flexibility to adjust certain PRAGMAs of the SQLite3 connection without having to hardcode those in your codebase (and wait for compilation).

This allows applications to use DATABASE_URL to dynamically fine tune their SQLite3 configuration.

The change complements #setup_connection that offers, via code, the option to perform queries on setup of each connection.

Only a few PRAGMAs necessary to allow more performant concurrent reads and reduce write locking.

These pragmas are detected and combined in a single SQL string to reduce to 1 the number of calls to sqlite3_exec C function.

There is no validation of supplied values as SQLite3 automatically ignores incorrect values for these pragmas.

Closes #84

References:

Introduce the flexibility to adjust certain PRAGMAs of the SQLite3
connection without having to hardcode those in your codebase (and wait
for compilation).

This allows applications to use `DATABASE_URL` to dynamically fine tune
their SQLite3 configuration.

The change complements `#setup_connection` that offers, via code, the
option to perform queries on setup of each connection.

Only a few PRAGMAs necessary to allow more performant concurrent reads
and reduce write locking.

These pragmas are detected and combined in a single SQL string to reduce
to 1 the number of calls to `sqlite3_exec` C function.

There is no validation of supplied values as SQLite3 automatically
ignores incorrect values for these pragmas.

Closes crystal-lang#84

References:
- https://www.sqlite.org/pragma.html
Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

What's the reason to prefix the query string arguments with an _?

The hash could be omitted but since it's only created if pragmas is used and not exposed I'm find starting as is.

@luislavena
Copy link
Contributor Author

luislavena commented Jan 16, 2023

Hello @bcardiff, didn't think it fully as I was following a bit what mattn/go-sqlite3 did, but also to differentiate those from connection pool options (Eg. initial_pool_size, max_pool_size, retry_attempts, etc), but I can change it to be exactly as the pragmas, which could simplify the code (and remove the case statement):

ALLOWED_PRAGMAS = {"busy_timeout", "cache_size", "foreign_keys", "journal_mode",  ...}
# ...

URI::Params.parse(query) do |key, value|
  next unless key.in?(ALLOWED_PRAGMAS)

  pragmas[key] = value
end

re: pragmas Hash, I wanted to avoid nesting too deep String.build, HTTP::Params.parse and case key, but I could change if we want to avoid that 😊

I also did that in case someone had the same pragma multiple times in the query parameters and that resulted in multiple SQL queries 😁

Thank you for your promptly response and let me know about the other changes.
❤️ ❤️ ❤️

@bcardiff
Copy link
Member

Starting with the hash is fine, no need to change.

In crystal-mysql we recently added an encoding argument, but without any prefix as _.

I'm not sure what would be a good convention going forward.
Maybe we should differentiate with _, x_, {driver}_ when the querystring represent a param of the specific driver.
Another alternative for this particular case is that maybe pragma_{name} is a good pattern for all supported pragmas.

If there is no clear better alternative we can go ahead and keep _.

@luislavena
Copy link
Contributor Author

Hello @bcardiff, I had some time to think about the PR, some notes:

  1. Prepending {driver}_, x_, or _ (my original code) seems to me too verbose. At the end you're already using the driver, specified by the scheme of the URI.
  2. On the specific case of pragma_, there are only a few of the pragmas that could be used via the C API (busy_timeout being the one I can think of).
  3. On my original assessment of URI#query, if the developer is already tweaking the connection pool options, query will be present and the Hash with the options will be allocated, even if there is no pragma. This will happen on every new connection being created, so a bigger pool, more pragma hashes will be allocated and then GC'ed.
  4. On my assessment of case versus simplified in?, while the former being verbose, it is also faster and could allow us better treatment of options in the future if needed (right now there is no sanitization as SQLite3 will ignore unknown values)

If is OK with you, I would suggest:

  1. Drop the _ to make it transparent that busy_timeout option is busy_timeout PRAGMA, nothing to learn or map in your head.
  2. Keep the hash that collects set pragmas, that way can combine multiple exec calls and only do once per pragma (last value wins)
  3. Keep the case, if we want to introduce validation of the provided values we could adjust it that way more easily.

Let me know what you think of this plan and I will introduce the changes.

Thank you in advance for your time!

❤️ ❤️ ❤️

@bcardiff
Copy link
Member

Drop the _ to make it transparent that busy_timeout option is busy_timeout PRAGMA, nothing to learn or map in your head.

Sounds good. My motivation to have some prefix is that we could validate the options names and instead of ignoring when they don't match a known pragma we could tell the user that a typo was made. So far we've not been doing that because we add one option at a time and it didn't occur to me. So again, sounds good to drop the prefix since we are not going to start validating all the other options probably.

Keep the hash that collects set pragmas, that way can combine multiple exec calls and only do once per pragma (last value wins)

Unless we have a shared memory cache I don't see how this can be done. crystal-db passes only the connection string.
Either way it sound like an early optimization to me. If anything, I would start by not using a hash.

Keep the case, if we want to introduce validation of the provided values we could adjust it that way more easily.

These would be validation of values, not keys right? I think we can defer for another PR if needed.


It seems that the main thing to decide is whether or not to keep the prefix. I'm fine dropping them if you feel they are too verbose.

@bcardiff
Copy link
Member

Btw, to avoid the hash in a neat way I was thinking of

require "uri"

query = "foo=1&bar=first&bar=second&quz=ignore"
params = extract(query, foo: nil, bar: nil)

puts typeof(params) # => NamedTuple(foo: String | Nil, bar: String | Nil)
puts params         # => {foo: "1", bar: "second"}

def extract(query, **default : **T) forall T
  res = default

  URI::Params.parse(query) do |key, value|
    {% begin %}
      case key
      {% for key in T %}
      when {{ key.stringify }}
        res = res.merge({{key.id}}: value)
      {% end %}
      end
    {% end %}
  end  

  res
end

some copy of the values could be avoided, but it does the work.

No longer prefix PRAGMAS with `_`, so the mapping between the real
SQLite3 pragmas and the usage in the URI is more direct.

Use macros instead of case to detect pragmas from URI params and return
those as NamedTuple.
@luislavena
Copy link
Contributor Author

Hello @bcardiff, thanks for your patience coming back to this.

Unless we have a shared memory cache I don't see how this can be done. crystal-db passes only the connection string.
Either way it sound like an early optimization to me. If anything, I would start by not using a hash.

Apologies for not making it clear, thought it was but re-reading my response seems didn't explain it.

The reason I was using the Hash was so under certain scenarios with double variables in the URI (Eg. ?foo=10&foo=20), the last one detected from the URI should be the definitive value. I've seen this in a few cases as quick overrides for testing before changing a long URI of configuration options 😓

This was done to avoid having the same PRAGMA multiple times when performing the call to SQLite3. Very naive approach that looking at your suggestion clearly addresses.

On that subject, your macro approach uses less memory and performs a bit faster than the Hash approach:

 extract (hash) 857.68k (  1.17µs) (± 1.28%)  721B/op   1.17× slower
extract (macro)   1.00M (995.58ns) (± 1.45%)  550B/op        fastest
Source code
require "benchmark"
require "uri"

def extract(query, **default : **T) forall T
  res = default

  URI::Params.parse(query) do |key, value|
    {% begin %}
      case key
      {% for key in T %}
      when {{ key.stringify }}
        res = res.merge({{key.id}}: value)
      {% end %}
      end
    {% end %}
  end

  res
end

def hash_extract(query)
  res = Hash(String, String).new

  URI::Params.parse(query) do |key, value|
    case key
    when "busy_timeout"
      res["busy_timeout"] = value
    when "cache_size"
      res["cache_size"] = value
    when "foreign_keys"
      res["foreign_keys"] = value
    when "journal_mode"
      res["journal_mode"] = value
    end
  end

  res
end

params = "busy_timeout=1000&garbage=foo&cache_size=-1000&foreign_keys=1&journal_mode=wal&busy_timeout=5000"

Benchmark.ips do |x|
  x.report("extract (hash)") { hash_extract(params) }
  x.report("extract (macro)") {
    extract(params,
      busy_timeout: nil,
      cache_size: nil,
      foreign_keys: nil,
      journal_mode: nil,
      synchronous: nil,
      wal_autocheckpoint: nil,
    )
  }
end

As for the validation of input, we might want to introduce something that provides that support part of crystal-db for others to use, but I would say we can talk about that in a different PR. 😉

I decided to push additional changes to:

  1. Remove the underscore as prefix
  2. Use your macro approach

I hope is all good.

Thank you again for your time and your suggestions.
❤️ ❤️ ❤️

Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor fix to the readme and we are good to go. I will commit those directly and wait a bit before merging.

README.md Outdated Show resolved Hide resolved
@luislavena
Copy link
Contributor Author

One minor fix to the readme and we are good to go. I will commit those directly and wait a bit before merging.

Ah, great catch! missed when I was doing the search and replace! 🤦🏽

Thank you!
❤️ ❤️ ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for PRAGMAs via query params?
2 participants