-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid memory access when using #scalar
or #exec
directly in multi-thread mode (-Dpreview_mt
)
#87
Comments
This on aarch64, right? |
@bcardiff this is on x86_64, using 1.7.3 under Alpine 3.17. Also used static binaries from the release under Ubuntu 22.04 |
In response to running diff --git spec/pool_spec.cr spec/pool_spec.cr
index 4971b94..77f34c4 100644
--- spec/pool_spec.cr
+++ spec/pool_spec.cr
@@ -5,13 +5,15 @@ describe DB::Pool do
channel = Channel(Nil).new
fibers = 5
max_n = 50
- with_db "#{DB_FILENAME}?max_pool_size=#{fibers}" do |db|
+ with_db "#{DB_FILENAME}?#{"journal_mode=wal&busy_timeout=5000&" if {{ flag?(:preview_mt) }}}max_pool_size=#{fibers}" do |db|
db.exec "create table numbers (n integer, fiber integer)"
fibers.times do |f|
spawn do
(1..max_n).each do |n|
- db.exec "insert into numbers (n, fiber) values (?, ?)", n, f
+ db.using_connection do |conn|
+ conn.exec "insert into numbers (n, fiber) values (?, ?)", n, f
+ end
sleep 0.01
end
channel.send nil SQLite3 databases are blocked by default for insert when accessed from multiple threads/processes, so I had to change
Without the Simplified example: require "./src/sqlite3"
COUNT = 10
puts "preview_mt: ", {{ flag?(:preview_mt) }}
db = DB.open("sqlite3:data.db?journal_mode=wal&busy_timeout=5000")
COUNT.times do |i|
spawn do
rand_id = rand(1..500)
{% if flag?(:use_conn) %}
db.using_connection do |conn|
puts conn.scalar("SELECT age FROM contacts WHERE id = ? LIMIT 1;", rand_id).as(Int64)
end
{% else %}
puts db.scalar("SELECT age FROM contacts WHERE id = ? LIMIT 1;", rand_id).as(Int64)
{% end %}
end
end
sleep 5
Fiber.yield
db.close When run with
When run with
And here the backtrace when crashed in Ubuntu 22.04:
|
Hello, doing a bit of digging on the weird MISUSE error and the Invalid memory access and seems to be related to the usage of Inspecting what happens in Noticed that inverting Is possible that statement object is getting GC'ed before anything? (Apologies for the noise, but trying to do dump as much details I collect while investigating this here in case someone else is looking at it). |
I think so, in MT. The statements are cached in a query hash. But maybe there are some race condition in MT which might cause issues. |
Ah! I will take a look to that this weekend and report back! Thank you for your promptly response! |
I tried putting a Mutex around diff --git a/src/db/string_key_cache.cr b/src/db/string_key_cache.cr
index f2cae6292..ef77d67d5 100644
--- a/src/db/string_key_cache.cr
+++ b/src/db/string_key_cache.cr
@@ -1,21 +1,28 @@
module DB
class StringKeyCache(T)
@cache = {} of String => T
+ @mutex = Mutex.new
def fetch(key : String) : T
- value = @cache.fetch(key, nil)
- value = @cache[key] = yield unless value
- value
+ @mutex.synchronize do
+ value = @cache.fetch(key, nil)
+ value = @cache[key] = yield unless value
+ value
+ end
end
def each_value
- @cache.each do |_, value|
- yield value
+ @mutex.synchronize do
+ @cache.each do |_, value|
+ yield value
+ end
end
end
def clear
- @cache.clear
+ @mutex.synchronize do
+ @cache.clear
+ end
end
end
end |
Applying the swap of the LibSQLite.reset you mention here helps reducing the errors, but they are still there. Next potential culprit is in the |
I think I got it. So far it hasn't crashed here. Can you try these crystal-db and crystal-sqlite3 changes?
Above the non thread safe structures I found an issue in the pool. |
Hello @bcardiff, switched to However, when attempt to run this using the HTTP server from my example, I'm still getting API misuse from SQLite:
Repo used to reproduce this: https://github.com/luislavena/test-crystal-sqlite3-db I'm wondering the following (bear with me on my ignorance of the internals of crystal-db): statements are created in the context of a connection, but the connection can be checked out from the pool independently of the statement we are going to execute, right? When we do This means that all the time the connection of a statement is the same as the connection pooled from the pool, so when I do Is possible that when retrieved the prepared statement from the pool, the connection is not longer available? When compiled and run in Ubuntu 22.04, I managed to get a different backtrace:
Attempt to debug this further with Will continue to investigate after a walk around the park 😀 |
The PoolPreparedStatement should handle checking out connections, despite having or not prepared the statement already. Actually the ones that are known to have the statement prepared already will be given priority. Pool#checkout receives candidates list but is not a hard restriction. In that process closed connections should be discarded. Knowing if a connection is closed requires trying to use it. I assume sqlite connections can't be lost from sqlite side, there are no sockets involved. Maybe I'm wrong here. If I'm wrong maybe there is a missing map between a sqlite error and ResourceConnectionLost error. This last kind of error are retried when doing db#query directly. This is the main difference when doing query over conn vs db. |
Another thing we could try is to disable the PoolPreparedStatement cache as a way to reduce the amount of things involved. There is no current way to disable it but it might be worth of having that in other scenarios. |
I disabled the statement cache (in a dirty way) and was able to get a different backtrace. This is using the http example with
|
So,
With MULTITHREAD I am still seeing the same error yy_reduce (I'm using the native sqlite3 of osx With SERIALIZED there error changed again. Among PoolRetryAttemptsExceeded and accept: Too many open files (Socket::Error) I got:
What hints me in this direction was dotnet/macios#5791 (comment) Could it be that after fixing the MT issues in crystal-db that we already discovered we are now missing some proper way of using sqlite3 in MT? |
Hello @bcardiff, thanks for your patience.
That is actually not necessary if the SQLite version you're using was already compiled with threadsafe mode on, see
I check that compile time thread safety was on, and left the default. Then tested against the $ shards install
Resolving dependencies
Fetching https://github.com/crystal-lang/crystal-sqlite3.git
Fetching https://github.com/crystal-lang/crystal-db.git
Using db (0.11.0 at 0897477)
Using sqlite3 (0.19.0 at d759ae0) But when attempting to terminate the server, received a new exception:
Around this line: def do_close
super
check LibSQLite3.close(self)
end Which might indicate that I also got the too many files open when trying to read the DWARF information to decode the backtrace in Ubuntu:
But similar to previous case, all this is coming from Since I was already playing, decided to try something different: use existing $ shards install
Resolving dependencies
Fetching https://github.com/crystal-lang/crystal-sqlite3.git
Fetching https://github.com/ysbaddaden/syn.git
Fetching https://github.com/crystal-lang/crystal-db.git
Using db (0.11.0 at 76eba76)
Using sqlite3 (0.19.0 at d759ae0)
Using syn (0.1.0 at d87e275) And it didn't crash with 5, 10 or even 50 concurrent connections 😁 Added an atomic counter to determine how many connections were used in those cases and managed to get a maximum of 12 connections (with I'm wondering if it will be possible to test this against the dummy driver and see if the pool/cache is the culprit. |
It seems we need to finalize statements before closing the connection. I thought that was optional since their docs says
I'm a bit surprised this didn't came up before. But if disabling the statement cache solved the memory issues then it seems we need to keep digging in that direction (and maybe allow disabling the statement cache 😅 which some might want to to reduce the memory footprint) |
Update: we are closing all the statements when closing the connection. Could it be that due to MT we are trying to use a connection that is being discarded? 🤔 💭 |
I'm thinking that somehow the connections are getting GC'ed after the prepared methods that were for that connection were removed. My gut check is that DB Prepared Pool ( Design question, the idea of having the pool of prepared of statements is to pick one from an available connection? Do we have an idea how much we are winning by this? Thank you for your patience and effort on troubleshooting this ❤️ ❤️ ❤️ |
Hey @luislavena, @beta-ziliani found a missing synchronize call in the crystal-db patch. Would you mind checking again with I think I got it. So far it hasn't crashed here. Can you try these crystal-db and crystal-sqlite3 changes:
🤞 for me it didn't crash |
@bcardiff @beta-ziliani I'm happy to inform that with these changes the explosions are gone! 👏🏽 🎉 🥳 I was able to perform a long running bench (30 mins at 4 and 8 workers) without leaks or exceptions! Thank you for investigating on this! |
@bcardiff opened a separate issue to investigate some performance bottleneck (#89) but I consider this issue solved. Thank you again you both! (@beta-ziliani too) 🙏🏽 |
@straight-shoota didn't want to close this until the changes mentioned ( |
The fix-mt branch was merged in #90. Closing again. Thanks @luislavena for all the efforts through this |
Hello!
Been testing out
preview_mt
mode with SQLite and encountered some issues. It has been hard to reproduce this, but I was able to get a small test program:data.db
contains a simple schema with 500 records in it:When compiled and executed with
-Dpreview_mt
, I randomly receive something like the following:Sometimes receiving
bad parameter or other API misuse
, othersunable to close due to unfinalized statements or unfinished backups
I was able to workaround this by using
using_connection
block instead:Which points there might be an issue on how
#exec
,#scalar
and possible other calls are handled byDB::Database
and passed to one of the connections in the pool.But even so, I will still randomly get errors (small adaptation to my example on my site).
Since I'm blowing things up, decided to put more dynamite into the issue and tried ysbaddaden/pool shard:
And then:
To my surprise, these random exceptions are gone and performance is even higher:
Tried running this project own specs with
-Dpreview_mt
and fails with the same issue, so not sure where to start looking at.Apologies for the rambling, but wanted to make sure I included all the information I think is connected and relevant, I hope is not too much.
Any ideas where I should start looking at first?
Thank you in advance.
❤️ ❤️ ❤️
The text was updated successfully, but these errors were encountered: