-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when Rails enters parallell testing mode #39
Comments
@buffpojken thanks for the report and the stack trace. This issue is failing in the I noticed you're using version 3.1.1 of rgeo-proj4. We recently released version 4 which includes a few fixes to the C extension and fixed a bug where segfaults would occasionally happen. Are you able to upgrade to test if it's still occurring in the new version? |
@keithdoggett, I'm taking the ball from @buffpojken since his plate is fuller than mine atm.
|
@Pe-co thanks for trying that. I tried to replicate in proj4 with a test like this def test_create_parallel
epsg_codes = (2000..2049).to_a
Parallel.each(epsg_codes, in_parallel: 8) do |epsg_code|
proj = RGeo::CoordSys::Proj4.create(epsg_code)
assert_equal("EPSG:#{epsg_code}", proj.auth_name)
end
end but couldn't reproduce the issue. I also created a rails app from scratch and added enough tests to trigger parallel tests, but still couldn't get a seg fault. A few questions:
|
This is the rgeo initializer we use
Proj version is - Rel. 9.0.1, June 15th, 2022 The segfault is happening when minitest is spawning child processes not during actual testing? With some more local testing, I have found that the store register call in the initializer seems to the be trigger. If I skip that line I'm able to get a clean run of my tests. I assume a couple of test or fixtures that need to use the SpatialFactoryStore is needed as well. |
Thanks for that extra info @Pe-co and apologies about the delay, I was away the last few weeks. I'll try to replicate with that version of Proj. I've been thinking about this some and I am a little confused why this issue is happening with multiprocessing. Issues like this typically happen when trying multi-threading (ex. rgeo/rgeo#307), but I'm not sure why this issue would only be happening when we have multiple processes. Maybe there's some kind of shared resource that Proj writes to that could be causing the issue. Maybe the way minitest spawns additional processes is causing issues. |
I also noticed that you only define a spherical factory in the spatial factory store which doesn't use proj. Do you have a lot of other SRIDs being used in your app or is it just 4326 for geographical data and one other projection? |
Input can come in various formats and gets converted to a few that we persist. I don't think the tests use more than 4326 and 3006. We do not declare an RGeo::Geographic.spherical_factory for any other srid than 4326, but we have an open RGeo::Geos.factory that can be fed with other srid. I think the error triggers even with that part disabled. |
|
I tried to replicate the issue in a fresh rails app with the following setup. I'm also using proj 9.0.1. # db/schema.rb
ActiveRecord::Schema[7.0].define(version: 2023_04_17_124002) do
# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
enable_extension "postgis"
create_table "buildings", force: :cascade do |t|
t.string "name"
t.geometry "boundary", limit: {:srid=>3006, :type=>"st_polygon"}
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
create_table "geo_buildings", force: :cascade do |t|
t.string "name"
t.geography "boundary", limit: {:srid=>4326, :type=>"st_polygon", :geographic=>true}
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
end # config/initializers/spatial_factory_store.rb
# frozen_string_literal: true
RGeo::ActiveRecord::SpatialFactoryStore.instance.tap do |config|
config.register(RGeo::Geographic.spherical_factory(srid: 4326, uses_lenient_assertions: true),
{ sql_type: 'geography' })
end # test/models/building_test.rb
require 'test_helper'
class BuildingTest < ActiveSupport::TestCase
1000.times do |i|
define_method "test_number_#{i}" do
building = Building.new
polygon = "POLYGON ((0 0, 0 #{i}, #{i} #{i}, #{i} 0, 0 0))"
building.boundary = polygon
building.name = i.to_s
assert_equal(building.boundary.as_text, polygon)
building.save
building = Building.last
assert_equal(building.boundary.as_text, polygon)
geo_fac = RGeo::Geographic.spherical_factory(srid: 4326)
geo_building = GeoBuilding.new
polygon = geo_fac.parse_wkt("POLYGON ((0.0 0.0, 0.0 #{(i % 5) + 1}.0, #{(i % 5) + 1}.0 #{(i % 5) + 1}.0, #{(i % 5) + 1}.0 0.0, 0.0 0.0))")
geo_building.boundary = polygon
geo_building.name = i.to_s
assert_equal(geo_building.boundary, polygon)
geo_building.save
geo_building = GeoBuilding.last
assert_equal(geo_building.boundary, polygon)
end
end
end # Gemfile
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
ruby '3.1.2'
# Bundle edge Rails instead: gem "rails", github: "rails/rails", branch: "main"
gem 'rails', '~> 7.0.4', '>= 7.0.4.3'
# Use postgresql as the database for Active Record
gem 'pg', '~> 1.1'
# Use the Puma web server [https://github.com/puma/puma]
gem 'puma', '~> 5.0'
# Build JSON APIs with ease [https://github.com/rails/jbuilder]
# gem "jbuilder"
# Use Redis adapter to run Action Cable in production
# gem "redis", "~> 4.0"
# Use Kredis to get higher-level data types in Redis [https://github.com/rails/kredis]
# gem "kredis"
# Use Active Model has_secure_password [https://guides.rubyonrails.org/active_model_basics.html#securepassword]
# gem "bcrypt", "~> 3.1.7"
# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
gem 'tzinfo-data', platforms: %i[mingw mswin x64_mingw jruby]
# Reduces boot times through caching; required in config/boot.rb
gem 'bootsnap', require: false
# Use Active Storage variants [https://guides.rubyonrails.org/active_storage_overview.html#transforming-images]
# gem "image_processing", "~> 1.2"
# Use Rack CORS for handling Cross-Origin Resource Sharing (CORS), making cross-origin AJAX possible
# gem "rack-cors"
gem 'activerecord-postgis-adapter'
gem 'rgeo'
gem 'rgeo-proj4'
group :development, :test do
# See https://guides.rubyonrails.org/debugging_rails_applications.html#debugging-with-the-debug-gem
gem 'debug', platforms: %i[mri mingw x64_mingw]
end
group :development do
# Speed up commands on slow machines / big apps [https://github.com/rails/spring]
# gem "spring"
end The tests are mainly nonsense but I wanted to test writing and reading spatial data. Unfortunately I'm still not able to replicate the issue with this. Are you able to reproduce the issue in a minimal rails app? Or even if you modify your local proj4 gem to see what data is being passed into the |
Sorry for the late reply, I have not found time to try to reproduce the issue in clean project, but here are 2 prints with the inputs to the _create function. I get to the function 2 times before the segfault Second time: I did not expect the 4055 value, I don't know what causes that to be passed to proj-4. |
If I run the tests with the registered instance commented out of the initializer, i get a bunch of prints, with both 4326, 3006 and 4055. |
@Pe-co do you think you could share an example app link that reproduces the error for you ? That would help us a lot to fix the error. Here is an example app guide to set you up quickly! |
I think that creating an app that recreates the issue could be time consuming, and it does not look like I will spend that time in a near future. I will let you know if I make some process. Sorry for not being more helpful, I do understand that fixing problems you cannot reproduce is painful. |
I sunk a few hours into it today, and managed to get a repro in a separate project, but its not reproducing stably. I still need to spend some more time minimizing to see what is actually needed. |
Thank you very much for using this time. As we are not paid for maintaining RGeo either, these hours you are spending are really precious to us maintainers, and the whole community using the Gem. If you want to share your ongoing progress, or are stuck at some point, we are here. The more stable the reproduction is, the better, of course. But if you have a 1/2 times reproduction, it would already be enough IMHO. |
Running the tests in this project gives a stable reproduction for me, https://github.com/Pe-co/rgeo_test_app. |
I am reproducing locally 🎉 I'll be offline the next few days, but I (hopefully) downloaded everything to try and find this segfault! |
Thanks for following up on this @Pe-co this is very helpful |
@Pe-co @keithdoggett Here is a smaller set of the reproduction: https://github.com/rgeo/rgeo-proj4-segv-issue-39. Basically we should use the thread-safe proj api. I won't have time to open the PR myself soon, so if there is any emergency feel free to take it from here. A few more notes on a potential implementation:
|
@BuonOmo I should be able to look into it now that you've found a more reproducible version of it. I was still having issues reproducing it locally consistently, but your repo looks good. Thanks for the working digging into that. |
We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context. Fixes #39
We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context. Fixes #39
We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context. Fixes #39
We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context. Fixes #39
We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context. Fixes #39
Hello 👋 has this been fixed? still getting Segmentation fault here. Running on Ruby 3.3.0 and
|
Hey @TamerShlash unfortunately this issue has not been fixed. We fixed a different, but related issue and @BuonOmo found a way to reproduce/fix the issue, but the change is fairly major and neither of us have had much time to commit to working on it recently. If you want to take a shot at making the changes we'd be more than happy to facilitate and help with the PR though. |
Hi @keithdoggett ! I saw the attempted fix PR as well and have a rough understanding of the issue (project context is being accessed by forked threads if I'm not mistaken). I also had an attempt at fixing that locally but without much success. I don't have much experience with C extensions but am happy to help, can you explain what is the scope of the change? Would it help taking a look at the way proj4rb handles contexts? I know they have a different use case. |
Hi @TamerShlash, Thanks for digging up the topic. I indeed found a fix, but it was only for another issue I found along the way... I don't have a clear remember about this issue, nor the time right no to handle it, but I'll try to help as much as I can. About the scope of the change. It might be related to the proj context: I changed it not to used the default one, and I tried forking (threads and processes) to see if it was still failing, and it worked for my use case. So my guesse would be that there is a specific weird case of forking that requires creating a new context. I think it would be important to first find that case, to be sure we're doing right, and have a test to prove it! Then the scope might be a bit large, but it wouldn't scare that much as it might be a lot of copy-pasta, and only initialization changes and fork hooks (see https://github.com/rails/rails/blob/main/activesupport/lib/active_support/fork_tracker.rb for a reference). I think the hardest is the reproduction, and if you have it and still do not feel confident about the fix, we could pair on this as I think it is at most a 3h work. However, it may also not be related to context, we really need a reproduction. About the proj4rb gem, I think you had a good intuition. The context.rb file is the one we are interested in I think. And their solution is basically to have one thread one context, and an object finalizer that ensures no leak with the proj contexts created |
Steps to reproduce
Include rgeo in a Rails 7 project. Add enough tests to trigger tests running in parallell (default is 50 tests). RGeo will segfault with the attached stack trace.
Expected behavior
Tests to run as usual
Actual behavior
Segfault in proj4-communication.
System configuration
Ruby version:
3.1.0
OS:
MacOS
Stack trace
/Users/pekkaakerstrom/.rvm/gems/ruby-3.1.2/gems/rgeo-proj4-3.1.1/lib/rgeo/coord_sys/proj4.rb:191: [BUG] Segmentation fault at 0x0000000104c38a70 ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
The text was updated successfully, but these errors were encountered: