Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect dependency data on rspec old releases #1906

Closed
deivid-rodriguez opened this issue Feb 14, 2019 · 14 comments
Closed

Incorrect dependency data on rspec old releases #1906

deivid-rodriguez opened this issue Feb 14, 2019 · 14 comments

Comments

@deivid-rodriguez
Copy link
Member

I noticed that the dependency data on old RSpec releases does not seem correct. For example,

curl https://rubygems.org/api/v2/rubygems/rspec/versions/1.2.3.json | jq

Rubygems.org considers hoe and cucumber as runtime dependencies of RSpec.

{
  "name": "rspec",
  "downloads": 367815598,
  "version": "1.2.3",
  "version_downloads": 2736,
  "platform": "ruby",
  "authors": "RSpec Development Team",
  "info": "Behaviour Driven Development for Ruby.",
  "licenses": null,
  "metadata": {},
  "sha": "2d95a5af310bf7aa0db7f1b33ee54cef33f9c843f5c91791af8bf072587825c5",
  "project_uri": "https://rubygems.org/gems/rspec",
  "gem_uri": "https://rubygems.org/gems/rspec-1.2.3.gem",
  "homepage_uri": "https://github.com/rspec",
  "wiki_uri": "",
  "documentation_uri": "http://relishapp.com/rspec",
  "mailing_list_uri": "http://rubyforge.org/mailman/listinfo/rspec-users",
  "source_code_uri": "https://github.com/rspec/rspec",
  "bug_tracker_uri": "",
  "changelog_uri": null,
  "dependencies": {
    "development": [
      {
        "name": "cucumber",
        "requirements": ">= 0.2.2"
      },
      {
        "name": "hoe",
        "requirements": ">= 1.12.1"
      }
    ],
    "runtime": [
      {
        "name": "cucumber",
        "requirements": ">= 0.2.2"
      },
      {
        "name": "hoe",
        "requirements": ">= 1.12.1"
      }
    ]
  },
  "built_at": "2009-04-13T03:00:00.000Z",
  "created_at": "2009-07-25T17:56:42.000Z",
  "description": "Behaviour Driven Development for Ruby.",
  "downloads_count": 2736,
  "number": "1.2.3",
  "summary": "rspec 1.2.3",
  "rubygems_version": ">= 0",
  "ruby_version": null,
  "prerelease": false,
  "requirements": null
}

However, if I look for the gemspec using the gem client, I see they are only development dependencies

gem specification --remote -v 1.2.3 rspec

--- !ruby/object:Gem::Specification
name: rspec
version: !ruby/object:Gem::Version
  version: 1.2.3
platform: ruby
authors:
- RSpec Development Team
autorequire: 
bindir: bin
cert_chain: []
date: 2009-04-13 00:00:00.000000000 Z
dependencies:
- !ruby/object:Gem::Dependency
  name: cucumber
  type: :development
  version_requirements: !ruby/object:Gem::Requirement
    requirements:
    - - ">="
      - !ruby/object:Gem::Version
        version: 0.2.2
  version_requirement: 
- !ruby/object:Gem::Dependency
  name: hoe
  type: :development
  version_requirements: !ruby/object:Gem::Requirement
    requirements:
    - - ">="
      - !ruby/object:Gem::Version
        version: 1.12.1
  version_requirement: 
description: Behaviour Driven Development for Ruby.
email:
- [email protected]
executables: []
extensions: []
extra_rdoc_files: []
files: []
homepage: http://rspec.info
licenses: []
metadata: 
post_install_message: 
rdoc_options: []
require_paths:
- lib
required_ruby_version: !ruby/object:Gem::Requirement
  requirements:
  - - ">="
    - !ruby/object:Gem::Version
      version: '0'
required_rubygems_version: !ruby/object:Gem::Requirement
  requirements:
  - - ">="
    - !ruby/object:Gem::Version
      version: '0'
requirements: []
rubygems_version: 3.0.2
signing_key: 
specification_version: 2
summary: rspec 1.2.3
test_files: []

This is causing bundler to make unnecessary requests to rubygems.org when running bundle install. See https://github.com/bundler/bundler/issues/6914.

If we find out the root cause of this we could probably run a script to fix all affected data. Any ideas welcome!

@sonalkr132
Copy link
Member

We seem to have created quite a few development dependencies on 2009-09-02, and only their versions (3699 total) are affected with this.

> Dependency.where("created_at < ?", "2009-09-04").where("created_at > ?", "2009-09-03").count
 => 23 
> Dependency.where("created_at < ?", "2009-09-02").where("created_at > ?", "2009-09-01").count
 => 30
> Dependency.where("created_at < ?", "2009-09-03").where("created_at > ?", "2009-09-02").count
 => 5469 
> Dependency.where("created_at < ?", "2009-09-02 06:40").where("created_at > ?", "2009-09-02 05:00").count
 => 5354 
> Dependency.select('distinct version_id').where("created_at < ?", "2009-09-02 07:00").where("created_at > ?", "2009-09-02 04:00").count
 => 3699 
> Version.where("created_at < ?", "2009-09-02 06:40").where("created_at > ?", "2009-09-02 05:00").count
 => 0 

Nothing out of ordinary seems to have happened as per rubygems.org git history and we were probably still in early development. Given the blip in count, this looks like result of some adhoc operation on db.

Will verify for any more mismatches with a random set.

@deivid-rodriguez
Copy link
Member Author

But the example I gave has a different creation date, hasn't it? Not sure if it's related but 80 versions of RSpec were pushed on the 25th of July, 2009. Some of them (all?) having this problem.

@sonalkr132
Copy link
Member

the example I gave has a different creation date, hasn't it?

Versions has different creation date, yes. I am pointing to abnormal creation time of dependencies tho (versions and dependencies are separate models/tables). If no versions were created between 2009-09-02 06:40 and 2009-09-02 05:00, there shouldn't any dependencies in that period either.

 > Dependency.where(version_id: Version.find_by(full_name: 'rspec-1.2.3').id).map { |d| [d.scope, d.created_at]}
 => [["runtime", Sat, 25 Jul 2009 17:56:42 UTC +00:00],
["runtime", Sat, 25 Jul 2009 17:56:42 UTC +00:00], 
["development", Wed, 02 Sep 2009 05:02:15 UTC +00:00], 
["development", Wed, 02 Sep 2009 05:02:15 UTC +00:00]] 

This info isn't as useful, we have other anomalous dependencies which weren't created in this period. I wrote a script and it gave appengine-tools-0.0.3 (version id: 100358, versions table starts at: 72374) as cut off. I have verified 2000 versions by id after this and random set of 1000 versions for remaining, they didn't have any mismatch. Total of 3521 versions need fixing. Interestingly, majority of anomalous dependencies are hoe and newgem.
I don't think this data was originally created on our site, initial versions have alphabetic ordering.

If we find out the root cause of this we could probably run a script to fix all affected data.

I doubt anyone is going to remember what went wrong in import or why data was massaged later. I will send in a rake task for updating dependencies table for versions before 100358.

@deivid-rodriguez
Copy link
Member Author

Thanks so much for the clarification @sonalkr132, I don't know rubygems.org data model well so I probably don't make much sense! 😅

So, how is the data going to get fixed? The redundant runtime dependencies will be deleted, right?

@eregon
Copy link

eregon commented Apr 13, 2020

Any news on this? It seems to create a lot of extra requests to RubyGems.org.

@sonalkr132
Copy link
Member

I was going to regenerate the versions file but got never really got to it, sorry. There were other unrelated issues which cause mismatch, I had to verify that they are fixed. Hopefully will get to it this weekend.
Thank you for your patience.

@deivid-rodriguez
Copy link
Member Author

@sonalkr132 Just to be clear on the planned fix, it'd be to remove all the incorrect runtime dependencies from DB, and then regenerating the versions file to pick that up. Is that it?

@sonalkr132
Copy link
Member

sonalkr132 commented Apr 14, 2020

yes, download the gemspec (https://rubygems.org/quick/Marshal.4.8/#{version_full_name}.gemspec.rz) -> update dependencies entries in table -> regenerate versions file (single step).

It seems to create a lot of extra requests to RubyGems.org.

I doubt @eregon issue has anything to do with this tho. Latest version affected by this issue was released on 11 September, 2009 (appengine-tools, 0.0.3) and I doubt any of gems used by eregon would depend on versions/gems released before this.
Most likely, eregon issue is about limited number of outbound (NAT/Public) IPs used by github build infra. Either github needs to add more IPs or we need to whitelist their existing IPs. Issue may have started recently because more people has started using github build.

EDIT: never mind this^. found the issue, will update when we fix.

@deivid-rodriguez
Copy link
Member Author

@sonalkr132 Just pinging you about this to make sure you still have it mind, and offering help if there's something I can do.

@sonalkr132
Copy link
Member

Thanks for reminder. As I mentioned above, it would be best if we regenerate versions file after ensuring other issues with compact index are fixed. I have to ping bother others to get this task done as I don't have right access.
You must have noticed the progress we made with compact_index, there too I have to ping andre again for access to the gem on rg.org (technically, I can add myself but it is not ethical).
I am looking into #1566 as of now. Thank you for your patience with this.

@deivid-rodriguez
Copy link
Member Author

Oh, I wasn't aware of that issue, thanks for pointing me to it 👍.

I understand this requires time and coordination. I'm not in a hurry to fix this, but I'm excited to see this stuff moving forward :)

Anyways, thanks for the transparency and for keeping me posted. If I can help in any way, just let me know.

@sonalkr132
Copy link
Member

See #2343

Total of 3521 versions need fixing.

I had missed disregarding a few cases which meant this number is higher than actual.

@deivid-rodriguez
Copy link
Member Author

[extraneous_dependencies:clean] spec and db run deps don't match for: rspec-1.2.3 spec: [] db: {"106044"=>"cucumber", "106045"=>"hoe"}
[extraneous_dependencies:clean] deleting dependencies with ids: ["106044", "106045"]

🎉

Thanks so much for this amazing work!

@deivid-rodriguez
Copy link
Member Author

This has been fixed now, so I'm happily closing this ticket :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants