Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency parser to find repos #1686

Open
10 tasks
schneems opened this issue Oct 5, 2022 · 0 comments
Open
10 tasks

Dependency parser to find repos #1686

schneems opened this issue Oct 5, 2022 · 0 comments

Comments

@schneems
Copy link
Member

schneems commented Oct 5, 2022

Problem

People don’t know what repos to subscribe to. My best suggestion is to look in their dependencies (such as Gemfile.lock, package.json, or Pom.xml)

For now I’m asking them to manuall look in their dependencies and then search for repos on CodeTriage. I would love to automate this process.

I’m thinking it looks like this: the user uploads a file then based on the name we run an extraction script that returns the dependencies in the file in something machine readable (outputting it in json to STDOUT and any errors or warnings to STDERR).

$ cat Gemfile.lock | parse_gemfile.rb
{ 'repos': [ { 'name': "activerecord" } ], 'language': "ruby" }

Then we can take that information to search of a project with that language and name exists in the database.

Ruby tooling - Task zero

We can use the lockfile parser that ships with Ruby to do this for Ruby. See the heroku/heroku-buildpack-ruby for some inspiration.

  • Ruby (Gemfile.lock)

In addition to doing this code I also would like a test file setup where we can put future parser tests. That will make it easier for future PRs to be tested (if they just have to copy an existing test instead of learning Ruby/Rspec if they don't know it.

Place the test in test/dependency_parsers/ruby_parser_test.rb

I still want the code to be implemented as a script. Put it in a folder under lib/dependency_parser/ruby/parse.rb. The input to the script will be the file contents via STDIN. I want json to STDOUT and errors to STDERR. Failures should result in non-zero exit code

Dependency/lockfile parsing for other languages- Task one

Ideally we won’t need to install other tooling, for example installing maven would be overkill. It takes a bunch of time to install and eats up a bunch of space on disk. This would slow down deploys.

I'm thinking we need to extract the information in either Ruby or Bash or Javascript as these are the three main languages already on the app.

We need a parser for these languages:

  • Javascript (package.json)
  • Java
  • Python
  • Golang
  • Rust
  • PHP
  • Swift
  • Elixir
  • Your favorite language I didn't mention here.

Each of these can be in a different PR. Put them in a folder under lib/ named dependency_parser/<language>/parse.<extension>. When sending a PR please give me an example input and expected output.

The script needs to defensively check for invalid input and output a helpful error message if:

  • It's formatted incorrectly
  • Is missing critical information
  • Is empty

If a script cannot continue then it needs to exit with non-zero status code.

If a test/dependency_parsers/ruby_parser_test.rb already exists, copy it to ``test/dependency_parsers/ruby_parser_.rb` and try to fill it out to the best of your ability. If it does not please provide me with an input to your script and an expected output.

Upload and Storage - Part 2

It might be that a project isn't yet added to CodeTriage but might be in the future. As a future proofing method we can ask users if they want us to store their dependency file.

  • We need a webpage in Rails where people can upload a file.

For now we can store the whole raw contents in a new table It should be linked to a specific user, include the original filename, it's contents, and a label/name (i.e. people might want "my side project" or "my work project").

Give people a UI so they can manage them (CRUD).

Integration with CodeTriage - Part 3

We need a way to consume the scripts in a way that users can utilize:

  • After a lockfile is uploaded, pass it to the appropriate script based on a mapping of filename to script.
  • Execute that script
  • Then based on the output, query the database and see if there are matches with the output
  • Show them to the user on the results page
  • Bonus: If there are multiple dependencies, map the name of the dependency label to the suggested repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant