Skip to content

GSOC 2020

Michael Herzog edited this page Feb 25, 2020 · 44 revisions

AboutCode is submitting its candidature for the Google Summer of Code in 2020 as a mentoring org. This page contain all the information for students and anyone else interested in participating and helping with the program.

AboutCode is a family of FOSS projects to uncover data ... about software code:

  • where does the code come from? which software package?
  • what is its license? copyright?
  • is the code secure, maintained, well coded?

All these are questions that are important to answer: there are million of free and open source software components available on the web for reuse.

Knowing where a software package comes from, what is its license and if it is vulnerable and what's its licensing should be a problem of the past such that everyone can safely consume more free and open source software.

Join us to make it so!

Our tools are used to help detect and report the origin and license of source code, packages and binaries as well as discover software and package dependencies, and in the future track security vulnerabilities, bugs and other important software package attributes. This is a suite of command line tools, web-based and API servers and desktop applications.

Table of Contents

AboutCode projects are...

  • ScanCode Toolkit is a popular command line tool to scan code for licenses, copyrights and packages, used by many organizations and FOSS projects, small and large.

  • Scancode Workbench (formerly AboutCode Manager) is a JavaScript, Electron-based desktop application to review scan results and document your origin and license conclusions.

  • AboutCode Toolkit is a command line tool to document and inventory known packages and licenses and generate attribution docs, typically using the results of analyzed and reviewed scans.

  • TraceCode Toolkit is a command line tool to find which source code file is used to create a compiled binary and trace and graph builds.

  • DeltaCode is a command line tool to compare scans and determine if and where there are material differences that affect licensing.

  • ConAn: a command line tool to analyze the code in Docker and container images

  • VulnerableCode: an emerging server-side application to collect and track known package vulnerabilities.

  • license-expression: a library to parse, analyze, simplify and render boolean license expression (such as SPDX)

We also work closely, contribute and co-started several other orgs and projects:

  • Package URL which is an emerging standard to reference software packages of all types with simple, readable and concise URLs.

  • SPDX aka. Software Package Data Exchange, a spec to document the origin and licensing of packages.

  • ClearlyDefined to review and help FOSS projects improve their licensing and documentation clarity.

Contact

Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss Introduce yourself and start the discussion!

For personal issues, you can contact the primary org admin directly: @pombredanne and [email protected]

Please ask questions the smart way: http://www.catb.org/~esr/faqs/smart-questions.html

Technology

Discovering the origin of code is a vast topic. We primarily use Python for this and some C/C++ (and eventually some Rust and Go) for performance sensitive code and Electron/JavaScript for GUI.

Our domain includes text analysis and processing (for instance for copyrights and licenses detection), parsing (for package manifest formats), binary analysis (to detect the origin and license of binaries, which source code they come from, etc.) as well as web based tools and APIs (to expose the tools and libraries as web services) and low-level data structures for efficient matching (such as Aho- Corasick and other automata).

Skills

Incoming students will need the following skills:

  • Intermediate to strong Python programming. For some projects, strong C/C++ and/or Rust is needed too.
  • Familiarity with git as a version control system
  • Ability to set up your own development environment
  • An interest in FOSS licensing and software code and origin analysis

We are happy to help you get up to speed, but the more you are able to demonstrate ability and skills in advance, the more likely we are to choose your application!

About your project application

We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:

  • Your name

  • Title of your proposal

  • Abstract of your proposal

  • Detailed description of your idea including explanation on why is it innovative and what it will contribute to the project

  • hint: explain your data structures and you planned main processing flows in details.

  • Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome)

  • Mention the details of your academic studies, any previous work, internships

  • Relevant skills that will help you to achieve the goal (programming languages, frameworks)?

  • Any previous open-source projects (or even previous GSoC) you have contributed to and links.

  • Do you plan to have any other commitments during GSoC that may affect your work? Any vacations/holidays? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious full time commitment during the GSoC time frame)

Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss introduce yourself and start the discussion!

The best way to demonstrate your capability would be to submit a small patch ahead of the project selection for an existing issue or a new issue.
We will always consider and prefer a project submissions where you have submitted a patch over any other submission without a patch.

You can pick any project idea from the list below. If you have other ideas that are not in this list, contact the team first to make sure it makes sense.

Our Project ideas

[NOTE: this is being updated and is not complete as of 2020-02-20]

Here is a list of candidate project ideas for your consideration. Your own ideas are welcomed too! Please chat about them to increase your chances of success!

ScanCode Projects

ScanCode Workbench Projects

TraceCode Projects

Conan and Other Projects

Mentoring

We welcome new mentors to help with the program and require some good unerstanding of the project codebase and domain to join as a mentor. Contact the team on Gitter.

Clone this wiki locally