-
Notifications
You must be signed in to change notification settings - Fork 113
GSOC 2020
AboutCode.org is pleased to announce that Google has chosen us as a mentor organization for Google Summer of Code 2020! This page contain information for students and anyone else interested in participating and helping with the program.
AboutCode is a family of FOSS projects to uncover data ... about software code:
- where does the code come from? which software package?
- what is its license? copyright?
- is the code secure, maintained, well coded?
All these are questions that are important to answer: there are million of free and open source software components available on the web for reuse.
Knowing where a software package comes from, what is its license and if it is vulnerable and what's its licensing should be a problem of the past such that everyone can safely consume more free and open source software.
Join us to make it so!
Our tools are used to help detect and report the origin and license of source code, packages and binaries as well as discover software and package dependencies, and in the future track security vulnerabilities, bugs and other important software package attributes. This is a suite of command line tools, web-based and API servers and desktop applications.
-
ScanCode Toolkit is a popular command line tool to scan code for licenses, copyrights and packages, used by many organizations and FOSS projects, small and large.
-
Scancode Workbench (formerly AboutCode Manager) is a JavaScript, Electron-based desktop application to review scan results and document your origin and license conclusions.
-
AboutCode Toolkit is a command line tool to document and inventory known packages and licenses and generate attribution docs, typically using the results of analyzed and reviewed scans.
-
TraceCode Toolkit is a command line tool to find which source code file is used to create a compiled binary by tracing and graphing a build.
-
DeltaCode is a command line tool to compare scans and determine if and where there are material differences that affect licensing.
-
ConAn: a command line tool to analyze the code in Docker and container images
-
VulnerableCode: an emerging server-side application to collect and track known package vulnerabilities.
-
license-expression: a library to parse, analyze, simplify and render boolean license expression (such as SPDX)
We have also co-founded and contributed to important projects for other organizations:
-
Package URL which is an emerging standard to reference software packages of all types with simple, readable and concise URLs.
-
SPDX aka. Software Package Data Exchange, a spec to document the origin and licensing of packages.
-
ClearlyDefined to review and help FOSS projects improve their licensing and documentation clarity.
Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss Introduce yourself and start the discussion!
For personal issues, you can contact the primary org admin directly: @pombredanne and [email protected]
Please ask questions the smart way: http://www.catb.org/~esr/faqs/smart-questions.html
Discovering the origin of code is a vast topic. We primarily use Python with some C/C++ , Rust and Go for performance sensitive code. We use Electron and JavaScript for our ScanCode Workbench.
Our domain includes text analysis and processing (for instance for copyrights and licenses detection), parsing (for package manifest formats), binary analysis (to detect the origin and license of binaries, primarily based on the corresponding source code), Web-based tools and APIs (to expose the tools and libraries as Web Services) and low-level data structures for efficient matching (such as Aho- Corasick and other automata).
Incoming students will need the following skills:
- Intermediate to strong Python programming. For some projects, strong C/C++ and/or Rust may be needed.
- Familiarity with git as a version control system
- Ability to set up your own development environment
- An interest in FOSS licensing and software composition analysis.
We are happy to help you get up to speed, but the more you are able to demonstrate ability and skills in advance, the more likely we are to choose your application!
We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:
-
Your name
-
Title of your proposal
-
Abstract of your proposal
-
Detailed description of your idea including explanation on why is it innovative and what it will contribute to the project
-
hint: explain your data structures and you planned main processing flows in details.
-
Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome)
-
Mention the details of your academic studies, any previous work, internships
-
Relevant skills that will help you to achieve the goal (programming languages, frameworks)?
-
Any previous open-source projects (or even previous GSoC) you have contributed to and links.
-
Do you plan to have any other commitments during GSoC that may affect your work? Any vacations/holidays? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious full time commitment during the GSoC time frame)
Join the chat online or by IRC at https://gitter.im/aboutcode-org/discuss introduce yourself and start the discussion!
The best way to demonstrate your capability would be to submit a small patch
ahead of the project selection for an existing issue or a new issue.
We will always consider and prefer a project submissions where you have
submitted a patch over any other submission without a patch.
You can pick any project idea from the list below. If you have other ideas that are not in this list, contact the team first to make sure it makes sense.
[NOTE: this is being updated and is not yet complete as of 2020-02-25]
Here is a list of candidate project ideas for your consideration. Your own ideas are welcomed too! Please chat about them to increase your chances of success!
- Improve Copyright detection accuracy and speed in ScanCode
- Improve file classification in ScanCode
- Improve License detection accuracy and speed in ScanCode
- Build ScanCode Installers
- Leverage ClearlyDefined scans to improve license detection
- Add additional package parsers
- Improve Workbench Database Schema
- Improve Workbench UI
- Improve Workbench "Conclusions"
- Add Reporting to Workbench
- View License text and local file contents