Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CodeQL Workflow for Code Security Analysis #3476

Merged
merged 8 commits into from
Nov 6, 2023
Merged

Conversation

b4yuan
Copy link
Contributor

@b4yuan b4yuan commented Oct 28, 2023

Summary

This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.

What is CodeQL

CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.

For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

What this PR does

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that

  • Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
  • Runs daily.
  • Excludes queries with a high false positive rate or low-severity findings.
  • Does not display results for git submodules, focusing only on our own codebase.

Validation

To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Using the workflow results

If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:

  1. Under the repository name, click on the Security tab.
  2. In the left sidebar, click Code scanning alerts.

Is this a good idea?

We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.

We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205

Review of engineering hazards

License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:

Here's what you may also do with the Software, but only with an Open Source Codebase and subject to the License Restrictions provisions below:

Perform analysis on the Open Source Codebase.

If the Open Source Codebase is hosted and maintained on GitHub.com, generate CodeQL databases for or during automated analysis, CI, or CD.

False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.

Add CodeQL Workflow for Code Security Analysis

This pull request introduces a CodeQL workflow to enhance the security analysis of our repository. CodeQL is a powerful static analysis tool that helps identify and mitigate security vulnerabilities in our codebase. By integrating this workflow into our GitHub Actions, we can proactively identify and address potential issues before they become security threats.

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
- Runs on every push and pull request to the main branch.
- Excludes queries with a high false positive rate or low-severity findings.
- Does not display results for third-party code, focusing only on our own codebase.

Testing:
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Deployment:
Once this pull request is merged, the CodeQL workflow will be active and automatically run on every push and pull request to the main branch. To view the results of these code scans, please follow these steps:
1. Under the repository name, click on the Security tab.
2. In the left sidebar, click Code scanning alerts.

Additional Information:
- You can further customize the workflow to adapt to your specific needs by modifying the workflow file.
- For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation.

Signed-off-by: Brian <[email protected]>
Add CodeQL Workflow for Code Security Analysis

This pull request introduces a CodeQL workflow to enhance the security analysis of our repository. CodeQL is a powerful static analysis tool that helps identify and mitigate security vulnerabilities in our codebase. By integrating this workflow into our GitHub Actions, we can proactively identify and address potential issues before they become security threats.

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
- Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
- Runs daily.
- Excludes queries with a high false positive rate or low-severity findings.
- Does not display results for git submodules, focusing only on our own codebase.

Testing:
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Deployment:
Once this pull request is merged, the CodeQL workflow will be active and automatically run on every push and pull request to the main branch. To view the results of these code scans, please follow these steps:
1. Under the repository name, click on the Security tab.
2. In the left sidebar, click Code scanning alerts.

Additional Information:
- You can further customize the workflow to adapt to your specific needs by modifying the workflow file.
- For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

Signed-off-by: Brian <[email protected]>
Add CodeQL Workflow for Code Security Analysis

This pull request introduces a CodeQL workflow to enhance the security analysis of our repository. CodeQL is a powerful static analysis tool that helps identify and mitigate security vulnerabilities in our codebase. By integrating this workflow into our GitHub Actions, we can proactively identify and address potential issues before they become security threats.

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
- Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
- Runs daily.
- Excludes queries with a high false positive rate or low-severity findings.
- Does not display results for git submodules, focusing only on our own codebase.

Testing:
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Deployment:
Once this pull request is merged, the CodeQL workflow will be active and automatically run on every push and pull request to the main branch. To view the results of these code scans, please follow these steps:
1. Under the repository name, click on the Security tab.
2. In the left sidebar, click Code scanning alerts.

Additional Information:
- You can further customize the workflow to adapt to your specific needs by modifying the workflow file.
- For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

Signed-off-by: Brian <[email protected]>
@raysan5
Copy link
Owner

raysan5 commented Oct 29, 2023

@b4yuan It seems the check process failed.

In any case, how does this PR compare to #3152?

@b4yuan
Copy link
Contributor Author

b4yuan commented Oct 29, 2023

@raysan5 check process failed because there are some errors found by CodeQL. If the analysis finds any errors, it will invoke fail_on_error.py which prevents the workflow from 'passing.'

This PR is different than #3152 because it allows for rule filtering; uploads the results as an artifact; and allows you to view the results of the analysis in the Security tab under Code scanning alers.

Add CodeQL Workflow for Code Security Analysis

This pull request introduces a CodeQL workflow to enhance the security analysis of our repository. CodeQL is a powerful static analysis tool that helps identify and mitigate security vulnerabilities in our codebase. By integrating this workflow into our GitHub Actions, we can proactively identify and address potential issues before they become security threats.

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
- Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
- Runs daily.
- Excludes queries with a high false positive rate or low-severity findings.
- Does not display results for git submodules, focusing only on our own codebase.

Testing:
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Deployment:
Once this pull request is merged, the CodeQL workflow will be active and automatically run on every push and pull request to the main branch. To view the results of these code scans, please follow these steps:
1. Under the repository name, click on the Security tab.
2. In the left sidebar, click Code scanning alerts.

Additional Information:
- You can further customize the workflow to adapt to your specific needs by modifying the workflow file.
- For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

Signed-off-by: Brian <[email protected]>
@raysan5
Copy link
Owner

raysan5 commented Nov 1, 2023

@jorgectf Please, could you take a look to this PR? Is it similar to yours?

Are the shell and python scripts really required?

@raysan5
Copy link
Owner

raysan5 commented Nov 3, 2023

@b4yuan After some thinking about this PR, I'm afraid I'm not merging it. Actually, the PR seems generated in an automatic way, I saw you pushed similar PRs to multiple repos. Personally it does not inspire confidence to me. Sorry.

If there was a more detailed explanation why this addition is specifically needed and beneficial for raylib I could reconsider my position but at this moment I can't see why raylib project specifically needs to integrate this kind of system.

@b4yuan Are you using raylib in any specific project or in any specific environment where this kind of CodeQL validations are required? Why did you consider it is specifically beneficial for raylib project? Are you pushing CodeQL to multiple repos as some kind of commercial strategy for a company?

Feel free to answer me here or just send me a private mail to: [email protected]

Thanks.

@raysan5 raysan5 closed this Nov 3, 2023
@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 3, 2023

@raysan5 Thanks for the comments. While the PR itself is automated, they were only raised after we tested CodeQL on our own forks of each repositories. We actually ran CodeQL on these repositories (raylib included, of course) months ago and raised PRs for bugs we found. Specifically in raylib's case, we raised #3021 to fix a bug we found with CodeQL.

The goal here is to improve open-source embedded projects. Even if you think that raylib doesn't need a security analysis, we would suggest your consideration, as it is hard to judge where the open-source code is running. There are many stories of unexpected use cases that resulted in security problems down the road.

In the paper: https://arxiv.org/abs/2310.00205, we present our findings here and why we think that CodeQL is a beneficial tool that devs should leverage. We are not commercially interested in any way; we are only interested in improving code quality.

@orcmid
Copy link
Contributor

orcmid commented Nov 3, 2023

@b4yuan @raysan5
Wow, the abstract tells it all. In particular, I gather that CodeQL is a GitHub facility, so there is no commercial interest other than GitHub's in having secure code in repositories under its care.

I have received warning notices, usually on dependencies that have security issues. It takes a little work to determine whether my usage (of a JavaScript framework for example) is transitively vulnerable or not. Just the same, I'd rather have that checking than not.

I don't use GitHub workflows, so I might have to rethink that on some code that I will need very strong safety assurance about. Thanks for the information. I am mindful that these fixtures do not prove code is safe, only that it appears not to be.

PS: I have been oblivious to the arrival of the Security tab on my repository views. There's much to digest here, and much that can be useful for raylib whether or not CodeQL is set up.

PPS: It appears that one can do some of this on a fork, so maybe there's a benign way to do this without negotiating with repo owners. I am not likely to try this, though.

@raysan5
Copy link
Owner

raysan5 commented Nov 3, 2023

@b4yuan thank you very much for your answers and excuse if my tone could have been a bit rude.

I checked the generated reports and I even reviewed some of the issues (64d64cc) but I noticed many of them are not that relevant and also related to raylib external libraries, where I can't do much from my side.

Some questions (and excuse my insistence on this topic):

  • Would it be possible to avoid the check on src/external libraries? or that's not recommended?
  • Would it be possible to avoid the shell and python scripts and integrate its logic directly on the workflow?
  • I saw the check running takes some time, does it has any cost over running time quota on an open-source project like raylib?
  • I don't think raylib is so critical to run it on every push, would it be ok to just run it manually when desired or once a week?

Again, if you feel more confortable answering in private, you can write me to [email protected]

Thanks for your answers.

@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 3, 2023

@orcmid @raysan5 That's correct, CodeQL was developed by GitHub.

Regarding dependencies... I also want to add that third party lib paths can be filtered out in this section:

    # Filter out rules with low severity or high false positve rate
    # Also filter out warnings in third-party code
    - name: Filter out unwanted errors and warnings
      uses: advanced-security/filter-sarif@v1
      with:
        patterns: |
          -**:cpp/path-injection
          -**:cpp/world-writable-file-creation
          -**:cpp/poorly-documented-function
          -**:cpp/potentially-dangerous-function
          -**:cpp/use-of-goto
          -**:cpp/integer-multiplication-cast-to-long
          -**:cpp/comparison-with-wider-type
          -**:cpp/leap-year/*
          -**:cpp/ambiguously-signed-bit-field
          -**:cpp/suspicious-pointer-scaling
          -**:cpp/suspicious-pointer-scaling-void
          -**:cpp/unsigned-comparison-zero
          -**/cmake*/Modules/**
         #dir/path/here

and CodeQL would not analyze those directories. (I looked for git submodules in raylib, but didn't seem to find any. If there are any third party directory paths that you want to be filtered out, let me know, and I can update the workflow accordingly.) This workflow uploads all of CodeQL's errors to the Code scanning alerts section under the Security tab. From there, you can mark bugs that CodeQL finds as 'will not fix', 'false positive', etc. If you find that a rule yields especially many false positives, you can filter that rule altogether in the above snippet. (We've also suggested the rules that are top contributors to false positives in the PR body, which you may find helpful).

Of course, it is up to you to weigh the benefits that CodeQL brings (since we have found a merged bug fix in this repo before found by CodeQL, this suggests that it likely would add value), compared to the drawbacks (perhaps false positive rate, maintenance, etc). We are very transparent with the drawbacks in the paper, but we find that improving false positive rate can be improved with rule filtering, and we don't believe maintenance will consume too many resources, though, again, that is up to the judgement of the devs. Not only have we already configured CodeQL for this repo (and tested that it works on our fork), the streamlined way that GitHub presents CodeQL analysis findings in its Security tab enables this tool to be used much more easily.

@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 3, 2023

@raysan5 Looks like we posted at the same time lol.

Would it be possible to avoid the check on src/external libraries? or that's not recommended?

Yes, I can update this shortly.

Would it be possible to avoid the shell and python scripts and integrate its logic directly on the workflow?

I'll look into this as well.

I saw the check running takes some time, does it has any cost over running time quota on an open-source project like raylib?

From https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration: "GitHub Actions usage is free for standard GitHub-hosted runners in public repositories, and for self-hosted runners. For private repositories, each GitHub account receives a certain amount of free minutes and storage for use with GitHub-hosted runners, depending on the account's plan. Any usage beyond the included amounts is controlled by spending limits. For more information, see "About billing for GitHub Actions."

As long as resources are available, it should be free for an open-source repo like raylib.

I don't think raylib is so critical to run it on every push, would it be ok to just run it manually when desired or once a week?

Running on every push is already commented out; it's just there if you wanted it to be triggered on every push. Currently the workflow is scheduled to run daily, but I can also make changes to update it weekly as well.

Will update/ping when your asks are addressed.

@raysan5 raysan5 reopened this Nov 3, 2023
@raysan5
Copy link
Owner

raysan5 commented Nov 3, 2023

@b4yuan thanks for your answers. Let's see if the following concerns could be addressed:

  • External libraries used by raylib, located in raylib/src/external directory, but note that most of those libraries are single-file header-only libraries and are actually build with raylib modules, not sure if the system could avoid those checks because, technically, on compilation those libraries are merged inside the modules code.
  • External shell/python scripts. I did a quick check and the shell script should be avoidable, about the python script, not sure if it can be integrated in some way on the workflow...
  • Running time. If possible I would launch it once-a-week, in any case, I can also launch it manually when required, right?

I'll try to review the workflow more carefully when merged to understand the process and be able to maintain it.

Thanks.

EDIT: I see the Errors detected actually belong to an external library (GLFW), I'm afraid I can't do much there beside reporting them. Is it possible to skip that fail on error?

@orcmid
Copy link
Contributor

orcmid commented Nov 3, 2023

@raysan5 @b4yuan

  • External libraries used by raylib, located in raylib/src/external directory, but note that most of those libraries are single-file header-only libraries and are actually build with raylib modules, not sure if the system could avoid those checks because, technically, on compilation those libraries are merged inside the modules code.

EDIT: I see the Errors detected actually belong to an external library (GLFW), I'm afraid I can't do much there beside reporting them. Is it possible to skip that fail on error?

These are interesting dependency cases. In some sense there need to be warnings but they have to be handled at the origin. That they can be false positives or irrelevant because of what the dependency is can be troublesome. The standing can change over time, depending on changes in raylib, so suppressing them has to be cautious.

Is there a way to make them warnings rather than fail/error cases? Then we can hope the originator provides a fix assuming its not a false positive.

PS: There is a different though related issue concerning raylib externals. The raylib release cadence does not adjust to fixes and update in the external dependencies. Rather, a raylib release simply incorporates the latest dependency releases as they are at the time. There has not been, as far as I know, any effort to introduce a raylib security release to quickly promulgate fixes to security vulnerabilities in an external. Perhaps it has never come up.

@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 3, 2023

@raysan5 @orcmid

At the moment I have excluded src/external//glfw from the CodeQL results. I've also incorporated the build script into the workflow. To remove fail_on_error.py is probably possible, but not without making the workflow file unnecessarily long and confusing. What do you think?

(I also can't view the results in the Security tab of raylib. Let me know if the errors from the external library are no longer being analyzed.)

@raysan5
Copy link
Owner

raysan5 commented Nov 3, 2023

@b4yuan Thanks for the review! I just run it again and it seems no error is detected but still it seems to report some error:

image

image

(I filtered by Severity: Error)

What implies removing fail_on_error.py? Reporting the process finished correctly despite containing the errors? If that's the case, I think it can be removed.

@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 3, 2023

The analysis step will pass even if there are errors found, because it finished 'correctly'. fail_on_error.py parses the results, and if there are any errors, forces the entire workflow to fail. This may come in handy visually when inspecting PRs, weekly checks, etc.

(I will test this on our fork and see what might be going on). Alternatively, if you think this step is excessive, the script can be removed altogether.

@jorgectf
Copy link

jorgectf commented Nov 4, 2023

👋 Late to the convo, but just wanted to highlight a setting that might replace the script sharing the same objective: https://github.com/raysan5/raylib/settings/security_analysis

Protection rules under Code Scanning:

image

@raysan5
Copy link
Owner

raysan5 commented Nov 4, 2023

@jorgectf Oh! Thank you very much for your answer! Actually I was wondering that same thing, it was a bit strange to me the need of an error-scanning script to set the fail in the process...

@b4yuan Given the new details provided, could then the script just be removed from the workflow?

@b4yuan
Copy link
Contributor Author

b4yuan commented Nov 6, 2023

Thanks @raysan5 @jorgectf for bringing this up. Updated with the removal of the python script.

@raysan5
Copy link
Owner

raysan5 commented Nov 6, 2023

Running workflow...

@raysan5 raysan5 merged commit b216e2f into raysan5:master Nov 6, 2023
2 checks passed
@raysan5
Copy link
Owner

raysan5 commented Nov 6, 2023

@b4yuan @jorgectf thank you very much for working on this improvement, reviewing it and explaining all the details. I'm merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants