Skip to content

Automatically exempt package versions that contain binaries that are too large to scan with Virus Total #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkevenaar opened this issue Apr 29, 2021 · 36 comments
Assignees
Labels
5 - Released The area addressed in the ticket has been released in the product and is generally available. Bug Tickets that represent defects/bugs. package-scanner

Comments

@mkevenaar
Copy link

mkevenaar commented Apr 29, 2021

With the new changes on the moderation workflow, all files that are bigger than 200MB are flagged for moderation.

For me alone that would be 10-15 packages, with around 100 releases per year.

For what I know. there are a lot of packages that have (binary) files bigger than 200MB.

I think an automated exemption for files >= 200MB should be a good way to go. Perhaps with an warning that the files haven't been scanned due to the size of the files or something like that.

┆Issue is synchronized with this Gitlab issue by Unito
┆Milestone: 1.0.0

@svelez
Copy link

svelez commented Apr 29, 2021

Just chiming in that my (single) package is hitting this workflow as well.

It does surprise me that a change to introduce a manual validation step for a potentially large number of packages did make it in to the workflow, but I am not certain that if technical or cost limitations prevent virus scanning large packages, that automatically exempting all packages of a certain size would be desirable.

This would make it too easy for someone wishing to distribute malware to artificially inflate the size of a trojan horse.... and we all know how well users read warnings... and how effective they are when they happen too frequently.

To be honest, I can't think of a good solution, but it is a bummer that approval can now take longer, and that at the end of the day, the end user can't really have that much more confidence in the safety of the package.

@numericalfreedom
Copy link

Also suffering:
https://community.chocolatey.org/packages/ggu-software/20.21.010
https://community.chocolatey.org/packages/ggu-software-international/20.21.010

@flcdrg
Copy link
Member

flcdrg commented May 21, 2021

I'm testing out a possible workaround for this.

The VirusTotal v3 API doesn't appear to have upload size limits, so theoretically if you can pre-emptively upload the files to VirusTotal (eg. by using the vt CLI - https://github.com/VirusTotal/vt-cli) before you push your Chocolatey package, then assuming VirusTotal has had enough time to analyse the files first, then maybe that will allow Chocolatey to pass the virus scanning stage.

The package I'm trying this out on is https://community.chocolatey.org/packages/azure-functions-core-tools-3/3.0.3477

I'll post back here once the scanning status has updated to let you know if it worked or not.

@flcdrg
Copy link
Member

flcdrg commented May 21, 2021

Looks like that worked!

@numericalfreedom
Copy link

@flcdrg , what would be the exact procedure with vt-cli? Does Virus Total save some information in the checked .msi-file or record the successful check in a database? If I check the package before uploading, is that sufficient before pushing its .nuget file to Chocolatey?

@flcdrg
Copy link
Member

flcdrg commented May 23, 2021

I used vt scan file path-to-file.exe --apikey xxxxx to upload the file to VirusTotal.

VirusTotal uses a sha256 to index the results - so you can query on the sha value and it will return the status of that file.

@numericalfreedom
Copy link

@flcdrg This seems to be a good proposal. But:

  1. The desired malware scan should be seamlessly included into the Choco test chain that is being run on pure Window$ (no Cygwin)

  2. VirusTotal technology is already included into Micro$oft Defender (cooperation)

  3. Micro$oft Defender can also be operated in command line mode and could be simply executed in the download directory of the test platform

"How to use Microsoft Defender Antivirus with Command Prompt on Windows 10"
https://www.windowscentral.com/how-use-windows-defender-command-prompt-windows-10

Something like this should be simply included into the test script

MpCmdRun -Scan -ScanType 3 -File C:\Users\username\Downloads -Timeout 1

The Choco community could maybe also help to improve Window$ Defender

@flcdrg
Copy link
Member

flcdrg commented May 23, 2021

I'm not using cygwin (as far as I know?).

Sure, my workaround is just that. Hopefully the Chocolatey team can improve the process for everyone, but in the meantime it seems to be a way to unblock packages that reference large files being automatically approved.

I've modified the au scripts used by my Redgate packages (as they get updated every few days) to push to VirusTotal. I'll be interested to see if that works.

@numericalfreedom
Copy link

@flcdrg AND Chocolatey develpers

Here my tested proposal for simply circumventing the serious issue with the virus scan.

VirusTotal technology is already included into Micro$oft Defender (due to cooperation) !!! The Choco community could maybe also help to improve Window$ Defender.

Micro$oft Defender can also be operated in command line mode and could be simply executed in the download directory of the test platform (Check this: "How to use Microsoft Defender Antivirus with Command Prompt on Windows 10"
https://www.windowscentral.com/how-use-windows-defender-command-prompt-windows-10 Something like this should be simply included into the test script MpCmdRun -Scan -ScanType 3 -File C:\Users\username\Downloads -Timeout 1)

Here the demo on my latest (suffering) uploads:

https://community.chocolatey.org/packages/ggu-software/20.21.011
https://community.chocolatey.org/packages/ggu-software-international/20.21.011

Check on ggu-software-20-21-011:

"C:\Program Files\Windows Defender\MpCmdRun.exe" -Scan -ScanType 3 -DisableRemediation -File C:\Users\Master\Chocolatey\ggu-software-20-21-011\COMPLETE_GGU_SOFTWARE_20_21_011.msi
Scan starting...
Scan finished.
Scanning C:\Users\Master\Chocolatey\ggu-software-20-21-011\COMPLETE_GGU_SOFTWARE_20_21_011.msi found no threats.

Check on ggu-software-international-20-21-011:

C:\Users\Master\Chocolatey\ggu-software-international-20-21-011>"C:\Program Files\Windows Defender\MpCmdRun.exe" -Scan -ScanType 3 -DisableRemediation -File C:\Users\Master\Chocolatey\ggu-software-international-20-21-011\COMPLETE_GGU_SOFTWARE_INTERNATIONAL_20_21_011.msi
Scan starting...
Scan finished.
Scanning C:\Users\Master\Chocolatey\ggu-software-international-20-21-011\COMPLETE_GGU_SOFTWARE_INTERNATIONAL_20_21_011.msi found no threats.

The Micro$oft Defender command line check can be very easily implemented into the package validation sequence as a third step. The virus definitions can be also simply updated:

"C:\Program Files\Windows Defender\MpCmdRun.exe" -SignatureUpdate
Signature update started . . .
Signature update finished. No updates needed

This verifies, that Micro$oft Defender hast the latest virus and malware definitions loaded.

A series of packages is stalled in updating on Chocolatey.org due to the missing solution to this issue. Please think again over using Micro$oft Defender protecting many devices of users not believing in the Kaspersky and Norton story.

@pauby
Copy link
Member

pauby commented May 25, 2021

@numericalfreedom How will that work where Microsoft Defender is not available?

@numericalfreedom
Copy link

@pauby Microsoft Defender can eventually be installed. Since Windows 7, Microsoft Defender is around and should theoretically be available on the package testing platforms. I am not informed about all the details. Here some alternative suggestions:

  1. Migrating the virtual machine of the testing system to Windows 10 could be a relatively simple solution.

  2. A more elegant trick could mean including the malware check into the script of "choco push", requiring, that potentially infected .msi or .exe installation media have to reside with the same sha256 hash like in the indicated remonte location in the .nuget directory and could be cross checked against malware on the fly.

The Microsoft Defender service must run in order that the command line mode works. Also, the command line mode requires an absolue path to the directory or file to be checked. The check step is very error prone, if something is not conforming to the rules.

These are just some ideas of an external observer, I would be very happy about a quick solution, because both of my packages boost to over 700MBs and fail the malware check making my client nervous.

@pauby
Copy link
Member

pauby commented May 27, 2021

@numericalfreedom

Microsoft Defender can eventually be installed. Since Windows 7, Microsoft Defender is around and should theoretically be available on the package testing platforms. I am not informed about all the details.

This requires Windows Defender to be available. My question was really around what to do if it's not. Having it available is not always an option so we need something that doesn't rely on it.

Migrating the virtual machine of the testing system to Windows 10 could be a relatively simple solution.

Your suggestion is we move our testing infrastructure to Windows and Windows Defender without understanding the full consequences of doing so. This isn't an option at this time and nor would it be a relatively simple solution.

A more elegant trick could mean including the malware check into the script of "choco push", requiring, that potentially infected .msi or .exe installation media have to reside with the same sha256 hash like in the indicated remonte location in the .nuget directory and could be cross checked against malware on the fly.

We do have the option of using whatever Virus Scanner you need within the licensed editions of Chocolatey when the package is downloaded. I appreciate you are talking about at the push stage and my suggestion for that feature would be to raise another issue here for that to be considered as we are both straying off-topic here.

@pauby
Copy link
Member

pauby commented May 27, 2021

@flcdrg This could be something we consider for a future version of Package Scanner.

@pauby pauby added 0 - Backlog Where tickets start after being triaged. This means the ticket has targeted milestone/labels. package-scanner labels May 27, 2021
@numericalfreedom
Copy link

numericalfreedom commented May 27, 2021

@pauby Thank You for Your instructive comments.

The way @flcdrg proposes could work fine. I was wrong, the vt-cli works directly from windows command line prompt, just VT recommends the use of cygwin for performance.

If I upload my huge over 700MB .msi-files to VT, I receive a lengthy http://-URL to check back for the results. I do not know how to make finally "meet both ends" with VT and triage in Choco. For small files, vt scan file returns with a sha256 hash, but this should also be checked back during the third verification step. Quo vadis Choco?

@numericalfreedom
Copy link

numericalfreedom commented May 27, 2021

Looks like that worked!

@flcdrg The package failed the virus check step also.

@flcdrg
Copy link
Member

flcdrg commented May 27, 2021

I'm not sure whether there is still a size limit. I've raised an issue over in the vt-cli repo about a weird error I'm getting trying to upload a 1GB file - VirusTotal/vt-cli#33

@flcdrg
Copy link
Member

flcdrg commented May 27, 2021

FYI the file I uploaded the first time (that worked) was larger than 200MB but less than 1GB

@numericalfreedom
Copy link

@flcdrg The VT part triggered by a command line operation is clear for me. If VT is having sufficient time for checking a large file, the final result is a sha256 hash. But the question is, how does the third validation step use this information? If the validation script triggers a VT check with a command line operation, the check begins with uploading the file again. In order to "meet both ends", VT should save the sha256 value along with the check procedure for a while and the validation routine should have the sha256 in the .nuspec in order to be able to control the malware status quickly. That would need the extension of the .nuspec specification.

@flcdrg
Copy link
Member

flcdrg commented May 28, 2021

@numericalfreedom I don't think any of that is necessary. I don't know how the Chocolatey scanning stage functions internally, but it seems like it is smart enough to figure out that a sha256 of the same file exists and if the results are fine then it will pass.
It presumably calculates that SHA itself and then queries the VirusTotal API.
No need to extend the nuspec to include it.

@numericalfreedom

This comment has been minimized.

@flcdrg

This comment has been minimized.

@numericalfreedom

This comment has been minimized.

@flcdrg
Copy link
Member

flcdrg commented May 29, 2021

Just found out in the vt-cli linked issue that the maximum upload size for that tool (and possibly the VirusTotal v3 API) is 650MB.

@flcdrg
Copy link
Member

flcdrg commented May 29, 2021

Adding another data point:
The Red Gate SQL Toolbelt download is 214MB. This version was automatically approved today, which would be because the au scripts are uploading the file to VirusTotal beforehand.

@numericalfreedom
Copy link

Just found out in the vt-cli linked issue that the maximum upload size for that tool (and possibly the VirusTotal v3 API) is 650MB.

If a file larger than 650MB is uploaded with the vt-cli to VT, a lengthy message appears in the end with a http:// URL that allows to retrive the hash to check back the malware test with 'vt file sha256 ...' result on VT with a post request. The malware testing at VT seems to take some time.

@numericalfreedom

This comment has been minimized.

@pauby
Copy link
Member

pauby commented Jun 1, 2021

@numericalfreedom This isn't the correct place for this. Please add these requests to the package moderation comments on the package page.

@numericalfreedom
Copy link

@pauby Sorry, I know. I also have done that parallely. The comment is to remember the developpers that all packages over about 200MB are suffering for the moment awaiting urgent remediation action in this issue. There is no way how the community can help here.

@TheCakeIsNaOH
Copy link
Member

So, there are a relatively large number of packages with individual binaries over 200mb, but under 650mb. I probably use 15-20 of these personally.

While packages with files over 650mb are rarer (I use maybe two), and from the other ones that I know of tend to be not updated frequently.

Therefore, I think allowing the package scanner to upload binaries over 200mb would be fairly effective at reducing the number of packages needing to be moderated manually and reducing the queue. Maybe the current API setup would work and just increasing the limit on the scanner end wouldbe enough. Perhaps there is a newer API call or endpoint to upgrade too. Or maybe if the API that the vt-cli uses is not publicly useable, perhaps you could contact VirusTotal to get access to it?

@pauby
Copy link
Member

pauby commented Jun 2, 2021

@numericalfreedom I understand that, however that's what this issue represents. The requests for package moderation goes on the package page itself. We have many priorities and this is something we will look at as soon as we can but being realistic it's not going to happen in the immediate future.

@TheCakeIsNaOH It was suggested above (I think by @flcdrg) that there is a v3 API that could be used to eliminate this limit.

@majkinetor
Copy link

majkinetor commented Jun 3, 2021

I think 200MB limit is too small in current day and age. It was introduced years ago and there was that gallery bug also with larger packages.

So

  1. Years passed, so size should go up because in Electron age, todo apps can have 100MB+
  2. Packages with multiple arch are particularly problematic as limit is shared among them.
  3. Workaround for single arch fitting in is workable although not nice, for multi-arch is highly problematic because it boils down to multiple packages or non-embedded packages (which are next to junk for me at least and I don't use them except for few tools like browsers)
  4. The gallery bug is fixed

I think around 500MB would be good next limit, but even smaller changes would be beneficial, but not bellow 350MB IMO. Ideally 1 GB value would be nice for foreseeable future.

@gep13
Copy link
Member

gep13 commented Jun 3, 2021

@majkinetor I think you are talking about a different issue here. If I am not mistaken, you are talking about the limit that is imposed on an embedded package that is being pushed to the Chocolatey Community Repository. This issue is talking about a limit on the size of the files that are being pushed to VirusTotal. These are not the same things.

I think what you are describing deserves its own issue.

@pauby
Copy link
Member

pauby commented Jun 4, 2021

@gep13 That was my fault. I pointed @majkinetor to this issue. Apologies.

@rackerbenoit
Copy link

It is an issue for these packages too. I confirmed with Octopus support. https://community.chocolatey.org/packages/OctopusDeploy

@flcdrg
Copy link
Member

flcdrg commented Jan 26, 2022

I was talking to Octopus last year about this and mentioned to them about the vt-cli workaround that should work for them. I note they've got a different problem with https://community.chocolatey.org/packages/OctopusDeploy.Tentacle/6.1.1284#versionhistory at the moment though (unrelated to virus scanning)

@sync-by-unito sync-by-unito bot added Bug Tickets that represent defects/bugs. 2 - Working Tickets that are currently being worked on. and removed 0 - Backlog Where tickets start after being triaged. This means the ticket has targeted milestone/labels. labels Jun 17, 2022
@sync-by-unito sync-by-unito bot added 3 - Review This is for tickets that need to be reviewed prior to being complete. and removed 2 - Working Tickets that are currently being worked on. labels Jun 28, 2022
@sync-by-unito sync-by-unito bot closed this as completed Jul 14, 2022
@sync-by-unito sync-by-unito bot added 4 - Done Tickets that have been completed and are ready for release. and removed 3 - Review This is for tickets that need to be reviewed prior to being complete. labels Jul 14, 2022
@sync-by-unito sync-by-unito bot changed the title [package-scanner] Files bigger than 200MB Automatically exempt package versions that contain binaries that are too large to scan with Virus Total Sep 1, 2022
@sync-by-unito sync-by-unito bot added 5 - Released The area addressed in the ticket has been released in the product and is generally available. and removed 4 - Done Tickets that have been completed and are ready for release. labels Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Released The area addressed in the ticket has been released in the product and is generally available. Bug Tickets that represent defects/bugs. package-scanner
Projects
None yet
Development

No branches or pull requests

10 participants