Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix: Virus Scanning - Not functioning PROD (Test as well) #3825

Closed
2 tasks
CarlyCotton opened this issue Oct 23, 2024 · 1 comment
Closed
2 tasks

Hotfix: Virus Scanning - Not functioning PROD (Test as well) #3825

CarlyCotton opened this issue Oct 23, 2024 · 1 comment
Assignees
Labels
Bug Something isn't working Complex More advanced issue and needs special attention.

Comments

@CarlyCotton
Copy link
Collaborator

CarlyCotton commented Oct 23, 2024

Issue Description
File uploads are not being scanned - hanging in pending.

Acceptance Criteria

  • Investigate and resolve.
  • Test with small files and files close to the limit of 15 MB.

Logs from PROD:
image.png

Additional Context

  • please note we are seeing similar behavior in the Test environment as well.
  • early investigation suggests that timeout is occurring on large files (>1mb).
  • consider tuning resources to reduce resource spikes
  • consider increasing connection timeout and db pull frequency
  • connection timeout is currently set at 30s
@CarlyCotton CarlyCotton added the Business Items under Business Consideration label Oct 23, 2024
@ninosamson ninosamson added the Bug Something isn't working label Oct 23, 2024
@Joshua-Lakusta Joshua-Lakusta changed the title Virus Scanning - Not functioning PROD Virus Scanning - Not functioning PROD (Test as well) Oct 23, 2024
@ninosamson ninosamson added Complex More advanced issue and needs special attention. Dev & Architecture Development and Architecture and removed Business Items under Business Consideration labels Oct 24, 2024
@andrewsignori-aot andrewsignori-aot removed the Dev & Architecture Development and Architecture label Oct 24, 2024
@andrewsignori-aot andrewsignori-aot self-assigned this Oct 28, 2024
github-merge-queue bot pushed a commit that referenced this issue Oct 30, 2024
The error was possible to be reproducible in the local development
machine using 17Kb files. Any file larger than 16Kb causes the error,
which means that the default chunk size allocated for the stream buffer
was exceeded.

The readable streams have two reading modes: [flowing and
paused](https://nodejs.org/api/stream.html#two-reading-modes).
The connection to `clamav` server was open but was not receiving data
for certain files, which led to the timeout. When the
`passthroughStream.resume();` was added it resulted in the data being
send immediately.
The `passthroughStream.readableFlowing` was defined as null, which means
the below ([source](https://nodejs.org/api/stream.html#three-states))
- readable.readableFlowing === null
- readable.readableFlowing === false
- readable.readableFlowing === true

In the [example from the clamscan](passthroughStream) package, the
`passthrough` stream is piped to an output stream which also resumes the
stream, as mentioned below.

> All
[Readable](https://nodejs.org/api/stream.html#class-streamreadable)
streams begin in paused mode but can be switched to flowing mode in one
of the following ways:
> 
> Adding a ['data'](https://nodejs.org/api/stream.html#event-data) event
handler.
> Calling the
[stream.resume()](https://nodejs.org/api/stream.html#readableresume)
method.
> Calling the
[stream.pipe()](https://nodejs.org/api/stream.html#readablepipedestination-options)
method to send the data to a
[Writable](https://nodejs.org/api/stream.html#class-streamwritable).

Local test results:
 - a 4MB file is processed in less than 3 seconds.
- a 15 MB file is processed in approximately 15 seconds.
- queued 50 files of 15 MB and it had a total processing time of 15
minutes with no errors.

No timeouts adjusted in this PR and we may not adjust any if the tests
on Openshift go well for this change.
The target of this change is to fix the current production issue only.
@ninosamson ninosamson changed the title Virus Scanning - Not functioning PROD (Test as well) Hotfix: Virus Scanning - Not functioning PROD (Test as well) Oct 30, 2024
dheepak-aot pushed a commit that referenced this issue Oct 30, 2024
The error was possible to be reproducible in the local development
machine using 17Kb files. Any file larger than 16Kb causes the error,
which means that the default chunk size allocated for the stream buffer
was exceeded.

The readable streams have two reading modes: [flowing and
paused](https://nodejs.org/api/stream.html#two-reading-modes).
The connection to `clamav` server was open but was not receiving data
for certain files, which led to the timeout. When the
`passthroughStream.resume();` was added it resulted in the data being
send immediately.
The `passthroughStream.readableFlowing` was defined as null, which means
the below ([source](https://nodejs.org/api/stream.html#three-states))
- readable.readableFlowing === null
- readable.readableFlowing === false
- readable.readableFlowing === true

In the [example from the clamscan](passthroughStream) package, the
`passthrough` stream is piped to an output stream which also resumes the
stream, as mentioned below.

> All
[Readable](https://nodejs.org/api/stream.html#class-streamreadable)
streams begin in paused mode but can be switched to flowing mode in one
of the following ways:
> 
> Adding a ['data'](https://nodejs.org/api/stream.html#event-data) event
handler.
> Calling the
[stream.resume()](https://nodejs.org/api/stream.html#readableresume)
method.
> Calling the
[stream.pipe()](https://nodejs.org/api/stream.html#readablepipedestination-options)
method to send the data to a
[Writable](https://nodejs.org/api/stream.html#class-streamwritable).

Local test results:
 - a 4MB file is processed in less than 3 seconds.
- a 15 MB file is processed in approximately 15 seconds.
- queued 50 files of 15 MB and it had a total processing time of 15
minutes with no errors.

No timeouts adjusted in this PR and we may not adjust any if the tests
on Openshift go well for this change.
The target of this change is to fix the current production issue only.
dheepak-aot added a commit that referenced this issue Oct 30, 2024
## Cherry pick commits

The following commits are cherry picked and committed into
`file-integration-fix/cherry-pick-branch` branch from `main` branch.

- #3833 - SIN Validation - Gender Field Limit - Commit:8040205 - SHA:
8040205
- #3831 - SINF restriction bridge mapping and rename to SINR -
Commit:3ed8c2e - SHA: 3ed8c2e
- #3825 - Virus Scanning - Not functioning PROD - Commit: 5bb5050 - SHA:
5bb5050
- #3745 - Modify process that reads SIN & CRA verification response
files - Commit: 20fcda4 - SHA: 20fcda4
andrewsignori-aot added a commit that referenced this issue Oct 31, 2024
When starting a socket to send a file content to `clamav` using the
passthrough and no content is sent, the connection stays open for an
indefinite amount of time (over 5 minutes).
Changed queue-consumers to throw an error and remove the job, **same
behavior when the file is not found**.

_We can have a separate ticket to apply validation for the file upload
to avoid accepting empty files, but I considered it outside this hotfix
issue._

ClamAV debug log when a empty file is send for scanning.
```
2024-10-31 08:32:55 node-clam: Initially testing socket/tcp connection to clamscan server.
2024-10-31 08:32:55 node-clam: Attempting to establish socket/TCP connection for "_ping"
2024-10-31 08:32:55 node-clam: using remote server: 172.18.0.6:3310
2024-10-31 08:32:55 node-clam: Established connection to clamscan server!
2024-10-31 08:32:55 node-clam: PONG!
2024-10-31 08:32:55 node-clam: Established connection to clamscan server!
2024-10-31 08:32:55 node-clam: Done with the full pipeline.
2024-10-31 08:32:55 node-clam: Socket/Host connection closed.
2024-10-31 08:33:00 node-clam: ClamAV has been scanning for 5 seconds...
...
2024-10-31 08:35:25 node-clam: ClamAV has been scanning for 150 seconds...
...
2024-10-31 08:45:30 node-clam: ClamAV has been scanning for 755 seconds...
...
```
@andrewsignori-aot
Copy link
Collaborator

andrewsignori-aot commented Nov 1, 2024

@Joshua-Lakusta @CarlyCotton, while testing this (now or in the future), please ensure that files with content greater than 16Kb are used. Files with less than 16Kb would work even before this fix, which would lead to a false positive.
During dev testing, we used files wth 15Kb, 17Kb, 4 Mb, and 15 Mb.

Files with no content will not be scanned right now. As per the conversation on Teams chat.

We are accepting file uploads even if there is no content at all in the file (zero bytes). This can cause an issue in the virus scan (because there is nothing to be scanned) and the questions are:
Should we prevent the user from uploading empty files assuming they are empty by mistake?
Should the system accept the files with no content and assume they will not be scanned for virus (there is nothing to be scanned)?
The problem is the virus scan is being mitigate in the hotfix and the file, if sent to the virus scan, will be reject, hence the idea would be to take one of the two above mentioned actions to prevent the file to be queued for virus scanning. Does it make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Complex More advanced issue and needs special attention.
Projects
None yet
Development

No branches or pull requests

4 participants