Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip Federal Restrictions File #3869

Closed
4 of 6 tasks
ninosamson opened this issue Oct 31, 2024 · 2 comments
Closed
4 of 6 tasks

Unzip Federal Restrictions File #3869

ninosamson opened this issue Oct 31, 2024 · 2 comments

Comments

@ninosamson
Copy link
Collaborator

ninosamson commented Oct 31, 2024

User Story
The federal restrictions file is passed to SIMS as a zip file. We need to unzip the file and then import the .txt file. Right now we are not expecting to unzip the file.

Acceptance Criteria

  • unzip the federal restrictions file when it comes to SIMS from the SFTP so we can process the .TXT file
  • archive the zip file during this process

Technical

  • Only one file is expected per zip file.
  • Use the out-of-box node lib.
  • Try to create the ability to decompress any zip file using the common method that reads the files.
  • Nice to have, create an E2E to have a mocked zip file.
@ninosamson ninosamson self-assigned this Oct 31, 2024
@ninosamson ninosamson changed the title Unzip Federal Restrictions File Hotfix: Unzip Federal Restrictions File Oct 31, 2024
@ninosamson ninosamson changed the title Hotfix: Unzip Federal Restrictions File HOTFIX: Unzip Federal Restrictions File Oct 31, 2024
@ninosamson ninosamson changed the title HOTFIX: Unzip Federal Restrictions File Unzip Federal Restrictions File Oct 31, 2024
@ninosamson ninosamson added the Business Items under Business Consideration label Oct 31, 2024
@CarlyCotton CarlyCotton self-assigned this Oct 31, 2024
@CarlyCotton
Copy link
Collaborator

Just a note for me to grab a sample and update ACs to be more fulsome

@Joshua-Lakusta Joshua-Lakusta added Dev & Architecture Development and Architecture and removed Business Items under Business Consideration labels Nov 5, 2024
@andrewsignori-aot andrewsignori-aot removed the Dev & Architecture Development and Architecture label Nov 14, 2024
@sh16011993 sh16011993 self-assigned this Dec 3, 2024
@ninosamson ninosamson assigned dheepak-aot and unassigned sh16011993 Dec 12, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 21, 2024
## Federal Restrictions - Unzip ZIP files coming from SFTP

- [x] Adjusted the existing regex to match the federal restrictions file
with `.zip` and `.ZIP` format.
- [x] Used a library [adm-zip](https://github.com/cthackers/adm-zip) to
extract the `.zip` file.
- [x] Updated the `sftp-integration-base` to use encoding `null` while
reading only the compressed file to avoid data corruption.
- [x] For the federal restrictions integration, the first file inside
the downloaded compressed archive is processed assuming there will
always be only one file in the `.zip` archive.

## Technical Investigations and performance findings

### APPROACH 1

Based on documentation and also testing, the nodejs in built library
Zlib supports archiving and extraction of only gunzip (.gz) files.
It does not support the same operations on a .zip files.
Extracting .zip with Zlib Gunzip (Doesn't support)


![image](https://github.com/user-attachments/assets/4ce66725-8ec0-4377-8983-cc424e0e9e19)


![image](https://github.com/user-attachments/assets/12b4d6c9-e17e-4f65-97cf-22843520e191)

Extracting .gz with Zlib Gunzip (Works Perfectly)


![image](https://github.com/user-attachments/assets/3b485ac8-5c36-4457-9034-fc2ed083316b)

### APPROACH 2 - Third party library

https://github.com/cthackers/adm-zip 

Tested code(Not the final code)


![image](https://github.com/user-attachments/assets/e815c661-2c8d-411a-9813-e4649ddfc47c)

It also provides non blocking method to read data. (getDataAsync)
 
It works perfectly.


![image](https://github.com/user-attachments/assets/3c37594a-b719-43d0-a0de-e35f77fd6c14)

Tested the upload with 139MB file with around 140,000 records. 
Time taken by the lib to read the file is 666ms

![image](https://github.com/user-attachments/assets/049965e3-2d38-4162-9c4f-7569b1384e33)
github-merge-queue bot pushed a commit that referenced this issue Dec 27, 2024
…obs with more resource demand (#4184)

## Issue

While running the federal restrictions file with bulk volume of data
(139MB File), we noticed that the required resource limit is going
beyond that maximum limit for both CPU and Memory.


![image](https://github.com/user-attachments/assets/6fa5174b-d2d2-41e7-9531-296f78f390e7)


![image](https://github.com/user-attachments/assets/ebf10fea-5a8c-40d2-9d4e-32272b592011)

## Solution

Hence to meet the demand for job(s) requiring more resources(Currently
federal restrictions is the only one such job), CPU and Memory limits
and requested have been bumped up a little.

## Outcome (Federal Restrictions file 139MB)


![image](https://github.com/user-attachments/assets/7d9be43a-f1d9-4b34-8eb8-8472af561555)
@dheepak-aot
Copy link
Collaborator

While running the federal restrictions file with bulk volume of data (139MB File), we noticed that the required resource limit is going beyond that maximum limit for both CPU and Memory.

image.png

image.png

Hence to meet the demand for job(s) requiring more resources(Currently federal restrictions is the only one such job), CPU and Memory limits and requested have been increased a little.

image.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants