Skip to content

API: Extremely Poor Docker Resource Utilization Efficiency #2730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
palisadoes opened this issue Dec 2, 2024 · 158 comments
Closed

API: Extremely Poor Docker Resource Utilization Efficiency #2730

palisadoes opened this issue Dec 2, 2024 · 158 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@palisadoes
Copy link
Contributor

Describe the bug

We run a demonstration instance of Talawa-API on a GoDaddy VPS server running Ubuntu. It has the following resources:

  1. 1 core
  2. 2 GB of RAM
  3. 40 GB of disk

Other information:

  1. The demo instance is intended to create an evaluation environment for new GtiHub contributors and users alike as they decide to use Talawa. The DB of the demo instance gets reset every day.
  2. Talawa API runs natively on this VPS server with acceptable performance with one user. The load average is approximately 1, which is the target value for a system with only 1 core.
  3. When Talawa API runs on the server using docker. The load average reaches 130, the swap process is the top CPU resource user. The system is so overloaded that only one ssh session at a time is achievable.

The purpose of this issue is to find ways to tune all Talawa-API Dockerfile and app configurations to lower its CPU and RAM utilization by at least 75%

  1. With the current Docker performance very few developers or end users will want to try Talawa themselves.
  2. This has been a recurring issue with Talawa API. The poor performance threatens the success of our current MongoDB based MVP.

To Reproduce
Steps to reproduce the behavior:

  1. Run Talawa-API on a system
  2. See excessive resource utilization

Expected behavior

  1. Acceptable usage information such that it can run easily on a mid-range laptop without impacting its performance

Actual behavior

  1. Poor performance

Screenshots

image

Additional details
Add any other context or screenshots about the feature request here.

Potential internship candidates

Please read this if you are planning to apply for a Palisadoes Foundation internship

@palisadoes palisadoes added the bug Something isn't working label Dec 2, 2024
@github-actions github-actions bot added feature request unapproved Unapproved for Pull Request labels Dec 2, 2024
@prayanshchh
Copy link
Contributor

can u please assign, I want to work on this issue but I will need guidance

@varshith257
Copy link
Member

This mostly related of reducing docker image size

@varshith257 varshith257 removed the unapproved Unapproved for Pull Request label Dec 3, 2024
@palisadoes palisadoes changed the title Extremely Poor Docker Resource Utilization Efficiency API: Extremely Poor Docker Resource Utilization Efficiency Dec 4, 2024
@palisadoes palisadoes added good first issue Good for newcomers and removed feature request labels Dec 4, 2024
@prayanshchh
Copy link
Contributor

prayanshchh commented Dec 6, 2024

Diffrent ways to approach this issue

1. Multi-Stage Builds
Using a multi-stage build can help separate the build and runtime environments, ensuring that only production-ready artifacts are included in the final image. This can be achieved by:

Installing dependencies and building the application in the first stage.
Copying only the necessary files (e.g., dist, node_modules) into a minimal runtime stage.

2. Optimizing Base Images
Switching to optimized base images can dramatically reduce size:

Baseline Image (Full Node.js): ~900 MB
Using Multi-Stage with Slim: ~400–500 MB
Using Multi-Stage with Alpine: ~250–300 MB
With Distroless: ~150–200 MB

3. Using Compression Tools
Tools like docker-slim can further compress the final image by analyzing and stripping unused dependencies and files:
With docker-slim: ~100–150 MB.

please suggest a method that doesn't impact comaptibility with codebase

@palisadoes
Copy link
Contributor Author

@prayanshchh

Please investigate the best solution and propose it after testing on your system. It's not just RAM, but also ways to reduce the CPU overhead.

@prayanshchh
Copy link
Contributor

alright sir

@vasujain275
Copy link
Contributor

@palisadoes @prayanshchh

The main problem I found with the API is that we have to run it in dev mode in the production Docker environment because our build process for the Talawa API is broken, so we can't use npm run start. If we resolve the build issue, we can drastically improve performance and security of the docker container.

I think @varshith257 also tried to solve the build process issue a few months back, any upadates on that?

@palisadoes
Copy link
Contributor Author

Would this PR by @adithyanotfound provide any insights?

@palisadoes
Copy link
Contributor Author

@vasujain275 Why do you say the build process is broken? Can you create an issue for someone else to try to fix it?

@prayanshchh
Copy link
Contributor

Would this PR by @adithyanotfound provide any insights?

Yes this helps, I will start my work on this in two days, have got end sem exams

@prayanshchh
Copy link
Contributor

am unassigning myself from the issue due to lack of progress

@prayanshchh prayanshchh removed their assignment Dec 14, 2024
@PurnenduMIshra129th
Copy link

@palisadoes plz assign me

@PurnenduMIshra129th
Copy link

@palisadoes what is the load average if the api runs without docker means what is the performance . I need this because i will only focus to improve to docker performance.If not then i have to use profiler to measure what is the exact issue is it related to docker container or in code sue unOptimized query.

@PurnenduMIshra129th
Copy link

PurnenduMIshra129th commented Dec 17, 2024

@palisadoes for now i have done limits it cpu and memory usage . Also added the multistage build and used one light weight image . But i think this will handle upto a specific user . But To handle it effectivly can i use kubernatives or any other services to handle the load . So it will scale the pods if load increase and reduce the cpu usage and improve the performance.If not does the vps server where the container is hosted can it provides this mechanism. And one doubt is how i give more load to this api because at the time of testing l am the only user .

@vasujain275
Copy link
Contributor

vasujain275 commented Dec 17, 2024

@palisadoes for now i have done limits it cpu and memory usage . Also added the multistage build and used one light weight image . But i think this will handle upto a specific user . But To handle it effectivly can i use kubernatives or any other services to handle the load . So it will scale the pods if load increase and reduce the cpu usage and improve the performance.If not does the vps server where the container is hosted can it provides this mechanism.

  1. We don't need k8s
  2. Multistage builds and lightweight base image will not help, we already have multi stage builds with alpine images. The main issue is our build process.
  3. @palisadoes Due to my end semester exams right now I am not able to create that Graphql build Error Issue that is the main performance blocker on this. I will get to in 2-3 days once my exams end. Sorry for the delay.
  4. I think we should close the docker performance related issues as they create unnecessary confusion. Our docker images are well optimised. The main issue is that we are running our api in dev mode in them, once the build is fixed we can modify the docker files to see the performance improvements.

@PurnenduMIshra129th
Copy link

PurnenduMIshra129th commented Dec 17, 2024

Build related issue means i don't get means u are saying about unnecessary node modules or something like this are in build at the time of building the docker image there which are causing the issue. I need futher calrity. And in above u commented u are not able to run npm run start it is working fine because api service is starting

@palisadoes
Copy link
Contributor Author

@palisadoes for now i have done limits it cpu and memory usage . Also added the multistage build and used one light weight image . But i think this will handle upto a specific user . But To handle it effectivly can i use kubernatives or any other services to handle the load . So it will scale the pods if load increase and reduce the cpu usage and improve the performance.If not does the vps server where the container is hosted can it provides this mechanism.

1. We don't need k8s

2. Multistage builds and lightweight base image will not help, we already have multi stage builds with alpine images. The main issue is our build process.

3. @palisadoes Due to my end semester exams right now I am not able to create that Graphql build Error Issue that is the main performance blocker on this. I will get to in 2-3 days once my exams end. Sorry for the delay.

4. I think we should close the docker performance related issues as they create unnecessary confusion. Our docker images are well optimised. The main issue is that we are running our api in dev mode in them, once the build is fixed we can modify the docker files to see the performance improvements.

OK.

@PurnenduMIshra129th
Copy link

@palisadoes i run a load test on the server with docker and with out docker on the configuration of duration of 30 sec and 2 req/sec and found means total of 60 request will be made in 30 sec in this scenerio both have equal successRate . But when i run the same test for same duration but with different request rate like 5 req/sec means 150 request in 30 sec got the result of slightly better performance of server with out docker . But the thing is server can't handle 150 request in 30 sec as many of request is under processing and not completed the request out of this only 40 request is successful.And if u want run the docker on low end service for a small user base like in 60 sec it makes 50 to 60 (considerable factor like medicore device 4gb of ram and 4core ) it will handle the request easily if talwa-api will reduce its cpu excessive task and if we limit the cpu usage also it will handle but some slowness will be there in this scenerio. What u say?

@palisadoes
Copy link
Contributor Author

@PurnenduMIshra129th please coordinate with @vasujain275

There appears to be multiple causes. The application is clearly over using resources.

Here is additional information.

@PurnenduMIshra129th
Copy link

@vasujain275 yes u are correct build process is broken . After build it is not working properly . Also when i try to run npm run prod it is not running gives multiple error. U have any thoughts on this ? should we have use import instead of require.

@palisadoes
Copy link
Contributor Author

Here is the issue. The importation process must have no errors no matter how many times it's run. The data must be valid.

@palisadoes
Copy link
Contributor Author

Can someone open issues to fix the localhost references in the Admin app that's not a test or docker file so that we don't get the localhost related errors in the web console. This is what's there.

src/index.tsx://     url: 'ws://localhost:4000/subscriptions',
src/components/AddOn/support/services/Plugin.helper.ts:    const result = await fetch(`http://localhost:${process.env.PORT}/store`);
src/components/AddOn/support/services/Plugin.helper.ts:    const result = await fetch(`http://localhost:3005/installed`);
src/App.tsx:    const result = await fetch(`http://localhost:3005/installed`);

@palisadoes
Copy link
Contributor Author

How can we make the home page load faster?

@palisadoes
Copy link
Contributor Author

There is a serious flaw in the design of Talawa Admin.

  1. GraphQL queries are made from the client directly to the API
  2. In some cases they are made through the admin server

This can cause XSS errors. It's also not clear how authorization is handled in this heterogeneous scenario.

To get around the issue, the API and Admin servers need to run on the same IP address. This is not practical. The apps will often reside on separate devices.

The solution seems to be only allowing connections to the API from the Admin app and not web clients. Web clients should only talk to the web app.

Is this a best practice? What is advisable?

@adithyanotfound
Copy link

How can we make the home page load faster?

Maybe we can use lazy loading.

@CHIRANTH-24
Copy link

Can someone open issues to fix the localhost references in the Admin app that's not a test or docker file so that we don't get the localhost related errors in the web console. This is what's there.

src/index.tsx://     url: 'ws://localhost:4000/subscriptions',
src/components/AddOn/support/services/Plugin.helper.ts:    const result = await fetch(`http://localhost:${process.env.PORT}/store`);
src/components/AddOn/support/services/Plugin.helper.ts:    const result = await fetch(`http://localhost:3005/installed`);
src/App.tsx:    const result = await fetch(`http://localhost:3005/installed`);

@palisadoes
I would like to open the issue.
The env is of production right?

@palisadoes
Copy link
Contributor Author

palisadoes commented Feb 13, 2025

@CHIRANTH-24

  1. This isn't related to docker at all.
  2. We need to ensure that these files in the develop and develop-postgres files in Talawa-Admin don't reference localhost.
  3. The setup script and .env.sample must not be updated. This is handled by the setup script.

@PurnenduMIshra129th
Copy link

@palisadoes mongo db data importaion error can be solved . Previously i have solved that issue it is related to it is not able to connect the mongo outside the docker enviroment . I will again then make a pr for this issue.

@PurnenduMIshra129th
Copy link

@palisadoes @vasujain275 @varshith257 can you tell me the requirement why this much entry point for init mongo is there inside the compose file this much entry point is really important always in my case it is showing error .If you tell the requirments what our exact requirment then i can try different process this much entry point and init mongo.sh scripts are creating issue . We can modify some coding style to work it as accurate. It is very confusing.And for replica set docker provides some commands for this feature.It is really helpful for further.

@PurnenduMIshra129th
Copy link

@palisadoes it is failing due to window writing style which is not supportable in unix we have to fix this . found some soultion regarding this before running the script we have to convert it to its compatiable version.

@palisadoes
Copy link
Contributor Author

@PurnenduMIshra129th

Please work with @prayanshchh and @VanshikaSabharwal on the solution via slack.

@VanshikaSabharwal
Copy link

Yes @palisadoes i saw that this 2 below errors and solved them

  • Api cannot be accessed as we have to use docker-compose.dev.yaml for api and docker-compose.prod.yaml for admin. So, i updated the code with this. in app.tsx of api to let it authorize all origins.

Previous Error while accessing http://localhost:4000:
Image

After:
Image
Image

Before and After Code(talawa-api) -> app.tsx:

Image

  • This code is temporary and we will have to update it due to security concerns later.

  • Next thing i did is that i made update in both docker files of api and admin by adding some code which when we previously added was failing and giving this error due to improper formating and linting issues.

Previous Error:
services.app.networks.talawa-shared-network Additional property external is not allowed

Before and After Code(talawa-api):

Image

Image

Before and After Code(talawa-admin):

Image

Result:

Image

  • Because i made a network between api and admin successfully. Now they can communicate with each other and are accessible . Before they were not.

Previous Error while trying to access api from admin:

Image

@VanshikaSabharwal
Copy link

@palisadoes @prayanshchh I also found this that when i went inside the docker mongo image i saw that repl set is created and the names of db are imported but not the data from sample data. The data inside venues is empty for some reason .

Image
Image

@VanshikaSabharwal
Copy link

@palisadoes
Conclusion:

  • Work Done

    • Both docker files of API and ADMIN are running fine and are accessible both seperately and together.
    • Connection made between API and ADMIN by updating docker compose file.
  • To Do

    • Data is not really getting imported in mongo db even after creation of repl set and only names of the database is created but those db are empty. This is why we see this error User not found because the data does not exist to login/signup.
    • We are using docker-compose.dev.yaml for API and docker-compose.prod.yaml for ADMIN due to which we see Unauthorized access error. We either have to use .prod or .dev to maintain a consistency. If we continue to use .prod at one and .dev at other than there will be several errors occuring in future.

@VanshikaSabharwal
Copy link

@palisadoes If we can fix the data import which i think @prayanshchh is doing somewhere and have a consistent docker compose file either .dev or .prod then we can proceed with deployment and see if there is any more error or not.

@prayanshchh
Copy link
Contributor

prayanshchh commented Feb 14, 2025

  1. @VanshikaSabharwal I am working on data import, but even with present config u can't import twice as it shows duplicate key error but if you do it the first time, that would work normally although this won't happen once my PR get's merged but for you this still shouldn't be a problem.

  2. did you try using docker dev file for both api and admin, we still have issues I shared the issue link on slack for reference

@PurnenduMIshra129th
Copy link

@prayanshchh @VanshikaSabharwal is the data import problem is for windows line style ending.And in any part do you need any help?

@prayanshchh
Copy link
Contributor

@prayanshchh @VanshikaSabharwal is the data import problem is for windows line style ending.And in any part do you need any help?

no the problem was with duplicate key error, error was when we run import command when data is already present. We just have to clean db before every import

@prayanshchh
Copy link
Contributor

prayanshchh commented Feb 14, 2025

@prayanshchh @VanshikaSabharwal is the data import problem is for windows line style ending.And in any part do you need any help?

the bigger issue is that even with data present in db, login is still not working, trying to debug that.

@VanshikaSabharwal
Copy link

@prayanshchh did you try to actually go inside the mongo container and see if there is data on not?

@prayanshchh
Copy link
Contributor

yes I shared you ss in slack

@palisadoes
Copy link
Contributor Author

The server is working now, but the develop sample data seems to be either corrupted or the code is faulty. You can't manage organizations as the SuperAdmin

other.mp4

@PurnenduMIshra129th
Copy link

@palisadoes as this issue is related to docker efficiency if this feature is checked and working properly on other system so can i unassign my self from this issue? because i am not able to contribute on new issue .

@palisadoes
Copy link
Contributor Author

Closing

@PurnenduMIshra129th
Copy link

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet