Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak ? #2933

Open
2 tasks done
NicoAdrian opened this issue Oct 9, 2024 · 9 comments
Open
2 tasks done

Possible memory leak ? #2933

NicoAdrian opened this issue Oct 9, 2024 · 9 comments
Labels

Comments

@NicoAdrian
Copy link

NicoAdrian commented Oct 9, 2024

Prerequisites

Description

Memory usage keeps growing during the test, eventually causing my system to go out of RAM.
RAM increase stops when all the users have been spawned (I launch a test with 30k)
Stopping the test doesn't free the memory, nor launching another one.
Launching another one in the same conditions doesn't increase RAM however
I'm using FastHttpClient fyi

Command line

/usr/local/bin/locust --processes -1 --class-picker --web-host 0.0.0.0 --logfile=/var/log/locust

Locustfile contents

@task
def req(self):
    t0 = time()
    with self.client.get(self.url, headers={"User-Agent": self.ua}, catch_response=True) as resp:
        response_time = time() - t0
        if response_time > 2:
            resp.failure(f"Request took too long: {response_time:.3f}")
        if resp.status_code >= 400:
            self.errors += 1
            resp.failure(f"Got HTTP {resp.status_code}")
            if self.errors == 5:
                logging.warning("Too many errors, stopping user")
                raise StopUser
        else:
            self.errors = 0

Python version

3.11

Locust version

2.31.8

Operating system

Linux 6.1.97-104.177.amzn2023.x86_64

@NicoAdrian NicoAdrian added the bug label Oct 9, 2024
@AdityaS8804
Copy link

Hey,
How about we modify the FastHttpSession in fasthttp.py to explicitly call self.client.close() when the test stops. This will result in any remaining HTTP connections or data held in memory are freed.
I'd like to contribute to this issue. Please let me know if this is a valid approach. Your inputs are much appreciated.

@cyberw
Copy link
Collaborator

cyberw commented Oct 22, 2024

I think that makes a lot of sense. A user can never come back to life when it is stopped, so it should close its connection and clean up any resources asap.

Not sure exactly where to implement it though. on_stop isnt good, because it could be overridden in a subclass. __del__() could work but I'm not sure if it happens soon enough (and at that time we're probably close to closing the connection anyway). But you're welcome to try it out!

@NicoAdrian
Copy link
Author

Ok fine but how can 30k users take ~30Gb of RAM ? has anyone experienced this before ? I mean, is this the expected behaviour ?

@cyberw
Copy link
Collaborator

cyberw commented Oct 22, 2024

Oh no, that's definitely not the expected behaviour. If you can give me a minimal example that reproduces this I'll have a look. Can you see the same behaviour on any other platforms?

If you werent stopping/starting tons of users its unlikely to be resolved by explicitly closing sessions either.

There's one thing you might want to look into. This may create a lot of unique failures, which is bad because they are stored individually (probably not 30GB of data, but still :)
resp.failure(f"Request took too long: {response_time:.3f}")

@NicoAdrian
Copy link
Author

NicoAdrian commented Oct 22, 2024

Oh no, that's definitely not the expected behaviour. If you can give me a minimal example that reproduces this I'll have a look. Can you see the same behaviour on any other platforms?

If you werent stopping/starting tons of users its unlikely to be resolved by explicitly closing sessions either.

There's one thing you might want to look into. This may create a lot of unique failures, which is bad because they are stored individually (probably not 30GB of data, but still :) resp.failure(f"Request took too long: {response_time:.3f}")

Can confirm I don't stop a lot of users (like a dozen, among 30k). I will try to comment the "request took too long line", if that helps.
Here is my, somewhat (edited, because of business issues), full locustfile.py:

EDIT: I can't test this on other platforms, just Linux (centos)

import datetime
import logging
import re
from random import random
from time import sleep, time
from urllib.parse import quote

from locust import FastHttpUser, constant_pacing, events, task
from locust.exception import StopUser

PROXY_HOST = "someproxy.net"
PROXY_PORT = 8080

@events.init_command_line_parser.add_listener
def on_init_command_line_parser(parser):
    parser.add_argument("--test-id", default=datetime.datetime.now().strftime("%Y-%m-%d %H:%M"), help="Test ID")
    parser.add_argument("--env", default="blue", choices=["blue", "green"], help="Environment (blue/green)")
    parser.add_argument("--use-proxy", action="store_true")



class BaseUser(FastHttpUser):
    abstract = True
    network_timeout = 15

    def __init__(self, environment):
        if environment.parsed_options.use_proxy is True:
            self.proxy_host = PROXY_HOST
            self.proxy_port = PROXY_PORT
        super().__init__(environment)
        self.errors = 0
        self.host = self.host.format(env=environment.parsed_options.env)
        self.ua = f"Mozilla/5.0 Test_perf_test_id_{environment.parsed_options.test_id}__{int(random() * 10**16)}"

    @task
    def req(self):
        t0 = time()
        with self.client.get(self.url, headers={"User-Agent": self.ua}, catch_response=True) as resp:
            response_time = time() - t0
            if response_time > 2:
                resp.failure(f"Request took too long: {response_time:.3f}")
            if resp.status_code >= 400:
                self.errors += 1
                resp.failure(f"Got HTTP {resp.status_code}")
                if self.errors >= 5:
                    logging.warning("Too many errors, stopping user")
                    raise StopUser
            else:
                self.errors = 0


class DashUser(BaseUser):
    abstract = True
    wait_time = constant_pacing(2)

    def on_start(self):
        sleep(random())
        with self.client.get(
            f"{self.host}?{self.query}",
            headers={"User-Agent": self.ua},
            allow_redirects=False,
            catch_response=True,
        ) as resp:
            if resp.status_code != 302:
                resp.failure(f"on_start failed: {resp.status_code}")
                raise StopUser
            else:
                self.url = f"{self.host}/" + resp.headers["Location"]


class SomeUser(DashUser):
    # just strings
    query = "foo=bar"
    pass

@cyberw
Copy link
Collaborator

cyberw commented Oct 22, 2024

I need you to further narrow it down. Remove everything in that locustfile not needed to reproduce the issue. does it happen with just a basic FastHttpUser with a single request? If there is no problem then, keep adding stuff until you see it again.

I’m assuming there are no errors logged?

@github-staff github-staff deleted a comment Oct 29, 2024
@andreabisello
Copy link

sorry for the off-topic, what is StopUser? why i can't find anything in docs.locust.io searching for StopUser other than a reference in the changelog? Raising a StopUser (that i have found in from locust.exception import StopUser) it's a good way to destroy users when not required anymore?
thankyou.

@flbraun
Copy link

flbraun commented Nov 12, 2024

Just wanted to confirm that I've also observed a memory leak with FastHttpClient back in November 2023. We have a "special" test where we're creating a new FastHttpSession for every task to include handshake overhead in the test.

My memory on this is a bit foggy because it was such a long time ago, but I think it was the response body that was never properly discarded. Fortunately I was able to triage the leak with a manual invocation of the garbage collector.

Maybe these are some clues to look into? I didn't have time to properly debug it back then.

Minimal code example:

import gc

from locust import FastHttpUser
from locust.contrib.fasthttp import FastHttpSession


class MyUser(FastHttpUser):
    host = 'https://jsonplaceholder.typicode.com'

    @task
    def get_todos(self):
        new_session = FastHttpSession(self.environment, self.host, self)

        with new_session.request('GET', '/todos/1', catch_response=True) as response:
            if response.status_code != HTTPStatus.OK:
                response.failure('unexpected status code')
            else:
                response.success()

        # manually invoke GC to discard new_session
        gc.collect()

@NicoAdrian
Copy link
Author

I need you to further narrow it down. Remove everything in that locustfile not needed to reproduce the issue. does it happen with just a basic FastHttpUser with a single request? If there is no problem then, keep adding stuff until you see it again.

I’m assuming there are no errors logged?

No errors logged.

sorry for the off-topic, what is StopUser? why i can't find anything in docs.locust.io searching for StopUser other than a reference in the changelog? Raising a StopUser (that i have found in from locust.exception import StopUser) it's a good way to destroy users when not required anymore? thankyou.

Yea it just kills a user, so it doesn't request anymore for this test session.

Just wanted to confirm that I've also observed a memory leak with FastHttpClient back in November 2023. We have a "special" test where we're creating a new FastHttpSession for every task to include handshake overhead in the test.

My memory on this is a bit foggy because it was such a long time ago, but I think it was the response body that was never properly discarded. Fortunately I was able to triage the leak with a manual invocation of the garbage collector.

Maybe these are some clues to look into? I didn't have time to properly debug it back then.

Minimal code example:

import gc

from locust import FastHttpUser
from locust.contrib.fasthttp import FastHttpSession


class MyUser(FastHttpUser):
    host = 'https://jsonplaceholder.typicode.com'

    @task
    def get_todos(self):
        new_session = FastHttpSession(self.environment, self.host, self)

        with new_session.request('GET', '/todos/1', catch_response=True) as response:
            if response.status_code != HTTPStatus.OK:
                response.failure('unexpected status code')
            else:
                response.success()

        # manually invoke GC to discard new_session
        gc.collect()

I will try to manually trigger GC as you did. Will keep you posted, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants