Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Jest detects open handles #928

Open
bergundy opened this issue Oct 17, 2022 · 22 comments
Open

[Bug] Jest detects open handles #928

bergundy opened this issue Oct 17, 2022 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@bergundy
Copy link
Member

Getting this warning when running a simple test with the test environment:

 $ ./node_modules/.bin/jest --detectOpenHandles

...
Jest has detected the following 1 open handle potentially keeping Jest from exiting:

  ●  neon threadsafe function
...

Code for reproduction:

import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker, Runtime, DefaultLogger, LogEntry } from '@temporalio/worker';
import { someWorkflow } from './workflows';

let testEnv: TestWorkflowEnvironment;

beforeAll(async () => {
  // Use console.log instead of console.error to avoid red output
  // Filter INFO log messages for clearer test output
  Runtime.install({
    logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
  });

  testEnv = await TestWorkflowEnvironment.createTimeSkipping();
});

afterAll(async () => {
  await testEnv?.teardown();
});

test('some random test', async () => {
  const { client, nativeConnection } = testEnv;
  const worker = await Worker.create({
    connection: nativeConnection,
    taskQueue: 'test',
    workflowsPath: require.resolve('./workflows'),
    activities: {
      // ...
    },
  });

  await worker.runUntil(async () => {
    const result = await client.workflow.execute(someWorkflow, {
      args: [],
      workflowId: 'money-transfer-test-workflow',
      taskQueue: 'test',
    });

    expect(result).toEqual('test');
  });
});
@bergundy bergundy added the bug Something isn't working label Oct 17, 2022
@lorensr
Copy link
Contributor

lorensr commented Oct 17, 2022

Val did some investigating in temporalio/samples-typescript#189

@mjameswh
Copy link
Contributor

Test case can be reduced to this:

test(`core-bridge registerErrors() doesn't leave open handles`, async () => {
    const native = require("@temporalio/core-bridge");

    native.registerErrors({
        IllegalStateError: () => ({}),
        ShutdownError: () => ({}),
        TransportError: () => ({}),
        UnexpectedError: () => ({}),
        GracefulShutdownPeriodExpiredError: () => ({}),
    });
});

The issue is specifically with registerErrors(). For example, that one doesn't trigger the error message:

test(`core-bridge getTimeOfDay() doesn't leave open handles`, async () => {
    const native = require("@temporalio/core-bridge");
    native.getTimeOfDay();
}

@mjameswh
Copy link
Contributor

The issue relates to usage of the .root() function to keep references to error constructors from Rust land.

pub fn register_errors(mut cx: FunctionContext) -> JsResult<JsUndefined> {
    // ...

    let mapping = cx.argument::<JsObject>(0)?;
    let shutdown_error = mapping
        .get::<JsFunction, _, _>(&mut cx, "ShutdownError")?
        .root(&mut cx);   // <----

Neon's Root object internally register a ThreadSafeFunction, which is a special type of resource provided by Node's napi allowing native code to call a JavaScript function from any thread. It is that ThreadSafeFunction resource that Jest detects as being left open.

@mjameswh
Copy link
Contributor

mjameswh commented Nov 28, 2022

Opened a ticket in Neon: neon-bindings/neon#948

@igorsechyn
Copy link

We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests

@mandriv
Copy link

mandriv commented Feb 28, 2023

Is there any known workaround?

@Irvenae
Copy link

Irvenae commented Feb 28, 2023

Since the new version we are hitting an issue with Jest on CircleCI as well.
Running into Too long with no output (exceeded 10m0s): context deadline exceeded

@mjameswh mjameswh changed the title [Bug] Jest detects open file handles [Bug] Jest detects open handles Feb 28, 2023
@mjameswh
Copy link
Contributor

Is there any known workaround?

I'm not aware of any workaround regarding Jest's warning about open handle. This warning message can simply be ignored. This may also cause a 10 seconds delay on process termination, which should generally not be that much of an issue.

@mjameswh
Copy link
Contributor

mjameswh commented Feb 28, 2023

Since the new version we are hitting an issue with Jest on CircleCI as well. Running into Too long with no output (exceeded 10m0s): context deadline exceeded

We hit the same issue in our test setup. Weirdly it causes issues in CircleCI as jest just won't exit despite using --forceExit flag. We upgraded to the latest version of sdk and the issue still persists. It is intermittent, so it is hard to pinpoint it to temporal tests, but this behaviour started around the same time as we added temporal tests

I don't think this is related to the "detected open handle" bug. Wondering if this might be related, though there's unfortunately very little context to work with.

@mandriv @Irvenae Let assume this is a different issue... Could you please provide more details on these situations? Here are some ideas info that might help figure this out, but feel free to add whatever that you think could be useful:

  • Temporal SDK/Jest/Node versions?
  • Collecting coverage metrics?
  • Is that behaviour happening consistently or is it intermittent?
  • If you run these tests locally using the very same command as when these tests are being run in CircleCI, does execution completes normally and process stops by itself?
  • Do you very get long period of time where the process stop outputting anything? How long
  • Can you still reproduce the error with a reduced test suite?

@Irvenae
Copy link

Irvenae commented Mar 1, 2023

  • Temporal SDK v1.6.0, Jest 29.3.0, node v16.15.0
  • No coverage metrics
  • intermittent
  • Locally tests complete (tried it 10 times)

This is the log we get in circleCI:

2023-02-28T09:49:27.110Z [WARN] This program is running inside a containerized environment with a memory constraint (eg. docker --memory 70226m). Node itself does not consider this memory constraint in how it manages its heap memory. There is consequently a high probability that the process will crash due to running out of memory. To increase reliability, we recommend adding '--max-old-space-size=52669' to your node arguments. Refer to https://docs.temporal.io/application-development/worker-performance for more advice on tuning your workers.
2023-02-28T09:49:27.145998Z  INFO temporal_sdk_core::ephemeral_server: Downloading https://temporal.download/assets/temporalio/sdk-java/releases/download/v1.17.0/temporal-test-server_1.17.0_linux_amd64.tar.gz to /tmp/temporal-test-server-sdk-typescript-1.6.0
2023-02-28T09:49:27.871Z [INFO] Creating worker {
  options: {
    namespace: 'default',
    identity: '454@f5f8c41c3a04',
    shutdownGraceTime: '10s',
    maxConcurrentActivityTaskExecutions: 100,
    maxConcurrentLocalActivityExecutions: 100,
    enableNonLocalActivities: true,
    maxConcurrentWorkflowTaskExecutions: 100,
    stickyQueueScheduleToStartTimeout: '5m',
    maxHeartbeatThrottleInterval: '60s',
    defaultHeartbeatThrottleInterval: '30s',
    isolateExecutionTimeout: '4294967295ms',
    workflowThreadPoolSize: 1,
    maxCachedWorkflows: 1761,
    enableSDKTracing: false,
    showStackTraceSources: true,
    reuseV8Context: true,
    debugMode: true,
    interceptors: { activityInbound: [Array], workflowModules: [Array] },
    sinks: { logger: [Object] },
    workflowsPath: '/root/repo/packages/temporal-features/src/tests/allWorkflows.ts',
    bundlerOptions: { webpackConfigHook: [Function: webpackConfigHook] },
    activities: {
      cancelJob: [Getter],
      createItem: [Getter],
      createItemWithData: [Getter],
      deleteItem: [Getter],
      findItems: [Getter],
      findItemsInAllVersions: [Getter],
      getFileJson: [Getter],
      getItem: [Getter],
      persistData: [Getter],
      setItem: [Getter],
      createOrMemoizeJob: [Getter],
      getJobResult: [Getter],
      getResult: [Getter],
      queryWorkflow: [Getter],
      signalWithStartWorkflowForItem: [Getter],
      signalWorkflow: [Getter],
      startAndQueryWorkflowForItem: [Getter]
    },
    taskQueue: 'test',
    connection: NativeConnection {
      nativeClient: [External: 7fe4b80045b0],
      referenceHolders: Set(0) {}
    },
    shutdownGraceTimeMs: 10000,
    stickyQueueScheduleToStartTimeoutMs: 300000,
    isolateExecutionTimeoutMs: 4294967295,
    maxHeartbeatThrottleIntervalMs: 60000,
    defaultHeartbeatThrottleIntervalMs: 30000,
    loadedDataConverter: {
      payloadConverter: [DefaultPayloadConverter],
      failureConverter: [DefaultFailureConverter],
      payloadCodecs: []
    }
  }
}
2023-02-28T09:49:36.018Z [INFO] asset workflow-bundle-a2ca801512c61451083d.js 22.2 MiB [emitted] [immutable] (name: main)
2023-02-28T09:49:36.018Z [INFO] runtime modules 1.25 KiB 6 modules
2023-02-28T09:49:36.018Z [INFO] modules by path ./packages/ 4.44 MiB 726 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./node_modules/ 3.54 MiB 293 modules
2023-02-28T09:49:36.019Z [INFO] modules by path ./tooling/tool-version-sync/lib/*.js 4.99 KiB
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/index.js 1.2 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/toolVersions.js 1.54 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO]   ./tooling/tool-version-sync/lib/toolMemoryLimits.js 2.25 KiB [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_payload_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] __temporal_custom_failure_converter (ignored) 15 bytes [built] [code generated]
2023-02-28T09:49:36.019Z [INFO] webpack 5.75.0 compiled successfully in 7586 ms
2023-02-28T09:49:36.116Z [INFO] Workflow bundle created { size: '22.12MB' }
2023-02-28T09:49:37.170023Z  INFO temporal_sdk_core::worker: Initializing worker task_queue=test namespace=default
2023-02-28T09:49:37.172Z [INFO] Worker state changed { state: 'RUNNING' }
(node:454) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)

Too long with no output (exceeded 10m0s): context deadline exceeded

In this run it only runs 1 test file. In this test file we only create 1 timeSkipping test env, which we reuse for 12 tests.
I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question.
The MaxListenersExceededWarning can be ignored, sth in our code we need to resolve.

A successful run would have 3 min later the Worker state changed { state: 'STOPPING' } and etc. We only print info logs I will try to reproduce with debug logs as well.

@Irvenae
Copy link

Irvenae commented Mar 1, 2023

I might have been on the wrong track, I today reran this 10 times on CircleCI but no failure. I know for sure this happens when we have multiple tests running, but there it could be OOM as well (so setting node options and/or going to a single Jest worker to see if it is resolved). Anyway, I am also further investigating to potentially pinpoint it better.

@igorsechyn
Copy link

FWIW, we found that we were either running out CPU or Memory, when executing in CircleCI. The only thing that helped us is to reduce number of jest workers. we had not had a single failure since then.

@mjameswh
Copy link
Contributor

mjameswh commented Mar 1, 2023

I checked the memory and we only use at a max 25% Mem of the instance so OOM is out of the question.

Not sure if this is the issue here, but I personally recommend to always explicitly set Node's --max-old-space-size (see this page for some details).

Based on your log, it appears you are executing inside a containerized environment with a memory constraint. Node itself doesn't play great in such cases, because it doesn't know about those constraints and instead configure its heap allocation and garbage collection to based on the machine's total memory.

The opposite is also possible: by default, Node will configure its heap allocation limit and garbage collection algorithms to 25% of available memory, up to a limit of 4GB (assuming node 14 to 18, with 64 bits cpu). That means that your Node process may not be taking advantage of all the resources available to it, which could explain that it is not performing as it should.

Both cases can be resolved by explicitly setting and properly tuning Node's --max-old-space-size.

@NivOclear
Copy link

so any progress regarding this issue ? this not just a warning in our cicd its break the jest test and not passing them

@Irvenae
Copy link

Irvenae commented Apr 18, 2023

We have resolved it by using a bigger machine and reducing the number of jest workers and setting --max-old-space-size (taking into account the nr of workers).

I am now evaluating AVA tbh, because attaching the debugger is spotty with Jest.

@Kannndev
Copy link

I am not sure if this is relevant, but i will add here just to get others opinion as well,

When I run the test with createLocal it works but when i use createTimeSkipping test times out even after 60s. The example test is

import { DefaultLogger, LogEntry, Runtime, Worker } from '@temporalio/worker';

import { TestWorkflowEnvironment } from '@temporalio/testing';
import { WorkflowCoverage } from '@temporalio/nyc-test-coverage';
import { example } from './workflow';

describe('Example workflow with mocks', () => {
  let testEnv: TestWorkflowEnvironment;
  const workflowCoverage = new WorkflowCoverage();

  beforeAll(async () => {
    // Use console.log instead of console.error to avoid red output
    // Filter INFO log messages for clearer test output
    Runtime.install({
      logger: new DefaultLogger('WARN', (entry: LogEntry) => console.log(`[${entry.level}]`, entry.message)),
    });

    testEnv = await TestWorkflowEnvironment.createLocal();
  });

  afterAll(async () => {
    await testEnv?.teardown();
  });

  afterAll(() => {
    workflowCoverage.mergeIntoGlobalCoverage();
  });

  it('successfully completes the Workflow with a mocked Activity', async () => {
    const { client, nativeConnection } = testEnv;
    const taskQueue = 'test';

    const worker = await Worker.create(
      workflowCoverage.augmentWorkerOptions({
        connection: nativeConnection,
        taskQueue: 'test',
        workflowsPath: require.resolve('../workflows'),
        activities: {
          greet: async () => 'Hello, Temporal!',
        },
      })
    );

    await worker.runUntil(async () => {
      const result = await client.workflow.execute(example, {
        args: ['Temporal'],
        workflowId: 'test',
        taskQueue,
      });
      expect(result).toEqual('Hello, Temporal!');
    });
  });
});

Why is this behaviour happening ?,

@bergundy
Copy link
Member Author

When I run the test with createLocal it works but when i use createTimeSkipping test times out even after 60s. The example test is

Thanks for the feedback.

We haven't had the capacity to investigate this further for now, we'll need more work to understand what's going on here.

@jbsil
Copy link

jbsil commented Feb 1, 2024

We've found a workaround for this when running Workers using the TestWorkflowEnvironment in jest with --detectOpenHandles. This does not appear to work for activities run using the MockActivityEnvironment, where the open handles still persist.

the summary of the worker workaround is:

  1. when you create your TestWorkflowEnvironment, store it on global, e.g. global.TemporalEnvironment: TestWorkflowEnvironment
  2. as you Worker.create each of your workers, store the Worker (await the call to create) into something on global, e.g. global.workers: Worker[]
  3. as you worker.run() each of your workers, store the run promise as well into something on global, e.g. global.workerPromises: Promise[]
  4. setup a jest globalTeardown to clean up all of this stuff
  5. on each of the global.workers, call .shutdown(), which will cause the workers to (eventually) shut down. do not try to await this
  6. instead, wait for all of the worker run promises, e.g. await Promise.all(global.workerPromises)
  7. finally, teardown the TWE, e.g. await global.TemporalEnvironment.teardown()
    e.g. teardown.ts
export default async function teardown() {
  // ask each worker to start shutting down
  global.workers.forEach((worker) => {
    // do not wait for this, it is non-blocking
    worker.shutdown();
  });
  // the promises created by `Worker.run` will resolve once the worker has actually shutdown
  await Promise.all(global.workerPromises);
  // fyi - teardown the environment won't work until after the workers disconnect
  await global.TemporalEnvironment?.teardown();
}

To get your typing correct, you can declare global, but note that you have to var your variables, let and const wont work.
e.g.

declare global {
  var TemporalEnvironment: TestWorkflowEnvironment;
  var workers: Worker[];
  var workerPromises: Promise[];
}

We have found that the most "sane" configuration of this is to use a jest globalSetup to start the TestWorkflowEnvironment and all of the workers involved, because it more closely mirrors what will be happening in production (the server and workers will be running before a workflow payload arrives). But, you don't have to do that part.

In our worker repos, using Node 18 and temporalio/* @ 1.9.0, this closes all of our handles and lets jest exit gracefully.

We're still unable to work around the open handles from the MockActivityEnvironment, though, which is disappointing.

@jbsil
Copy link

jbsil commented Feb 1, 2024

It turns out that the handles opened by MockActivityEnvironment are in the Runtime. That means that you can
jest config

{
  ...,
  "globalTeardown": "tests/teardown.ts"
}

teardown.ts

import { Runtime } from '@temporalio/worker';
export default async function() {
  await Runtime.getInstance().shutdown();
}

and your open handles will get cleaned up. There is some delay (a few seconds) after the tests finish while the Runtime shuts down, but it stops jest from complaining!
Hurray!

@mjameswh
Copy link
Contributor

mjameswh commented Feb 2, 2024

Thanks a lot @jbsil for sharing your findings on this issue. It will certainly help a few of our users.

I have to admit that I'm very intrigued by why this solution actually resolves the symptom, as I'm pretty sure that this will not force unloading of Neon's global ThreadSafeFunction, but glad to know that it works. We're planning to dig into more into Jest related issues soon; I will experiment more with your workaround at that moment.

@ilijaNL
Copy link

ilijaNL commented May 4, 2024

This is still ongoing issue, sometimes it also takes sometime before everything is terminated according to jest runner.

@jamespsterling
Copy link

We are seeing this for MockActivityEnvironment,

Jest has detected the following 1 open handle potentially keeping Jest from exiting:

    neon threadsafe function

      34 |
      35 |   beforeEach(async () => {
    > 36 |     env = new MockActivityEnvironment({ attempt: 2 });
         |           ^
      37 |
      38 |     const compiledSubject = await compileTemplate({
      39 |       mergeFields: allMergeFields,

      at ../../../../node_modules/@temporalio/core-bridge/index.js:16:14
      at Function.create (../../../../node_modules/@temporalio/worker/src/runtime.ts:202:31)
      at Function.instance (../../../../node_modules/@temporalio/worker/src/runtime.ts:194:17)
      at Activity.makeActivityLogger (../../../../node_modules/@temporalio/worker/src/activity.ts:82:34)
      at new Activity (../../../../node_modules/@temporalio/worker/src/activity.ts:61:12)
      at new MockActivityEnvironment (../../../../node_modules/@temporalio/testing/src/index.ts:416:21)
      at Object.<anonymous> (src/lib/actions/emails/prepareEmailAndMessageData.spec.ts:36:11)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests