feat: Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction #904

chandrasekharan-zipstack · 2024-12-18T07:52:38Z

What

Updated env to reduce extraction time from ~8.3 hours to ~1 hour

Why

After move to LLMW v2, multi page extractions of 1500 pages should be done in ~1 hour

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, reduced extraction time only for LLMW v2

Env Config

# ~60 mins (assuming it'll be enough to process 1500 pages with LLMW v2)
ADAPTER_LLMW_MAX_POLLS=120

Related Issues or PRs

Notes on Testing

No explicit testing / validation done - logic based on these envs work though

Checklist

I have read and understood the Contribution Guidelines.

sonarqubecloud · 2024-12-18T07:53:12Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2024-12-18T07:53:13Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{9}}$$	$$\textcolor{#23d18b}{\tt{9}}$$

ritwik-g · 2024-12-18T08:18:09Z

backend/sample.env

-# 500 mins to allow 1500 (max pages limit) * 20 (approx time in sec to process a page)
-ADAPTER_LLMW_MAX_POLLS=1000
+# ~60 mins (assuming it'll be enough to process 1500 pages with LLMW v2)
+ADAPTER_LLMW_MAX_POLLS=120


@chandrasekharan-zipstack but what about for the V1? Doesn't it use the same ENV?

@ritwik-g you're right, v1 uses the same envs. Shall we discourage support for v1 by changing this env (reducing the time / max possible pages) and instead enforce support with v2 for more number of pages?
Realistically speaking - I doubt if any user has such large extraction times.

Worst case,

either we'll have to let this env be and take action after we sunset v1

or introduce a new set of envs for v2 and update that (involves changes in the SDK, so I'm not a fan of this)

harini-venkataraman · 2024-12-19T11:05:30Z

backend/sample.env

-# 500 mins to allow 1500 (max pages limit) * 20 (approx time in sec to process a page)
-ADAPTER_LLMW_MAX_POLLS=1000
+# ~60 mins (assuming it'll be enough to process 1500 pages with LLMW v2)
+ADAPTER_LLMW_MAX_POLLS=120


@chandrasekharan-zipstack Will this not cause timeout if it exceeds max 15min time which we have set in gunicorn ?

@harini-venkataraman the timeout exists only for extractions that happens with a web UI. This large setting is mainly ideal for async pipeline based extractions where such gunicorn timeouts will not play a role

Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction

2361a14

chandrasekharan-zipstack requested review from ritwik-g and a team December 18, 2024 07:52

chandrasekharan-zipstack self-assigned this Dec 18, 2024

chandrasekharan-zipstack requested review from pk-zipstack and removed request for a team December 18, 2024 07:52

ritwik-g reviewed Dec 18, 2024

View reviewed changes

harini-venkataraman reviewed Dec 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction #904

feat: Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction #904

chandrasekharan-zipstack commented Dec 18, 2024

sonarqubecloud bot commented Dec 18, 2024

github-actions bot commented Dec 18, 2024

ritwik-g Dec 18, 2024

chandrasekharan-zipstack Dec 19, 2024

harini-venkataraman Dec 19, 2024

chandrasekharan-zipstack Dec 19, 2024

feat: Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction #904

Are you sure you want to change the base?

feat: Updated ADAPTER_LLMW_MAX_POLLS to 120 for 1 hour extraction #904

Conversation

chandrasekharan-zipstack commented Dec 18, 2024

What

Why

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Env Config

Related Issues or PRs

Notes on Testing

Checklist

sonarqubecloud bot commented Dec 18, 2024

Quality Gate passed

github-actions bot commented Dec 18, 2024

ritwik-g Dec 18, 2024

Choose a reason for hiding this comment

chandrasekharan-zipstack Dec 19, 2024

Choose a reason for hiding this comment

harini-venkataraman Dec 19, 2024

Choose a reason for hiding this comment

chandrasekharan-zipstack Dec 19, 2024

Choose a reason for hiding this comment