[Do not merge] When chunk_size=0, skip vector db #649

gaya3-zipstack · 2024-09-03T16:33:59Z

What

Current; implementation always uses the indexed nodes when fetching context for prompts. However, when chunk_size=0, since we have to send the entire context, we can directly send the extracted text instead of fetching the chunk from the vector db.

Why

This will improve response time for prompts when chunk_size=0 as vector db need not be accessed

How

When chunk_size=0, the context can be fetched from the extracted text present in the container file system

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Maybe. Testing required around prompt studio for chunk sizes = 0 and > 0. Suggested to test with plugins also.

Database Migrations

None

Env Config

NA

Relevant Docs

Related Issues or PRs

https://zipstack.atlassian.net/browse/UN-1418

SDK PR

[Do not merge] When chunk_size=0, skip vector db unstract-sdk#99

Dependencies Versions

Notes on Testing

Screenshots

Profile with Chunk_size=0
Manual indexing on a document. Here after indexing is completed, no nodes are added to the vector DB as shown

Prompt run on top of manual indexing. Here after prompt run, still no records in the vector db. But still, prompt answers are right as the context gets picked up from the extracted text and works fine.

Running a prompt before manual indexing (dynamic indexing would kick in).

Manually remove the extracted file after indexing. Run prompt. This gives an error saying the extracted file is missing

Now, do a manual re-indexing. Extracted file will be re-created. Then run prompt.

Profile with chunk_size =1024

Manual indexing on a document. Here after indexing is completed, nodes are added to the vector DB as shown

Prompt run on top of manual indexing. Prompt run works fine picking context from vector DB.

Running a prompt before manual indexing (dynamic indexing would kick in) as there are no records in vector db.

Dynamic indexing kicked in and prompt run worked fine

Manually remove the records from vector db

On running prompt, we see an error

Manually re-index. Run prompt again and prompt should work fine. Nodes added to vector DB.

Checklist

I have read and understood the Contribution Guidelines.

github-actions · 2024-09-03T16:34:28Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{9}}$$	$$\textcolor{#23d18b}{\tt{9}}$$

gaya3-zipstack · 2024-09-03T16:37:22Z

prompt-service/src/unstract/prompt_service/main.py

+                        RunLevel.RUN,
+                        "Extracted file not present.",
+                    )
+                    return APIError(message=msg)


@chandrasekharan-zipstack Currently this error is showing up as html. How do I throw the error in the right format?

Let's huddle for this, didn't get you here @gaya3-zipstack

chandrasekharan-zipstack · 2024-09-04T03:28:32Z

prompt-service/src/unstract/prompt_service/main.py

+                        RunLevel.RUN,
+                        "Extracted file not present.",
+                    )
+                    return APIError(message=msg)


Let's huddle for this, didn't get you here @gaya3-zipstack

chandrasekharan-zipstack · 2024-09-04T03:31:03Z

prompt-service/src/unstract/prompt_service/main.py

-                            f"{msg} {output[PSKeys.VECTOR_DB]} for doc_id {doc_id}"
+                try:
+                    # Read from extract_file_path and set that as context
+                    with open(extract_file_path) as file:


@gaya3-zipstack does this work because the volumes are shared between backend and prompt-service? Was this change tested against a tool run in a workflow or a pipeline?

sonarqubecloud · 2024-09-04T16:07:24Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

gaya3-zipstack added 3 commits August 30, 2024 22:33

Fix for not accessing vector db when chunk size=0

b438ba9

Merge remote-tracking branch 'origin' into fix/chunk-size-0

52dedb4

Skip vector db when chunk_size=0

fbb8a21

gaya3-zipstack requested review from Deepak-Kesavan, harini-venkataraman, chandrasekharan-zipstack and hari-kuriakose September 3, 2024 16:33

gaya3-zipstack mentioned this pull request Sep 3, 2024

[Do not merge] When chunk_size=0, skip vector db Zipstack/unstract-sdk#99

Draft

gaya3-zipstack commented Sep 3, 2024

View reviewed changes

chandrasekharan-zipstack reviewed Sep 4, 2024

View reviewed changes

gaya3-zipstack marked this pull request as draft September 4, 2024 09:33

gaya3-zipstack changed the title ~~When chunk_size=0, skip vector db~~ [Do not merge] When chunk_size=0, skip vector db Sep 4, 2024

Raise APIError

33695d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] When chunk_size=0, skip vector db #649

[Do not merge] When chunk_size=0, skip vector db #649

gaya3-zipstack commented Sep 3, 2024 •

edited

Loading

github-actions bot commented Sep 3, 2024

gaya3-zipstack Sep 3, 2024

chandrasekharan-zipstack Sep 4, 2024

chandrasekharan-zipstack Sep 4, 2024

chandrasekharan-zipstack Sep 4, 2024

sonarqubecloud bot commented Sep 4, 2024

[Do not merge] When chunk_size=0, skip vector db #649

Are you sure you want to change the base?

[Do not merge] When chunk_size=0, skip vector db #649

Conversation

gaya3-zipstack commented Sep 3, 2024 • edited Loading

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

github-actions bot commented Sep 3, 2024

gaya3-zipstack Sep 3, 2024

Choose a reason for hiding this comment

chandrasekharan-zipstack Sep 4, 2024

Choose a reason for hiding this comment

chandrasekharan-zipstack Sep 4, 2024

Choose a reason for hiding this comment

chandrasekharan-zipstack Sep 4, 2024

Choose a reason for hiding this comment

sonarqubecloud bot commented Sep 4, 2024

Quality Gate passed

gaya3-zipstack commented Sep 3, 2024 •

edited

Loading