You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, document chunks are stored individually into our vector database (PGVector), i.e. the only relationship we record is the one between a chunk and its original document.
We should expand this to extract the document layout (headers, footers, table, image, caption, …) and the relationships (chunk --> page --> file, previous_chunk --> chunk --> next_chunk, …) and store them into a database, see our scheme.
The text was updated successfully, but these errors were encountered:
jacopo-chevallard
changed the title
Extract and save document/chunk structure and relationships
Extract and store document/chunk structure and relationships
Nov 4, 2024
Currently, document chunks are stored individually into our vector database (PGVector), i.e. the only relationship we record is the one between a chunk and its original document.
We should expand this to extract the document layout (headers, footers, table, image, caption, …) and the relationships (chunk --> page --> file, previous_chunk --> chunk --> next_chunk, …) and store them into a database, see our scheme.
The text was updated successfully, but these errors were encountered: