Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/DocumentStore #2106

Merged
merged 61 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
114b185
datasource: initial commit
vinodkiran Mar 22, 2024
bc92e1b
datasource: datasource details and chunks
vinodkiran Mar 23, 2024
7a411f2
datasource: Document Store Node
vinodkiran Mar 23, 2024
3171170
Merge branch 'main' into FEATURE/Datasource
vinodkiran Mar 24, 2024
a0d323a
Merge branch 'main' into FEATURE/Datasource
vinodkiran Mar 26, 2024
c01d1f9
more changes
vinodkiran Mar 28, 2024
e3c96d1
Document Store - Base functionality
vinodkiran Mar 31, 2024
48987a1
Document Store Loader Component
vinodkiran Apr 1, 2024
b78a1d0
Document Store Loader Component
vinodkiran Apr 2, 2024
73ffa74
before merging the modularity PR
vinodkiran Apr 3, 2024
7a116e0
Merge branch 'main' into FEATURE/Datasource
vinodkiran Apr 3, 2024
e675e37
after merging the modularity PR
vinodkiran Apr 3, 2024
e3101ee
preview mode
vinodkiran Apr 4, 2024
3b21afd
initial draft PR
vinodkiran Apr 5, 2024
1c20a89
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 5, 2024
6fd01ae
fixes
vinodkiran Apr 5, 2024
bb4202b
minor updates and fixes
vinodkiran Apr 6, 2024
5229085
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 8, 2024
f873460
preview with loader and splitter
vinodkiran Apr 8, 2024
aaa9272
preview with credential
vinodkiran Apr 8, 2024
0c42894
show stored chunks
vinodkiran Apr 10, 2024
63945c5
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 11, 2024
14800af
preview update...
vinodkiran Apr 12, 2024
7412e76
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 12, 2024
1856369
edit config
vinodkiran Apr 12, 2024
5412c0e
save, preview and other changes
vinodkiran Apr 13, 2024
458f8b1
save, preview and other changes
vinodkiran Apr 13, 2024
ca52066
save, process and other changes
vinodkiran Apr 14, 2024
f6a0706
save, process and other changes
vinodkiran Apr 14, 2024
ccc5f0e
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 14, 2024
e5a86c5
Merge branch 'refs/heads/main' into FEATURE/Datasource
vinodkiran Apr 15, 2024
47924e3
alpha1 - for internal testing
vinodkiran Apr 16, 2024
43b90e5
rerouting urls
vinodkiran Apr 16, 2024
2914b0a
bug fix on new leader create
vinodkiran Apr 16, 2024
22163a5
pagination support for chunks
vinodkiran Apr 21, 2024
cf84342
delete document store
vinodkiran Apr 21, 2024
52b2d19
Merge branch 'main' into feature/Datasource
HenryHengZJ Apr 25, 2024
18e75fb
Update pnpm-lock.yaml
HenryHengZJ Apr 25, 2024
73ff491
Merge branch 'main' into feature/Datasource
HenryHengZJ Apr 26, 2024
c473195
doc store card view
HenryHengZJ Apr 27, 2024
bf22598
Update store files to use updated storage functions, Document Store T…
vinodkiran Apr 27, 2024
4fb1e72
ui changes
HenryHengZJ Apr 29, 2024
029f7cf
Merge branch 'main' into feature/Datasource
HenryHengZJ Apr 29, 2024
68037e4
add expanded chunk dialog, improve ui
HenryHengZJ Apr 30, 2024
b180f25
change throw Error to InternalError
HenryHengZJ Apr 30, 2024
5da5044
Bug Fixes and removal of subFolder, adding of view chunks for store
vinodkiran Apr 30, 2024
235a124
lint fixes
vinodkiran Apr 30, 2024
dae134d
Merge remote-tracking branch 'origin/FEATURE/Datasource' into FEATURE…
vinodkiran Apr 30, 2024
389fa88
merge changes
vinodkiran Apr 30, 2024
e2b4daa
DocumentStoreStatus component
vinodkiran Apr 30, 2024
56de0f8
ui changes for doc store
HenryHengZJ Apr 30, 2024
f073ef6
add remove metadata key field, add custom document loader
HenryHengZJ May 1, 2024
ba6b951
add chatflows used doc store chips
HenryHengZJ May 1, 2024
71324c5
add types/interfaces to DocumentStore Services
vinodkiran May 1, 2024
ac19f7f
document loader list dialog title bar color change
HenryHengZJ May 1, 2024
353af9a
update interfaces
HenryHengZJ May 1, 2024
85becf4
Whereused Chatflow Name and Added chunkNo to retain order of created …
vinodkiran May 2, 2024
9f462f6
Merge branch 'main' into feature/Datasource
HenryHengZJ May 2, 2024
1e0f964
use typeorm order chunkNo, ui changes
HenryHengZJ May 2, 2024
3308cb1
Merge branch 'main' into feature/Datasource
HenryHengZJ May 2, 2024
5301db5
Merge branch 'main' into feature/Datasource
HenryHengZJ May 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import { ICommonObject, IDatabaseEntity, INode, INodeData, INodeOptionsValue, INodeOutputsValue, INodeParams } from '../../../src/Interface'
import { DataSource } from 'typeorm'
import { Document } from '@langchain/core/documents'

class DocStore_DocumentLoaders implements INode {
label: string
name: string
version: number
description: string
type: string
icon: string
category: string
baseClasses: string[]
inputs: INodeParams[]
outputs: INodeOutputsValue[]
badge: string

constructor() {
this.label = 'Document Store'
this.name = 'documentStore'
this.version = 1.0
this.type = 'Document'
this.icon = 'dstore.svg'
this.badge = 'NEW'

this.category = 'Document Loaders'
this.description = `Load data from pre-configured document stores`
this.baseClasses = [this.type]
this.inputs = [
{
label: 'Select Store',
name: 'selectedStore',
type: 'asyncOptions',
loadMethod: 'listStores'
}
]
this.outputs = [
{
label: 'Document',
name: 'document',
description: 'Array of document objects containing metadata and pageContent',
baseClasses: [...this.baseClasses, 'json']
},
{
label: 'Text',
name: 'text',
description: 'Concatenated string from pageContent of documents',
baseClasses: ['string', 'json']
}
]
}

//@ts-ignore
loadMethods = {
async listStores(_: INodeData, options: ICommonObject): Promise<INodeOptionsValue[]> {
const returnData: INodeOptionsValue[] = []

const appDataSource = options.appDataSource as DataSource
const databaseEntities = options.databaseEntities as IDatabaseEntity

if (appDataSource === undefined || !appDataSource) {
return returnData
}

const stores = await appDataSource.getRepository(databaseEntities['DocumentStore']).find()
for (const store of stores) {
if (store.status === 'SYNC') {
const obj = {
name: store.id,
label: store.name,
description: store.description
}
returnData.push(obj)
}
}
return returnData
}
}

async init(nodeData: INodeData, _: string, options: ICommonObject): Promise<any> {
// if (options.isUpsert) {
const selectedStore = nodeData.inputs?.selectedStore as string
const appDataSource = options.appDataSource as DataSource
const databaseEntities = options.databaseEntities as IDatabaseEntity
const chunks = await appDataSource
.getRepository(databaseEntities['DocumentStoreFileChunk'])
.find({ where: { storeId: selectedStore } })

const finalDocs = []
for (const chunk of chunks) {
finalDocs.push(new Document({ pageContent: chunk.pageContent, metadata: JSON.parse(chunk.metadata) }))
}
return finalDocs
// }
//
// return {}
}
}

module.exports = { nodeClass: DocStore_DocumentLoaders }
15 changes: 15 additions & 0 deletions packages/components/nodes/documentloaders/DocumentStore/dstore.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
154 changes: 154 additions & 0 deletions packages/components/src/documentStoreProcessor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
import { CharacterTextSplitter, MarkdownTextSplitter, RecursiveCharacterTextSplitter } from 'langchain/text_splitter'
import path from 'path'
import { PDFLoader } from 'langchain/document_loaders/fs/pdf'
import { TextLoader } from 'langchain/document_loaders/fs/text'
import fs from 'fs'
import { TextSplitter } from 'langchain/dist/text_splitter'
import { DocxLoader } from 'langchain/document_loaders/fs/docx'

export class DocumentStoreProcessor {
constructor() {}
public async splitIntoChunks(id: string, config: any, uploadedFiles: any[]): Promise<object> {
let splitter: TextSplitter
switch (config.splitter) {
case 'character-splitter':
splitter = new CharacterTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50,
separator: config.separator ?? '\n'
})

break
case 'recursive-splitter':
splitter = new RecursiveCharacterTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})

break
case 'token-splitter':
break
case 'code-splitter':
splitter = RecursiveCharacterTextSplitter.fromLanguage(config.codeLanguage, {
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})
break
case 'html-to-markdown-splitter':
break
case 'markdown-splitter':
splitter = new MarkdownTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})
break
}

for (let i = 0; i < uploadedFiles.length; i++) {
let fileObj = uploadedFiles[i]
if (fileObj.status === 'NEW') {
const filename = fileObj.name
let loader = null
const blob = await fs.openAsBlob(fileObj.path)
switch (path.extname(filename)) {
case '.pdf':
loader = new PDFLoader(blob)
break
default:
loader = new TextLoader(blob)
break
}
// @ts-ignore
if (splitter && loader) {
const docs = await loader.loadAndSplit(splitter)
fileObj.chunks = docs
fileObj.totalChunks = docs.length
}
}
}

return { id: id, uploadedFiles: uploadedFiles }
}

public async split(config: any, fileObj: any): Promise<object> {
let splitter: TextSplitter
switch (config.splitter) {
case 'character-splitter':
splitter = new CharacterTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50,
separator: config.separator ?? '\n'
})

break
case 'recursive-splitter':
splitter = new RecursiveCharacterTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})

break
case 'token-splitter':
break
case 'code-splitter':
splitter = RecursiveCharacterTextSplitter.fromLanguage(config.codeLanguage, {
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})
break
case 'html-to-markdown-splitter':
break
case 'markdown-splitter':
splitter = new MarkdownTextSplitter({
chunkSize: config.chunkSize ?? 1000,
chunkOverlap: config.chunkOverlap ?? 50
})
break
}

const filename = fileObj.name
let loader = null
const blob = await fs.openAsBlob(fileObj.path)
switch (path.extname(filename)) {
case '.pdf': {
let legacy = config.pdfLegacyBuild
if (config.pdfUsage === 'perFile') {
loader = new PDFLoader(blob, {
splitPages: false,
pdfjs: () =>
// @ts-ignore
legacy ? import('pdfjs-dist/legacy/build/pdf.js') : import('pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js')
})
} else {
loader = new PDFLoader(blob, {
pdfjs: () =>
// @ts-ignore
legacy ? import('pdfjs-dist/legacy/build/pdf.js') : import('pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js')
})
}
break
}
case '.docx': {
loader = new DocxLoader(blob)
break
}
default:
loader = new TextLoader(blob)
break
}
let chunks: any[] = []
// @ts-ignore
if (splitter && loader) {
chunks = await loader.loadAndSplit(splitter)
}
const totalChunks = chunks.length
// if -1, return all chunks
if (config.previewChunkCount === -1) config.previewChunkCount = totalChunks
// return all docs if the user ask for more than we have
if (totalChunks <= config.previewChunkCount) config.previewChunkCount = totalChunks
// return only the first n chunks
if (totalChunks > config.previewChunkCount) chunks = chunks.slice(0, config.previewChunkCount)

return { chunks: chunks, totalChunks: totalChunks, previewChunkCount: config.previewChunkCount }
}
}
1 change: 1 addition & 0 deletions packages/components/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ dotenv.config({ path: envPath, override: true })
export * from './Interface'
export * from './utils'
export * from './speechToText'
export * from './documentStoreProcessor'
29 changes: 29 additions & 0 deletions packages/server/src/Interface.ts
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,32 @@ export interface IUploadFileSizeAndTypes {
fileTypes: string[]
maxUploadSize: number
}

// DocumentStore related
export enum DocumentStoreStatus {
EMPTY_SYNC = 'EMPTY',
SYNC = 'SYNC',
SYNCING = 'SYNCING',
STALE = 'STALE',
NEW = 'NEW'
}

export interface IDocumentStore {
id: string
name: string
description: string
subFolder: string
files: string // JSON string
metrics: string // JSON string
whereUsed: string // JSON string
updatedDate: Date
createdDate: Date
status: DocumentStoreStatus
}
export interface IDocumentStoreFileChunk {
id: string
docId: string
storeId: string
pageContent: string
metadata: string
}
Loading
Loading