Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 8769619
Author: Jin Hai <[email protected]>
Date:   Sun May 12 13:40:47 2024 +0800

    Update readme (infiniflow#741)

    ### What problem does this PR solve?

    Update readme.

    ### Type of change

    - [x] Documentation Update

    Signed-off-by: Jin Hai <[email protected]>

commit ffe5737
Author: KevinHuSh <[email protected]>
Date:   Sat May 11 19:47:53 2024 +0800

    let index be batchly. (infiniflow#733)

    ### What problem does this PR solve?

    let index be batchly.

    ### Type of change

    - [x] Refactoring

commit 04a9e95
Author: KevinHuSh <[email protected]>
Date:   Sat May 11 16:04:28 2024 +0800

    let file in knowledgebases visible in file manager (infiniflow#714)

    ### What problem does this PR solve?

    Let file in knowledgebases visible in file manager.
    infiniflow#162

    ### Type of change

    - [x] New Feature (non-breaking change which adds functionality)

commit 91b4a18
Author: balibabu <[email protected]>
Date:   Sat May 11 16:03:07 2024 +0800

    Make the app name configurable even after the project is built (infiniflow#731)

    ### What problem does this PR solve?

    Make the app name configurable even after the project is built infiniflow#730

    ### Type of change

    - [x] New Feature (non-breaking change which adds functionality)

commit 33eaf6f
Author: Ikko Eltociear Ashimine <[email protected]>
Date:   Fri May 10 12:22:40 2024 +0900

    docs: update README_ja.md (infiniflow#707)

    ### What problem does this PR solve?

    _Briefly describe what this PR aims to solve. Include background context
    that will help reviewers understand the purpose of the PR._

    ### Type of change

    - [x] Documentation Update

commit d65ba3e
Author: balibabu <[email protected]>
Date:   Fri May 10 10:38:39 2024 +0800

    feat: delete the added model infiniflow#503 and display an error message when the requested file fails to parse infiniflow#684  (infiniflow#708)

    ### What problem does this PR solve?

    feat: delete the added model infiniflow#503
    feat: display an error message when the requested file fails to parse
    infiniflow#684

    ### Type of change

    - [x] New Feature (non-breaking change which adds functionality)

commit bef1bbd
Author: CKLogic <[email protected]>
Date:   Fri May 10 09:48:50 2024 +0800

    Update README with Detailed WebUI Service Launch Instructions (infiniflow#694)

    ### What problem does this PR solve?

    Improve README by detailing Launch Service from Source section

    This commit enhances the README document by adding comprehensive steps
    for running the WebUI service in the 'Launch Service from Source'
    section. It aims to provide clearer guidance for users attempting to
    start the service from the source code, making the setup process more
    accessible and understandable.

    Key changes include:
    - Detailed instructions for setting up and running the WebUI service.
    - Necessary prerequisites for launching the service from source.

    This update ensures that users have all the information they need to
    successfully launch the service, improving the overall usability of our
    project.

    ### Type of change

    - [x] Documentation Update

commit 6b36f31
Author: writinwaters <[email protected]>
Date:   Fri May 10 09:48:24 2024 +0800

    Minor editorial updates (infiniflow#700)

    ### What problem does this PR solve?

    Editorial updates only.

    ### Type of change

    - [x] Documentation Update

commit 648a2ba
Author: KevinHuSh <[email protected]>
Date:   Thu May 9 15:32:24 2024 +0800

    fix disabled doc is still retreivalable (infiniflow#695)

    ### What problem does this PR solve?

    Fix that disabled doc is still retreivalable

    ### Type of change

    - [x] Bug Fix (non-breaking change which fixes an issue)

commit 9392b8b
Author: writinwaters <[email protected]>
Date:   Thu May 9 12:37:45 2024 +0800

    0509 faq (infiniflow#693)

    ### What problem does this PR solve?

    Editorial updates only.

    ### Type of change

    - [x] Documentation Update

commit 4153a36
Author: KevinHuSh <[email protected]>
Date:   Thu May 9 11:35:08 2024 +0800

    truncate text to fitin embedding model (infiniflow#692)

    ### What problem does this PR solve?

    ### Type of change

    - [x] Refactoring

commit bca63ad
Author: GYH <[email protected]>
Date:   Thu May 9 11:32:36 2024 +0800

    Update faq.md (infiniflow#685)

    ### What problem does this PR solve?

    Updated FAQ: How to upgrade RAGFlow

    ### Type of change

    - [x] Documentation Update

commit 793e29f
Author: balibabu <[email protected]>
Date:   Thu May 9 11:30:15 2024 +0800

    fix: fix uploaded file time error infiniflow#680 (infiniflow#690)

    ### What problem does this PR solve?

    fix: fix uploaded file time error infiniflow#680
    feat: support preview of word and excel infiniflow#684

    ### Type of change

    - [x] Bug Fix (non-breaking change which fixes an issue)

commit 99be226
Author: KevinHuSh <[email protected]>
Date:   Wed May 8 20:00:14 2024 +0800

    fix coordinate error (infiniflow#686)

    ### What problem does this PR solve?

    infiniflow#683

    ### Type of change

    - [x] Bug Fix (non-breaking change which fixes an issue)

commit 7ddb2f1
Author: KevinHuSh <[email protected]>
Date:   Wed May 8 15:20:45 2024 +0800

    make sure to raise exception if redis is not there (infiniflow#674)

    ### What problem does this PR solve?

    ### Type of change

    - [x] Refactoring

commit c28f7b5
Author: KevinHuSh <[email protected]>
Date:   Wed May 8 13:58:41 2024 +0800

    make sure  the error will be recorded. (infiniflow#672)

    ### What problem does this PR solve?

    ### Type of change

    - [x] Refactoring
  • Loading branch information
fkzhao committed May 12, 2024
1 parent 4b5e2e9 commit b9faab9
Show file tree
Hide file tree
Showing 70 changed files with 4,106 additions and 4,781 deletions.
49 changes: 37 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@

[RAGFlow](https://demo.ragflow.io) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

## 📌 Latest Updates

- 2024-05-08 Integrates LLM DeepSeek-V2.
- 2024-04-26 Adds file management.
- 2024-04-19 Supports conversation API ([detail](./docs/conversation_api.md)).
- 2024-04-16 Integrates an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding), and [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
- 2024-04-11 Supports [Xinference](./docs/xinference.md) for local LLM deployment.
- 2024-04-10 Adds a new layout recognition model for analyzing legal documents.
- 2024-04-08 Supports [Ollama](./docs/ollama.md) for local LLM deployment.
- 2024-04-07 Supports Chinese UI.

## 🌟 Key Features

### 🍭 **"Quality in, quality out"**
Expand Down Expand Up @@ -56,17 +67,6 @@
- Multiple recall paired with fused re-ranking.
- Intuitive APIs for seamless integration with business.

## 📌 Latest Features

- 2024-05-08 Integrates LLM DeepSeek.
- 2024-04-26 Adds file management.
- 2024-04-19 Supports conversation API ([detail](./docs/conversation_api.md)).
- 2024-04-16 Integrates an embedding model 'bce-embedding-base_v1' from [BCEmbedding](https://github.com/netease-youdao/BCEmbedding), and [FastEmbed](https://github.com/qdrant/fastembed), which is designed specifically for light and speedy embedding.
- 2024-04-11 Supports [Xinference](./docs/xinference.md) for local LLM deployment.
- 2024-04-10 Adds a new layout recognition model for analyzing Laws documentation.
- 2024-04-08 Supports [Ollama](./docs/ollama.md) for local LLM deployment.
- 2024-04-07 Supports Chinese UI.

## 🔎 System Architecture

<div align="center" style="margin-top:20px;margin-bottom:20px;">
Expand Down Expand Up @@ -114,12 +114,14 @@

3. Build the pre-built Docker images and start up the server:

> Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.5.0`, before running the following commands.
```bash
$ cd ragflow/docker
$ chmod +x ./entrypoint.sh
$ docker compose up -d
```
> Please note that running the above commands will automatically download the development version docker image of RAGFlow. If you want to download and run a specific version of docker image, please find the RAGFLOW_VERSION variable in the docker/.env file, change it to the corresponding version, for example, RAGFLOW_VERSION=v0.5.0, and run the above commands.


> The core image is about 9 GB in size and may take a while to load.
Expand Down Expand Up @@ -247,6 +249,29 @@ $ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh
```

7. Start the WebUI service
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ vim .umirc.ts
# Modify proxy.target to 127.0.0.1:9380
$ npm run dev
```

8. Deploy the WebUI service
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ umi build
$ mkdir -p /ragflow/web
$ cp -r dist /ragflow/web
$ apt install nginx -y
$ cp ../docker/nginx/proxy.conf /etc/nginx
$ cp ../docker/nginx/nginx.conf /etc/nginx
$ cp ../docker/nginx/ragflow.conf /etc/nginx/conf.d
$ systemctl start nginx
```

## 📚 Documentation

- [FAQ](./docs/faq.md)
Expand Down
25 changes: 13 additions & 12 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,19 @@

[RAGFlow](https://demo.ragflow.io) は、深い文書理解に基づいたオープンソースの RAG (Retrieval-Augmented Generation) エンジンである。LLM(大規模言語モデル)を組み合わせることで、様々な複雑なフォーマットのデータから根拠のある引用に裏打ちされた、信頼できる質問応答機能を実現し、あらゆる規模のビジネスに適した RAG ワークフローを提供します。

## 📌 最新情報

- 2024-05-08 LLM DeepSeek-V2を統合しました。
- 2024-04-26 「ファイル管理」機能を追加しました。
- 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
- 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
- 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
- 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
- 2024-04-10 メソッド「Laws」に新しいレイアウト認識モデルを追加します。
- 2024-04-08 [Ollama](./docs/ollama.md) を使用した大規模モデルのローカライズされたデプロイメントをサポートします。
- 2024-04-07 中国語インターフェースをサポートします。


## 🌟 主な特徴

### 🍭 **"Quality in, quality out"**
Expand Down Expand Up @@ -56,18 +69,6 @@
- 複数の想起と融合された再ランク付け。
- 直感的な API によってビジネスとの統合がシームレスに。

## 📌 最新の機能

- 2024-05-08
- 2024-04-26 「ファイル管理」機能を追加しました。
- 2024-04-19 会話 API をサポートします ([詳細](./docs/conversation_api.md))。
- 2024-04-16 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) から埋め込みモデル「bce-embedding-base_v1」を追加します。
- 2024-04-16 [FastEmbed](https://github.com/qdrant/fastembed) は、軽量かつ高速な埋め込み用に設計されています。
- 2024-04-11 ローカル LLM デプロイメント用に [Xinference](./docs/xinference.md) をサポートします。
- 2024-04-10 メソッド「Laws」に新しいレイアウト認識モデルを追加します。
- 2024-04-08 [Ollama](./docs/ollama.md) を使用した大規模モデルのローカライズされたデプロイメントをサポートします。
- 2024-04-07 中国語インターフェースをサポートします。

## 🔎 システム構成

<div align="center" style="margin-top:20px;margin-bottom:20px;">
Expand Down
43 changes: 32 additions & 11 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,17 @@

[RAGFlow](https://demo.ragflow.io) 是一款基于深度文档理解构建的开源 RAG(Retrieval-Augmented Generation)引擎。RAGFlow 可以为各种规模的企业及个人提供一套精简的 RAG 工作流程,结合大语言模型(LLM)针对用户各类不同的复杂格式数据提供可靠的问答以及有理有据的引用。

## 📌 近期更新

- 2024-05-08 集成大模型 DeepSeek
- 2024-04-26 增添了'文件管理'功能.
- 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
- 2024-04-16 集成嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 和 专为轻型和高速嵌入而设计的 [FastEmbed](https://github.com/qdrant/fastembed)
- 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
- 2024-04-10 为‘Laws’版面分析增加了底层模型。
- 2024-04-08 支持用 [Ollama](./docs/ollama.md) 本地化部署大模型。
- 2024-04-07 支持中文界面。

## 🌟 主要功能

### 🍭 **"Quality in, quality out"**
Expand Down Expand Up @@ -56,17 +67,6 @@
- 基于多路召回、融合重排序。
- 提供易用的 API,可以轻松集成到各类企业系统。

## 📌 新增功能

- 2024-05-08 集成大模型 DeepSeek
- 2024-04-26 增添了'文件管理'功能.
- 2024-04-19 支持对话 API ([更多](./docs/conversation_api.md)).
- 2024-04-16 集成嵌入模型 [BCEmbedding](https://github.com/netease-youdao/BCEmbedding) 和 专为轻型和高速嵌入而设计的 [FastEmbed](https://github.com/qdrant/fastembed)
- 2024-04-11 支持用 [Xinference](./docs/xinference.md) 本地化部署大模型。
- 2024-04-10 为‘Laws’版面分析增加了底层模型。
- 2024-04-08 支持用 [Ollama](./docs/ollama.md) 本地化部署大模型。
- 2024-04-07 支持中文界面。

## 🔎 系统架构

<div align="center" style="margin-top:20px;margin-bottom:20px;">
Expand Down Expand Up @@ -247,7 +247,28 @@ $ docker compose -f docker-compose-base.yml up -d
$ chmod +x ./entrypoint.sh
$ bash ./entrypoint.sh
```
7. 启动WebUI服务
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ vim .umirc.ts
# 修改proxy.target为127.0.0.1:9380
$ npm run dev
```

8. 部署WebUI服务
```bash
$ cd web
$ npm install --registry=https://registry.npmmirror.com --force
$ umi build
$ mkdir -p /ragflow/web
$ cp -r dist /ragflow/web
$ apt install nginx -y
$ cp ../docker/nginx/proxy.conf /etc/nginx
$ cp ../docker/nginx/nginx.conf /etc/nginx
$ cp ../docker/nginx/ragflow.conf /etc/nginx/conf.d
$ systemctl start nginx
```
## 📚 技术文档

- [FAQ](./docs/faq.md)
Expand Down
43 changes: 29 additions & 14 deletions api/apps/document_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from flask import request
from flask_login import login_required, current_user

from api.db.db_models import Task
from api.db.db_models import Task, File
from api.db.services.file2document_service import File2DocumentService
from api.db.services.file_service import FileService
from api.db.services.task_service import TaskService, queue_tasks
Expand All @@ -33,7 +33,7 @@
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid
from api.db import FileType, TaskStatus, ParserType
from api.db import FileType, TaskStatus, ParserType, FileSource
from api.db.services.document_service import DocumentService
from api.settings import RetCode
from api.utils.api_utils import get_json_result
Expand All @@ -59,12 +59,19 @@ def upload():
return get_json_result(
data=False, retmsg='No file selected!', retcode=RetCode.ARGUMENT_ERROR)

e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
raise LookupError("Can't find this knowledgebase!")

root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
kb_root_folder = FileService.get_kb_folder(current_user.id)
kb_folder = FileService.new_a_file_from_kb(kb.tenant_id, kb.name, kb_root_folder["id"])

err = []
for file in file_objs:
try:
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
raise LookupError("Can't find this knowledgebase!")
MAX_FILE_NUM_PER_USER = int(os.environ.get('MAX_FILE_NUM_PER_USER', 0))
if MAX_FILE_NUM_PER_USER > 0 and DocumentService.get_doc_count(kb.tenant_id) >= MAX_FILE_NUM_PER_USER:
raise RuntimeError("Exceed the maximum file number of a free user!")
Expand Down Expand Up @@ -99,6 +106,8 @@ def upload():
if re.search(r"\.(ppt|pptx|pages)$", filename):
doc["parser_id"] = ParserType.PRESENTATION.value
DocumentService.insert(doc)

FileService.add_file_from_kb(doc, kb_folder["id"], kb.tenant_id)
except Exception as e:
err.append(file.filename + ": " + str(e))
if err:
Expand Down Expand Up @@ -228,11 +237,13 @@ def rm():
req = request.json
doc_ids = req["doc_id"]
if isinstance(doc_ids, str): doc_ids = [doc_ids]
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
errors = ""
for doc_id in doc_ids:
try:
e, doc = DocumentService.get_by_id(doc_id)

if not e:
return get_data_error_result(retmsg="Document not found!")
tenant_id = DocumentService.get_tenant_id(doc_id)
Expand All @@ -241,21 +252,25 @@ def rm():

ELASTICSEARCH.deleteByQuery(
Q("match", doc_id=doc.id), idxnm=search.index_name(tenant_id))
DocumentService.increment_chunk_num(
doc.id, doc.kb_id, doc.token_num * -1, doc.chunk_num * -1, 0)

DocumentService.clear_chunk_num(doc_id)
b, n = File2DocumentService.get_minio_address(doc_id=doc_id)

if not DocumentService.delete(doc):
return get_data_error_result(
retmsg="Database error (Document removal)!")

informs = File2DocumentService.get_by_document_id(doc_id)
if not informs:
MINIO.rm(doc.kb_id, doc.location)
else:
File2DocumentService.delete_by_document_id(doc_id)
f2d = File2DocumentService.get_by_document_id(doc_id)
FileService.filter_delete([File.source_type == FileSource.KNOWLEDGEBASE, File.id == f2d[0].file_id])
File2DocumentService.delete_by_document_id(doc_id)

MINIO.rm(b, n)
except Exception as e:
errors += str(e)

if errors: return server_error_response(e)
if errors:
return get_json_result(data=False, retmsg=errors, retcode=RetCode.SERVER_ERROR)

return get_json_result(data=True)


Expand Down
13 changes: 8 additions & 5 deletions api/apps/file_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from api.db.services.file2document_service import File2DocumentService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid
from api.db import FileType
from api.db import FileType, FileSource
from api.db.services import duplicate_name
from api.db.services.file_service import FileService
from api.settings import RetCode
Expand All @@ -45,7 +45,7 @@ def upload():

if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]

if 'file' not in request.files:
return get_json_result(
Expand Down Expand Up @@ -132,7 +132,7 @@ def create():
input_file_type = request.json.get("type")
if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]

try:
if not FileService.is_parent_folder_exist(pf_id):
Expand Down Expand Up @@ -176,7 +176,8 @@ def list():
desc = request.args.get("desc", True)
if not pf_id:
root_folder = FileService.get_root_folder(current_user.id)
pf_id = root_folder.id
pf_id = root_folder["id"]
FileService.init_knowledgebase_docs(pf_id, current_user.id)
try:
e, file = FileService.get_by_id(pf_id)
if not e:
Expand All @@ -199,7 +200,7 @@ def list():
def get_root_folder():
try:
root_folder = FileService.get_root_folder(current_user.id)
return get_json_result(data={"root_folder": root_folder.to_json()})
return get_json_result(data={"root_folder": root_folder})
except Exception as e:
return server_error_response(e)

Expand Down Expand Up @@ -250,6 +251,8 @@ def rm():
return get_data_error_result(retmsg="File or Folder not found!")
if not file.tenant_id:
return get_data_error_result(retmsg="Tenant not found!")
if file.source_type == FileSource.KNOWLEDGEBASE:
continue

if file.type == FileType.FOLDER.value:
file_id_list = FileService.get_all_innermost_file_ids(file_id, [])
Expand Down
10 changes: 10 additions & 0 deletions api/apps/llm_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,16 @@ def add_llm():
return get_json_result(data=True)


@manager.route('/delete_llm', methods=['POST'])
@login_required
@validate_request("llm_factory", "llm_name")
def delete_llm():
req = request.json
TenantLLMService.filter_delete(
[TenantLLM.tenant_id == current_user.id, TenantLLM.llm_factory == req["llm_factory"], TenantLLM.llm_name == req["llm_name"]])
return get_json_result(data=True)


@manager.route('/my_llms', methods=['GET'])
@login_required
def my_llms():
Expand Down
8 changes: 8 additions & 0 deletions api/db/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,11 @@ class ParserType(StrEnum):
NAIVE = "naive"
PICTURE = "picture"
ONE = "one"


class FileSource(StrEnum):
LOCAL = ""
KNOWLEDGEBASE = "knowledgebase"
S3 = "s3"

KNOWLEDGEBASE_FOLDER_NAME=".knowledgebase"
Loading

0 comments on commit b9faab9

Please sign in to comment.