Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine README #118

Merged
merged 3 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 88 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,64 @@
English | [简体中文](./README_zh.md)
<div align="center">
<a href="https://ragflow.io/">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/f034fb27-b3bf-401b-b213-e1dfa7448d2a" width="320" alt="ragflow logo">
</a>
</div>


## System Environment Preparation
<p align="center">
<a href="./README.md">English</a> |
<a href="./README_zh.md">简体中文</a>
</p>

### Install docker
<p align="center">
<a href="https://ragflow.io" target="_blank">
<img alt="Static Badge" src="https://img.shields.io/badge/RAGFLOW-LLM-white?&labelColor=dd0af7"></a>
<a href="https://hub.docker.com/r/infiniflow/ragflow" target="_blank">
<img src="https://img.shields.io/badge/docker_pull-ragflow:v1.0-brightgreen"
alt="docker pull ragflow:v1.0"></a>
<a href="https://github.com/infiniflow/ragflow/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
</a>
</p>

If your machine doesn't have *Docker* installed, please refer to [Install Docker Engine](https://docs.docker.com/engine/install/)
[RAGFLOW](http://ragflow.io) is a knowledge management platform built on custom-build document understanding engine and LLM,
with reasoned and well-founded answers to your question. Clone this repository, you can deploy your own knowledge management
platform to empower your business with AI.

<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b24a7a5f-4d1d-4a30-90b1-7b0ec558b79d" width="1000"/>
</div>

# Features
- **Custom-build document understanding engine.** Our deep learning engine is made according to the needs of analyzing and searching various type of documents in different domain.
- For documents from different domain for different purpose, the engine applys different analyzing and search strategy.
- Easily intervene and manipulate the data proccessing procedure when things goes beyond expectation.
- Multi-media document understanding is supported using OCR and multi-modal LLM.
- **State-of-the-art table structure and layout recognition.** Precisely extract and understand the document including table content. [README](./deepdoc/README.md)
- For PDF files, layout and table structures including row, column and span of them are recognized.
- Put the table accrossing the pages together.
- Reconstruct the table structure components into html table.
- **Querying database dumped data are supported.** After uploading tables from any database, you can search any data records just by asking.
- Instead of using SQL to query a database, every one cat get the wanted data just by asking using natrual language.
- The record number uploaded is not limited.
- Some extra description of column headers should be provided.
- **Reasoned and well-founded answers.** The cited document part in LLM's answer is provided and pointed out in the original document.
- The answers are based on retrieved result for which we apply vector-keyword hybrids search and rerank.
- The part of document cited in the answer is presented in the most expressive way.
- For PDF file, the cited parts in document can be located in the original PDF.


### OS Setups
Firstly, you need to check the following command:
# Release Notification
**Star us on GitHub, and be notified for a new releases instantly!**
![star-us](https://github.com/langgenius/dify/assets/100913391/95f37259-7370-4456-a9f0-0bc01ef8642f)

# Installation
## System Requirements
Be aware of the system minimum requirements before starting installation.
- CPU >= 2 cores
- RAM >= 8GB

Then, you need to check the following command:
```bash
121:/ragflow# sysctl vm.max_map_count
vm.max_map_count = 262144
Expand All @@ -24,7 +74,11 @@ Add or update the following line in the file:
vm.max_map_count=262144
```

## Here we go!
## Install docker

If your machine doesn't have *Docker* installed, please refer to [Install Docker Engine](https://docs.docker.com/engine/install/)

## Quick Start
> If you want to change the basic setups, like port, password .etc., please refer to [.env](./docker/.env) before starting the system.

> If you change anything in [.env](./docker/.env), please check [service_conf.yaml](./docker/service_conf.yaml) which is a
Expand All @@ -37,10 +91,13 @@ vm.max_map_count=262144
> [OpenAI](https://platform.openai.com/login?launch), [通义千问/QWen](https://dashscope.console.aliyun.com/model),
> [智谱AI/ZhipuAI](https://open.bigmodel.cn/)
```bash
121:/ragflow# cd docker
121:/# git clone https://github.com/infiniflow/ragflow.git
121:/# cd ragflow/docker
121:/ragflow/docker# docker compose up -d
```
If after about a half of minutes, use the following command to check the server status. If you can have the following outputs,
> The core image is about 15GB, please be patient for the first time

After pulling all the images and running up, use the following command to check the server status. If you can have the following outputs,
_**Hallelujah!**_ You have successfully launched the system.
```bash
121:/ragflow# docker logs -f ragflow-server
Expand All @@ -58,10 +115,25 @@ _**Hallelujah!**_ You have successfully launched the system.
INFO:werkzeug:Press CTRL+C to quit

```
Open your browser, after entering the IP address of your server, if you see the flowing in your browser, _**Hallelujah**_ again!
> The default serving port is 80, if you want to change that, please refer to [ragflow.conf](./nginx/ragflow.conf),
> and change the *listen* value.

<div align="center" style="margin-top:20px;margin-bottom:20px;">
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b24a7a5f-4d1d-4a30-90b1-7b0ec558b79d" width="1000"/>
</div>
Open your browser, enter the IP address of your server, _**Hallelujah**_ again!
> The default serving port is 80, if you want to change that, please refer to [docker-compose.yml](./docker-compose.yaml),
> and change the left part of *'80:80'*'.

# Configuration
If you need to change the default setting of the system when you deploy it. There several ways to configure it.
Please refer to [README](./docker/README.md) and manually set the configuration.
After changing something, please run *docker-compose up -d* again.

# RoadMap

- [ ] File manager.
- [ ] Support URLs. Crawl web and extract the main content.


# Contributing

For those who'd like to contribute code, see our [Contribution Guide](https://github.com/infiniflow/ragflow/blob/main/CONTRIBUTING.md).

# License

This repository is available under the [Ragflow Open Source License](LICENSE), which is essentially Apache 2.0 with a few additional restrictions.
15 changes: 5 additions & 10 deletions api/apps/llm_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,12 @@
#
from flask import request
from flask_login import login_required, current_user

from api.db.services import duplicate_name
from api.db.services.llm_service import LLMFactoriesService, TenantLLMService, LLMService
from api.db.services.user_service import TenantService, UserTenantService
from api.utils.api_utils import server_error_response, get_data_error_result, validate_request
from api.utils import get_uuid, get_format_time
from api.db import StatusEnum, UserTenantRole, LLMType
from api.db.services.knowledgebase_service import KnowledgebaseService
from api.db.db_models import Knowledgebase, TenantLLM
from api.settings import stat_logger, RetCode
from api.db import StatusEnum, LLMType
from api.db.db_models import TenantLLM
from api.utils.api_utils import get_json_result
from rag.llm import EmbeddingModel, CvModel, ChatModel
from rag.llm import EmbeddingModel, ChatModel


@manager.route('/factories', methods=['GET'])
Expand Down Expand Up @@ -119,4 +113,5 @@ def list():

return get_json_result(data=res)
except Exception as e:
return server_error_response(e)
return server_error_response(e)

2 changes: 1 addition & 1 deletion api/ragflow_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
/_/ |_| \__,_/ \__, //_/ /_/ \____/ |__/|__/
/____/

""")
""", flush=True)
stat_logger.info(
f'project base: {utils.file_utils.get_project_base_directory()}'
)
Expand Down
80 changes: 80 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@

# Docker Environment Variable

Look into [.env](./.env), there're some important variables.

## MYSQL_PASSWORD
The mysql password could be changed by this variable. But you need to change *mysql.password* in [service_conf.yaml](./service_conf.yaml) at the same time.


## MYSQL_PORT
It refers to exported port number of mysql docker container, it's useful if you want to access the database outside the docker containers.

## MINIO_USER
It refers to user name of [Mino](https://github.com/minio/minio). The modification should be synchronous updating at minio.user of [service_conf.yaml](./service_conf.yaml).

## MINIO_PASSWORD
It refers to user password of [Mino](https://github.com/minio/minio). The modification should be synchronous updating at minio.password of [service_conf.yaml](./service_conf.yaml).


## SVR_HTTP_PORT
It refers to The API server serving port.


# Service Configuration
[service_conf.yaml](./service_conf.yaml) is used by the *API server* and *task executor*. It's the most important configuration of the system.

## ragflow

### host
The IP address used by the API server.

### port
The serving port of API server.

## mysql

### name
The database name in mysql used by this system.

### user
The database user name.

### password
The database password. The modification should be synchronous updating at *MYSQL_PASSWORD* in [.env](./.env).

### port
The serving port of mysql inside the container. The modification should be synchronous updating at [docker-compose.yml](./docker-compose.yml)

### max_connections
The max database connection.

### stale_timeout
The timeout duation in seconds.

## minio

### user
The username of minio. The modification should be synchronous updating at *MINIO_USER* in [.env](./.env).

### password
The password of minio. The modification should be synchronous updating at *MINIO_PASSWORD* in [.env](./.env).

### host
The serving IP and port inside the docker container. This is not updating until changing the minio part in [docker-compose.yml](./docker-compose.yml)

## user_default_llm
Newly signed-up users use LLM configured by this part. Otherwise, user need to configure his own LLM in *setting*.

### factory
The LLM suppliers. '通义千问', "OpenAI" and "智谱AI" are supported.

### api_key
The corresponding API key of your assigned LLM vendor.

## oauth
This is OAuth configuration which allows your system using the third-party account to sign-up and sign-in to the system.

### github
Got to [Github](https://github.com/settings/developers), register new application, the *client_id* and *secret_key* will be given.