PaddlePaddle · TeslaZhao · Nov 8, 2021 · Nov 8, 2021 · Nov 8, 2021
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 <p align="center">
     <br>
-<img src='doc/serving_logo.png' width = "600" height = "130">
+<img src='doc/images/serving_logo.png' width = "600" height = "130">
     <br>
 <p>
 
@@ -47,7 +47,7 @@ We consider deploying deep learning inference service online to be a user-facing
 [Serving Examples](./python/examples/).
 
 <p align="center">
-    <img src="doc/demo.gif" width="700">
+    <img src="doc/images/demo.gif" width="700">
 </p>
 
 

diff --git a/README_CN.md b/README_CN.md
@@ -2,7 +2,7 @@
 
 <p align="center">
     <br>
-<img src='doc/serving_logo.png' width = "600" height = "130">
+<img src='doc/images/serving_logo.png' width = "600" height = "130">
     <br>
 <p>
 
@@ -48,7 +48,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
 - 提供丰富多彩的前后处理，方便用户在训练、部署等各阶段复用相关代码，弥合AI开发者和应用开发者之间的鸿沟，详情参考[模型示例](./python/examples/)。
 
 <p align="center">
-    <img src="doc/demo.gif" width="700">
+    <img src="doc/images/demo.gif" width="700">
 </p>
 
 <h2 align="center">教程</h2>

diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/ABTEST_IN_PADDLE_SERVING.md
@@ -4,7 +4,7 @@
 
 This document will use an example of text classification task based on IMDB dataset to show how to build a A/B Test framework using Paddle Serving. The structure relationship between the client and servers in the example is shown in the figure below.
 
-<img src="abtest.png" style="zoom:25%;" />
+<img src="images/abtest.png" style="zoom:25%;" />
 
 Note that:  A/B Test is only applicable to RPC mode, not web mode.
 

diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
@@ -4,7 +4,7 @@
 
 该文档将会用一个基于IMDB数据集的文本分类任务的例子，介绍如何使用Paddle Serving搭建A/B Test框架，例中的Client端、Server端结构如下图所示。
 
-<img src="abtest.png" style="zoom:33%;" />
+<img src="images/abtest.png" style="zoom:33%;" />
 
 需要注意的是：A/B Test只适用于RPC模式，不适用于WEB模式。
 

diff --git a/doc/BERT_10_MINS.md b/doc/BERT_10_MINS.md
@@ -115,7 +115,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}]
 
 We tested the performance of Bert-As-Service based on Padde Serving based on V100 and compared it with the Bert-As-Service based on Tensorflow. From the perspective of user configuration, we used the same batch size and concurrent number for stress testing. The overall throughput performance data obtained under 4 V100s is as follows.
 
-![4v100_bert_as_service_benchmark](4v100_bert_as_service_benchmark.png)
+![4v100_bert_as_service_benchmark](images/4v100_bert_as_service_benchmark.png)
 
 <!--
 yum install -y libXext libSM libXrender

diff --git a/doc/BERT_10_MINS_CN.md b/doc/BERT_10_MINS_CN.md
@@ -111,4 +111,4 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}]
 
 我们基于V100对基于Padde Serving研发的Bert-As-Service的性能进行测试并与基于Tensorflow实现的Bert-As-Service进行对比，从用户配置的角度，采用相同的batch size和并发数进行压力测试，得到4块V100下的整体吞吐性能数据如下。
 
-![4v100_bert_as_service_benchmark](4v100_bert_as_service_benchmark.png)
+![4v100_bert_as_service_benchmark](images/4v100_bert_as_service_benchmark.png)
diff --git a/doc/C++DESIGN.md b/doc/C++DESIGN.md
@@ -45,11 +45,11 @@ Models that can be predicted using the Paddle Inference Library, models saved du
 
 ### 3.4 Server Inferface
 
-![Server Interface](server_interface.png)
+![Server Interface](images/server_interface.png)
 
 ### 3.5 Client Interface
 
-<img src='client_inferface.png' width = "600" height = "200">
+<img src='images/client_inferface.png' width = "600" height = "200">
 
 ### 3.6 Client io used during Training
 
@@ -66,7 +66,7 @@ def save_model(server_model_folder,
 
 ## 4. Paddle Serving Underlying Framework
 
-![Paddle-Serging Overall Architecture](framework.png)
+![Paddle-Serging Overall Architecture](images/framework.png)
 
 **Model Management Framework**: Connects model files of multiple machine learning platforms and provides a unified inference interface
 **Business Scheduling Framework**: Abstracts the calculation logic of various different inference models, provides a general DAG scheduling framework, and connects different operators through DAG diagrams to complete a prediction service together. This abstract model allows users to conveniently implement their own calculation logic, and at the same time facilitates operator sharing. (Users build their own forecasting services. A large part of their work is to build DAGs and provide operators.)
@@ -102,31 +102,31 @@ class FluidFamilyCore {
 
 With reference to the abstract idea of model calculation of the TensorFlow framework, the business logic is abstracted into a DAG diagram, driven by configuration, generating a workflow, and skipping C ++ code compilation. Each specific step of the service corresponds to a specific OP. The OP can configure the upstream OP that it depends on. Unified message passing between OPs is achieved by the thread-level bus and channel mechanisms. For example, the service process of a simple prediction service can be abstracted into 3 steps including reading request data-> calling the prediction interface-> writing back the prediction result, and correspondingly implemented to 3 OP: ReaderOp-> ClassifyOp-> WriteOp
 
-![Infer Service](predict-service.png)
+![Infer Service](images/predict-service.png)
 
 Regarding the dependencies between OPs, and the establishment of workflows through OPs, you can refer to [从零开始写一个预测服务](CREATING.md) (simplified Chinese Version)
 
 Server instance perspective
 
-![Server instance perspective](server-side.png)
+![Server instance perspective](images/server-side.png)
 
 
 #### 4.2.2 Paddle Serving Multi-Service Mechanism
 
-![Paddle Serving multi-service](multi-service.png)
+![Paddle Serving multi-service](images/multi-service.png)
 
 Paddle Serving instances can load multiple models at the same time, and each model uses a Service (and its configured workflow) to undertake services. You can refer to [service configuration file in Demo example](../tools/cpp_examples/demo-serving/conf/service.prototxt) to learn how to configure multiple services for the serving instance
 
 #### 4.2.3 Hierarchical relationship of business scheduling
 
 From the client's perspective, a Paddle Serving service can be divided into three levels: Service, Endpoint, and Variant from top to bottom.
 
-![Call hierarchy relationship](multi-variants.png)
+![Call hierarchy relationship](images/multi-variants.png)
 
 One Service corresponds to one inference model, and there is one endpoint under the model. Different versions of the model are implemented through multiple variant concepts under endpoint:
 The same model prediction service can configure multiple variants, and each variant has its own downstream IP list. The client code can configure relative weights for each variant to achieve the relationship of adjusting the traffic ratio (refer to the description of variant_weight_list in [Client Configuration](CLIENT_CONFIGURE.md) section 3.2).
 
-![Client-side proxy function](client-side-proxy.png)
+![Client-side proxy function](images/client-side-proxy.png)
 
 ## 5. User Interface
 

diff --git a/doc/C++DESIGN_CN.md b/doc/C++DESIGN_CN.md
@@ -47,11 +47,11 @@ PaddlePaddle是百度开源的机器学习框架，广泛支持各种深度学
 
 ### 3.4 Server Inferface
 
-![Server Interface](server_interface.png)
+![Server Interface](images/server_interface.png)
 
 ### 3.5 Client Interface
 
-<img src='client_inferface.png' width = "600" height = "200">
+<img src='images/client_inferface.png' width = "600" height = "200">
 
 ### 3.6 训练过程中使用的Client io
 
@@ -68,7 +68,7 @@ def save_model(server_model_folder,
 
 ## 4. Paddle Serving底层框架
 
-![Paddle-Serging总体框图](framework.png)
+![Paddle-Serging总体框图](images/framework.png)
 
 **模型管理框架**：对接多种机器学习平台的模型文件，向上提供统一的inference接口
 **业务调度框架**：对各种不同预测模型的计算逻辑进行抽象，提供通用的DAG调度框架，通过DAG图串联不同的算子，共同完成一次预测服务。该抽象模型使用户可以方便的实现自己的计算逻辑，同时便于算子共用。（用户搭建自己的预测服务，很大一部分工作是搭建DAG和提供算子的实现）
@@ -104,31 +104,31 @@ class FluidFamilyCore {
 
 参考TF框架的模型计算的抽象思想，将业务逻辑抽象成DAG图，由配置驱动，生成workflow，跳过C++代码编译。业务的每个具体步骤，对应一个具体的OP，OP可配置自己依赖的上游OP。OP之间消息传递统一由线程级Bus和channel机制实现。例如，一个简单的预测服务的服务过程，可以抽象成读请求数据->调用预测接口->写回预测结果等3个步骤，相应的实现到3个OP: ReaderOp->ClassifyOp->WriteOp
 
-![预测服务Service](predict-service.png)
+![预测服务Service](images/predict-service.png)
 
 关于OP之间的依赖关系，以及通过OP组建workflow，可以参考[从零开始写一个预测服务](CREATING.md)的相关章节
 
 服务端实例透视图
 
-![服务端实例透视图](server-side.png)
+![服务端实例透视图](images/server-side.png)
 
 
 #### 4.2.2 Paddle Serving的多服务机制
 
-![Paddle Serving的多服务机制](multi-service.png)
+![Paddle Serving的多服务机制](images/multi-service.png)
 
 Paddle Serving实例可以同时加载多个模型，每个模型用一个Service（以及其所配置的workflow）承接服务。可以参考[Demo例子中的service配置文件](../tools/cpp_examples/demo-serving/conf/service.prototxt)了解如何为serving实例配置多个service
 
 #### 4.2.3 业务调度层级关系
 
 从客户端看，一个Paddle Serving service从顶向下可分为Service, Endpoint, Variant等3个层级
 
-![调用层级关系](multi-variants.png)
+![调用层级关系](images/multi-variants.png)
 
 一个Service对应一个预测模型，模型下有1个endpoint。模型的不同版本，通过endpoint下多个variant概念实现：
 同一个模型预测服务，可以配置多个variant，每个variant有自己的下游IP列表。客户端代码可以对各个variant配置相对权重，以达到调节流量比例的关系（参考[客户端配置](CLIENT_CONFIGURE.md)第3.2节中关于variant_weight_list的说明）。
 
-![Client端proxy功能](client-side-proxy.png)
+![Client端proxy功能](images/client-side-proxy.png)
 
 ## 5. 用户接口
 

diff --git a/doc/CUBE_LOCAL.md b/doc/CUBE_LOCAL.md
@@ -88,7 +88,7 @@ this step is not necessary, but it can help you to verify if the model is ready.
 ```
 if you succeed, you will see this
 <p align="center">
-    <img src="cube-cli.png" width="700">
+    <img src="images/cube-cli.png" width="700">
 </p>
 
 If you see that each key has a corresponding value output, it means that the delivery was successful. This file can also be used by Serving to perform cube query in general kv infer op in Serving.

diff --git a/doc/CUBE_LOCAL_CN.md b/doc/CUBE_LOCAL_CN.md
@@ -91,7 +91,7 @@ cd cube
 
 如果执行成功，会看到如下结果
 <p align="center">
-    <img src="cube-cli.png" width="700">
+    <img src="images/cube-cli.png" width="700">
 </p>
 
 

diff --git a/doc/DESIGN_DOC.md b/doc/DESIGN_DOC.md
@@ -39,7 +39,7 @@ Paddle Serving provides RPC and HTTP protocol for users. For HTTP service, we re
 
 <p align="center">
     <br>
-<img src='user_groups.png' width = "700" height = "470">
+<img src='images/user_groups.png' width = "700" height = "470">
     <br>
 <p>
 
@@ -96,7 +96,7 @@ Distributed Sparse Parameter Indexing is commonly seen in advertising and recomm
 
 <p align="center">
     <br>
-<img src='cube_eng.png' width = "450" height = "230">
+<img src='images/cube_eng.png' width = "450" height = "230">
     <br>
 <p>
 
@@ -116,7 +116,7 @@ The core execution engine of Paddle Serving is a Directed acyclic graph(DAG). In
 
 <p align="center">
     <br>
-<img src='design_doc.png'">
+<img src='images/design_doc.png'">
     <br>
 <p>
 
@@ -132,7 +132,7 @@ After sufficient offline evaluation of the model, online A/B test is usually nee
 
 <p align="center">
     <br>
-<img src='abtest.png' width = "345" height = "230">
+<img src='images/abtest.png' width = "345" height = "230">
     <br>
 <p>
 
@@ -188,15 +188,15 @@ the end-to-end deep learning model can not solve all the problems at present. Us
 ### 5.1 Network Communication Mechanism
 The network framework of Pipeline Serving uses gRPC and gPRC gateway. The gRPC service receives the RPC request, and the gPRC gateway receives the RESTful API request and forwards the request to the gRPC Service through the reverse proxy server. Therefore, the network layer of Pipeline Serving receives both RPC and RESTful API.
 <center>
-<img src='pipeline_serving-image1.png' height = "250" align="middle"/>
+<img src='images/pipeline_serving-image1.png' height = "250" align="middle"/>
 </center>
 
 ### 5.2 Core Design And Use Cases
 
 The core design of Pipeline Serving is a graph execution engine, and the basic processing units are OP and Channel. A set of directed acyclic graphs can be realized through combination. Reference for design and use documents《[Pipeline Serving](PIPELINE_SERVING.md)》
 
 <center>
-<img src='pipeline_serving-image2.png' height = "300" align="middle"/>
+<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>
 
 ----

diff --git a/doc/DESIGN_DOC_CN.md b/doc/DESIGN_DOC_CN.md
@@ -42,7 +42,7 @@ Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协
 
 <p align="center">
     <br>
-<img src='user_groups.png' width = "700" height = "470">
+<img src='images/user_groups.png' width = "700" height = "470">
     <br>
 <p>
 
@@ -99,7 +99,7 @@ fetch_var {
 为什么要使用Paddle Serving提供的分布式稀疏参数索引服务？1）在一些推荐场景中，模型的输入特征规模通常可以达到上千亿，单台机器无法支撑T级别模型在内存的保存，因此需要进行分布式存储。2）Paddle Serving提供的分布式稀疏参数索引服务，具有并发请求多个节点的能力，从而以较低的延时完成预估服务。
 <p align="center">
     <br>
-<img src='cube_eng.png' width = "450" height = "230">
+<img src='images/cube.png' width = "450" height = "230">
     <br>
 <p>
 分布式稀疏参数索引通常在广告推荐中出现，并与分布式训练配合形成完整的离线-在线一体化部署。下图解释了其中的流程，产品的在线服务接受用户请求后将请求发送给预估服务，同时系统会记录用户的请求以进行相应的训练日志处理和拼接。离线分布式训练系统会针对流式产出的训练日志进行模型增量训练，而增量产生的模型会配送至分布式稀疏参数索引服务，同时对应的稠密的模型参数也会配送至在线的预估服务。在线服务由两部分组成，一部分是针对用户的请求提取特征后，将需要进行模型的稀疏参数索引的特征发送请求给分布式稀疏参数索引服务，针对分布式稀疏参数索引服务返回的稀疏参数再进行后续深度学习模型的计算流程，从而完成预估。
@@ -118,7 +118,7 @@ C++ Serving采用[better-rpc](https://github.com/apache/incubator-brpc)进行底
 C++ Serving的核心执行引擎是一个有向无环图，图中的每个节点代表预估服务的一个环节，例如计算模型预测打分就是其中一个环节。有向无环图有利于可并发节点充分利用部署实例内的计算资源，缩短延时。一个例子，当同一份输入需要送入两个不同的模型进行预估，并将两个模型预估的打分进行加权求和时，两个模型的打分过程即可以通过有向无环图的拓扑关系并发。
 <p align="center">
     <br>
-<img src='design_doc.png'">
+<img src='images/design_doc.png'">
     <br>
 <p>
 
@@ -136,7 +136,7 @@ Paddle Serving采用对称加密算法对模型进行加密，在服务加载模
 
 <p align="center">
     <br>
-<img src='abtest.png' width = "345" height = "230">
+<img src='images/abtest.png' width = "345" height = "230">
     <br>
 <p>
 
@@ -189,13 +189,13 @@ imdb_service.run_server()
 ### 5.1 网络框架
 Pipeline Serving的网络框架采用gRPC和gPRC gateway。gRPC service接收RPC请求，gPRC gateway接收RESTful API请求通过反向代理服务器将请求转发给gRPC Service。即，Pipeline Serving的网络层同时接收RPC和RESTful API。
 <center>
-<img src='pipeline_serving-image1.png' height = "250" align="middle"/>
+<img src='images/pipeline_serving-image1.png' height = "250" align="middle"/>
 </center>
 
 ### 5.2 核心设计与使用用例
 Pipeline Serving核心设计是图执行引擎，基本处理单元是OP和Channel，通过组合实现一套有向无环图，设计与使用文档参考《[Pipeline Serving设计与实现](PIPELINE_SERVING_CN.md)》
 <center>
-<img src='pipeline_serving-image2.png' height = "300" align="middle"/>
+<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>
 ----
 

diff --git a/doc/GRPC_IMPL_CN.md b/doc/GRPC_IMPL_CN.md
@@ -18,7 +18,7 @@
 
 使用gRPC接口，Client端可以在Win/Linux/MacOS平台上调用不同语言。gRPC 接口实现结构如下：
 
-![](https://github.com/PaddlePaddle/Serving/blob/develop/doc/grpc_impl.png)
+![](images/grpc_impl.png)
 
 ## 1.与bRPC接口对比
 

diff --git a/doc/PIPELINE_SERVING.md b/doc/PIPELINE_SERVING.md
@@ -18,7 +18,7 @@ Paddle Serving provides a user-friendly programming framework for multi-model co
 The Server side is built based on <b>RPC Service</b> and <b>graph execution engine</b>. The relationship between them is shown in the following figure.
 
 <div align=center>
-<img src='pipeline_serving-image1.png' height = "250" align="middle"/>
+<img src='images/pipeline_serving-image1.png' height = "250" align="middle"/>
 </div>
 
 ### 1.1 RPC Service
@@ -61,7 +61,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 - For cases where large data needs to be transferred between OPs, consider RAM DB external memory for global storage and data transfer by passing index keys in Channel.
 
 <div align=center>
-<img src='pipeline_serving-image2.png' height = "300" align="middle"/>
+<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </div>
 
 
@@ -80,7 +80,7 @@ The graph execution engine consists of OPs and Channels, and the connected OPs s
 - The following illustration shows the design of Channel in the graph execution engine, using input buffer and output buffer to align data between multiple OP inputs and multiple OP outputs, with a queue in the middle to buffer.
 
 <div align=center>
-<img src='pipeline_serving-image3.png' height = "500" align="middle"/>
+<img src='images/pipeline_serving-image3.png' height = "500" align="middle"/>
 </div>
 
 
@@ -323,7 +323,7 @@ All examples of pipelines are in [examples/pipeline/](../python/examples/pipelin
 Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
 
 <div align=center>
-<img src='pipeline_serving-image4.png' height = "200" align="middle"/>
+<img src='images/pipeline_serving-image4.png' height = "200" align="middle"/>
 </div>
 
 ### 3.1 Files required for pipeline deployment
Original file line number	Diff line number	Diff line change
Expand Up		@@ -111,4 +111,4 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}]

		我们基于V100对基于Padde Serving研发的Bert-As-Service的性能进行测试并与基于Tensorflow实现的Bert-As-Service进行对比，从用户配置的角度，采用相同的batch size和并发数进行压力测试，得到4块V100下的整体吞吐性能数据如下。

		![4v100_bert_as_service_benchmark](4v100_bert_as_service_benchmark.png)
		![4v100_bert_as_service_benchmark](images/4v100_bert_as_service_benchmark.png)