Add Write Models Docs (Oneflow-Inc#203)

rentainhe · web-flow · commit 1787a6d920b0 · 2022-03-28T12:11:24.000+08:00
* refine layer docs

* refine model docs

* add more layer docs

* refine layer docs

* add write models

* refine

* refine docs

* fix tokenizer docs

* update tokenization docs

* refine and update docs

* update docs

* fix comments

* make format

* fix conflict

* update README and changelog

* update link and merge main
diff --git a/README.md b/README.md
@@ -57,11 +57,11 @@ LiBai is a large-scale open-source model training toolbox based on OneFlow. The
 
 ## Installation
 
-See [Installation instructions](https://libai.readthedocs.io/en/latest/tutorials/Installation.html).
+See [Installation instructions](https://libai.readthedocs.io/en/latest/tutorials/get_started/Installation.html).
 
 ## Getting Started
 
-See [Getting Started](https://libai.readthedocs.io/en/latest/tutorials/Getting_Started.html) for the basic usage of LiBai.
+See [Quick Run](https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html) for the basic usage of LiBai.
 
 ## Documentation
 
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -55,10 +55,10 @@ LiBai是一个基于OneFlow的大规模模型训练开源工具箱，主分支
 </details>
 
 ## 安装
-请参考[LiBai安装文档](https://libai.readthedocs.io/en/latest/tutorials/Installation.html)进行安装。
+请参考[LiBai安装文档](https://libai.readthedocs.io/en/latest/tutorials/get_started/Installation.html)进行安装。
 
 ## 快速入门
-请参考[快速入门文档](https://libai.readthedocs.io/en/latest/tutorials/Getting_Started.html)了解和学习LiBai的基本使用，后续我们将提供丰富的教程与完整的使用指南。
+请参考[快速入门文档](https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html)了解和学习LiBai的基本使用，后续我们将提供丰富的教程与完整的使用指南。
 
 ## 使用文档
 请参考[LiBai使用文档](https://libai.readthedocs.io/en/latest/index.html)了解LiBai中相关接口的使用。
diff --git a/changelog.md b/changelog.md
@@ -22,5 +22,5 @@
 - Support 3D parallel [T5](https://arxiv.org/abs/1910.10683) model
 - Support 3D parallel [Vision Transformer](https://arxiv.org/abs/2010.11929)
 - Support Data parallel [Swin Transformer](https://arxiv.org/abs/2103.14030) model
-- Support finetune task in [projects](/projects/)
-- Support text classification task in [projects](/projects/)
+- Support finetune task in [QQP project](/projects/QQP/)
+- Support text classification task in [text classification project](/projects/text_classification/)
diff --git a/docs/source/modules/libai.layers.rst b/docs/source/modules/libai.layers.rst
@@ -8,6 +8,7 @@ libai.layers
         VocabEmbedding,
         SinePositionalEmbedding,
         PatchEmbedding,
+        drop_path,
         DropPath,
         build_activation,
         Linear,
diff --git a/docs/source/modules/libai.tokenizer.rst b/docs/source/modules/libai.tokenizer.rst
@@ -3,7 +3,9 @@ libai.tokenizer
 
 .. currentmodule:: libai.tokenizer
 .. automodule:: libai.tokenizer
+    :member-order: bysource
     :members:
         BertTokenizer,
         GPT2Tokenizer,
-        GoogleT5Tokenizer
+        GoogleT5Tokenizer,
+        PreTrainedTokenizer,
diff --git a/docs/source/tutorials/basics/Distributed_Configuration.md b/docs/source/tutorials/basics/Distributed_Configuration.md
@@ -45,7 +45,7 @@ from .common.train import train
 train.dist.pipeline_parallel_size = 8
 ```
 
-**Note:** For models which have been configured with pipeline parallelism(e.g., BERT, GPT-2, T5 and ViT), you can simply update the distributed config to execute pipeline parallel training on them. If you need to train your own model with pipeline parallel strategy, please refer to [Write Models]() for more details about configuring your own model with pipeline parallelism.
+**Note:** For models which have been configured with pipeline parallelism(e.g., BERT, GPT-2, T5 and ViT), you can simply update the distributed config to execute pipeline parallel training on them. If you need to train your own model with pipeline parallel strategy, please refer to [Write Models](https://libai.readthedocs.io/en/latest/tutorials/basics/Write_Models.html) for more details about configuring your own model with pipeline parallelism.
 
 #### **Data Parallel + Tensor Parallel for 2D Parallel Training on 8 GPUs**
 
diff --git a/docs/source/tutorials/basics/Write_Models.md b/docs/source/tutorials/basics/Write_Models.md
@@ -0,0 +1,59 @@
+# Write Models
+
+In this section, we will introduce how to implement a new model entirely from scratch and make it compatible with LiBai.
+
+
+## Construct Models in LiBai
+
+LiBai uses [LazyConfig](https://libai.readthedocs.io/en/latest/tutorials/Config_System.html) for more flexible config system, which means you can simply import your own model in your config and train it under LiBai.
+
+For image classification task, the input data is usually a batch of images and labels. The following code shows how to build a toy model for this task, import it in your code:
+```python
+# toy_model.py
+import oneflow as flow
+import oneflow.nn as nn
+
+
+class ToyModel(nn.Module):
+    def __init__(self, 
+        num_classes=1000, 
+    ):
+        super().__init__()
+        self.features = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
+        self.avgpool = nn.AdaptiveAvgPool2d(1)
+        self.classifier = nn.Linear(64, num_classes)
+        self.loss_func = nn.CrossEntropyLoss()
+    
+    def forward(self, images, labels=None):
+        x = self.features(images)
+        x = self.avgpool(x)
+        x = flow.flatten(x, 1)
+        x = self.classifier(x)
+
+        if labels is not None and self.training:
+            losses = self.loss_func(x, labels)
+            return {"losses": losses}
+        else:
+            return {"prediction_scores": x}
+```
+
+**Note:**
+- For classification models, the ``forward`` function must have ``images`` and ``labels`` as arguments, which corresponds to the output in ``__getitem__`` of LiBai's built-in datasets, please refer to [imagenet.py](https://github.com/Oneflow-Inc/libai/blob/main/libai/data/datasets/imagenet.py) for more details about the dataset.
+- **This toy model** will return ``losses`` during training and ``prediction_scores`` during inference, and both of them should be the type of ``dict``, which means you should implement the ``loss function`` in your model, like ``self.loss_func=nn.CrossEntropyLoss()`` as the ToyModel showing above.
+
+
+## Import the model in config
+
+With ``LazyConfig System``, you can simply import the model in your config file. The following code shows how to use ``ToyModel`` in your config file:
+```python
+# config.py
+from libai.config import LazyCall
+from toy_model import ToyModel
+
+model = LazyCall(ToyModel)(
+    num_classes=1000
+)
+```
+
+
+
diff --git a/docs/source/tutorials/basics/index.rst b/docs/source/tutorials/basics/index.rst
@@ -10,4 +10,5 @@ Basics
    Training.md
    Train_and_Eval_Command_Line.md
    Build_New_Project_on_LiBai.md
+   Write_Models.md
    Distributed_Configuration.md
diff --git a/libai/layers/activation.py b/libai/layers/activation.py
@@ -47,6 +47,10 @@ def forward(self, x: flow.Tensor) -> flow.Tensor:
 
 
 def build_activation(activation: Optional[Activation]):
+    """
+    Fetching activation layers by name, e.g.,
+    ``build_activation("gelu")`` returns ``nn.GELU()`` module.
+    """
     if not activation:
         return Passthrough()
 
diff --git a/libai/layers/droppath.py b/libai/layers/droppath.py
@@ -18,6 +18,8 @@
 
 
 def drop_path(x, drop_prob: float = 0.5, training: bool = False):
+    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks)."""
+
     if drop_prob == 0.0 or not training:
         return x
     keep_prob = 1 - drop_prob
diff --git a/libai/layers/transformer_layer.py b/libai/layers/transformer_layer.py
@@ -53,21 +53,6 @@ class TransformerLayer(nn.Module):
             https://arxiv.org/pdf/1909.08053.pdf.
             Default: ``False``.
         layer_idx: the layer index, which determines the placement.
-
-    Inputs:
-        * **hidden_states**: [bsz, seq_length, hidden_size], (S(0), B).
-        * **attention_mask**: [bsz, 1, seq_length, seq_length], (S(0), B),
-          the combination of key padding mask and casual mask of hidden states.
-        * **encoder_states**: [bsz, seq_length, hidden_size], (S(0), B), encoder output,
-          this will be used in cross attention.
-        * **encoder_attention_mask**: [bsz, 1, seq_length, seq_length],
-          (S(0), B) key padding mask of encoder states.
-        * **past_key_value**: tuple of key and value, each shape is
-          [src_len, bsz, num_heads, head_size], For decoder layer,
-          the past_key_value contains the states both from
-          self attention and cross attention.
-        * **use_cache**: it will be set to `True`, when the model is in the inference phase and
-          used for incremental decoding.
     """
 
     def __init__(
@@ -149,6 +134,24 @@ def forward(
         past_key_value=None,
         use_cache=False,
     ):
+        """
+        Args:
+            hidden_states: shape is (batch_size, seq_length, hidden_size),
+                sbp signature is (S(0), B).
+            attention_mask: the combination of key padding mask and casual mask of hidden states
+                with shape (batch_size, 1, seq_length, seq_length) and the sbp
+                signature is (S(0), B),
+            encoder_states: encoder output with shape (batch_size, seq_length, hidden_size)
+                and the sbp signature is (S(0), B), which will be used in cross attention.
+            encoder_attention_mask: key padding mask of encoder states with shape
+                (batch_size, 1, seq_length, seq_length) and the sbp signature is (S(0), B).
+            past_key_value: tuple of key and value, each shape is
+                (seq_length, bsz, num_heads, head_size), For decoder layer,
+                the past_key_value contains the states both from self attention
+                and cross attention.
+            use_cache: it will be set to `True` when the model is in the inference phase and
+                used for incremental decoding.
+        """
         # Change placement for pipeline parallelsim
         hidden_states = hidden_states.to_global(placement=dist.get_layer_placement(self.layer_idx))
 
diff --git a/libai/models/build.py b/libai/models/build.py
@@ -26,7 +26,7 @@
 
 
 def build_model(cfg):
-    """Build the whole model architecture, defined by ``cfg.model.model_name``.
+    """Build the whole model architecture, defined by ``cfg.model``.
     Note that is does not load any weights from ``cfg``.
     """
     if "_target_" in cfg:  # LazyCall
diff --git a/libai/tokenizer/__init__.py b/libai/tokenizer/__init__.py
@@ -14,7 +14,7 @@
 # limitations under the License.
 
 from .build import TOKENIZER_REGISTRY, build_tokenizer
-from .tokenization_base import PreTrainedTokenizer
 from .tokenization_bert import BertTokenizer
 from .tokenization_gpt2 import GPT2Tokenizer
 from .tokenization_t5 import GoogleT5Tokenizer
+from .tokenization_base import PreTrainedTokenizer
diff --git a/libai/tokenizer/tokenization_base.py b/libai/tokenizer/tokenization_base.py