docs: add documentation for with andwithout records on from_hub (#5515)

# Description  Closes #<issue_number> **Type of change**  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: Natalia Elvira <[email protected]> Co-authored-by: nataliaElv <[email protected]>
argilla-io · Sep 19, 2024 · e1b2e6e · e1b2e6e
1 parent da95b38
commit e1b2e6e
Show file tree

Hide file tree

Showing 3 changed files with 107 additions and 10 deletions.
diff --git a/argilla/docs/how_to_guides/dataset.md b/argilla/docs/how_to_guides/dataset.md
@@ -140,6 +140,9 @@ new_dataset.create()
 
 ## Define dataset settings
 
+!!! tip
+    Instead of defining your own custom settings, you can use some of our pre-built templates for text classification, ranking and rating. Learn more [here](../reference/argilla/settings/settings.md#creating-settings-using-built-in-templates).
+
 ### Fields
 
 The fields in a dataset consist of one or more data items requiring annotation. Currently, Argilla supports plain text and markdown through the `TextField` and images through the `ImageField`, though we plan to introduce additional field types in future updates.

diff --git a/argilla/docs/how_to_guides/import_export.md b/argilla/docs/how_to_guides/import_export.md
@@ -120,28 +120,44 @@ dataset = rg.Dataset.from_hub(repo_id="<my_org>/<my_dataset>")
 
 The `rg.Dataset.from_hub` method loads the configuration and records from the dataset repo. If you only want to load records, you can pass a `datasets.Dataset` object to the `rg.Dataset.log` method. This enables you to configure your own dataset and reuse existing Hub datasets. See the [guide on records](record.md) for more information.
 
+
 !!! note "With or without records"
 
-    The example above will pull the dataset's `Settings` and records from the hub. If you only want to pull the dataset's configuration, you can set the `with_records` parameter to `False`. This is useful if you're just interested in a specific dataset template or you want to make changes in the dataset settings and/or records.
+    The example above will pull the dataset's `Settings` and records from the hub. If you only want to pull the dataset's configuration, you can set the `with_records` parameter to `False`. This is useful if you're just interested in a specific dataset template or you want to make changes in the records.
 
     ```python
     dataset = rg.Dataset.from_hub(repo_id="<my_org>/<my_dataset>", with_records=False)
     ```
 
-    With the dataset's configuration, you could then make changes to the dataset. For example, you could adapt the dataset's settings for a different task:
-
-    ```python
-    dataset.settings.questions = [rg.TextQuestion(name="answer")]
-    dataset.update()
-    ```
-
     You could then log the dataset's records using the `load_dataset` method of the `datasets` package and pass the dataset to the `rg.Dataset.log` method.
 
     ```python
     hf_dataset = load_dataset("<my_org>/<my_dataset>")
-    dataset.records.log(hf_dataset)
+    dataset.records.log(hf_dataset) # (1)
     ```
 
+    1. You could also use the `mapping` parameter to map record field names to argilla field and question names.
+
+
+#### Import settings from Hub
+
+When importing datasets from the hub, Argilla will load settings from the hub in three ways:
+
+1. If the dataset was pushed to hub by Argilla, then the settings will be loaded from the hub via the configuration file.
+2. If the dataset was loaded by another source, then Argilla will define the settings based on the dataset's features in `datasets.Features`. For example, creating a `TextField` for a text feature or a `LabelQuestion` for a label class.
+3. You can pass a custom `rg.Settings` object to the `rg.Dataset.from_hub` method via the `settings` parameter. This will override the settings loaded from the hub.
+
+```python
+settings = rg.Settings(
+    fields=[rg.TextField(name="text")],
+    questions=[rg.TextQuestion(name="answer")]
+) # (1)
+
+dataset = rg.Dataset.from_hub(repo_id="<my_org>/<my_dataset>", settings=settings)
+```
+
+1. The settings that you pass to the `rg.Dataset.from_hub` method will override the settings loaded from the hub, and need to align with the dataset being loaded.
+
 ### Local Disk
 
 #### Export to Disk

diff --git a/argilla/docs/reference/argilla/settings/settings.md b/argilla/docs/reference/argilla/settings/settings.md
@@ -30,7 +30,85 @@ dataset.create()
 
 ```
 
-> To define the settings for fields, questions, metadata, vectors, or distribution, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md), [`rg.TaskDistribution`](task_distribution.md) class documentation.
+To define the settings for fields, questions, metadata, vectors, or distribution, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md), [`rg.TaskDistribution`](task_distribution.md) class documentation.
+
+### Creating settings using built in templates
+
+Argilla provides built-in templates for creating settings for common dataset types. To use a template, use the class methods of the `Settings` class. There are three built-in templates available for classification, ranking, and rating tasks. Template settings also include default guidelines and mappings.
+
+#### Classification Task
+
+You can define a classification task using the `rg.Settings.for_classification` class method. This will create a dataset with a text field and a label question. You can select field types using the `field_type` parameter with `image` or `text`.
+
+```python
+settings = rg.Settings.for_classification(labels=["positive", "negative"]) # (1)
+```
+
+This will return a `Settings` object with the following settings:
+
+```python
+settings = Settings(
+    guidelines="Select a label for the document.",
+    fields=[rg.TextField(field_type)(name="text")],
+    questions=[LabelQuestion(name="label", labels=labels)],
+    mapping={"input": "text", "output": "label", "document": "text"},
+)
+```
+
+#### Ranking Task
+
+You can define a ranking task using the `rg.Settings.for_ranking` class method. This will create a dataset with a text field and a ranking question.
+
+```python
+settings = rg.Settings.for_ranking()
+```
+
+This will return a `Settings` object with the following settings:
+
+```python
+settings = Settings(
+    guidelines="Rank the responses.",
+    fields=[
+        rg.TextField(name="instruction"),
+        rg.TextField(name="response1"),
+        rg.TextField(name="response2"),
+    ],
+    questions=[RankingQuestion(name="ranking", values=["response1", "response2"])],
+    mapping={
+        "input": "instruction",
+        "prompt": "instruction",
+        "chosen": "response1",
+        "rejected": "response2",
+    },
+)
+```
+
+#### Rating Task
+
+You can define a rating task using the `rg.Settings.for_rating` class method. This will create a dataset with a text field and a rating question.
+
+```python
+settings = rg.Settings.for_rating()
+```
+
+This will return a `Settings` object with the following settings:
+
+```python
+settings = Settings(
+    guidelines="Rate the response.",
+    fields=[
+        rg.TextField(name="instruction"),
+        rg.TextField(name="response"),
+    ],
+    questions=[RatingQuestion(name="rating", values=[1, 2, 3, 4, 5])],
+    mapping={
+        "input": "instruction",
+        "prompt": "instruction",
+        "output": "response",
+        "score": "rating",
+    },
+)
+```
 
 ---