Skip to content

v2.2.0

Compare
Choose a tag to compare
@jfcalvo jfcalvo released this 19 Sep 14:59
· 231 commits to develop since this release
e1b2e6e

🌟 Release highlights

Important

Argilla server 2.2.0 adds support for background jobs. These background jobs allow us to run jobs that might take a long time at request time. For this reason we now rely on Redis and Python RQ workers.

So to upgrade your Argilla instance to version 2.2.0 you need to have an available Redis server. See the Redis get-started documentation for more information or the Argilla server configuration documentation.

If you have deployed Argilla server using the docker-compose.yaml, you should download the docker-compose.yaml file again to bring the latest changes to set Redis and Argilla workers

Workers are needed to process Argilla's background jobs. You can run Argilla workers with the following command:

python -m argilla_server worker

ChatField: working with text conversations in Argilla

chat_field.mp4

You can now work with text conversations natively in Argilla using the new ChatField. It is especially designed to make it easier to build datasets for conversational Large Language Models (LLMs), displaying conversational data in the form of a chat.

Here's how you can create a dataset with a ChatField:

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

settings = rg.Settings(
	fields=[rg.ChatField(name="chat")],
	questions=[...]
)

dataset = rg.Dataset(
	name="chat_dataset",
	settings=settings,
	workspace="my_workspace",
	client=client
)

dataset.create()

record = rg.Record(
	fields={
		"chat": [
			{"role": "user", "content": "Hello World, how are you?"},
			{"role": "assistant", "content": "I'm doing great, thank you!"}
		]
	}
)

dataset.records.log([record])

Read more about how to use this new field type here and here.

Adjust task distribution settings

You can now modify task distribution settings at any time, and Argilla will automatically recalculate the completed and pending records. When you update this setting, records will be removed from or added to the pending queues of your team accordingly.

You can make this change in the dataset settings page or using the SDK:

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = client.datasets("my_dataset")
dataset.settings.distribution.min_submitted = 2
dataset.update()

Track team progress from the SDK

The Argilla SDK now provides a way to retrieve data on annotation progress. This feature allows you to monitor the number of completed and pending records in a dataset and also the number of responses made by each user:

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = client.datasets("my_dataset")

progress = dataset.progress(with_users_distribution=True)

The expected output looks like this:

{
    "total": 100,
    "completed": 50,
    "pending": 50,
    "users": {
        "user1": {
           "completed": { "submitted": 10, "draft": 5, "discarded": 5},
           "pending": { "submitted": 5, "draft": 10, "discarded": 10},
        },
        "user2": {
           "completed": { "submitted": 20, "draft": 10, "discarded": 5},
           "pending": { "submitted": 2, "draft": 25, "discarded": 0},
        },
        ...
}

Read more about this feature here.

Automatic settings inference

When you import a dataset using the from_hub method, Argilla will automatically infer the settings, such as the fields and questions, based on the dataset Features. This will save you time and effort when working with datasets from the Hub.

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = rg.Dataset.from_hub("yahma/alpaca-cleaned")

Task templates

We've added pre-built templates for common dataset types, including text classification, ranking, and rating tasks. These templates provide a starting point for your dataset creation, with pre-configured settings. You can use these templates to get started quickly, without having to configure everything from scratch.

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

settings = rg.Settings.for_classification(labels=["positive", "negative"])

dataset = rg.Dataset(
	name="my_dataset",
	settings=settings,
	client=client,
	workspace="my_workspace",
)

dataset.create()

Read more about templates here.

Full Changelog: v2.1.0...v2.2.0