Skip to content

Releases: argilla-io/argilla

Release 2.1.0

05 Sep 15:11
Compare
Choose a tag to compare

🌟 Release highlights

Image Field

Screenshot showing Argilla's new Image Field and Dark Mode
Argilla now supports multimodal datasets with the introduction of a native ImageField. This new type of field allows you to work seamlessly with image data, making it easier to annotate and curate datasets that combine text and images.

Here's an example of a dataset with an image field:

import argilla as rg

client = rg.Argilla(...)

settings = rg.Settings(
	fields = [
		rg.ImageField(name="image"),
		rg.TextField(name="caption")
	],
	questions = [
		rg.LabelQuestion(
			name="good_or_bad", 
			title="Is the caption good or bad",
			labels=["good", "bad"]
		),
		rg.TextQuestion(name="comments")
	]
)

dataset = rg.Dataset(name="image_captions", settings=settings)
dataset.create()

record = rg.Record(
	fields= {
	  "image": "https://docs.argilla.io/dev/assets/logo.svg", 
	  "caption": "This is the Argilla logo"
	}
)
dataset.records.log([record])

Read more

Dark Mode

Argilla seems too bright for you? You can now try our new Dark Mode: a theme designed to reduce eye strain and give a new modern look to the app. You can enable Dark Mode under "My Settings".

Spanish Translation

Captura de pantalla 2024-09-05 a las 17 28 29

We're committed to making Argilla accessible to a broader audience. With the addition of Spanish translation, we're taking another step towards breaking language barriers and enabling more teams to collaborate on data curation projects.
There's nothing you need to do to enable it: Argilla will automatically switch to Spanish when your browser's main language is set to Spanish. ¡Disfrutadla!

Import any dataset from the Hugging Face Hub

The from_hub method just got a major boost! You can now input your own settings, allowing you to use this method with almost any dataset from the Hugging Face Hub, not just Argilla datasets.

Here's how easy it is to import a dataset from the Hub:

import argilla as rg

client = rg.Argilla(...)

settings = rg.Settings(
    fields=[
        rg.TextField(name="input"),
    ],
    questions=[
        rg.TextQuestion(name="output"),
    ],
)

dataset = rg.Dataset.from_hub(
    repo_id="yahma/alpaca-cleaned",
    settings=settings,
)

Read more

Other Notable Fixes and Improvements

  • Adaptable text areas for TextQuestion's, providing a better user experience in the UI.
  • Enhanced messaging for empty queues, keeping you informed when no records are available in the UI.

Full Changelog: v2.0.1...v2.1.0

v2.0.1

13 Aug 14:35
Compare
Choose a tag to compare

What's Changed

🧹 Patch release of bug fixes and minor documentation and messaging improvements. Enjoy your summer while we change the world in v2.1.0.

Fixed

  • Fixed error when creating optional fields. (#5362)
  • Fixed error creating integer and float metadata with visible_for_annotators. (#5364)
  • Fixed error when logging records with suggestions or responses for non-existent questions. (#5396 by @maxserras)
  • Fixed error from conflicts in testing suite when running tests in parallel. (#5349)
  • Fixed error in response model when creating a response with a None value. (#5343)

Changed

  • Changed from_hub method to raise an error when a dataset with the same name exists. (#5258)
  • Changed log method when ingesting records with no known keys to raise a descriptive error. (#5356)
  • Changed code snippets to add new datasets (#5395)

Added

  • Added Google Analytics to the documentation site. (#5366)
  • Added frontend skeletons to progress metrics to optimise load time and improve user experience. (#5391)
  • Added documentation in methods in API references for the Python SDK. (#5400)

Full Changelog: v2.0.0...v2.0.1

v2.0.0

31 Jul 06:49
c23126f
Compare
Choose a tag to compare

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

Important

If you want to continue using your legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK & documentation

We've redesigned our SDK with the idea to adapt it to the new single Dataset and Record classes and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making it simpler and faster to configure your dataset and get it up and running.

Here's an example of what creating a Dataset looks like:

import argilla as rg
from datasets import load_dataset

# log to the Argilla client
client = rg.Argilla(
    api_url="<api_url>",
    api_key="<api_key>"
    # headers={"Authorization": f"Bearer {HF_TOKEN}"}
)

# configure dataset settings
settings = rg.Settings(
    guidelines="Classify the reviews as positive or negative.",
    fields=[
        rg.TextField(
            name="review",
            title="Text from the review",
            use_markdown=False,
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="my_label",
            title="In which category does this article fit?",
            labels=["positive", "negative"],
        )
    ],
)

# create the dataset in your Argilla instance
dataset = rg.Dataset(
    name=f"my_first_dataset",
    settings=settings,
    client=client,
)
dataset.create()

# get some data from the hugging face hub and load the records
data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})

To learn more about this SDK and how it works, check out our revamped documentation: https://argilla-io.github.io/argilla/latest

We made this new documentation site from scratch, applying the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find.

New UI layout

We have also redesigned part of our UI for Argilla 2.0:

  • We've redistributed the information in the Home page.
  • Datasets don't have Tasks, but Questions.
  • A clearer way to see your team's progress over each dataset.
  • Annotation guidelines and your progress are now accessible at all times within the dataset page.
  • Dataset pages also have a new flexible layout, so you can change the size of different panels and expand or collapse the guidelines and progress.
  • SpanQuestion's are now supported in the bulk view.
Argilla2.mp4

Automatic task distribution

Argilla 2.0 also comes with an automated way to split the task of annotating a dataset among a team. Here's how it works in a nutshell:

  • An owner or an admin can set the minimum number of submitted responses expected for each record.
  • When a record reaches that threshold, its status changes to complete and it's automatically removed from the pending queue of all team members.
  • A dataset is 100% complete when all records have the status complete.

By default, the minimum submitted answers is 1, but you can create a dataset with a different value:

settings = rg.Settings(
    guidelines="These are some guidelines.",
    fields=[
        rg.TextField(
            name="text",
        ),
    ],
    questions=[
        rg.LabelQuestion(
            name="label",
            labels=["label_1", "label_2", "label_3"]
        ),
    ],
    distribution=rg.TaskDistribution(min_submitted=3)
)

You can also change the value of an existing dataset as long as it has no responses. You can do this from the General tab inside the Dataset Settings page in the UI or from the SDK:

import argilla as rg

client = rg.Argilla(...)

dataset = client.datasets("my_dataset")

dataset.settings.distribution.min_submitted = 4

dataset.update()

To learn more, check our guide on how to distribute the annotation task.

Easily deploy in Hugging face Spaces

We've streamlined the deployment of an Argilla Space in the Hugging Face Hub. Now, there's no need to manage users and passwords. Follow these simple steps to create your Argilla Space:

  • Select the Argilla template.
  • Choose your hardware and persistent storage options (if you prefer others than the recommended ones).
  • If you are creating a space inside an organization, enter your Hugging Face Hub username under username to get the owner role.
  • Leave password empty if you'd like to use Hugging Face OAuth to sign in to Argilla.
  • Select if the space will be public or private.
  • Create Space ! 🎉
    Now you and your team mates can simply sign in to Argilla using Hugging Face OAuth!
    Learn more about deploying Argilla in Hugging Face Spaces.
spaces_deploy.mp4

New Contributors

Full Changelog: v1.29.1...v2.0.0

v1.29.1

22 Jul 08:27
Compare
Choose a tag to compare
v1.29.1 Pre-release
Pre-release

What's Changed

Full Changelog: v1.29.0...v1.29.1

v2.0.0rc2

05 Jul 08:34
1e6cb47
Compare
Choose a tag to compare
v2.0.0rc2 Pre-release
Pre-release

What's Changed

Full Changelog: v2.0.0rc1...v2.0.0rc2

v2.0.0rc1

21 Jun 10:02
Compare
Choose a tag to compare
v2.0.0rc1 Pre-release
Pre-release

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

Important

If you want to continue using legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK

We've redesigned our SDK with the idea to adapt it to the new single Dataset class and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making the process to configure your dataset and get it up and running much simpler and faster.

To learn more about this new SDK, you can check:

New UI layout

We have also revamped our UI for Argilla 2.0:

  • We've redistributed the information in the Home page
  • Datasets don't have Tasks, but Questions.
  • Annotation guidelines and your progress are now accessible at all times within the dataset page.
  • Dataset pages also have a new flexible layout, so you can change the size of different panes and expand or collapse the guidelines and progress.
  • SpanQuestion's are now supported in the bulk view.
2_0_layout.mp4

New documentation

This new version of Argilla comes hand-in-hand with a revamped documentation: https://argilla-io.github.io/argilla/latest

We have applied the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find. Let us know what you think!

Share your thoughts with us!

Note

This is a release candidate ahead of the official Argilla 2.0 release. Try it out and let us know what you think.
Find us in Discord or open a Github issue here.

What's Changed

Read more

v1.29.0

30 May 15:46
Compare
Choose a tag to compare

🔆 Release highlights

Warning

This will be the last release of Argilla v1. Starting from Argilla 2.0.0, we will only support FeedbackDatasets which will be renamed to Dataset. All other dataset types (DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text) will be deprecated. In the next release, we will provide more information and documentation on how to migrate all your datasets into Argilla 2.0 Datasets.

Improved record search

Your search matches are now highlighted so you can see easily the result of your search. We’ve also added a selector for datasets with more than one record fields so you can choose whether to do the search on All fields or a specific one.

search.mp4

Record information and metadata in the UI

You can now check all the information and metadata associated for each record directly in the UI.

metadata.mp4

What's Changed in v1.29.0

New Contributors

Full Changelog: v1.28.0...v1.29.0

v1.28.0

09 May 15:13
Compare
Choose a tag to compare

🔆 Release highlights

Improved suggestions

suggestions_first.mp4

Multiple scores support for MultiLabelQuestion and RankingQuestion

MultiLabelQuestion and RankingQuestion now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.

Warning

If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.

See scores next to its label / value

Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.

Suggestions first - 🌟 Community request: #4647

Now you can order labels in MultiLabelQuestion so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.

SpanQuestion improvements

new_spans_selection.mp4

Pre-selection highlight

We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.

Note

Remember that character-level spans are activated by holding Shift while doing the selection.

New label selector

We’ve improved the way the label selector works in the SpanQuestion when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.

Persistent storage warning

We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.

To learn more about this warning and how to disable it, go to our docs.

Changelog 1.28.0

Added

  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)

Changed

  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)

Fixed

  • Fixed Korean character reversed. (#4753)

Fixed

  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)

Full Changelog: v1.27.0...v1.28.0

v1.27.0

18 Apr 14:21
Compare
Choose a tag to compare

🔆 Release highlights

Overlapping spans

We are finally releasing a much expected feature: overlapping spans. This allows you to draw more than one span over the same token(s)/character(s).

overlapping_spans.mp4

To try them out, set up a SpanQuestion with the argument allow_overlap=True like this:

dataset = rg.FeedbackDataset(
    fields = [rg.TextField(name="text")]
    questions = [
        rg.SpanQuestion(
            name="spans",
            labels=["label1", "label2", "label3"],
            field="text"
        )
    ]
)

Learn more about configuring this and other question types here.

Global progress bars

We’ve included a new column in our home page that offers the global progress of your datasets, so that you can see at a glance what datasets are closer to completion.

Captura de pantalla 2024-04-17 a las 14 27 32

These bars show progress by grouping records based on the status of their responses:

  • Submitted: Records where all responses have the submitted status.
  • Discarded: Records where all responses have the discarded status.
  • Conflicting: Records with at least one submitted and one discarded response.
  • Left: All other records that have no submitted or discarded responses. These may be in pending or draft .

Suggestions got a new look

We’ve improved the way suggestions are shown in the UI to make their purpose clearer: now you can identify each suggestion with a sparkle icon ✨ .

The behavior is still the same:

  • suggested values will appear pre-filled responses and marked with the sparkle icon.
  • make changes the the incorrect suggestions, then save as a draft or submit.
  • the icon will stay to mark the suggestions so you can compare the final response with the suggested one.

Increased label limits

We’ve increased the limit of labels you can use in Label, Multilabel and Span questions to 500. If you need to go beyond that number, you can set up a custom limit using the following environment variables:

  • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS to set the limits in label and multi label questions.
  • ARGILLA_SPAN_OPTIONS_MAX_ITEMS to set the limit in span questions.

Warning

The UI has been optimized to support up to 1000 labels. If you go beyond this limit, the UI may not be as responsive.

Learn more about this and other environment variables here.

Argilla auf Deutsch!

Thanks to our contributor @paulbauriegel you can now use Argilla fully in German! If that is the main language of your browser, there is nothing you need to do, the UI will automatically detect that and switch to German.

Would you like to translate Argilla to your own language? Reach out to us and we'll help you!

Changelog 1.27.0

Added

  • Added Allow overlap spans in the FeedbackDataset (#4668)
  • Added allow_overlapping parameter for span questions. (#4697)
  • Added overall progress bar on Datasets table (#4696)
  • Added German language translation (#4688)

Changed

  • New UI design for suggestions (#4682)

Fixed

  • Improve performance for more than 250 labels (#4702)

New Contributors

Full Changelog: v1.26.1...v1.27.0

v1.26.1

27 Mar 13:16
Compare
Choose a tag to compare

1.26.1

Added

  • Added support for automatic detection of RTL languages. (#4686)

Full Changelog: v1.26.0...v1.26.1