Skip to content

Commit

Permalink
Merge pull request #709 from snipsco/release/0.18.0
Browse files Browse the repository at this point in the history
Release 0.18.0
  • Loading branch information
adrienball authored Nov 26, 2018
2 parents 4e57cc2 + 4f338c6 commit e7beb69
Show file tree
Hide file tree
Showing 50 changed files with 2,735 additions and 1,652 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# Changelog
All notable changes to this project will be documented in this file.

## [0.18.0] - 2018-11-26
### Added
- New YAML format to create dataset
- Verbose mode in CLI

### Changed
- Bump `snips-nlu-ontology` to `0.62.0` to improve memory usage


## [0.17.4] - 2018-11-20
### Added
- Add a `--config` argument in the metrics CLI
Expand Down Expand Up @@ -175,6 +184,7 @@ several commands.
- Fix compiling issue with `bindgen` dependency when installing from source
- Fix issue in `CRFSlotFiller` when handling builtin entities

[0.18.0]: https://github.com/snipsco/snips-nlu/compare/0.17.4...0.18.0
[0.17.4]: https://github.com/snipsco/snips-nlu/compare/0.17.3...0.17.4
[0.17.3]: https://github.com/snipsco/snips-nlu/compare/0.17.2...0.17.3
[0.17.2]: https://github.com/snipsco/snips-nlu/compare/0.17.1...0.17.2
Expand Down
14 changes: 14 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,20 @@ Configurations
:members:


Dataset
-------

.. module:: snips_nlu.dataset

.. autoclass:: Dataset
:members:

.. autoclass:: Intent
:members:

.. autoclass:: Entity
:members:

Result and output format
------------------------

Expand Down
70 changes: 15 additions & 55 deletions docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,70 +14,30 @@ is typically used by running ``snips-nlu <command> [args]`` or alternatively
Creating a dataset
------------------

As seen in the :ref:`tutorial` section, a command allows you to generate a
dataset from a :ref:`language <languages>` and a list of text files describing
:ref:`intents <intent>` and :ref:`entities <slot>`:
As seen in the :ref:`tutorial <tutorial>` section, a command allows you to generate a
dataset from a :ref:`language <languages>` and a list of YAML files containing
data for :ref:`intents <intent>` and :ref:`entities <slot>`:

.. code-block:: bash
snips-nlu generate-dataset en intent_1.txt intent_2.txt entity_1.txt
snips-nlu generate-dataset en my_first_intent.yaml my_second_intent.yaml my_entity.yaml
This will print a Json string to the standard output. If you want to store the
dataset directly in a Json file, you just have to pipe the previous command like
below:

.. code-block:: bash
snips-nlu generate-dataset en intent_1.txt intent_2.txt entity_1.txt > dataset.json
Each intent file corresponds to a single intent, and the name of the file must
start with ``intent_``. The same is true for entity files, which must start
with ``entity_``.

An intent file is a text file in which each row corresponds to an utterance.
Slots, along with their corresponding slot type (entity), can be defined using
the following syntax:

.. code-block:: console
.. note::

Find me a flight from [departure:city](Paris) to [destination:city](London)
Find me a flight from [departure:city](Moscow) [departureDate:snips/datetime](tomorrow around 9pm)
You don't have to use separated files for each intent and entity. You could
for instance merge all intents together in a single ``intents.yaml`` file,
or even merge all intents and entities in a single ``dataset.yaml`` file.

In this example, there are three different slots -- ``departure``,
``destination`` and ``departureDate`` -- and two different entities -- ``city``
and ``snips/datetime`` (which is a :ref:`builtin entity <builtin_entity_resolution>`).
Check :ref:`this section <entity_vs_slot_name>` to have more details about the
difference between slots and entities.

An entity file is a comma separated text file in which each row corresponds to
an entity value, optionally followed with its :ref:`synonyms <synonyms>`. The syntax used
is the following:

.. code-block:: console
bedroom
garden,yard,backyard
Here, the entity (room) has two values which are ``"bedroom"`` and ``"garden"``.
Two synonyms, ``"yard"`` and ``"backyard"``, are defined for ``"garden"``.
If a value or a synonym contains a comma, the value must be put between
double quotes ``"``.

If the value contains double quotes, it must be doubled
to be escaped like this: ``"A value with a "","" in it"`` which corresponds
to the actual value ``A value with a "," in it``.

.. Note::
This will print a JSON string to the standard output. If you want to store the
dataset directly in a JSON file, you just have to pipe the previous command like
below:

By default entities are generated as :ref:`automatically extensible <auto_extensible>`,
i.e. the recognition will accept additional values than the ones listed in
the entity file. This behavior can be changed by adding at the beginning of
the entity file the following:
.. code-block:: bash
.. code-block:: bash
snips-nlu generate-dataset en my_first_intent.yaml my_second_intent.yaml my_entity.yaml > dataset.json
# automatically_extensible=false
Check the :ref:`Training Dataset Format <dataset>` section for more details
about the format used to describe the training data.

.. _training_cli:

Expand Down
Loading

0 comments on commit e7beb69

Please sign in to comment.