Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.17.0 #670

Merged
merged 72 commits into from
Oct 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
ed8b9b0
Merge pull request #645 from snipsco/master
adrienball Aug 8, 2018
b54dc33
Add Pull Request template
adrienball Aug 10, 2018
a3aba72
Update template after review
adrienball Aug 16, 2018
875f92d
Merge pull request #647 from snipsco/task/pull-request-template
adrienball Aug 16, 2018
e0709c3
Install enum34 only for Python < 3.4
adrienball Aug 16, 2018
20a960b
Merge pull request #650 from snipsco/fix/enum34
adrienball Aug 16, 2018
a0861d3
Use unicode string helpers everywhere
adrienball Aug 17, 2018
4d76d61
Use fitted_required decorator
adrienball Aug 17, 2018
f15d7a7
Do not use CRFs for intents with no slots
adrienball Aug 17, 2018
eec9b4e
Add tests
adrienball Aug 17, 2018
99d6c24
Merge pull request #651 from snipsco/enhancement/optimize-slot-filling
adrienball Aug 22, 2018
afcc4f9
Merge pull request #654 from snipsco/master
adrienball Aug 22, 2018
cc3fc96
Remove unit_name from processing unit serialization since it's persis…
ClemDoum Aug 28, 2018
e8ab2c6
Merge pull request #655 from snipsco/fix/unit-name-in-serialization
adrienball Aug 30, 2018
8fbce77
Merge pull request #658 from snipsco/master
ClemDoum Aug 30, 2018
0610b3d
Merge pull request #661 from snipsco/master
ClemDoum Sep 7, 2018
169ec58
Remove default handler
ClemDoum Sep 7, 2018
c0512f7
Merge pull request #662 from snipsco/task/remove-default-logging-handler
adrienball Sep 17, 2018
fee9c86
Update system requirements
adrienball Sep 17, 2018
76b247a
Fix issue with new version of Sphinx
adrienball Oct 2, 2018
9fa74e5
Merge pull request #665 from snipsco/task/update-requirements
adrienball Oct 2, 2018
f5d8049
Improve error handling when loading resources
adrienball Jul 26, 2018
b570440
Clean download CLI
adrienball Jul 27, 2018
2272c08
Add builtin entities download CLI
adrienball Jul 27, 2018
a06e9a9
Refactor builtin entity parsing API
adrienball Aug 2, 2018
6ddceab
Persist and load gazetteer entities from directory
adrienball Aug 2, 2018
52fe427
Update tox.ini
adrienball Aug 2, 2018
51b40f1
Add test
adrienball Aug 2, 2018
b2c50cf
Fix issue with overlapping builtin entities in DeterministicIntentParser
adrienball Aug 2, 2018
4c31d35
Fix download-entity cli
adrienball Aug 6, 2018
4801340
Remove threshold parameter for gazetteer entities
adrienball Aug 21, 2018
85312b4
Iterate on directory + bump ontology
ClemDoum Sep 12, 2018
eb1fd9c
Fix issue with items deduplication function
adrienball Sep 18, 2018
2389d1c
Implement custom entities parser
ClemDoum Aug 13, 2018
c651777
Include tests in linting tests
ClemDoum Aug 13, 2018
26ae079
Fix Linting
ClemDoum Aug 13, 2018
eaf296b
Return None as the default required resources for processing units
adrienball Aug 24, 2018
319b63c
Fix issues with CustomEntityParser usage and stemming
adrienball Aug 24, 2018
9ba0992
Refactor processing units
ClemDoum Aug 27, 2018
c4edd24
Implement custom entity parser serialiazation in NLU engine
ClemDoum Aug 27, 2018
0913158
Fix linting
ClemDoum Aug 28, 2018
2223413
Rename "parser" module to "entity_parser"
ClemDoum Aug 28, 2018
4f3a158
Finish the draft of implementation of the CustomEntityParser and tests
ClemDoum Aug 28, 2018
91d0f4f
Fix custom entity parser serialization
ClemDoum Aug 28, 2018
53fd121
Change EntityMatchFactory into CustomEntityMatchFactory
ClemDoum Aug 29, 2018
86056dc
Do resolution with the custom intent parser and remove entities utter…
ClemDoum Aug 29, 2018
4f53f76
Fix issues with builtin and custom entity parsers
ClemDoum Sep 6, 2018
7b2a07e
Fix linting
adrienball Sep 19, 2018
1566ee5
Remove MLUnit and MLUnitConfig
adrienball Sep 19, 2018
d8b384a
Fix small issues
adrienball Sep 19, 2018
d578d05
Fix issues with custom entity parser and tokenization
adrienball Sep 21, 2018
5277e88
Fix bug with entity feature name affecting intent classification
adrienball Sep 28, 2018
756b25a
Fix bug with CustomEntityMatchFactory
adrienball Sep 28, 2018
b74e386
Fix default generic features
adrienball Sep 28, 2018
0c121bd
Fix issue when building a custom entity parser with stemmed values
adrienball Sep 28, 2018
ae648ed
Fix entity scope of CustomEntityMatch CRF feature
adrienball Oct 1, 2018
92225ed
Improve CustomEntityParser serialization
adrienball Oct 1, 2018
c9e459a
Update snips-nlu-ontology and snips-nlu-utils dependencies
adrienball Oct 2, 2018
09f8c6b
Fix abstractclassmethod
ClemDoum Oct 3, 2018
ba7a565
Fix FileNotFoundError in python 2
ClemDoum Oct 3, 2018
0918226
Fix builtin entity path bug in Python 2.7
ClemDoum Oct 3, 2018
67fb413
Update documentation
adrienball Oct 3, 2018
cb53d07
Fix linting issues
adrienball Oct 3, 2018
8b5ff6e
Add default configuration for italian
adrienball Oct 3, 2018
d8afcbd
Update __about__ file and bump model version to 0.17.0
adrienball Oct 3, 2018
10d9acb
Fix CI
adrienball Oct 4, 2018
2d4aa15
Fix tests
adrienball Oct 4, 2018
0c60b3f
Merge pull request #668 from snipsco/task/custom-gazetteer-entities
adrienball Oct 4, 2018
95c27ae
Add sample datasets
adrienball Oct 4, 2018
ffb6287
Update README
adrienball Oct 4, 2018
fe1c1a6
Merge pull request #669 from snipsco/task/improve-readme
adrienball Oct 5, 2018
fbe7e4f
Update Changelog
adrienball Oct 5, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 4 additions & 14 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,9 @@ install:
build: false

test_script:
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_de-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_en-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_es-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_fr-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_ja-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_ko-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu download snips_nlu_zh-0.2.0 --direct"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_de de"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_en en"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_es es"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_fr fr"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_ja ja"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_ko ko"
- "%PYTHON%\\python.exe -m snips_nlu link snips_nlu_zh zh"
- "%PYTHON%\\python.exe -m snips_nlu download-all-languages"
- "%PYTHON%\\python.exe -m snips_nlu download-entity snips/musicAlbum fr"
- "%PYTHON%\\python.exe -m snips_nlu download-entity snips/musicArtist fr"
- "%PYTHON%\\python.exe -m snips_nlu download-entity snips/musicTrack fr"
- "%PYTHON%\\python.exe -m unittest discover"
- "%PYTHON%\\python.exe snips_nlu_samples/sample.py"
7 changes: 7 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
**Description**:
- ...

**Checklist**:
- [ ] My PR is ready for code review
- [ ] I have added some tests, if applicable, and run the whole test suite, including [linting tests](../linting_test.py)
- [ ] I have updated the documentation, if applicable
Binary file added .img/benchmarks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ after_success:
- tox -e coverage-report
- codecov

cache: pip
cache: pip
13 changes: 12 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# Changelog
All notable changes to this project will be documented in this file.

## [0.17.0] - 2018-10-05
### Added
- Support for 3 new builtin entities in French: `snips/musicAlbum`, `snips/musicArtist` and `snips/musicTrack`
- Minimal support for Italian

### Changed
- model version `0.16.0` => `0.17.0`

### Fixed
- Bug with entity feature name in intent classification

## [0.16.5] - 2018-0906
## [0.16.5] - 2018-09-06
### Fixed
- Segfault in CRFSuite when the `CRFSlotFiller` is fitted only on empty utterances

Expand Down Expand Up @@ -140,6 +150,7 @@ several commands.
- Fix compiling issue with `bindgen` dependency when installing from source
- Fix issue in `CRFSlotFiller` when handling builtin entities

[0.17.0]: https://github.com/snipsco/snips-nlu/compare/0.16.5...0.17.0
[0.16.5]: https://github.com/snipsco/snips-nlu/compare/0.16.4...0.16.5
[0.16.4]: https://github.com/snipsco/snips-nlu/compare/0.16.3...0.16.4
[0.16.3]: https://github.com/snipsco/snips-nlu/compare/0.16.2...0.16.3
Expand Down
171 changes: 128 additions & 43 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,84 @@ Snips NLU

`Snips NLU <https://snips-nlu.readthedocs.io>`_ (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information.

Check out our `blog post`_ to get more details about why we built Snips NLU and how it works under the hood.
Summary
-------

- `What is Snips NLU about ?`_
- `Getting Started`_

- `System requirements`_
- `Installation`_
- `Language Resources`_
- `API Usage`_

- `Sample code`_
- `Command Line Interface`_
- `Sample datasets`_
- `Benchmarks`_
- `Documentation`_
- `FAQ`_
- `Related content`_
- `How do I contribute ?`_
- `Licence`_

What is Snips NLU about ?
-------------------------

Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant.

The NLU engine first detects what the intention of the user is (a.k.a. `intent`_), then extracts the parameters (called `slots`_) of the query. The developer can then use this to determine the appropriate action or response.


Let’s take an example to illustrate this, and consider the following sentence:

.. code-block:: text

"What will be the weather in paris at 9pm?"

Properly trained, the Snips NLU engine will be able to extract structured data such as:

.. code-block:: json

{
"intent": {
"intentName": "searchWeatherForecast",
"probability": 0.95
},
"slots": [
{
"value": "paris",
"entity": "locality",
"slotName": "forecast_locality"
},
{
"value": {
"kind": "InstantTime",
"value": "2018-02-08 20:00:00 +00:00"
},
"entity": "snips/datetime",
"slotName": "forecast_start_datetime"
}
]
}

In this case, the identified intent is ``searchWeatherForecast`` and two slots were extracted, a locality and a datetime. As you can see, Snips NLU does an extra step on top of extracting entities: it resolves them. The extracted datetime value has indeed been converted into a handy ISO format.

Check out our `blog post`_ to get more details about why we built Snips NLU and how it works under the hood. We also published a `paper on arxiv`_, presenting the machine learning architecture of the Snips Voice Platform.


Getting Started
---------------

-------------------
System requirements
-------------------
- 64-bit Linux, MacOS >= 10.11, 64-bit Windows
- Python 2.7 or Python >= 3.4

- Python 2.7 or Python >= 3.4, < 3.7
- RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset.


------------
Installation
------------

Expand All @@ -46,10 +115,11 @@ distribution. To do so, `Rust <https://www.rust-lang.org/en-US/install.html>`_
and `setuptools_rust <https://github.com/PyO3/setuptools-rust>`_ must be
installed before running the ``pip install snips-nlu`` command.

------------------
Language resources
------------------

Snips NLU relies on `language resources`_ that must be downloaded before the
Snips NLU relies on `external language resources`_ that must be downloaded before the
library can be used. You can fetch resources for a specific language by
running the following command:

Expand All @@ -71,51 +141,41 @@ Once the resources have been fetched, they can be loaded in Python using:

load_resources("en")

The list of supported languages is available
`here <https://snips-nlu.readthedocs.io/en/latest/languages.html>`_.
The list of supported languages is available at
`this address <https://snips-nlu.readthedocs.io/en/latest/languages.html>`_.

A simple example
----------------
API Usage
---------

Let’s take an example to illustrate the main purpose of this lib, and consider the following sentence:
----------------------
Command Line Interface
----------------------

.. code-block:: text
The easiest way to test the abilities of this library is through the command line interface.

"What will be the weather in paris at 9pm?"
First, start by training the NLU with one of the `sample datasets`_:

Properly trained, the Snips NLU engine will be able to extract structured data such as:
.. code-block:: sh

snips-nlu train path/to/dataset.json path/to/output_trained_engine

.. code-block:: json
Where ``path/to/dataset.json`` is the path to the dataset which will be used during training, and ``path/to/output_trained_engine`` is the location where the trained engine should be persisted once the training is done.

{
"intent": {
"intentName": "searchWeatherForecast",
"probability": 0.95
},
"slots": [
{
"value": "paris",
"entity": "locality",
"slotName": "forecast_locality"
},
{
"value": {
"kind": "InstantTime",
"value": "2018-02-08 20:00:00 +00:00"
},
"entity": "snips/datetime",
"slotName": "forecast_start_datetime"
}
]
}
After that, you can start parsing sentences interactively by running:

.. code-block:: sh

snips-nlu parse path/to/trained_engine

Where ``path/to/trained_engine`` corresponds to the location where you have stored the trained engine during the previous step.


-----------
Sample code
-----------

Here is a sample code that you can run on your machine after having
installed `snips-nlu`, fetched the english resources and downloaded this
`sample dataset`_:
installed `snips-nlu`, fetched the english resources and downloaded one of the `sample datasets`_:

.. code-block:: python

Expand All @@ -141,19 +201,39 @@ installed `snips-nlu`, fetched the english resources and downloaded this
What it does is training an NLU engine on a sample weather dataset and parsing
a weather query.

Sample datasets
---------------

Here is a list of some datasets that can be used to train a Snips NLU engine:

- `Lights dataset <sample_datasets/lights_dataset.json>`_: "Turn on the lights in the kitchen", "Set the light to red in the bedroom"
- `Beverage dataset <sample_datasets/beverage_dataset.json>`_: "Prepare two cups of cappucino", "Make me a cup of tea"
- `Flights dataset <sample_datasets/flights_dataset.json>`_: "Book me a flight to go to boston this weekend", "book me some tickets from istanbul to moscow in three days"

Benchmarks
----------

In January 2018, we reproduced an `academic benchmark`_ which was published during the summer 2017. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and `Rasa NLU`_. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue).

.. image:: .img/benchmarks.png

In the figure above, `F1 scores`_ of both intent classification and slot filling were computed for several NLU providers, and averaged accross the three datasets used in the academic benchmark mentionned before. All the underlying results can be found `here <https://github.com/snipsco/nlu-benchmark/tree/master/2018-01-Braum-et-al-extension>`_.


Documentation
-------------

To find out how to use Snips NLU please refer to our `documentation <https://snips-nlu.readthedocs.io>`_, it will provide you with a step-by-step guide on how to use and setup our library.
To find out how to use Snips NLU please refer to the `package documentation <https://snips-nlu.readthedocs.io>`_, it will provide you with a step-by-step guide on how to setup and use this library.

FAQ
---
Please join our `Discord channel`_ to ask your questions and get feedback from the community.
Please join Snips `Discord channel`_ to ask your questions and get feedback from the community.

Links
-----
Related content
---------------
* `What is Snips about ? <https://snips.ai/>`_
* Snips NLU Open sourcing `blog post`_
* `Snips Voice Platform paper (arxiv) <https://arxiv.org/abs/1805.10190>`_
* `Snips NLU Language Resources <https://github.com/snipsco/snips-nlu-language-resources>`_
* `Bug tracker <https://github.com/snipsco/snips-nlu/issues>`_
* `Snips NLU Rust <https://github.com/snipsco/snips-nlu-rs>`_: Rust inference pipeline implementation and bindings (C, Swift, Kotlin, Python)
Expand All @@ -170,7 +250,12 @@ Licence

This library is provided by `Snips <https://www.snips.ai>`_ as Open Source software. See `LICENSE <LICENSE>`_ for more information.

.. _language resources: https://github.com/snipsco/snips-nlu-language-resources
.. _sample dataset: snips_nlu_samples/sample_dataset.json
.. _external language resources: https://github.com/snipsco/snips-nlu-language-resources
.. _Discord channel: https://discordapp.com/invite/3939Kqx
.. _blog post: https://medium.com/snips-ai/an-introduction-to-snips-nlu-the-open-source-library-behind-snips-embedded-voice-platform-b12b1a60a41a
.. _blog post: https://medium.com/snips-ai/an-introduction-to-snips-nlu-the-open-source-library-behind-snips-embedded-voice-platform-b12b1a60a41a
.. _paper on arxiv: https://arxiv.org/abs/1805.10190
.. _academic benchmark: http://workshop.colips.org/wochat/@sigdial2017/documents/SIGDIAL22.pdf
.. _Rasa NLU: https://nlu.rasa.ai/
.. _F1 scores: https://en.wikipedia.org/wiki/F1_score
.. _intent: https://snips-nlu.readthedocs.io/en/latest/data_model.html#intent
.. _slots: https://snips-nlu.readthedocs.io/en/latest/data_model.html#slot
5 changes: 2 additions & 3 deletions debug/debug.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@ def debug_training(dataset_path, config_path=None):

load_resources(dataset["language"])

if config_path is None:
config = NLUEngineConfig()
else:
config = None
if config_path is not None:
with Path(config_path).open("r", encoding="utf8") as f:
config = NLUEngineConfig.from_dict(json.load(f))

Expand Down
3 changes: 2 additions & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,5 @@ Configurations
Result and output format
------------------------

.. automodule:: snips_nlu.result
.. automodule:: snips_nlu.result
:members:
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
# Autodoc settings
autoclass_content = 'both'
autodoc_member_order = 'bysource'
autodoc_default_flags = ['members', 'no-show-inheritance']
autodoc_default_options = {'members': None, 'no-show-inheritance': None}

# Napoleon settings
napoleon_google_docstring = True
Expand Down
6 changes: 4 additions & 2 deletions docs/source/data_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,8 @@ entity in your dataset as follows:
"value": "blue",
"synonyms": []
}
]
],
"parser_threshold": 1.0
}
}

Expand Down Expand Up @@ -361,6 +362,7 @@ not your custom entity is automatically extensible:
"my_custom_entity": {
"automatically_extensible": true,
"use_synonyms": true,
"data": []
"data": [],
"parser_threshold": 1.0
}
}
9 changes: 3 additions & 6 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,12 @@ Installation
System requirements
-------------------
- 64-bit Linux, MacOS >= 10.11, 64-bit Windows
- Python 2.7 or Python >= 3.4
- Python 2.7 or Python >= 3.4, <3.7
- RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset.

.. warning::

Python Version
--------------

We recommend using the latest version of Python 3. Snips NLU supports Python
3.4 and newer as well as Python 2.7.
Snips NLU does not support Python3.7 yet.


Install Snips NLU
Expand Down
2 changes: 2 additions & 0 deletions docs/source/languages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ their isocode:
+------------+------------+
| French | fr |
+------------+------------+
| Italian | it |
+------------+------------+
| Japanese | ja |
+------------+------------+
| Korean | ko |
Expand Down
2 changes: 2 additions & 0 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Now, the ``"entities"`` part of the generated json looks like that:
"value": "garden"
}
],
"parser_threshold": 1.0,
"use_synonyms": true
},
"snips/temperature": {}
Expand Down Expand Up @@ -199,6 +200,7 @@ and that we want our assistant to cover. Additionally, we add some
"value": "garden"
}
],
"parser_threshold": 1.0,
"use_synonyms": true
},
"snips/temperature": {}
Expand Down
Loading