Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolyAI Models - model.tar.gz - No Longer Available? #6806

Closed
RedTint opened this issue Sep 28, 2020 · 26 comments
Closed

PolyAI Models - model.tar.gz - No Longer Available? #6806

RedTint opened this issue Sep 28, 2020 · 26 comments
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@RedTint
Copy link

RedTint commented Sep 28, 2020

Rasa version: 2.0.0rc2
Python version: 3.6.12

Operating system (windows, osx, ...): OSX

Issue: Failed to run rasa init. Upon checking the model source, it is no longer available. It was taken down 2 hours ago upon posting this.

Edited to include the message from https://github.com/PolyAI-LDN/polyai-models:

After much consideration, the PolyAI team has decided to take down the ConveRT models from the public domain.
Over the course of last year, we have been very excited to see ConveRT gaining a huge amount of traction in various communities - that was something we didn't expect when we first released it. However, with the amount of business growing and the shift of our team's priorities, we no longer have the resources to responsibly maintain or provide support for these models.
PolyAI is working to create end-to-end voice assistants. If you're interested in helping us, check out our careers page at polyai.com/careers. On the other hand, if you are interested in knowing how our solutions can help you transform your contact center, please get in touch at [email protected].

Error (including full traceback):

Warning: Output is not to a terminal (fd=1).
Warning: Input is not to a terminal (fd=0).
2020-09-28 04:00:48 INFO     absl  - Using /tmp/tfhub_modules to cache modules.
2020-09-28 04:00:48 INFO     absl  - Downloading TF-Hub Module 'https://github.com/PolyAI-LDN/polyai-models/releases/download/v1.0/model.tar.gz'.
Welcome to Rasa! 🤖

To get started quickly, an initial project will be created.
If you need some help, check out the documentation at https://rasa.com/docs/rasa.

Created project directory at '/usr/src/app'.
Finished creating project structure.
Training an initial model...
The configuration for policies and pipeline was chosen automatically. It was written into the config file at './config.yml'.
Training NLU model...
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/rasa/utils/train_utils.py", line 142, in load_tf_hub_model
    return tfhub.load(model_url)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/module_v2.py", line 97, in load
    module_path = resolve(handle)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/module_v2.py", line 53, in resolve
    return registry.resolver(handle)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/registry.py", line 42, in __call__
    return impl(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 88, in __call__
    self._lock_file_timeout_sec())
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/resolver.py", line 415, in atomic_download
    download_fn(handle, tmp_dir)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 83, in download
    response = self._call_urlopen(request)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 96, in _call_urlopen
    return url.urlopen(request)
  File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/rasa/__main__.py", line 113, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/usr/local/lib/python3.6/site-packages/rasa/cli/scaffold.py", line 218, in run
    init_project(args, path)
  File "/usr/local/lib/python3.6/site-packages/rasa/cli/scaffold.py", line 128, in init_project
    print_train_or_instructions(args, path)
  File "/usr/local/lib/python3.6/site-packages/rasa/cli/scaffold.py", line 68, in print_train_or_instructions
    args.model = rasa.train(domain, config, training_files, output)
  File "/usr/local/lib/python3.6/site-packages/rasa/train.py", line 55, in train
    loop,
  File "/usr/local/lib/python3.6/site-packages/rasa/utils/common.py", line 300, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.6/site-packages/rasa/train.py", line 110, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/usr/local/lib/python3.6/site-packages/rasa/train.py", line 207, in _train_async_internal
    old_model_zip_path=old_model,
  File "/usr/local/lib/python3.6/site-packages/rasa/train.py", line 246, in _do_training
    additional_arguments=nlu_additional_arguments,
  File "/usr/local/lib/python3.6/site-packages/rasa/train.py", line 543, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/train.py", line 97, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/model.py", line 159, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/model.py", line 171, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/components.py", line 760, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/registry.py", line 163, in create_component_by_config
    return component_class.create(component_config, config)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/components.py", line 464, in create
    return cls(component_config)
  File "/usr/local/lib/python3.6/site-packages/rasa/nlu/tokenizers/convert_tokenizer.py", line 44, in __init__
    self.module = train_utils.load_tf_hub_model(self.model_url)
  File "/usr/local/lib/python3.6/site-packages/rasa/utils/train_utils.py", line 146, in load_tf_hub_model
    return tfhub.load(model_url)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/module_v2.py", line 97, in load
    module_path = resolve(handle)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/module_v2.py", line 53, in resolve
    return registry.resolver(handle)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/registry.py", line 42, in __call__
    return impl(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 88, in __call__
    self._lock_file_timeout_sec())
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/resolver.py", line 415, in atomic_download
    download_fn(handle, tmp_dir)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 83, in download
    response = self._call_urlopen(request)
  File "/usr/local/lib/python3.6/site-packages/tensorflow_hub/compressed_module_resolver.py", line 96, in _call_urlopen
    return url.urlopen(request)
  File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
ERROR: Service 'rasa' failed to build: The command '/bin/sh -c rasa init --no-prompt' returned a non-zero code: 1

Command or request that led to error:

rasa init
@RedTint RedTint added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Sep 28, 2020
@RedTint RedTint changed the title PolyAI Models - model.tar.gz - No Longer Available PolyAI Models - model.tar.gz - No Longer Available? Sep 28, 2020
@domsammut
Copy link
Contributor

We're also getting this error.

Rasa version: 1.10.14
Python version: 3.6

@kitlun
Copy link

kitlun commented Sep 28, 2020

https://github.com/PolyAI-LDN/polyai-models

After much consideration, the PolyAI team has decided to take down the ConveRT models from the public domain.

Over the course of last year, we have been very excited to see ConveRT gaining a huge amount of traction in various communities - that was something we didn't expect when we first released it. However, with the amount of business growing and the shift of our team's priorities, we no longer have the resources to responsibly maintain or provide support for these models.

PolyAI is working to create end-to-end voice assistants. If you're interested in helping us, check out our careers page at polyai.com/careers. On the other hand, if you are interested in knowing how our solutions can help you transform your contact center, please get in touch at [email protected].

@sara-tagger
Copy link
Collaborator

Thanks for the issue, @tabergma will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@tabergma
Copy link
Contributor

@dakshvar22

@gioppoluca
Copy link

same error here, using the docker version, URL does not exist anymore.

@tabergma
Copy link
Contributor

Unfortunately, the ConveRT model was taken offline. We are working on a long-term solution and will keep you updated. In the mean time we recommend to remove ConverRT from your pipeline and just use supervised embeddings, such as CountVectorsFeaturizer.
For example, you could change the default config.yml created by rasa init to the following to train your model.

language: en

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100

policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#   - name: RulePolicy

We are sorry for any inconveniences!

@RedTint
Copy link
Author

RedTint commented Sep 28, 2020

No worries and thank you! rasa init worked afterwards, though, I still get the warning Could not load model due to HTTP Error 404: Not Found when running rasa run:

rasa-ros | 2020-09-28 13:25:00 INFO     root  - Starting Rasa server on http://localhost:5005
rasa-ros | 2020-09-28 13:25:03 INFO     absl  - Using /tmp/tfhub_modules to cache modules.
rasa-ros | 2020-09-28 13:25:03 INFO     absl  - Downloading TF-Hub Module 'https://github.com/PolyAI-LDN/polyai-models/releases/download/v1.0/model.tar.gz'.
rasa-ros | 2020-09-28 13:25:04 ERROR    rasa.core.agent  - Could not load model due to HTTP Error 404: Not Found.
rasa-ros | /usr/local/lib/python3.6/site-packages/rasa/shared/utils/io.py:87: UserWarning: The model at 'models' could not be loaded. Error: HTTP Error 404: Not Found
rasa-ros | /usr/local/lib/python3.6/site-packages/rasa/shared/utils/io.py:87: UserWarning: Agent could not be loaded with the provided configuration. Load default agent without any model.
rasa-ros | 2020-09-28 13:25:04 INFO     root  - Rasa server is up and running.

@connorbrinton
Copy link

The previously publicly available ConveRT models appear to have been licensed under the Apache 2.0 license, making redistribution permissible. If anyone has the official files for the ConveRT models, it would be great to have them redistributed under the same license here.

I've repackaged the loaded model I have running in production, and released it here (under the Apache 2.0 license):
https://github.com/connorbrinton/polyai-models/releases/tag/v1.0

To use this model, you'll need to either:

  • Retrain your models, with the new model_url configuration parameter for ConveRTTokenizer set to the URL of the new model location, or
  • Monkey-patch the loader to redirect requests for the original model locations to the new model location

@aizest
Copy link

aizest commented Sep 28, 2020

Thanks a lot.
But where to set that model_url configuration parameter?
I added this "model_url" property under ConveRTTokenizer in the config.yml, but it seems still tries to download from "http://models.poly-ai.com/convert/v1/model.tar.gz".

The previously publicly available ConveRT models appear to have been licensed under the Apache 2.0 license, making redistribution permissible. If anyone has the official files for the ConveRT models, it would be great to have them redistributed under the same license here.

I've repackaged the loaded model I have running in production, and released it here (under the Apache 2.0 license):
https://github.com/connorbrinton/polyai-models/releases/tag/v1.0

To use this model, you'll need to either:

  • Retrain your models, with the new model_url configuration parameter for ConveRTTokenizer set to the URL of the new model location, or
  • Monkey-patch the loader to redirect requests for the original model locations to the new model location

@connorbrinton
Copy link

@aizest Have you upgraded to Rasa 1.10.14? The changes making the model URL configurable are very recent, so you may need to upgrade to a version of Rasa that includes those changes in order for the model_url property to work 🙂

@aizest
Copy link

aizest commented Sep 28, 2020

@connorbrinton Got it. Have to make some changes in the code, as we're using a lower version.
Really appreciate the quick feedback!

@nmoss
Copy link

nmoss commented Oct 1, 2020

With polyAI removing the convert models does Rasa have another recommended configuration? I was a bit worried about continuing to try to use convert with the current licensing. For now I've gone back to using the default configuration for Rasa. Any clarification on this issue would be appreciated. Thanks!

@magdalini-anastasiadou
Copy link

@aizest Have you upgraded to Rasa 1.10.14? The changes making the model URL configurable are very recent, so you may need to upgrade to a version of Rasa that includes those changes in order for the model_url property to work slightly_smiling_face

I have Rasa 1.10.14 and started the config file

pipeline:
  - name: ConveRTTokenizer
    model_url: https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz

and still it tries to download the previous url, ending with an HTTP Error 404: Not Found. Any help on how to solve this?
I tried to delete everything from /tmp/thfub_modules and also to alter the code in convert_tokenizer.py but still the same

Training NLU model...
2020-10-07 15:38:15 INFO     absl  - Using /tmp/tfhub_modules to cache modules.
2020-10-07 15:38:15 INFO     absl  - Downloading TF-Hub Module 'https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz'.
2020-10-07 15:39:20 INFO     absl  - Downloading https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz: 52.35MB
2020-10-07 15:39:37 INFO     absl  - Downloading https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz: 148.59MB
2020-10-07 15:39:38 INFO     absl  - Downloaded https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz, Total size: 152.02MB
2020-10-07 15:39:38 INFO     absl  - Downloaded TF-Hub Module 'https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz'.
2020-10-07 15:39:43 INFO     absl  - Downloading TF-Hub Module 'https://github.com/PolyAI-LDN/polyai-models/releases/download/v1.0/model.tar.gz'.

@free-soellingeraj
Copy link

free-soellingeraj commented Oct 8, 2020

@magdalini-anastasiadou I had the same problem.
I ended up hacking rasa/nlu/featurizers/dense_featurizer/convert_featurizer.py:27 and rasa/nlu/tokenizers/convert_tokenizer.py:10, just to get up and running. commenting out those lines and replacing them with the model url https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz and that worked.
FWIW I also removed the model_url param from the config.yml to avoid double downloading that model, which is kinda large.

@rfalba
Copy link

rfalba commented Oct 12, 2020

@magdalini-anastasiadou I also had this same issue. I fix it by using the model_url property under ConveRTFeaturizer also.
pipeline:

@AdityaSingh06
Copy link

Hi Team,

Even I was facing the ConveRTTokenizer unavailibility issue before so i upgraded rasa to '1.10.14' (using the command "pip3 install rasa[convert] ==1.10.14") and made the following changes to the config.yml file --

pipeline:

But ever since I made the above two mentioned changes, my model has become extremely erratic (unable to identify entities properly, misclassifying intensions etc.).
My data has not changed and only above mentioned changes are made from my side. Is there anything i am missing or need to do here?
Any suggestion or pointer would be of great help. Thanks community.

@sbkv
Copy link

sbkv commented Oct 23, 2020

@tabergma @akelad Any updates? Note that ConveRT is still listed as a component:
https://rasa.com/docs/rasa/components/#converttokenizer
What URL should be used?

@dakshvar22
Copy link
Contributor

@sbkv A fix for this will be released soon in a patch. The PR is under review. The fix makes it mandatory to set model_url inside ConveRTTokenizer to either a community/self-hosted URL of the model or path to a local directory containing the model files. You cannot use the component if this parameter is not set.

@mamun-131
Copy link

@dakshvar22
Hi, I am using RASA 1.9.6 and want to use ConveRTTokenizer and ConveRTFeaturizer. Where can I get the model file that fits on RASA 1.9.6? Please help me?

@dakshvar22
Copy link
Contributor

@mamun-131 You will have to upgrade to at least 1.10.14 to set a configurable URL.
If that is not an option, you may try this method to manually override the URL set inside the code. You will need to reinstall the package from source in that case.

@mamun-131
Copy link

@magdalini-anastasiadou I had the same problem.
I ended up hacking rasa/nlu/featurizers/dense_featurizer/convert_featurizer.py:27 and rasa/nlu/tokenizers/convert_tokenizer.py:10, just to get up and running. commenting out those lines and replacing them with the model url https://github.com/connorbrinton/polyai-models/releases/download/v1.0/model.tar.gz and that worked.
FWIW I also removed the model_url param from the config.yml to avoid double downloading that model, which is kinda large.

@free-soellingeraj would be appreciated, if you share the process a little more. where did you change and how did you reinstall RASA further?

@free-soellingeraj
Copy link

free-soellingeraj commented Oct 28, 2020

I didn't re-install anything.
For a stable solution, I forked the repo and updated the forked repo. In the fork, if you go to rasa/nlu/featurizers/dense_featurizer/convert_featurizer.py line 27 and rasa/nlu/tokenizers/convert_tokenizer.py line 10 you will find a URL that needs to be edited. You wouldn't have to fork the repo though. You could just go to the rasa library in its install directory and edit those files directly. To find the install directory, you can look at your stack trace. Or you should be able to do:

>> import rasa
>> print(rasa.__file__)

@mamun-131
Copy link

mamun-131 commented Oct 28, 2020

@free-soellingeraj Actually I did the same. Its working. Thanks.
In Ubuntu file are available here.
/usr/local/lib/python3.7/site-packages/rasa/nlu/featurizers/dense_featurizer
/usr/local/lib/python3.7/site-packages/rasa/nlu/tokenizers

@dakshvar22
Copy link
Contributor

dakshvar22 commented Oct 29, 2020

Closing this as the fix now makes the URL configurable in rasa==2.0.3. You can either use a self/community-hosted URL of the model or if you have a local copy of the model, pass the path to the directory containing the model files to parameter model_url of ConveRTTokenizer

@LohithArcot
Copy link

LohithArcot commented Feb 15, 2021

The previously publicly available ConveRT models appear to have been licensed under the Apache 2.0 license, making redistribution permissible. If anyone has the official files for the ConveRT models, it would be great to have them redistributed under the same license here.

I've repackaged the loaded model I have running in production, and released it here (under the Apache 2.0 license):
https://github.com/connorbrinton/polyai-models/releases/tag/v1.0

To use this model, you'll need to either:

  • Retrain your models, with the new model_url configuration parameter for ConveRTTokenizer set to the URL of the new model location, or
  • Monkey-patch the loader to redirect requests for the original model locations to the new model location

For the people wondering why there is a drop in performance, the model uploaded by Connor is a custom data retrained version. i.e. model built on top of the standard Poly-AI model
@connorbrinton correct me if I am wrong please.
The original author of the ConveRT Featurizer has also released the model on Github

@connorbrinton
Copy link

@lohithpro It's been a while, so I don't remember exactly how I pulled the model off of my production servers, but I think I directly copied the downloaded model .tar.gz from the TFHUB_CACHE_DIR directory. I believe that would give the original PolyAI model pre-fine-tuning, but I could be wrong 😅

I would definitely recommend that people only rely on my uploaded ConveRT models as a stop-gap solution. NLP moves fast, so there are probably much better models out there already. I now use HuggingFace (LanguageModelFeaturizer) models instead of ConveRT models, and we're investigating Google's new conditional masked language model (which hasn't gotten a lot of press yet). It's an exciting time to be working with dialog systems! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests