Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train finetune flag throws tensorflow error #7720

Closed
stefanvantchev opened this issue Jan 13, 2021 · 10 comments · Fixed by #7769
Closed

Train finetune flag throws tensorflow error #7720

stefanvantchev opened this issue Jan 13, 2021 · 10 comments · Fixed by #7769
Assignees
Labels
area:rasa-oss/ml 👁 All issues related to machine learning type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@stefanvantchev
Copy link

stefanvantchev commented Jan 13, 2021

Rasa version:
2.2.3

Rasa SDK version (if used & relevant):
2.2.0

Rasa X version (if used & relevant):
none

Python version:
3.7.3

Operating system (windows, osx, ...):
Linux-4.19.0-13-cloud-amd64-x86_64-with-debian-10.7

Issue:

ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: <tf.Tensor 'batch_in_3:0' shape=(None,) dtype=float32>

Error (including full traceback):

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/__main__.py", line 134, in <module>
    main()
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/cli/train.py", line 205, in train_nlu
    finetuning_epoch_fraction=args.epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 711, in train_nlu
    finetuning_epoch_fraction=finetuning_epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 757, in _train_nlu_async
    finetuning_epoch_fraction=finetuning_epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 818, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/train.py", line 116, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/model.py", line 209, in train
    updates = component.train(working_data, self.config, **context)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 818, in train
    self.component_config[BATCH_STRATEGY],
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 220, in fit
    ) = self._get_tf_train_functions(eager, model_data, batch_strategy)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 482, in _get_tf_train_functions
    train_dataset_function, self.train_on_batch, eager, "train"
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 464, in _get_tf_call_model_function
    tf_call_model_function(next(iter(init_dataset)))
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
    *args, **kwds))
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py:293 train_on_batch  *
        prediction_loss = self.batch_loss(batch_in)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:1409 batch_loss  *
        tf_batch_data = self.batch_to_model_data_format(batch_in, self.data_signature)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py:592 batch_to_model_data_format  *
        batch_data[key][sub_key].append(
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/sparse_tensor.py:130 __init__  **
        indices, name="indices", dtype=dtypes.int64)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1475 convert_to_tensor
        (dtype.name, value.dtype.name, value))

    ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: <tf.Tensor 'batch_in_3:0' shape=(None,) dtype=float32>

Command or request that led to error:
python3 -m rasa train nlu --out small/spacy2 --nlu small/small_nlu.md --config small/small_spacy_config.yml **--finetune**

Content of configuration file (config.yml) (if relevant):

pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

Tensorflow related items:

tensorflow              2.3.2
tensorflow-addons       0.12.0
tensorflow-estimator    2.3.0
tensorflow-hub          0.9.0
tensorflow-probability  0.11.1
tensorflow-text         2.3.0

If I remove the "tuneup" flag, the training works just fine.
The model has only 3 intents with about 20 phrases each.
There are no stories - our work is strictly focused only on intent classification.
I've attempted several pipelines to no avail.

@stefanvantchev stefanvantchev added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Jan 13, 2021
@stefanvantchev stefanvantchev changed the title Finetune issue throws tensorflow error Finetune flag throws tensorflow error Jan 13, 2021
@stefanvantchev stefanvantchev changed the title Finetune flag throws tensorflow error Train finetune flag throws tensorflow error Jan 13, 2021
@sara-tagger
Copy link
Collaborator

Thanks for raising this issue, @melindaloubser1 will get back to you about it soon✨

Please also check out the docs and the forum in case your issue was raised there too 🤗

@dakshvar22
Copy link
Contributor

Hi @stefanvantchev, I am unable to reproduce the error. I think there might be some problem with the data. To help diagnose it could you tell me a bit more about your setup -

  1. How many intents did you have when training the base model?
  2. Did you add/remove any intents after training the base model and before running the finetuning command?
  3. Did you change the configuration after training the base model and before running the finetuning command?
  4. What version if Rasa Open Source was the base model trained with?

@stefanvantchev
Copy link
Author

I have 3 intents in both the training set, each with about 15-20.
No changes are made between training and finetuning command.
Running the test command works just fine.

I'd be happy to share the model.

I am sure it is some sort of prerequisite version mismatch - but what strikes me is that every other command and flag works.

@dakshvar22
Copy link
Contributor

Finetuning a model has certain restrictions. Does your setup follow those restrictions? We do have checks which would detect if you aren't following those restrictions so I'd be curious to see what's happening. If you are okay to do so, you could send me the training data(a small reproducible set is also fine) and trained model to my mail and I can also take a brief look.

@stefanvantchev
Copy link
Author

stefanvantchev commented Jan 19, 2021

I decided to test it on a completely brand new debian VM from GCP and clean installation of rasa.
I followed the steps in the Installing Rasa documentation.
I initialized the example setup from rasa.
I trained the model.
I then added one phrase to the first intent: I added the phrase "yo yo yo" to the first intent from your example
I trained the model again with the finetune flag turned on.
I received the following output:


stefan.vantchev@rasa-debian-code:~$ python3 -m rasa train --finetune
2021-01-19 19:54:39.664159: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-01-19 19:54:39.664212: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-01-19 19:54:41.187718: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-01-19 19:54:41.187772: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-19 19:54:41.187801: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (rasa-debian-code): /proc/driver/nvidia/version does not exist
The configuration for pipeline and policies was chosen automatically. It was written into the config file at 'config.yml'.
2021-01-19 19:54:41 INFO     rasa.model  - Data (messages) for NLU model section changed.
Training NLU model...
2021-01-19 19:54:42 WARNING  rasa.shared.utils.common  - The Incremental Training feature is currently experimental and might change or be removed in the future 🔬 Please share your feedback on it in the forum (https://forum.rasa.com) to help us make this feature ready for production.
Loading NLU model from models/20210119-195232.tar.gz for finetuning...
2021-01-19 19:54:42.861763: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-19 19:54:42.873367: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299995000 Hz
2021-01-19 19:54:42.878629: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x57e1d60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-19 19:54:42.878681: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-19 19:54:52 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2021-01-19 19:54:52 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 70 (7 distinct intents)

2021-01-19 19:54:52 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'affirm', 'bot_challenge', 'deny', 'greet', 'goodbye', 'mood_unhappy', 'mood_great'
2021-01-19 19:54:52 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2021-01-19 19:54:52 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Finished training component.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Finished training component.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Finished training component.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2021-01-19 19:54:52 INFO     rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - 81 vocabulary slots consumed out of 1080 slots configured for text attribute.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Finished training component.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2021-01-19 19:54:52 INFO     rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - 699 vocabulary slots consumed out of 1697 slots configured for text attribute.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Finished training component.
2021-01-19 19:54:52 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/__main__.py", line 134, in <module>
    main()
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/cli/train.py", line 58, in <lambda>
    train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/cli/train.py", line 102, in train
    finetuning_epoch_fraction=args.epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 109, in train
    loop,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 174, in train_async
    finetuning_epoch_fraction=finetuning_epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 353, in _train_async_internal
    finetuning_epoch_fraction=finetuning_epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 396, in _do_training
    finetuning_epoch_fraction=finetuning_epoch_fraction,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/train.py", line 818, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/train.py", line 116, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/model.py", line 209, in train
    updates = component.train(working_data, self.config, **context)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 818, in train
    self.component_config[BATCH_STRATEGY],
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 220, in fit
    ) = self._get_tf_train_functions(eager, model_data, batch_strategy)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 482, in _get_tf_train_functions
    train_dataset_function, self.train_on_batch, eager, "train"
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 464, in _get_tf_call_model_function
    tf_call_model_function(next(iter(init_dataset)))
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
    *args, **kwds))
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py:293 train_on_batch  *
        prediction_loss = self.batch_loss(batch_in)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py:1409 batch_loss  *
        tf_batch_data = self.batch_to_model_data_format(batch_in, self.data_signature)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py:592 batch_to_model_data_format  *
        batch_data[key][sub_key].append(
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/sparse_tensor.py:130 __init__  **
        indices, name="indices", dtype=dtypes.int64)
    /home/stefan.vantchev/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1475 convert_to_tensor
        (dtype.name, value.dtype.name, value))

    ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: <tf.Tensor 'batch_in_3:0' shape=(None,) dtype=float32>

I tested this step by step several times with exactly the same results.
You should be able to reproduce this error - just follow the instructions of your Installing Rasa doc as-is and then do the training and finetunning.

@dakshvar22
Copy link
Contributor

Can you post what version of tensorflow you have installed?

@stefanvantchev
Copy link
Author

tensorflow 2.3.2
tensorflow-addons 0.12.0
tensorflow-estimator 2.3.0
tensorflow-hub 0.9.0
tensorflow-probability 0.11.1
tensorflow-text 2.3.0

Whatever the rasa install required - I did not run any additional installation commands.

@dakshvar22
Copy link
Contributor

dakshvar22 commented Jan 20, 2021

Thanks! I was able to reproduce the error and it seems like the problem is when entity_recognition is set to True inside DIETClassifier. If it's set to False then it works. This happens even if the training data does not contain any data for entity recognition.

On more investigation - It looks like the sequence features of label attribute have indices of type float32 inside the batch which is unusual.

@dakshvar22 dakshvar22 added area:rasa-oss/ml 👁 All issues related to machine learning and removed area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Jan 20, 2021
@dakshvar22 dakshvar22 self-assigned this Jan 20, 2021
@stefanvantchev
Copy link
Author

Great, I am not losing my mind.

@wochinge
Copy link
Contributor

fixed by #7769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss/ml 👁 All issues related to machine learning type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants