Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log training metrics to visualize them via tensorboard #5422

Merged
merged 23 commits into from
Mar 18, 2020
Merged

Conversation

tabergma
Copy link
Contributor

@tabergma tabergma commented Mar 13, 2020

Proposed changes:
Related to https://github.com/RasaHQ/research/issues/56

Add option tensorboard_log_directory to DIETClasifier, ResponseSelector and
TEDPolicy.

By default tensorboard_log_directory is None. If a valid directory is provided,
metrics are written during training. After the model is trained you can take a look
at the training metrics in tensorboard. Execute tensorboard --logdir <path-to-given-directory>.

We also write down a model summary (layers, input size, type) in the provided directory.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. One meta comment: Right now we are logging at an epoch level. I think the default behaviour should be that. Can we add an optional parameter which lets you switch the behaviour to logging at a minibatch level. In my experience, logging at minibatch level can help sometimes, so if it is an easy addition then we should implement it.

changelog/5422.feature.rst Show resolved Hide resolved
changelog/5422.feature.rst Outdated Show resolved Hide resolved
rasa/utils/tensorflow/models.py Outdated Show resolved Hide resolved
rasa/utils/tensorflow/models.py Outdated Show resolved Hide resolved
@tabergma
Copy link
Contributor Author

Just to be sure: How would you log on minibatch level? Would you have a counter that increases over time so that you see all steps in one plot? For example, if the first minibatch has 64 steps, the second minibatch would start with step 65 in the plot.

@tabergma tabergma self-assigned this Mar 16, 2020
@dakshvar22
Copy link
Contributor

@tabergma Just tried on the scaffold project dataset - The steps for test logger do not correspond to the steps for the train logger. For example, if evaluate_after_every_epochs is set to 20, then the steps for test logger should be 20,40,60 rather than 1,2,3. Sharing the step size would directly give an indication of which epoch does the validation loss correspond to. Otherwise the user has to calculate that on their own. Attaching a screenshot -
Screenshot 2020-03-17 at 17 33 31
If this is easy enough to change, I would suggest we change it now.

@tabergma
Copy link
Contributor Author

This only happens if you set tensorboard_log_level to minibatch correct? So in that case, training metrics should be plotted after every step, but evaluation metrics just after every epoch?

@dakshvar22
Copy link
Contributor

Yes, setting it to epochs works fine. For your question regarding minibatch - IMO they should be plotted after every evaluate_after_every_epochs steps. So if evaluate_after_every_epochs is set to 2, and number of minibatches in training set is 20 and number of minibatches in validation set is 2 then steps for test logger would be 40,41,80,81,120,121

@tabergma
Copy link
Contributor Author

Update:
Do you mean something like this?
Screenshot 2020-03-17 at 13 31 42

@dakshvar22
Copy link
Contributor

Yes, there should be some inherent corresponding steps.

Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added two minor comments. I think we should add a simple test as well which atleast checks if the folders are created.

rasa/utils/tensorflow/models.py Outdated Show resolved Hide resolved
rasa/utils/tensorflow/models.py Show resolved Hide resolved
Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 💯 Thanks for adding support for this! 🚀

@tabergma tabergma merged commit da25fef into master Mar 18, 2020
@tabergma tabergma deleted the tensorboard branch March 18, 2020 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants