Uses torchmetrics for metric computation #284

kylebgorman · 2024-12-08T20:33:12Z

Closes #158.

Loss is computed as before, but streamlined somewhat.
torchmetrics' implementation of exact match accuracy is lightly adapted. This does everything in tensor-land and should keep things on devices. My tests confirm that accuracy is what it was before.
A torchmetrics-compatible implementation of symbol error rate (here defined as the edit distance divided by sum of target lengths) is inserted here. This is heavily documented and it is compatible with our existing implementation. The hot inner loop is still on CPU, but as mentioned in the documentation, this is probably the best option and I don't observe any obvious performance penalty when enabling this. Note that the computation of the old one was not as normally defined and gave the number of edits per word, not the number of edits per gold target symbol.
We do away with the evaluation module altogether. Rather we treat the metrics objects as nullables living in the base class, a design adapted from UDTube.
Both loss and the metrics expect logits tensors to be of the shape B x target_vocab_size x seq_len, but the RNN library code wants it to be B x seq_len x target_vocab_size. I choose the former to be the "default" representation and let the RNN dimensionality be an implementational detail.

The CLI interface is unimpacted.

Closes CUNY-CL#158. * Loss is computed as before, but streamlined somewhat. * `torchmetrics`' implementation of exact match accuracy is lightly adapted. This does everything in tensor-land and should keep things on devices. My tests confirm that accuracy is EXACTLY what it was before. * A `torchmetrics`-compatible implementation of symbol error rate (here defined as the edit distance divided by sum of target lengths) is inserted here. This is heavily documented and it is compatible with our existing implementation. The hot inner loop is still on CPU, but as mentioned in the documentation, this is probably the best option and I don't observe any obvious performance penalty when enabling this. * We do away with the `evaluation` module altogether. Rather we treat the metrics objects as nullables living in the base class, a design adapted from UDTube. The CLI interface is unimpacted, and my side-by-side shows the metrics are exactly the same as before this change.

Adamits · 2024-12-10T17:34:34Z

Nice!

Loss is computed as before, but streamlined somewhat.

torchmetrics' implementation of exact match accuracy is lightly adapted. This does everything in tensor-land and should keep things on devices. My tests confirm that accuracy is what it was before.

Cool, I think when we added our implementation, torchmetrics was requiring strings right? Glad ti see this.

A torchmetrics-compatible implementation of symbol error rate (here defined as the edit distance divided by sum of target lengths) is inserted here. This is heavily documented and it is compatible with our existing implementation. The hot inner loop is still on CPU, but as mentioned in the documentation, this is probably the best option and I don't observe any obvious performance penalty when enabling this. Note that the computation of the old one was not as normally defined and gave the number of edits per word, not the number of edits per gold target symbol.

So we go from a denominator of 1 (I am interpreting word as sequence, since we do not do pretokenization) to a denominator of the length of the gold symbols? Makes sense if so, but I cannot really remember what is standard.

We do away with the evaluation module altogether. Rather we treat the metrics objects as nullables living in the base class, a design adapted from UDTube.

Generally awesome!! Looking at the implementation, I am wondering why we replace the generic set of evals with specific metric attributes and metric booleans? What are the benefits of this (where I see the downsides as adding some bloat to the code, and lots of steps for including new metrics)?

Both loss and the metrics expect logits tensors to be of the shape B x target_vocab_size x seq_len, but the RNN library code wants it to be B x seq_len x target_vocab_size. I choose the former to be the "default" representation and let the RNN dimensionality be an implementational detail.

Does RNN library code mean for the forward function implemented in pytorch? I will look closer at the code, but my feeling is that it is conceptually better to try to maintain a consistent shape, and just reshape the tensor as close as possible to the function for which a different shape is needed.

yoyodyne/models/base.py

Adamits · 2024-12-10T17:57:45Z

yoyodyne/models/base.py

+        )
+        return loss
+
+    def _reset_metrics(self) -> None:


Is there a link to the torchmetrics docs explaining this that we could put here?

I added documentation about what one needs to add to add a metric (there are a lot of steps but all of them simple). There's nothing in torchmetrics since it really does what it says on the tin: it resets the counters internal to the calculator.

There is an alternative design where metric objects are stored in a list or whatever but given that we have all of two metrics, it seems premature. If we got > 4, I'd reconsider.

yoyodyne/models/transducer.py

kylebgorman · 2024-12-10T20:00:30Z

Nice!

Just to say: this isn't really ready for review yet. It depends on a lot of other small changes which I'll make first. I thought I could do it all in one go: I was wrong.

Cool, I think when we added our implementation, torchmetrics was requiring strings right? Glad ti see this.

Yep. It wasn't too hard to make our own.

So we go from a denominator of 1 (I am interpreting word as sequence, since we do not do pretokenization) to a denominator of the length of the gold symbols? Makes sense if so, but I cannot really remember what is standard.

Actually I mispoke here somewhat because I misread the code: the denominator was the length of the tensor; now it's the length of the string the tensor denotes.

Generally awesome!! Looking at the implementation, I am wondering why we replace the generic set of evals with specific metric attributes and metric booleans? What are the benefits of this (where I see the downsides as adding some bloat to the code, and lots of steps for including new metrics)?

Benefits: data lives on the accelerator, like loss data does. I actually don't think the steps to add metrics is meaningfully more difficult than what we had previously, so I don't see any big downsides either. It also can be documented without much trouble.

Both loss and the metrics expect logits tensors to be of the shape B x target_vocab_size x seq_len, but the RNN library code wants it to be B x seq_len x target_vocab_size. I choose the former to be the "default" representation and let the RNN dimensionality be an implementational detail.

Does RNN library code mean for the forward function implemented in pytorch? I will look closer at the code, but my feeling is that it is conceptually better to try to maintain a consistent shape, and just reshape the tensor as close as possible to the function for which a different shape is needed.

I originally piloted with that. Most things in the Torch universe (including loss functions, but also everything in torchmetrics) assume that the first dimension is batch size and the second is vocabulary size (third is length), so it's basically a quirk of the library's RNN modules' forward pass that they swap vocabulary size and length. Since the output of those RNN modules needs some postprocessing anyways (run through a classifier, combined with pointer-generator information, softmaxed, etc.) it makes sense to me to transpose them back to the Torch default one time ("set it and leave it") during those steps. I think this can be done in just a few places and will be an improvement in useability. Transpositions are not, I assume, computationally expensive (I assume what they do internally is just change the "striding" logic by making a minor edit to some metadata in the tensors), but they are a big maintenance burden. 90% of my debugging here involves transpositions and making shape conform.

Adamits · 2024-12-23T20:56:25Z

Nice!

Just to say: this isn't really ready for review yet. It depends on a lot of other small changes which I'll make first. I thought I could do it all in one go: I was wrong.

Cool, I think when we added our implementation, torchmetrics was requiring strings right? Glad ti see this.

Yep. It wasn't too hard to make our own.

So we go from a denominator of 1 (I am interpreting word as sequence, since we do not do pretokenization) to a denominator of the length of the gold symbols? Makes sense if so, but I cannot really remember what is standard.

Actually I mispoke here somewhat because I misread the code: the denominator was the length of the tensor; now it's the length of the string the tensor denotes.

Generally awesome!! Looking at the implementation, I am wondering why we replace the generic set of evals with specific metric attributes and metric booleans? What are the benefits of this (where I see the downsides as adding some bloat to the code, and lots of steps for including new metrics)?

Benefits: data lives on the accelerator, like loss data does. I actually don't think the steps to add metrics is meaningfully more difficult than what we had previously, so I don't see any big downsides either. It also can be documented without much trouble.

Both loss and the metrics expect logits tensors to be of the shape B x target_vocab_size x seq_len, but the RNN library code wants it to be B x seq_len x target_vocab_size. I choose the former to be the "default" representation and let the RNN dimensionality be an implementational detail.

Does RNN library code mean for the forward function implemented in pytorch? I will look closer at the code, but my feeling is that it is conceptually better to try to maintain a consistent shape, and just reshape the tensor as close as possible to the function for which a different shape is needed.

I originally piloted with that. Most things in the Torch universe (including loss functions, but also everything in torchmetrics) assume that the first dimension is batch size and the second is vocabulary size (third is length), so it's basically a quirk of the library's RNN modules' forward pass that they swap vocabulary size and length. Since the output of those RNN modules needs some postprocessing anyways (run through a classifier, combined with pointer-generator information, softmaxed, etc.) it makes sense to me to transpose them back to the Torch default one time ("set it and leave it") during those steps. I think this can be done in just a few places and will be an improvement in useability. Transpositions are not, I assume, computationally expensive (I assume what they do internally is just change the "striding" logic by making a minor edit to some metadata in the tensors), but they are a big maintenance burden. 90% of my debugging here involves transpositions and making shape conform.

Sorry for my delay. This mostly all makes sense to me. " 90% of my debugging here involves transpositions and making shape conform." -- this has typically been my experience in general when writing torch code :). I am mostly trying to suggest that having a yoyodyne default assumption of what shape tensors are in would be nice -- and I think would make it conceptually easier to visualize tensors as you code. If I follow correctly though, the decisions you made in reshaping sounds very reasonable.

kylebgorman · 2025-03-25T17:09:27Z

I'm going to restart this PR because merging is a pain. But comments have been addressed.

kylebgorman added 4 commits December 8, 2024 15:29

Cleanup to transducer

3c9770a

Adds new dependency

7307a0c

Debugging

aabffbc

Adamits suggested changes Dec 10, 2024

View reviewed changes

kylebgorman mentioned this pull request Mar 9, 2025

RNN encapsulation #311

Merged

kylebgorman added 2 commits March 25, 2025 12:23

Update and debug

d34bebc

More merging manuaally

3854d2a

kylebgorman mentioned this pull request Mar 25, 2025

Use torchmetrics and embed them in base model #318

Merged

kylebgorman closed this Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses torchmetrics for metric computation #284

Uses torchmetrics for metric computation #284

kylebgorman commented Dec 8, 2024 •

edited

Loading

Adamits commented Dec 10, 2024

Adamits Dec 10, 2024

kylebgorman Mar 25, 2025

kylebgorman commented Dec 10, 2024

Adamits commented Dec 23, 2024

kylebgorman commented Mar 25, 2025

Uses torchmetrics for metric computation #284

Uses torchmetrics for metric computation #284

Conversation

kylebgorman commented Dec 8, 2024 • edited Loading

Adamits commented Dec 10, 2024

Adamits Dec 10, 2024

Choose a reason for hiding this comment

kylebgorman Mar 25, 2025

Choose a reason for hiding this comment

kylebgorman commented Dec 10, 2024

Adamits commented Dec 23, 2024

kylebgorman commented Mar 25, 2025

kylebgorman commented Dec 8, 2024 •

edited

Loading