Update trainer.py #20473

TheMGGdev · 2024-11-08T17:18:51Z

Cleaning each time you compile the model the previous trainer metrics also erased the compilation of other models with are a cominbation of that model. Given problems in archiquectures shuch as GAN. As explain in issue:

google-cla · 2024-11-08T17:18:55Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

TheMGGdev · 2024-11-08T17:23:52Z

The issue is: #20474

codecov-commenter · 2024-11-08T17:25:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.94%. Comparing base (8409e18) to head (848fdde).

❗ There is a different number of reports uploaded between BASE (8409e18) and HEAD (848fdde). Click for more details.

HEAD has 6 uploads less than BASE

Flag BASE (8409e18) HEAD (848fdde)

keras 4 1

keras-torch 1 0

keras-tensorflow 1 0

keras-jax 1 0

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #20473       +/-   ##
===========================================
- Coverage   82.07%   59.94%   -22.14%     
===========================================
  Files         515      515               
  Lines       47416    47415        -1     
  Branches     7439     7439               
===========================================
- Hits        38919    28421    -10498     
- Misses       6691    17246    +10555     
+ Partials     1806     1748       -58

Flag	Coverage Δ
keras	`59.94% <ø> (-22.00%)`	⬇️
keras-jax	`?`
keras-numpy	`59.94% <ø> (-0.03%)`	⬇️
keras-tensorflow	`?`
keras-torch	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet · 2024-11-09T20:05:55Z

Thanks for the PR!

@hertschuh, @jeffcarp, do you remember why this line was inserted? I believe it was intended to fix a bug, however removing it does not seem to break any test.

If we do remove the line, then we need a test for the use case it was breaking.

hertschuh · 2024-11-10T02:59:36Z

Thanks for the PR!

@hertschuh, @jeffcarp, do you remember why this line was inserted? I believe it was intended to fix a bug, however removing it does not seem to break any test.

If we do remove the line, then we need a test for the use case it was breaking.

I don't have any context on this.

This is the PR that added this line to fix some bug: #20197

cc @james77777778

james77777778 · 2024-11-10T08:16:44Z

@fchollet @hertschuh
I'm looking into it and will report back soon.

james77777778 · 2024-11-10T15:04:26Z

@fchollet
I believe the change of this PR should be fine. The drawback is that there might be some unused metrics in the sub-trainers.

Previously, that line was necessary to prevent an error caused by the nested Trainer arch.
Here’s a reproducible snippet:

import keras

train_images = keras.random.uniform(shape=(32, 784))
train_labels = keras.random.randint(shape=(32, 1), minval=0, maxval=9)

trainer1 = keras.Sequential(
    [keras.layers.Input(shape=(784,)), keras.layers.Dense(1, activation="relu")]
)
trainer1.compile(loss="mse", metrics=["mse"])  # Not calling `build` for `trainer1`'s `CompileLoss`

# Create another model.
inputs = keras.Input(shape=(784,))
x = trainer1(inputs)
outputs = keras.layers.Dense(10)(x)
trainer2 = keras.Model(inputs=inputs, outputs=outputs)
trainer2.compile(loss="binary_crossentropy", metrics=["accuracy"])
trainer2.fit(
    train_images, keras.utils.to_categorical(train_labels, 10), epochs=2
)
# `fit` might fail because `trainer1`'s `CompileLoss` is not built

However, there is logic that skips the metrics for the sub-trainer:

keras/keras/src/trainers/trainer.py

Lines 264 to 267 in 8deee17

    
           if isinstance(layer, Trainer): 
        
               # All Trainer-related metrics in sublayers should be ignored 
        
               # because a new Trainer has been instantiated. 
        
               continue

This allows us to keep the metrics in the sub-trainer without encountering the unbuilt issue.

We might want to consider adding a similar test as shown above to prevent future breakages.

fchollet · 2024-11-10T18:23:39Z

Thank you, @james77777778 and @hertschuh !

@TheMGGdev are you able to add a basic, minimal unit test for your use case, so we avoid breaking it in the future?

We should also include a test based on @james77777778's snippet above.

TheMGGdev · 2024-11-12T09:31:18Z

I don't know if this is exactly what you are asking for. Here is a more simplified example of the issue 20474. In Keras 3.6 it gives error but in Keras 3 it works. In the issue I explain better the error and where I think the error is. This is the code with the prints for debugging:

import numpy as np

from keras import __version__
from keras.models import Sequential, Model
from keras.layers import Input, Dense
from keras.optimizers import Adam

print(__version__)

model_1 = Sequential([
            Input(shape = (100, )),
            Dense(100, activation = "sigmoid"),
        ],)
model_2 = Sequential([
            Input(shape = (100, )),
            Dense(80, activation = "sigmoid"),
        ],)

model_1.compile(loss = 'binary_crossentropy', optimizer = Adam(), metrics = ['accuracy'])

###Print for debugging/show the error 
print('---Debugging/show the error after compiled model_1 and before compiled combined---')
print('model_1.compiled ->', model_1.compiled)
print('model_1.metrics ->', model_1.metrics)
print('model_1._loss_tracker ->', model_1._loss_tracker)
###

combined = Model(Input(shape=(100,)), model_2(model_1(Input(shape=(100,)))))
combined.compile(loss = 'binary_crossentropy', optimizer = Adam())

###Print for debugging/show the error
print('---Debugging/show the error after compiled model_1 and combined---')
print('model_1.compiled ->', model_1.compiled)
print('model_1.metrics ->', model_1.metrics)
print('model_1._loss_tracker ->', model_1._loss_tracker)
###

model_1.train_on_batch(np.random.normal(0, 1, (64, 100)), np.random.normal(0, 1, (64, 100)))
combined.train_on_batch(np.random.normal(0, 1, (64, 100)), np.random.normal(0, 1, (64, 80)))

And this is the code without the prints for the test:

import numpy as np

from keras import __version__
from keras.models import Sequential, Model
from keras.layers import Input, Dense
from keras.optimizers import Adam

print(__version__)

model_1 = Sequential([
            Input(shape = (100, )),
            Dense(100, activation = "sigmoid"),
        ],)
model_2 = Sequential([
            Input(shape = (100, )),
            Dense(80, activation = "sigmoid"),
        ],)

model_1.compile(loss = 'binary_crossentropy', optimizer = Adam(), metrics = ['accuracy'])

combined = Model(Input(shape=(100,)), model_2(model_1(Input(shape=(100,)))))
combined.compile(loss = 'binary_crossentropy', optimizer = Adam())

model_1.train_on_batch(np.random.normal(0, 1, (64, 100)), np.random.normal(0, 1, (64, 100)))
combined.train_on_batch(np.random.normal(0, 1, (64, 100)), np.random.normal(0, 1, (64, 80)))

This is the output in Keras 3.6:

TheMGGdev · 2024-11-12T09:38:22Z

I think that the solution for both PR is to clean only the layers of the models that don´t have compiled == True, so the one that have not been compiled already. This way it should work for both cases

Update trainer.py

848fdde

Cleaning each time you compile the model the previous trainer metrics also erased the compilation of other models with are a cominbation of that model. Given problems in archiquectures shuch as GAN. As explain in issue:

google-ml-butler bot added the size:XS label Nov 8, 2024

google-ml-butler bot assigned gbaned Nov 8, 2024

TheMGGdev mentioned this pull request Nov 8, 2024

NO _loss_tracker on train_on_batch because compile model multiple times. Possible Bug. #20474

Open

gbaned requested a review from fchollet November 19, 2024 06:32

google-ml-butler bot added the awaiting review label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update trainer.py #20473

Update trainer.py #20473

TheMGGdev commented Nov 8, 2024

google-cla bot commented Nov 8, 2024

TheMGGdev commented Nov 8, 2024

codecov-commenter commented Nov 8, 2024

fchollet commented Nov 9, 2024

hertschuh commented Nov 10, 2024

james77777778 commented Nov 10, 2024

james77777778 commented Nov 10, 2024 •

edited

Loading

fchollet commented Nov 10, 2024

TheMGGdev commented Nov 12, 2024 •

edited

Loading

TheMGGdev commented Nov 12, 2024

Update trainer.py #20473

Are you sure you want to change the base?

Update trainer.py #20473

Conversation

TheMGGdev commented Nov 8, 2024

google-cla bot commented Nov 8, 2024

TheMGGdev commented Nov 8, 2024

codecov-commenter commented Nov 8, 2024

Codecov Report

fchollet commented Nov 9, 2024

hertschuh commented Nov 10, 2024

james77777778 commented Nov 10, 2024

james77777778 commented Nov 10, 2024 • edited Loading

fchollet commented Nov 10, 2024

TheMGGdev commented Nov 12, 2024 • edited Loading

TheMGGdev commented Nov 12, 2024

james77777778 commented Nov 10, 2024 •

edited

Loading

TheMGGdev commented Nov 12, 2024 •

edited

Loading