-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obscure validation failure due to _use_cached_eval_dataset
#20177
Comments
@DLumi I suspect this is not an issue on Keras 3 actually. Keras 2 actually caches an attribute on self, which totally makes sense that it might mess up in a keras/keras/src/backend/tensorflow/trainer.py Lines 391 to 397 in d4a5116
If you think we could apply the same approach to tf-keras, you are welcome to open up a PR there. Otherwise we will probably stick to this being fixed on Keras 3. I will close this for now, but if you can recreate a bug on Keras 3 please re-open! (Also, as to why this exists, yes it's to avoid some overhead the creating the dataset iterator. But Keras 3 handles this much more elegantly than Keras 2) |
Uh, I'm pretty sure it's functionally exactly the same as in Keras 2, as I see little to no change in actual code. Here's Keras 2 code for comparison: Maybe I am missing something here, though? Anyways, it would greatly help if I knew how to recreate first-working-then-failing tf.Dataset on the toy scale. |
Ah my bad, I misread the code. Still there is a key difference between Keras 2 and Keras 3 here. This line keras/keras/src/backend/tensorflow/trainer.py Line 262 in 4c71314
In Keras 3, we always clear the cached dataset at the beginning of fit. Which is not true in Keras 2. So I see how a crashing fit could cause an issue in Keras 2, but not in Keras 3. As for a crashing a dataset, maybe something like this. import tensorflow as tf
ds = tf.data.Dataset.from_tensor_slices(tf.range(100))
@tf.py_function(Tout=tf.int32)
def crasher(x):
if x > 50:
raise ValueError
return x
ds = ds.map(crasher)
for x in ds:
print(x) |
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
I'll preface by saying that I encountered this issue with tf_keras == 2.15, but the source code regarding evaluation is hardly different from v2.15, I feel that it's still applicable here.
The issue is that no matter what
fit
forcesevaluate
to use stored dataset object for the validation step instead of whatever object you supply to fit. This is super obscure, but it's probably done for some performance reasons, so whatever.Why is this an issue?
If you change something about your dataset (like, initially you forgot to turn on
.ignore_errors()
) mid training, and then you pass the new DS instance tofit
, it completely ignores this fact. And in this particular case, it would fail if any errors arise on the DS preprocessing steps.Yes, you can cure it by
model._eval_data_handler = None
, which in turn forcesevaluate
to cache the new object, but to figure this out, you have to spend some time on diving into the source code.So what I propose is:
fit
's documentationP.S. I'd provide a colab link, but it turns out that making a tf.Dataset that randomly fails when I want it to is actually way harder than it seems
The text was updated successfully, but these errors were encountered: