Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update packages for macOS #1203

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DocGarbanzo
Copy link
Contributor

Unfreeze matplotlib and add tensorflow-metal to macos install for apple silicon

@cfox570
Copy link
Contributor

cfox570 commented Dec 16, 2024

I tried the install (main branch with your updated files). The install works but training is strange. Here is a fragment from the last Epoch: Note the loss is astronomical. MacBook Air M1. Are you getting valid results?

178/178 [==============================] - ETA: 0s - loss: 119965049405247305285632.0000 - n_outputs0_loss: 80119361630094294515712.0000 - n_outputs1_loss: 39845651746355991805952.0000 
Epoch 6: val_loss did not improve from 107208061654925312.00000
178/178 [==============================] - 21s 117ms/step - loss: 119965049405247305285632.0000 - n_outputs0_loss: 80119361630094294515712.0000 - n_outputs1_loss: 39845651746355991805952.0000 - val_loss: 134251115941612859949056.0000 - val_n_outputs0_loss: 15944611674736914595840.0000 - val_n_outputs1_loss: 118306526784874082205696.0000
INFO:donkeycar.parts.keras:////////// Finished training in: 0:02:25.759302 //////////

I then uninstalled tensorflow-metal and though it was slow I got expected results:

Epoch 76: val_loss did not improve from 0.04335
178/178 [==============================] - 50s 282ms/step - loss: 0.0500 - n_outputs0_loss: 0.0365 - n_outputs1_loss: 0.0135 - val_loss: 0.0438 - val_n_outputs0_loss: 0.0316 - val_n_outputs1_loss: 0.0122
Epoch 77/100
178/178 [==============================] - ETA: 0s - loss: 0.0483 - n_outputs0_loss: 0.0350 - n_outputs1_loss: 0.0133  
Epoch 77: val_loss did not improve from 0.04335
178/178 [==============================] - 75s 420ms/step - loss: 0.0483 - n_outputs0_loss: 0.0350 - n_outputs1_loss: 0.0133 - val_loss: 0.0446 - val_n_outputs0_loss: 0.0323 - val_n_outputs1_loss: 0.0123
INFO:donkeycar.parts.keras:////////// Finished training in: 1:24:04.925587 //////////

I have been trying to get this to work since version 5.0 +. The version running Tensorflow 2.9 with the appropriate Python 3.9 and Tensorflow-metal 0.5.0 works. I have read that others have problems with the newer versions. And no updates from Apple since last year September.

@DocGarbanzo
Copy link
Contributor Author

I tried the install (main branch with your updated files). The install works but training is strange. Here is a fragment from the last Epoch: Note the loss is astronomical. MacBook Air M1. Are you getting valid results?

178/178 [==============================] - ETA: 0s - loss: 119965049405247305285632.0000 - n_outputs0_loss: 80119361630094294515712.0000 - n_outputs1_loss: 39845651746355991805952.0000 
Epoch 6: val_loss did not improve from 107208061654925312.00000
178/178 [==============================] - 21s 117ms/step - loss: 119965049405247305285632.0000 - n_outputs0_loss: 80119361630094294515712.0000 - n_outputs1_loss: 39845651746355991805952.0000 - val_loss: 134251115941612859949056.0000 - val_n_outputs0_loss: 15944611674736914595840.0000 - val_n_outputs1_loss: 118306526784874082205696.0000
INFO:donkeycar.parts.keras:////////// Finished training in: 0:02:25.759302 //////////

I then uninstalled tensorflow-metal and though it was slow I got expected results:

Epoch 76: val_loss did not improve from 0.04335
178/178 [==============================] - 50s 282ms/step - loss: 0.0500 - n_outputs0_loss: 0.0365 - n_outputs1_loss: 0.0135 - val_loss: 0.0438 - val_n_outputs0_loss: 0.0316 - val_n_outputs1_loss: 0.0122
Epoch 77/100
178/178 [==============================] - ETA: 0s - loss: 0.0483 - n_outputs0_loss: 0.0350 - n_outputs1_loss: 0.0133  
Epoch 77: val_loss did not improve from 0.04335
178/178 [==============================] - 75s 420ms/step - loss: 0.0483 - n_outputs0_loss: 0.0350 - n_outputs1_loss: 0.0133 - val_loss: 0.0446 - val_n_outputs0_loss: 0.0323 - val_n_outputs1_loss: 0.0123
INFO:donkeycar.parts.keras:////////// Finished training in: 1:24:04.925587 //////////

I have been trying to get this to work since version 5.0 +. The version running Tensorflow 2.9 with the appropriate Python 3.9 and Tensorflow-metal 0.5.0 works. I have read that others have problems with the newer versions. And no updates from Apple since last year September.

Thanks @cfox570 for helping to test the code. Training for me works without any problems on my MB Pro M3. Can you please check which versions of tensorflow, tensorflow-metal and python are getting installed in your environment? I am seeing the following:

tensorflow                   2.15.1
tensorflow-metal             1.1.0
python                       3.11.9 

Note, when you check out the version of the PR you should be seeing the version v5.2.dev3 if you are loading donkeycar. Also, do you maybe train using a transfer model? I could imagine that the formats of the saved models might differ when using tensorflow-metal and you might need to start from scratch.

@cfox570
Copy link
Contributor

cfox570 commented Dec 29, 2024 via email

@DocGarbanzo
Copy link
Contributor Author

I am using Ed's circuit launch data from the donkeycar_dataset repo.

@cfox570
Copy link
Contributor

cfox570 commented Jan 15, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants