-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training codes #1
Comments
Thank you. The code in model.py is was used for training, but I still need to update it for the current version of the models. One it is updated, I will update this issue. I do not plan to release the full training code at this point. |
are all the models (face detection, landmarking and gaze detection) based on mobilenet-v3? |
Yes. They are all (except for the optional, pretrained retinaface model) basically heatmap regression with a mobilenet-v3 backend. The gaze tracking model works basically exactly like the landmark one, just for a single landmark. Face detection is a bit special. That model outputs a heatmap, radius map and a maxpooled version of the heatmap that is used for decoding the output. Because the landmarking is quite robust with respect to face size and orientation, the face detection model can get away with outputting only very rough bounding boxes. |
Great thanks. I was wondering if its possible to share the pytorch pre-trained weights as well. I'm trying to run the codes in opencv dnn module instead of onnxruntime. current onnx model seems to be not compatible with dnn module. |
I have now updated the model definitions in model.py to match the currently used models. I have also uploaded the pytorch weights here. My previous attempts at getting the models to work using opencv's dnn module weren't successful, but if you manage to get them to run, I would be very interested in hearing about it! |
Thanks for sharing the files. I was able to convert the models and run them in opencv's dnn module. fps seems to be quite similar with both onnxruntime and dnn module. I will update you when the inference code is complete. |
Thank you for the update! I'm curious to know which format you converted the models to for use with the dnn module. |
I converted the pytorch weights to onnx and using cv2.dnn.readNetFromONNX() with opencv 4.3 version I could run the inference (no other change to your original code). But the outputs of dnn module and onnxruntime are a little bit different with the same preprocessed input. I have uploaded the converted weight here. |
Thank you, I will give it a try using your converted models. My first guess about the difference is that it might have something to do with the Upsample layers. The way I use them is apparently only fully supported with ONNX opset 11, which many inference engines do not seem to support yet. |
yeah the problem was because of align_corners=True in nn.Upsample layer. dnn module is not supporting it yet somehow. therefore needs to set False for inference in dnn module. I will try to find a fix. |
Finally solved. Thanks for pointing out the problem. with align_corners=True, converting from pytorch with onnx opset 11 and re-building opencv master (4.3.0-dev), the dnn module returns similar predictions as onnxruntime. I will re-upload the weights. |
Nice, thank you for the updates! I tried using the models you previously posted with the dnn module, but got an error. I assume those already needed a more recent version than 4.2.0. |
Great! |
For pupil detection, the biggest challenge was finding training data with accurate annotations and variance in pose. Most datasets I looked at had a significant number of annotations that were noticably off. MPIIGaze was the best I could find, but it still had many issues. That's why I ended up training on basically just synthetic data generated with UnityEyes only, but that has its own issues. Another challenge was keeping the gaze model fast, so it could be run in addition to the face landmark model without significantly impacting the frame rate for avatar animation. This lead me to select a very small model that is run at a low resolution. To compensate for that, I forego training the model in a way that lets it adapt to different poses and align the eyes in a consistent way. This lead to another issue, because the eye corner points from the landmark model may not match the corner points (if any are given) in the gaze dataset and most gaze datasets do not include full face images, so it is not possible to run the face landmark model first to align the eyes in a consistent manner. In the end, I calculated eye corner points and pupil centers from the json generated by UnityEyes and aligned the eyes with that. The pupil center was then used as a single landmark to be detected by the model. When I was working on this part, I wasn't aware of In the end, this alignment didn't quite match that produced when aligning according to the eye corner points, so there is some number fudging in the tracker to get better results. During training, I also augmented the training data with rather strong blur, noise and color shifts to make up for the synthetic nature of the data. In addition, I overlaid random bright rectangles to imitate reflections on glasses. While working on it, I posted some intermediate results in Twitter. The white dot is the model's prediction, the black dot is the target. The big picture is the red channel with the black and white dot overlaid. On the side, in the first column are the landmark map, the two offset layers as predicted and the adaptive wing loss mask. In the second rightmost column are the ground truth landmark map, adaptive wing loss mask (repeated) and the two ground truth offset layers. Overall, considering the speed of the model, I think it's working decently well, but any improvement would be welcome of course! You can find the UnityEyes preprocessing script here. |
no doubt about its decent performance. I just found poor performance on some challenging cases such as extreme glass reflections and outside sunny environments, which is mainly due to the limited training set and may not be a concern of your project. A first improvement could be some post processing stabilization scheme on the pupils to enhance their jittery behavior in case of glasses, with no change in the model. after I finish this part I can update you if stabilization makes any improvement. And thanks for the detailed explanations. I set up the training and could get comparable results. |
It's good to hear that you could get comparable results. Another thing I thought of, but haven't tried yet, is to train a bigger, slower model which would hopefully give more reliable results and use that to annotate a more diverse training set to train another smaller model. About stabilizing the pupils, I do a lot bunch of filtering and stabilizing in the code I use to actually animate avatars. |
Actually I tried a HG network (2 stacks) to train the pupil detector on the UnityEye set (with lots of augmentation) but it didn't improve much on my test set. I'm training now with 4 stacks. Oh I wasn't aware of those stabilization part. Thanks. |
That's very interesting! I'm curious to hear about your further results. |
Since I posted the previous pytorch weights already, here are the weights for the new 56x56 30 point model. |
Thank you for sharing the new trained weights. Intrestingly in OpenCV DNN, the inference time of the new model is higher than the lightest model before (6.5ms vs 5.5ms). However In OnnxRuntime inference time was reduced from (5.5ms to 1.7ms). I'm trying to figure out why DNN module is behaving like that! |
That's an interesting difference. The new model is pretty much the full size model going by layer and channel count, but the resolution of the channels is lower. Maybe that has something to do with OpenCV DNN behaving differently. |
One note regarding the Lines 718 to 747 in e805aa2
|
Your landmark is very robust for most case like large pose and exaggerated expression, I have train my model on 300WLP, but it failed to detect often, can you share the way of data process like data augmentation or training tricks. |
I merged multiple datasets, partially reannotating them with FAN and older versions of the same model for some features, fixed some eye point annotations in various ways and filtered out samples where different annotations didn't agree by some threshold. I also used very strong augmentation with noise, blur, downscale, rectangle overlays, strong rotation and random margins at the sides of faces. You can look at the sample images in the results part of the readme to see what the training data looks like. |
How do you think about regression based and heatmap based method, I use regression based method and add strong data augment as you mentioned, but when face box is not so good, the result will get very bad, but I tried your heatmap based model, even if the box is very strange like much larger than the actual face, the result is also very stable. Whether the robustness comes from model struction or something else. |
I can imagine that heatmap based methods lend themselves more to robustness, but I can't give a theoretical reason why. In this case, I think it is a combination of model structure and augmentation. I don't remember the mouth points causing me issues as they at least have the same number of points. I deleted the center eye points in WFLW, but it changes the shape of the eye. You can do two step training, first training on a bigger dataset and once it has converged, training some more on an adjusted WFLW to take advantage of its higher quality annotations. |
|
|
Thank you for explaining to me.
I really appreciate it. @emilianavt
|
Please refer to the landmark
You can create maps from coordinates, but the dataset I used to train is very customized.
No.
There isn't. The numbers are mainly for weighting different landmarks.
Please carefully review the code to understand how everything works. It's not a completely trivial change. |
The geffnet commit I'm on is: c450c12ae6ffb1757f62dde3c2765da3c10f6def |
Great work. is the code in model.py used for training the onnx inference models? any chance to release the training codes?
The text was updated successfully, but these errors were encountered: