Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video Swin Transformer #2369

Merged
merged 94 commits into from
Apr 5, 2024
Merged

Conversation

innat
Copy link
Contributor

@innat innat commented Mar 1, 2024

What does this PR do?

Fixes #2262

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you write any new necessary tests?
  • If this adds a new model, can you run a few training steps on TPU in Colab to ensure that no XLA incompatible OP are used?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

Copy link
Contributor

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work here 🎉 Thanks @innat! I still need to test presets.

The model looks very good overall, just some nits about exporting some layers and models.

keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/video_swin_layers.py Outdated Show resolved Hide resolved
keras_cv/layers/__init__.py Outdated Show resolved Hide resolved
return {
"videoswin_base_kinetics400": copy.deepcopy(
backbone_presets["videoswin_base_kinetics400"]
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backbone base model has more than one checkpints.

  1. with kinetics-400-base (current)
  2. with kinetics-400-base-imagenet22k
  3. with kinetics-600-base-imagenet22k
  4. with something-something-v2

How to facilitate the preset method for all of these?

def presets(cls):
    """Dictionary of preset names and configurations."""
    return {
        "videoswin_base_kinetics400": copy.deepcopy(
            backbone_presets["videoswin_base_kinetics400"]
        ),
        "videoswin_base_kinetics400_imagenet22k": copy.deepcopy(
            backbone_presets["videoswin_base_kinetics400_imagenet22k"]
        ),
        ...
    }

@innat
Copy link
Contributor Author

innat commented Mar 31, 2024

Summarizing weight check.

Backbones (tolerance 1e-4)

Classifier (tolerance 1e-5)

notebook-1 for kinetics-400 (tiny, small, base, base-imagenet22k)
notebook-2 for kinetics-600 (base-imagenet22k), something-something-v2

@tirthasheshpatel @divyashreepathihalli
Could you please verify the weight used in the above notebooks? I will remove this notebooks from kaggle workspace afterward.

Note, In notebook-1, torchvision lib is used to load video-swin api and the pytorch weights they offered, whereas in notebook-2, raw official code and weights are loaded.

@innat
Copy link
Contributor Author

innat commented Apr 2, 2024

ONNX

I noticed others also tried to export this model to onnx format but failed and reported to the official repo, tickets. So, I tried with this implementation with torch backend and it works as expected.

model = VideoClassifier(
    backbone=backbone,
    num_classes=num_classes,
    activation=None,
    pooling='avg',
)
model.eval()
batch_size = 1

#Input to the model
x = torch.randn(batch_size, 32, 224, 224, 3, requires_grad=True)
torch_out = model(x)

Using the torch official guideline.

torch.onnx.export(
    model, # model being run
    x,  # model input (or a tuple for multiple inputs)
    "vswin.onnx", 
    export_params=True,       
    opset_version=10,       
    do_constant_folding=True, 
    input_names = ['input'],   # the model's input names
    output_names = ['output'], # the model's output names
    dynamic_axes={
        'input' : {0 : 'batch_size'}, 
        'output' : {0 : 'batch_size'}
    }
)
import onnx
import onnxruntime

def to_numpy(tensor):
    if tensor.requires_grad:
        tensor = tensor.detach()
    tensor = tensor.cpu()
    numpy_array = tensor.numpy()
    return numpy_array

onnx_model = onnx.load("vswin.onnx")
onnx.checker.check_model(onnx_model)

ort_session = onnxruntime.InferenceSession(
    "vswin.onnx", providers=["CPUExecutionProvider"]
)

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

Logit checking.

np.testing.assert_allclose(
    to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05
)

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 2, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 2, 2024
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move the video_swin layers into the model folder itself. Everything else LGTM!

@innat
Copy link
Contributor Author

innat commented Apr 2, 2024

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate?
Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

@divyashreepathihalli
Copy link
Collaborator

divyashreepathihalli commented Apr 2, 2024

lets move the video_swin layers into the model folder itself. Everything else LGTM!

Sorry, could u please elaborate? Do u want this file to relocate to here? If so, wouldn't it be anti pattern from current standard? I mean, all of the layers supposed to be in this directory, or no?

Nope! all model specific layers should be inside the model folder. Only generic layers will go under the layers folder.
The move locations linked are correct.

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 3, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 3, 2024
@innat
Copy link
Contributor Author

innat commented Apr 3, 2024

I think the test is failling for other issue.

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 4, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 4, 2024
@divyashreepathihalli divyashreepathihalli merged commit bfeba12 into keras-team:master Apr 5, 2024
10 checks passed
@divyashreepathihalli
Copy link
Collaborator

Thank you for this awesome contribution!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Video Swin Transformer Model
4 participants