Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be faster and better #13

Open
tomcup opened this issue Jul 19, 2023 · 1 comment
Open

Be faster and better #13

tomcup opened this issue Jul 19, 2023 · 1 comment

Comments

@tomcup
Copy link

tomcup commented Jul 19, 2023

Try not using itmm, then it'll be simple to turn it into C/C++. I find it very slow for the initiation, I don't know why. In the example, it seems that cpu is used, is there some special reason not using gpu? GPUs are used more in video processing usually.

I've tried the demo, --n=8 for 30s and --n=32 for 2min(No initialization time) on my 1650gpu. The average of 3.75s per frame is much higher than RIFE(VapourSynth-RIFE-ncnn-Vulkan: rife-v4.6 ensemble=True , I tried using it on a 2h 1080p film from 24 to 60, the whole process is about 14h, average of it is 0.12s per frame), why is that?

@tomcup
Copy link
Author

tomcup commented Jul 20, 2023

I used screenshots of animated movies, and I don't expect this model to have a good effect on animation, but I want to see how it works against darkness and railings.

These are what I used:
mpv-shot0001
mpv-shot0002

But the GPU's performance surprised me:
image
I just python .\demo_Nx.py --n 3, and the gpu do like this?

Then I tried this:

import cv2
import sys
import torch

import numpy as np
from imageio import mimsave

sys.path.append(".")
import config as cfg
from benchmark.utils.padder import InputPadder

from model import feature_extractor, flow_estimation

I0 = cv2.imread("example/mpv-shot0001.jpg")
I2 = cv2.imread("example/mpv-shot0002.jpg")

I0_ = (torch.tensor(I0.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)
I2_ = (torch.tensor(I2.transpose(2, 0, 1)).cuda() / 255.0).unsqueeze(0)

padder = InputPadder(I0_.shape, divisor=32)
I0_, I2_ = padder.pad(I0_, I2_)

backbonetype, multiscaletype = (feature_extractor, flow_estimation)
# backbonecfg, multiscalecfg = cfg.init_model_config(F=16, depth=[2, 2, 2, 2, 2])
backbonecfg, multiscalecfg = cfg.init_model_config(F=32, depth=[2, 2, 2, 4, 4])
net = flow_estimation(feature_extractor(**backbonecfg), **multiscalecfg)

def convert(param):
    return {
        k.replace("module.", ""): v
        for k, v in param.items()
        if "module." in k and "attn_mask" not in k and "HW" not in k
    }

net.load_state_dict(convert(torch.load(f"ckpt/ours_t.pkl")))
net.eval()
net.to(torch.device("cuda"))

imgs = torch.cat((I0_, I2_), 1)
pred = net(imgs)

mid = (
    padder.unpad(pred)[0]
    .detach()
    .cpu()
    .numpy()
    .transpose(1, 2, 0)
    * 255.0
).astype(np.uint8)
mimsave("example/out_2x.jpg", [mid[:, :, ::-1]])

The result is more funny:

torch.cuda.OutOfMemoryError: CUDA out of memory. 
Tried to allocate 44.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. 
Of the allocated memory 9.82 GiB is allocated by PyTorch, and 186.06 MiB is reserved by PyTorch but unallocated. 
If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can two ordinary 1080p movie screenshots run out of GPU memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant