You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for your excellent work, and i have one question about the extracting 2d appearance feature. when using the resnet152 as the backbone, the output of layer4(before avg_pooling) is [frame 2048 7 7], frames refer to the length of the clip. then stack clips, i get [T len 2048 7 7.]
So can you share how you handle the resnet152 and get the appearance feature claimed in the paper that the dim is T*d
thanks very much
The text was updated successfully, but these errors were encountered:
thanks for your excellent work, and i have one question about the extracting 2d appearance feature. when using the resnet152 as the backbone, the output of layer4(before avg_pooling) is [frame 2048 7 7], frames refer to the length of the clip. then stack clips, i get [T len 2048 7 7.]
So can you share how you handle the resnet152 and get the appearance feature claimed in the paper that the dim is T*d
thanks very much
The text was updated successfully, but these errors were encountered: