You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I want to say, very great job and big thanks!
I've been considering writing a subsync-like tool for a long time and have wrote a prototyping code for validation before. Though this repo shocks me and I'm thrilled to the FFT algorithm for aligning, I do want to share my initial thought when i implemented my code:
Because most subtitles are not synchronized in a way either the frame rate is wrong (eg: 25 fps subtitle for a 24 fps movie) or there is some kind of offset in the beginning, or both, most of them can be synced by applying a linear transformation to the time. So the problem here is kinda like a linear regression problem and the vital point is to find the corresponding points between the subtitles and the audio or a reference sub. So, similarly, I transform subs into long vectors where 1 for sub on and 0 for sub off. And inspired by the feature detection algorithm in computer vision, I choose SIFT(Scale-invariant feature transform) algorithm and modify it so it can by applied in lower dimension (computer vision is 2D and this is 1D). SIFT-1D will return a set of interesting points (timestamps, and their feature vectors) for each sub. After that I use the common methods to compare the distances between the two sets of feature vectors, match them as pairs and then use RANSAC or other linear regression algorithms to calculate the linear transformation coefficients (scale and offset). The entire progress will cost several seconds when the resolution is 0.1s. In most cases it works fine but there are cases you have to adjust the parameters for SIFT-1D or RANSAC, or the result can turn really ugly, and the result is often unstable (there are some randomness in RANSAC). Also the speed is not optimized. I'm not sure whether the problems lie in the entire thought or my codes.
When I came up with your repo, I noticed that it doesn't support scaling but only offset. I was hoping SIFT-1D may be a solution when properly reimplemented. However I agree that
If you lower the split-penalty it can even correct the framerate difference because it automatically finds that splitting the movies in 3-4 (almost) equal parts with slightly different offsets optimizes the alignment rating.
@kaegi mentioned in #10. So it may not be that necessary.
Well, any comment is welcome ^~^
The text was updated successfully, but these errors were encountered:
First, I want to say, very great job and big thanks!
I've been considering writing a subsync-like tool for a long time and have wrote a prototyping code for validation before. Though this repo shocks me and I'm thrilled to the FFT algorithm for aligning, I do want to share my initial thought when i implemented my code:
Because most subtitles are not synchronized in a way either the frame rate is wrong (eg: 25 fps subtitle for a 24 fps movie) or there is some kind of offset in the beginning, or both, most of them can be synced by applying a linear transformation to the time. So the problem here is kinda like a linear regression problem and the vital point is to find the corresponding points between the subtitles and the audio or a reference sub. So, similarly, I transform subs into long vectors where 1 for sub on and 0 for sub off. And inspired by the feature detection algorithm in computer vision, I choose SIFT(Scale-invariant feature transform) algorithm and modify it so it can by applied in lower dimension (computer vision is 2D and this is 1D). SIFT-1D will return a set of interesting points (timestamps, and their feature vectors) for each sub. After that I use the common methods to compare the distances between the two sets of feature vectors, match them as pairs and then use RANSAC or other linear regression algorithms to calculate the linear transformation coefficients (scale and offset). The entire progress will cost several seconds when the resolution is 0.1s. In most cases it works fine but there are cases you have to adjust the parameters for SIFT-1D or RANSAC, or the result can turn really ugly, and the result is often unstable (there are some randomness in RANSAC). Also the speed is not optimized. I'm not sure whether the problems lie in the entire thought or my codes.
When I came up with your repo, I noticed that it doesn't support scaling but only offset. I was hoping SIFT-1D may be a solution when properly reimplemented. However I agree that
@kaegi mentioned in #10. So it may not be that necessary.
Well, any comment is welcome ^~^
The text was updated successfully, but these errors were encountered: