-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply best seeking strategy for MP3 #9408
Comments
@christosts Any update on this issue? Timeline? |
There's no mechanism in the player at the moment to inform the app what seeking strategy to use, or an API to inform apps of the seeking capabilities of a file. You could proactively enable both options (index-based and constant bit-rate seeking) in the DefaultExtractorsFactory. However, for MP3 files, if both options are enabled, then index-based seeking will be applied, with the caveats described here. We can try to improve MP3 parsing when both seeking options are enabled to pick the "best" one depending on whether the file is CBR/VBR, but at the moment I'm not sure we can reliably detect that early without actually parsing the entire file. I will mark this issue as an enhancement, but it's probably going to be on low-priority. |
Thanks, @christosts . I'm glad there is an idea for a long term solution that's perhaps more ideal, although it would be good to also have a short term, cheap workaround. Looking on StackOverflow, there is an answer suggesting how an app could monitor the bitrate: https://stackoverflow.com/a/32135909/13949389 But to then make use of this information, we would need to be able to lazily/dynamically set either index-based or constant bit-rate seeking after the audio has already started loading. Would that be feasible in the short term, or is it already possible? |
If there's no better way to determine whether an MP3 file is CBR or VBR, other than to scan through every frame, then I think that proactively enabling both options as @christosts suggested is the best that can be done here. The only other suggestion I can think of is that perhaps index seeking could heuristically detect "probably constant bitrate" after it's scanned some portion of the file and established that the scanned part is constant bitrate. That would not guarantee correct seeking though, since it's basically guessing that the rest of the file will be the same. |
I want to make sure I understand this: does that mean enabling both for an MP3 file is equivalent to just enabling index-based seeking alone because in either case, index-based seeking will inevitably be applied regardless of the other option being enabled? Or is it instead the case that if you seek beyond the tip of where the index-based seektable has so far been generated, it will then fall back to CBR seeking? And would the timestamp of the current playback position be auto-corrected when the more accurate index-based seektable eventually catches up to that timestamp? As for what ExoPlayer could ideally do to make things easier for the developer, I do like the way seek works in Apple's AV Foundation framework because the seek accuracy is actually a parameter of the seek operation itself. Sure, it may still make sense to have an API to start the building of the accurate seek table in advance using index-based seeking if an app knows it wants that, but I think ExoPlayer could in theory also lazily build up a similar seek table and do a more efficient job of building it on demand, making use of all information available to it. e.g. if an MP3 does provide a lowres seek table, that could be leveraged as a lattice to which more accurate fragments of the seek table can be built and attached on demand. And in the absence of that lattice, I think ExoPlayer could still try to automatically build up an accurate seek table in fragments/islands based on which parts of the audio get loaded, and those islands could connect over time snapping them into alignment, improving accuracy over time. I don't know exactly what other audio player libraries do, but there are clearly some clever things that they are doing to make the seek experience more generally accurate for MP3 files. Regarding the original use case, if we take a look at podcast players, they take arbitrary URLs outside the control of the app developer, and the vast majority of these are MP3 files. And one of the newer developments in podcast apps is that podcast episodes now come with chapter timestamps allowing some players to be built that can jump around to specific points where the host introduces particular topics. For this type of feature, I think it would be especially helpful to have some sort of heuristic that does a decent job for long audio files, so if there is a way to guess whether an audio file is "probably" CBR, that would certainly be a useful thing to detect. However, as I mentioned above, the only way to figure out that the audio file is probably CBR, you would need to wait until after some of the audio has started loading, so any ExoPlayer API that would allow me to act on this shouldn't simply be a parameter on the factory that creates the extractor as that's too early. Having this option as a parameter of the seek method itself would be more useful. |
Yes, if both options are enabled, then for MP3 files index-seeking will be used entirely and the CBR option will be ignored. Thank you for the input on seeking and audio files, that's very useful and we will take it into consideration when prioritizing items. I removed the low-priority label, but this enhancement is not scheduled yet. I'll update this issue if/when we find time to work on this. |
Just to clarify, this is not possible unless the MP3 file provides some kind of low-res seek table / lattice. The whole problem with MP3 files is that frames don't contain absolute timestamps. So if you start loading an MP3 in the middle, you have no way of determining the absolute timestamp you're loading from. Hence you don't know the starting timestamp of the fragment/island. The only two ways you can know the accurate starting timestamp (and hence the accurate timestamp of any position within the fragment/island) are:
In all other cases you have to start making assumptions that may not be true (e.g., that the bitrate is constant in part of the file you haven't loaded yet).
I'm not that convinced by this statement, although would happily be proven wrong! If I had to guess, I suspect they're just doing index based seeking, because that's fundamentally the only way to accurately seek into an MP3 file that doesn't contain something like a Xing header to assist with seeking. I suspect the difference is just that other players may be a lot more aggressive about buffering and indexing the entire file at the start of playback, so that the index is already built by the time the user seeks. It seems a lot more plausible to me that players would do this, rather than some complicated heuristic based schemes that fundamentally cannot work accurately in all cases. |
That is why I am suggesting that you could represent these islands with relative timestamps with an assumed/approximate starting timestamp, and at the moment islands become connected, they could snap into alignment with each other, i.e. those initial approximate assumptions could be corrected. So for example:
island-1 will have the correct initial offset (i.e. zero). If the user seeks to island 2, you could build a fragment of the seek table there with a best guess approximation of it's starting point timestamp. Once island-1 grows and connects with island 2, then island-2 can snap to its correct offset. |
Right, but the case you're describing can only happen if the player has already given the user an inaccurate seek (corresponding to the point in time when it initially started buffering
|
Unless perhaps two decoders were used, one for the leftmost island to continue expanding, and a secondary decoder invoked whenever the user seeks outside the known seek table. |
Is there an update or even ETA on this? This is the most frequent complaint on my audiobook player. |
I'm afraid not, this is not in our plans at the moment |
ExoPlayer provides options to enable index-based seeking and constant bitrate seeking, but in order to choose an appropriate strategy for some arbitrary media file (e.g. an arbitrary URL not under the control of the app developer), it would be necessary to first know whether the file has a precise or approximate seek table, or in the absence of any seek table, whether it is encoded with a VBR. How is an app developer expected to best query these things in order to choose an appropriate seek strategy for a given file? Is there any way ExoPlayer could make this easier?
The text was updated successfully, but these errors were encountered: