Add get_image method to Stream#61918
Conversation
|
Hey there @hunterjm, @allenporter, mind taking a look at this pull request as it has been labeled with an integration ( |
|
Actually this crashed once for me. |
| # decode packet (try up to 3 times) | ||
| for _i in range(3): | ||
| # pylint: disable=maybe-no-member | ||
| if frames := last_packet_or_image.decode(): |
There was a problem hiding this comment.
Have details of the crash? I assume its in turbojpeg, but i was curious about this step. I was thinking maybe this step is the problem since its produced by the worker thread but run in the main loop. However reading https://pyav.org/docs/develop/cookbook/basics.html#threading it didn't necessarily give me the impression this decode step could be decoupled, or that its cheap enough we can do it all the time. That is, it seems both heavyweight and tied to worker state.
How often are there keyframes?
(I was thinking through an alternative where the Stream asks the worker for a frame and it produces it on demand as it iterates, which is a little more complex from an orchestration perspective, but maybe less complex from a thread safety perspective if its just passing off the final bytes between threads)
There was a problem hiding this comment.
No, there wasn't much in my docker logs - I think the process exited with a 256 error.
After looking at a discussion about threading in libjpeg-turbo, I don't think it's from that side. It might be from the python wrapper, but there's a good chance it's from the pyav side.
I haven't been able to repro. It happened when I had 10+ cameras on the same page pulling images every 10 seconds. I don't have a separate dev instance so I took off those settings and only have it pulling on one camera right now, and I haven't seen any problems.
Keyframes depend on the input stream. Generally they're every 1 second or every few seconds, but for some of the modified codecs like H264+ H265+ they use really long intervals like 20 seconds to save on bandwidth. I use those myself, and I don't mind the lag. With the regular keyframe intervals this is not a concern, but for long keyframe intervals an alternative implementation is to wait for the next keyframe instead of the last one, resulting in a longer time to the first request but a shorter "lag" between the displayed picture and real time.
We could try to move it to the worker, and that might solve a threading issue if we have one, but we'd have to add the overhead of the worker to core events/signaling. Currently the only inter thread interaction is when the packet is set, but I think that is atomic.
One more feature we can add is options for quality/scaling which can help to speed up the encode/implement the width/height parameters used in Camera.async_camera_image
|
Moved the encoding into the worker. At first I played around with waiting for the next keyframe before returning the image, which will result in marginally less "lag" between the returned image and real time, but I ended up reverting to the last keyframe. The reason is twofold:
|
I think this is a very smart compromise to get the prior key frame, and seems like the correct answer. Overall, this having the worker thread do work approach is a little more complicated given the extra cooperation, but the code structure seems to hide the complexity well. I think its worthwhile given the robustness. |
|
|
||
| async def get_image(self) -> bytes: | ||
| """Fetch an image from the Stream and return it as a jpeg in bytes.""" | ||
| return await self._keyframe_converter.get_image() |
There was a problem hiding this comment.
I feel like this deserves a health check before running, e.g. is the worker thread started? maybe also pre-checking "self.available"? Or do you consider this the callers responsibility?
(either way i assume we'll want something faster than a 60 second timeout, which seems more like a failsafe in case things get stuck)
There was a problem hiding this comment.
I initially thought the caller should be responsible for this. It seems wasteful to have to check it every time, and the caller should generally know the status of stream.
However, it might be easier to actually take care of all this here in stream than to have platforms try and manage things. The problem then becomes managing the stream when given these image calls - I already saw a previous request to be able to lower the default stream timeout. I'm worried that introducing this separate use case for stream will have a separate list of demands including a customizable stream timeout.
Yes the 60 second timeout was based on waiting for the next keyframe, which could be some time in the future with certain codecs/settings. Not sure exactly what we should drop it to though. If this was purely image generation, something like 0.5 seconds would probably work. However, this is the time from the request until the worker replies, which requires the worker to have gotten a new packet from its input. For RTSP feeds this extra time waiting for a packet is probably negligible, but if it's some cloud feed like a hls feed this might be like 5-10 seconds. I guess this is one argument for moving the image generation process back out of the worker.
| stream.stop() | ||
|
|
||
|
|
||
| async def test_get_image(hass, record_worker_sync): |
There was a problem hiding this comment.
Is it possible to add some test coverage for the case with multiple tasks asking for image thumbnail to be produced? (I assume that is a case that can happen)
There was a problem hiding this comment.
We can add this after we figure out the other stuff.
There was a problem hiding this comment.
As the code stands right now, there is an asyncio lock around the call to generate_image() from the main thread, so there can only be one generate_image() per KeyframeConverter/stream at a time. I think a test for that specific case is not necessary since the lock is straightforward. I think adding such a test may also be quite involved as we'd have to add another pause/sync mechanism inside the get_image call so that the calls overlap, so there would be a complicated test with little benefit.
As for the threading problems before, I think they have mostly been resolved. The last one was due to trying to use the CodecContext object from the existing outputs. It would work for a while until the container was closed at which time the CodecContext object was deallocated. I've now created a separate CodecContext just for KeyframeConverter use.
|
|
||
| # Keep import here so that we can import stream integration without installing reqs | ||
| # pylint: disable=import-outside-toplevel | ||
| from homeassistant.components.camera.img_util import TurboJPEGSingleton |
There was a problem hiding this comment.
Is it a problem to make camera a dependencies for stream? It seems like they would be loading it anyways
There was a problem hiding this comment.
There's also a circular import problem between the two components because camera also imports things from stream. The easiest way to avoid this is to rejig the imports (including the conditional ones) to avoid importing individual classes/functions from each other and instead importing the whole modules (ie avoid using from x import y and just using import x), but the former seems to be preferred in the HA codebase and the existing stream and camera code follow this. If we want to make those changes, I can do that in this PR or another one.
3f61b65 to
6e1b320
Compare
Refactor and do keyframe conversion in worker
Refactor KeyFrameConverter Add KeyFrameConverter comments Adjust KEYFRAME_TIMEOUT
6e1b320 to
f549f31
Compare
|
Sorry had to rebase several times to get around frenck's CI issue. Should be good now. |
Proposed change
This PR adds a
get_imagemethod toStreamwhich generates a still image from the most recent keyframe in the RTSP feed. This can be used by camera platforms that support Stream but have no still snapshot url. This can be used as an alternative to using haffmpeg to establish a new connection to the camera whenever an image is requested, but for this to work properly the stream must be active (ie "preload stream" needs to be on).The jpeg creation step requires turbojpeg to be installed (alternatively we could use pillow).
Type of change
Additional information
Checklist
black --fast homeassistant tests)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest.requirements_all.txt.Updated by running
python3 -m script.gen_requirements_all..coveragerc.The integration reached or maintains the following Integration Quality Scale:
To help with the load of incoming pull requests: