Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pose_estimation): support multiprocessing videos_to_poses #130

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

J22Melody
Copy link
Contributor

@J22Melody J22Melody commented Dec 5, 2024

@J22Melody J22Melody marked this pull request as draft December 5, 2024 18:45
src/python/pose_format/bin/directory.py Outdated Show resolved Hide resolved
@@ -15,7 +15,7 @@ def load_video_frames(cap: cv2.VideoCapture):
cap.release()


def pose_video(input_path: str, output_path: str, format: str, additional_config: dict):
def pose_video(input_path: str, output_path: str, format: str, additional_config: dict = {'model_complexity': 1}, progress: bool = True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think additional_config should default to None. it knows how to handle it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this line here, which would raise an error:

https://github.com/sign-language-processing/pose/blob/master/src/python/pose_format/bin/pose_estimation.py#L30

Do we still want by default {'model_complexity': 1}? I do not know the default behavior from Mediapipe and how much difference it brings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, in that case, keep it

src/python/pose_format/bin/directory.py Show resolved Hide resolved
@@ -79,6 +111,21 @@ def get_corresponding_pose_path(video_path: Path, keep_video_suffixes: bool = Fa
return video_path.with_suffix(".pose")


def process_video(vid_path: Path, keep_video_suffixes: bool, pose_format: str, additional_config: dict) -> bool:
print(f'Estimating {vid_path} on CPU {psutil.Process().cpu_num()}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably switch to logging at some point rather than print statements, but apparently combining that with multiprocessing can be tricky.

https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes seems easiest.

try:
pose_path = get_corresponding_pose_path(video_path=vid_path, keep_video_suffixes=keep_video_suffixes)
if pose_path.is_file():
print(f"Skipping {vid_path}, corresponding .pose file already created.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example where logging.debug might be good, as this could potentially output thousands of messages

@J22Melody J22Melody changed the title [WIP] feat(pose_estimation): support multiprocessing videos_to_poses feat(pose_estimation): support multiprocessing videos_to_poses Dec 6, 2024
@J22Melody J22Melody marked this pull request as ready for review December 6, 2024 10:36
@J22Melody
Copy link
Contributor Author

can be merged if it looks good to you

@AmitMY AmitMY merged commit 7a95e70 into sign-language-processing:master Dec 6, 2024
4 checks passed
@J22Melody J22Melody deleted the multiprocessing branch December 6, 2024 14:29
@J22Melody
Copy link
Contributor Author

Okay, the current script has a memory leak - the memory usage goes to > 100GB after running 64 processes on thousands of large videos (BOBSL videos, 30 to 45 minutes) for a night.

Let me know if you have any clue about it.

@cleong110
Copy link
Contributor

@J22Melody last time I was debugging memory usage I used memray: https://pypi.org/project/memray/

@cleong110
Copy link
Contributor

cleong110 commented Dec 17, 2024

That was to do with DGS corpus/ the datasets library, I discovered in that case the tfds was expanding the entire video to frames and holding them in memory, which got really big really fast. I wonder if something similar is going on, that we've not tested with long videos and the pose estimation struggles with really long sequences of frames.

@cleong110
Copy link
Contributor

Following that suspicion and making some calculations...

  • process_video calls pose_video
  • pose_video calls load_video_frames, which only yields them, so that's not it, unless something takes all the frames and turns them into a list somewhere...
  • those frames get passed to load_holistic
  • which eventually passes them to process_holistic, which iterates over all frames. It doesn't hold onto the frames but it does store the body and confidence data in memory.
  • So by the time you get to the return statement, for a 30 minute video you will have processed, let's see... 30 minutes * 60 sec/minute * 25 frames/sec = 45k frames. So you get a numpy array of size (45k, 1, 576, 3) and another of (45k, 1, 576), both float32
  • So that should be about 400 MB per video. 64x that is about 24 GB of memory, just on the numpybody returned, and not counting any intermediate variables.
  • So I guess possibly there's some inefficiencies in process_holistic to look at? I would probably dig into that a bit more

@cleong110
Copy link
Contributor

Gave it a quick test on Google Colab, found a bug. #135

@cleong110
Copy link
Contributor

cleong110 commented Dec 17, 2024

OK, after a bit of messing around on Google Colab with memray (NB: you have to use --follow-fork or it won't work right with multiprocessing), I have some results. For two 5-minute videos, AKzh_rKNqzo.mp4 and AKzh_rKNqzo_copy.mp4 (I just copied the same file) from YouTube-ASL, it looks like it uses about 600 MB of memory at peak usage.

As expected, the peak of that is in process_holistic, specifically this line.

memray_flamegraphs_on_AKzh_rKNqzo_with_2_workers.zip
^ here's the flamegraphs.

https://colab.research.google.com/drive/10vMrhoC0zVXLMzhJHB9kUi_0LqBvFhz7?usp=sharing is the quick notebook I threw together.

@J22Melody
Copy link
Contributor Author

Thanks for the investigation.

Yes, every video is like a few hundred MB and if you have processed a few hundred videos without releasing them, then the memory explodes.

I guess the problem is with multiprocessing/process_map, and I tried multiprocessing.Pool.imap_unordered it works the same.

@AmitMY
Copy link
Collaborator

AmitMY commented Dec 19, 2024

I looked over the code, I don't see why it would leak memory. We are duplicating the memory on the line @cleong110 refered to though.

I think the first thing to investigate then is: Run pose estimation with a single process (video_to_pose) and trace memory over time. the memory should not increase over time (except for the normal, 10kb per frame~)
A video of 45 minutes should have roughly 100,000 frames, which would be 1 GB. (that means, no leak, but it is just a large file)

So in conclusion, if you run 64 processes, and each is 1GB, but we also duplicate the memory, it makes sense you ran out of memory.

Starting solution: Option 2 here
https://chatgpt.com/share/6763fd6f-c3d8-800e-86dc-48b47204d854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants