anim_nerf.npz #79

serizawa-04013958 · 2024-08-12T13:21:06Z

Hello
I'd like to confirm your preprocess code result.
when I try to preprocess people snap shot dataset, the results does not seem to be difficult from animnerf.npz.
here is a result. since I can get good result when I use animnerf, I would like to ask you the difference.
I used InstantAvatar/tree/master/scripts/visualize-SMPL.py for visualization
anim_nerf ↓
https://github.com/user-attachments/assets/4b337b02-171d-4c24-84df-f2b7579ca363
pose_optimized ↓
https://github.com/user-attachments/assets/058a1e3f-9562-4d91-87a8-81d07b767751

Thank you

tijiang13 · 2024-08-12T14:48:09Z

Hi Serizawa,

Based on the visualization you provided, it seems that the wrong intrinsic parameters may have been loaded. The Anim-NeRF version uses the provided GT intrinsics, which have a larger focal length. However, the preprocessing pipeline we provided utilizes ROMP, which assumes a much smaller focal length. It’s possible that the camera parameters were accidentally overwritten when you run the preprocessing process. As a result, you likely used the ROMP camera to project the Anim-NeRF poses, leading to an overly small reprojection.

Best, Tianjian

tijiang13 · 2024-08-12T14:53:49Z

I guess the documentt can be a bit of confusing. For the benchmark on PeopleSnapshot, we used the GT camera and poses to isolate the impact of inaccuracies stemming from cameras and poses. This was done to illustrate performance under controlled settings(or let's say, what will happen with a better pose estimator as better pose estimator papers come out every year :P). In contrast, the examples on Neuman data were provided to demonstrate performance in less controlled settings, where we have no prior knowledge of the cameras or poses.

serizawa-04013958 · 2024-08-12T15:18:44Z

Thank you for replying immediately!!
let me confirm, anim_nerf.npz is GT camera/pose parameter, right?

so, I understood that ROMP preprocess is not optimal for People snapshot dataset.
but I'd like to treat peoplesnapshot dataset like wild dataset(neuman).

I work on synthetic dataset using people snapshot, and I would like to use same camera pose estimator, but current ROMP can not achieve same quality as GT. Can you help me if possible?

Thank you very much.

tijiang13 · 2024-08-19T09:10:44Z

Hi Serizawa,

Maybe you can give 4DHuman a shot? We have done some internal experiments before and it has better alignment empirically.

Best, Tianjian

serizawa-04013958 · 2024-08-19T09:16:29Z

Hello
so, you mean anim_nerf.npz comes from Ground truth, but you did some internal experiment with 4DHuman,
then it was similar to GT alignment. is it correct?

I mean to ask how to get below accurate npz file. I'm very glad to confirm.

serizawa-04013958 · 2024-08-23T13:10:49Z

@tijiang13
If possible, coulda you share the code to convert 4D human output to ROMP format?
I could work 4D-human's sample code, but 4D-human's format is different from ROMP format...

AlecDusheck · 2024-09-03T04:29:44Z

This would be helpful!

tijiang13 · 2024-09-03T14:50:14Z

Hello Serizawa and Alec,

Sorry for the delayed reply -- I have been quite busy during the past 2 weeks and forgot to check Github regularly.

Here is the code I was using:

# process the SMPL poses
body_pose = np.zeros((NUM_PERSONS, NUM_FRAMES, 23, 3))
global_orient = np.zeros((NUM_PERSONS, NUM_FRAMES, 3))
transl = np.zeros((NUM_PERSONS, NUM_FRAMES, 3))
betas = np.zeros((NUM_PERSONS, NUM_FRAMES, 10))
for i, datum in enumerate(data):
    for j, person_id in enumerate(datum["personid"]):
        person_id = int(person_id)
        cx, cy = datum["box_center"][person_id]
        bbox_size = datum["box_size"][person_id]
        img_size = datum["img_size"][person_id]
        W, H = img_size

        # for cam_t we use pred_cam rather than pred_cam_t
        cam_t = datum["pred_cam"][person_id]
        tz, tx, ty = cam_t
        scale = 2 / max(bbox_size * tz, 1e-9)
        tz = focal_length * scale
        tx = tx + scale * (cx - W * 0.5)
        ty = ty + scale * (cy - H * 0.5)
        cam_t = np.array([tx, ty, tz])

        # convert back pose to SMPL format
        body_pose_R = datum["body_pose"][person_id]
        body_pose_ji = np.stack([cv2.Rodrigues(r)[0].squeeze(-1) for r in body_pose_R])
        global_orient_R = datum["global_orient"][person_id]
        global_orient_ji = cv2.Rodrigues(global_orient_R[0])[0].squeeze(-1)

        body_pose[j, i] = body_pose_ji
        global_orient[j, i] = global_orient_ji
        transl[j, i] = cam_t
        betas[j, i] = datum["betas"][person_id]
        
       
# process the camera
img_size = data[0]["img_size"][0]
W, H = img_size
intrinsic = np.array([[focal_length, 0, W * 0.5],
                      [0, focal_length, H * 0.5],
                      [0,            0,       1]])
extrinsic = np.broadcast_to(np.eye(4), (NUM_FRAMES, 4, 4))

After the conversion you will be able to visualize the SMPL meshes using the visualize_SMPL.py script in this repo.

Best, Tianjian

tijiang13 · 2024-09-03T14:56:57Z

Note: One of the good thing with 4DHuman is its ability to set the focal length for your GT camera when the intrinsics are available. This can be particularly useful when the true focal length differs significantly from the default settings such as in ROMP. This is surprisingly common when it comes to humans (people tend to use very large focal lengths).

serizawa-04013958 · 2024-09-03T15:02:04Z

That's great!! I appreciated it!

I'll get to work on the code right away in my environment.
Thank you so much!
For debugging, let me keep this issue

tijiang13 · 2024-09-03T15:04:46Z

You are welcome :D

Best, Tianjian

serizawa-04013958 · 2024-09-04T07:42:47Z

Hello, let me ask question.
how to caluculate focal_length?
Do you use same equation which is used in hmr2.py? I mean output['focal_length']

here is 4D-humans's code

focal_length = self.cfg.EXTRA.FOCAL_LENGTH * torch.ones(batch_size, 2, device=device, dtype=dtype)

tijiang13 · 2024-09-04T10:13:37Z

Hi Serizawa,

You can just run 4DHuman with default hyper-parameters. The code above just illustrates how to change the focal length & SMPL parameters accordingly if you want to set the focal length to a different value.

Best, Tianjian

serizawa-04013958 · 2024-09-05T13:46:50Z

Hello. I tried above code, but smpl pose is a little bit weird...
I used model's output from HMR2 on hmr2.py
here is my code. could you advise me?
If possible, I want to know detail about data(datum) and saving code of data.

visualize result by visualize_SMPL.py

4D-human's output worked well.

pred_smpl_params = out['pred_smpl_params']
body_pose_R = pred_smpl_params["body_pose"].detach().cpu().numpy()[0]
body_pose_ji = np.stack([cv2.Rodrigues(r)[0].squeeze(-1) for r in body_pose_R])
global_orient_R = pred_smpl_params["global_orient"].detach().cpu().numpy()[0]
global_orient_ji = cv2.Rodrigues(global_orient_R[0])[0].squeeze(-1)
body_pose[0, id] = body_pose_ji
global_orient[0, id] = global_orient_ji
transl[0, id] = cam_t
betas[0, id] = pred_smpl_params["betas"].detach().cpu().numpy()[0]

tijiang13 · 2024-09-09T09:25:37Z

I think there’s a difference because 4DHuman only renders the object within the bounding box, while we project it onto the entire image. As for the misalignment, I still run the refinement as before, and this is usually easy to fix I guess.

Best, Tianjian

serizawa-04013958 · 2024-09-13T12:56:18Z

I'm very sorry.
Could you share the conversion code of body_pose?
this code's output is (NUM_PERSONS, NUM_FRAMES, 23, 3)), but ROMP need (NUM_PERSONS, NUM_FRAMES, 69)) I think.

tijiang13 · 2024-09-15T13:26:22Z

Hi Serizawa,

No worries. Regarding the question, you can simply flatten the last two dimensions using output.reshape(NUM_PERSONS, NUM_FRAMES, 69).

Best, Tianjian

serizawa-04013958 · 2024-09-18T09:47:52Z

thank you for your comment.
Unfortunately I don't know why but I faced smpl position misalignment issue..
problem is the tz, tx, ty is incorrect and therefore, camera can not see smpl.
since tz is too big, and go out of the camera's range.
I can modify scaling parameter and visualize it, but it does not make sense.

I tried to get necessary information on demo.py file of 4D-humans code

` for batch in dataloader:
batch = recursive_to(batch, device)
with torch.no_grad():
out, pred_smpl_params = model(batch)

        pred_cam = out['pred_cam']
        box_center = batch["box_center"].float()
        box_size = batch["box_size"].float()
        img_size = batch["img_size"].float()
        # -----------------------------------
        # get info for ROMP conversion
        # -----------------------------------
        # pred_cam_t = out['pred_cam_t'][0].detach().cpu().numpy()
        W, H = img_cv2.shape[1], img_cv2.shape[0]
        bbox_size = box_size[0].detach().cpu().numpy()
        cam_t = pred_cam[0].detach().cpu().numpy()
        tz, tx, ty = cam_t
        scale = 2 / max(bbox_size * tz, 1e-9)
        focal_length = model_cfg.EXTRA.FOCAL_LENGTH # 5000
        tz = (focal_length * scale)
        
        cx, cy = box_center[0].detach().cpu().numpy()
        tx = tx + scale * (cx - W * 0.5)
        ty = ty + scale * (cy - H * 0.5)
        # print(type(tx), type(ty), type(tz))
        cam_t = np.array([tx, ty, tz])

`

tijiang13 · 2024-09-20T11:06:25Z

Hi Serizawa,

How did you visualise the SMPL? Did you use aitviewer?

Best, Tianjian

serizawa-04013958 · 2024-09-20T11:16:41Z

Yes, I used SMPL and aitviewer as same as visualize_SMPL.py

serizawa-04013958 · 2024-09-20T11:18:44Z

Here is my code. sorry I could not paste .py file

demo.txt

tijiang13 · 2024-09-20T12:02:35Z

Can you give this gist a try?

serizawa-04013958 · 2024-09-20T13:53:11Z

Thank you so much!!
normally 4D-human's out put is only image and .obj file, so which output do you indicate?
is it HMR2's output? sorry for bothering you. Thanks as always.

tijiang13 · 2024-09-20T14:01:24Z

Hi,

See: https://github.com/shubham-goel/4D-Humans/blob/main/demo.py#L91

Here is modified demo.py I used:

"""
Adapted based on https://github.com/shubham-goel/4D-Humans/blob/main/demo.py
"""
from pathlib import Path
import torch
import argparse
import os
import cv2
import numpy as np
from tqdm import tqdm
from collections import defaultdict
import joblib

from hmr2.configs import CACHE_DIR_4DHUMANS
from hmr2.models import HMR2, download_models, load_hmr2, DEFAULT_CHECKPOINT
from hmr2.utils import recursive_to
from hmr2.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
from hmr2.utils.renderer import Renderer, cam_crop_to_full


LIGHT_BLUE=(0.65098039,  0.74117647,  0.85882353)

def main():
    import time
    start = time.time()
    parser = argparse.ArgumentParser(description='HMR2 demo code')
    parser.add_argument('--checkpoint', type=str, default=DEFAULT_CHECKPOINT, help='Path to pretrained model checkpoint')
    parser.add_argument('--img_folder', type=str, default='example_data/images', help='Folder with input images')
    parser.add_argument('--out_folder', type=str, default='demo_out', help='Output folder to save rendered results')
    parser.add_argument('--detector', type=str, default='vitdet', choices=['vitdet', 'regnety'], help='Using regnety improves runtime')

    args = parser.parse_args()

    # Download and load checkpoints
    download_models(CACHE_DIR_4DHUMANS)
    model, model_cfg = load_hmr2(args.checkpoint)

    # Setup HMR2.0 model
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    model = model.to(device)
    model.eval()

    # Load detector
    from hmr2.utils.utils_detectron2 import DefaultPredictor_Lazy
    if args.detector == 'vitdet':
        from detectron2.config import LazyConfig
        import hmr2
        cfg_path = Path(hmr2.__file__).parent/'configs'/'cascade_mask_rcnn_vitdet_h_75ep.py'
        detectron2_cfg = LazyConfig.load(str(cfg_path))
        detectron2_cfg.train.init_checkpoint = "https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl"
        for i in range(3):
            detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.25
        detector = DefaultPredictor_Lazy(detectron2_cfg)
    elif args.detector == 'regnety':
        from detectron2 import model_zoo
        from detectron2.config import get_cfg
        detectron2_cfg = model_zoo.get_config('new_baselines/mask_rcnn_regnety_4gf_dds_FPN_400ep_LSJ.py', trained=True)
        detectron2_cfg.model.roi_heads.box_predictor.test_score_thresh = 0.5
        detectron2_cfg.model.roi_heads.box_predictor.test_nms_thresh   = 0.4
        detector       = DefaultPredictor_Lazy(detectron2_cfg)

    # Setup the renderer
    # renderer = Renderer(model_cfg, faces=model.smpl.faces)

    # Make output directory if it does not exist
    os.makedirs(args.out_folder, exist_ok=True)

    # Iterate over all images in folder
    outputs = []
    for img_path in tqdm(sorted(Path(args.img_folder).glob('*.png'))):
        img_cv2 = cv2.imread(str(img_path))

        # Detect humans in image
        det_out = detector(img_cv2)

        det_instances = det_out['instances']
        valid_idx = (det_instances.pred_classes==0) & (det_instances.scores > 0.5)
        boxes=det_instances.pred_boxes.tensor[valid_idx].cpu().numpy()

        # Run HMR2.0 on all detected humans
        dataset = ViTDetDataset(model_cfg, img_cv2, boxes)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=8, shuffle=False, num_workers=0)

        temp = defaultdict(list)
        for batch in dataloader:
            batch = recursive_to(batch, device)
            with torch.no_grad():
                out = model(batch)
            keys = ["pred_cam", "pred_cam_t", "focal_length", "pred_keypoints_2d"]
            for k in keys: temp[k].append(out[k].float().cpu().numpy())
            keys = ["box_center", "box_size", "personid", "img_size"]
            for k in keys: temp[k].append(batch[k].float().cpu().numpy())
            for k in out["pred_smpl_params"].keys():
                temp[k].append(out["pred_smpl_params"][k].float().cpu().numpy())
        for k in temp.keys(): temp[k] = np.concatenate(temp[k], axis=0)
        outputs.append(temp)
    output_path = Path(args.out_folder) / 'out_4dhuman.pkl'
    joblib.dump(outputs, output_path) # Save results

serizawa-04013958 · 2024-09-24T13:42:26Z

Thank you for sharing!!
your rendering code worked.

and I understood the difference between previous visualize_SMPL.py and visualize_SMPL.py which you shared this time.
original visualize_SMPL used
pc = Billboard.from_camera_and_distance(cam, 8, W, H, img_paths,
image_process_fn=draw_func)
but your code changed 8 to 200.

I will try to use 4D-human's output for instantavatar.

serizawa-04013958 · 2024-09-25T09:34:32Z

Hello, by using your code I could work Instantavatar training!
but unfortunately, the result was not attractive.
Have you ever compared the result using romp and 4D-humans?
if so, did you fix focal length=5000?

tijiang13 · 2024-09-26T14:59:36Z

Hi Serizawa,

We use GT intrinsics when it's available -- in case it's unknown, you will need to adjust it accordingly (from the resolution of the raw images etc.) If the results are not satisfying, I'd suggest checking estimated poses / key points / masks in your case for troubleshooting.

Best, Tianjian

tijiang13 mentioned this issue Sep 15, 2024

Why I get ghosting for own dataset? #80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anim_nerf.npz #79

anim_nerf.npz #79

serizawa-04013958 commented Aug 12, 2024 •

edited

Loading

tijiang13 commented Aug 12, 2024 •

edited

Loading

tijiang13 commented Aug 12, 2024 •

edited

Loading

serizawa-04013958 commented Aug 12, 2024

tijiang13 commented Aug 19, 2024

serizawa-04013958 commented Aug 19, 2024

serizawa-04013958 commented Aug 23, 2024 •

edited

Loading

AlecDusheck commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

serizawa-04013958 commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

serizawa-04013958 commented Sep 4, 2024 •

edited

Loading

tijiang13 commented Sep 4, 2024 •

edited

Loading

serizawa-04013958 commented Sep 5, 2024 •

edited

Loading

tijiang13 commented Sep 9, 2024

serizawa-04013958 commented Sep 13, 2024

tijiang13 commented Sep 15, 2024

serizawa-04013958 commented Sep 18, 2024 •

edited

Loading

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 24, 2024

serizawa-04013958 commented Sep 25, 2024

tijiang13 commented Sep 26, 2024

anim_nerf.npz #79

anim_nerf.npz #79

Comments

serizawa-04013958 commented Aug 12, 2024 • edited Loading

tijiang13 commented Aug 12, 2024 • edited Loading

tijiang13 commented Aug 12, 2024 • edited Loading

serizawa-04013958 commented Aug 12, 2024

tijiang13 commented Aug 19, 2024

serizawa-04013958 commented Aug 19, 2024

serizawa-04013958 commented Aug 23, 2024 • edited Loading

AlecDusheck commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

serizawa-04013958 commented Sep 3, 2024

tijiang13 commented Sep 3, 2024

serizawa-04013958 commented Sep 4, 2024 • edited Loading

tijiang13 commented Sep 4, 2024 • edited Loading

serizawa-04013958 commented Sep 5, 2024 • edited Loading

tijiang13 commented Sep 9, 2024

serizawa-04013958 commented Sep 13, 2024

tijiang13 commented Sep 15, 2024

serizawa-04013958 commented Sep 18, 2024 • edited Loading

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 20, 2024

tijiang13 commented Sep 20, 2024

serizawa-04013958 commented Sep 24, 2024

serizawa-04013958 commented Sep 25, 2024

tijiang13 commented Sep 26, 2024

serizawa-04013958 commented Aug 12, 2024 •

edited

Loading

tijiang13 commented Aug 12, 2024 •

edited

Loading

tijiang13 commented Aug 12, 2024 •

edited

Loading

serizawa-04013958 commented Aug 23, 2024 •

edited

Loading

serizawa-04013958 commented Sep 4, 2024 •

edited

Loading

tijiang13 commented Sep 4, 2024 •

edited

Loading

serizawa-04013958 commented Sep 5, 2024 •

edited

Loading

serizawa-04013958 commented Sep 18, 2024 •

edited

Loading