-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anim_nerf.npz #79
Comments
Hi Serizawa, Based on the visualization you provided, it seems that the wrong intrinsic parameters may have been loaded. The Anim-NeRF version uses the provided GT intrinsics, which have a larger focal length. However, the preprocessing pipeline we provided utilizes ROMP, which assumes a much smaller focal length. It’s possible that the camera parameters were accidentally overwritten when you run the preprocessing process. As a result, you likely used the ROMP camera to project the Anim-NeRF poses, leading to an overly small reprojection. Best, Tianjian |
I guess the documentt can be a bit of confusing. For the benchmark on PeopleSnapshot, we used the GT camera and poses to isolate the impact of inaccuracies stemming from cameras and poses. This was done to illustrate performance under controlled settings(or let's say, what will happen with a better pose estimator as better pose estimator papers come out every year :P). In contrast, the examples on Neuman data were provided to demonstrate performance in less controlled settings, where we have no prior knowledge of the cameras or poses. |
Thank you for replying immediately!! so, I understood that ROMP preprocess is not optimal for People snapshot dataset. I work on synthetic dataset using people snapshot, and I would like to use same camera pose estimator, but current ROMP can not achieve same quality as GT. Can you help me if possible? Thank you very much. |
Hi Serizawa, Maybe you can give 4DHuman a shot? We have done some internal experiments before and it has better alignment empirically. Best, Tianjian |
@tijiang13 |
This would be helpful! |
Hello Serizawa and Alec, Sorry for the delayed reply -- I have been quite busy during the past 2 weeks and forgot to check Github regularly. Here is the code I was using:
After the conversion you will be able to visualize the SMPL meshes using the Best, Tianjian |
Note: One of the good thing with 4DHuman is its ability to set the focal length for your GT camera when the intrinsics are available. This can be particularly useful when the true focal length differs significantly from the default settings such as in ROMP. This is surprisingly common when it comes to humans (people tend to use very large focal lengths). |
That's great!! I appreciated it! I'll get to work on the code right away in my environment. |
You are welcome :D Best, Tianjian |
Hello, let me ask question. here is 4D-humans's code
|
Hi Serizawa, You can just run 4DHuman with default hyper-parameters. The code above just illustrates how to change the focal length & SMPL parameters accordingly if you want to set the focal length to a different value. Best, Tianjian |
I think there’s a difference because 4DHuman only renders the object within the bounding box, while we project it onto the entire image. As for the misalignment, I still run the refinement as before, and this is usually easy to fix I guess. Best, Tianjian |
I'm very sorry. |
Hi Serizawa, No worries. Regarding the question, you can simply flatten the last two dimensions using Best, Tianjian |
thank you for your comment. I tried to get necessary information on demo.py file of 4D-humans code ` for batch in dataloader:
` |
Hi Serizawa, How did you visualise the SMPL? Did you use aitviewer? Best, Tianjian |
Yes, I used SMPL and aitviewer as same as visualize_SMPL.py |
Here is my code. sorry I could not paste .py file |
Can you give this gist a try? |
Thank you so much!! |
Hi, See: https://github.com/shubham-goel/4D-Humans/blob/main/demo.py#L91 Here is modified """
Adapted based on https://github.com/shubham-goel/4D-Humans/blob/main/demo.py
"""
from pathlib import Path
import torch
import argparse
import os
import cv2
import numpy as np
from tqdm import tqdm
from collections import defaultdict
import joblib
from hmr2.configs import CACHE_DIR_4DHUMANS
from hmr2.models import HMR2, download_models, load_hmr2, DEFAULT_CHECKPOINT
from hmr2.utils import recursive_to
from hmr2.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
from hmr2.utils.renderer import Renderer, cam_crop_to_full
LIGHT_BLUE=(0.65098039, 0.74117647, 0.85882353)
def main():
import time
start = time.time()
parser = argparse.ArgumentParser(description='HMR2 demo code')
parser.add_argument('--checkpoint', type=str, default=DEFAULT_CHECKPOINT, help='Path to pretrained model checkpoint')
parser.add_argument('--img_folder', type=str, default='example_data/images', help='Folder with input images')
parser.add_argument('--out_folder', type=str, default='demo_out', help='Output folder to save rendered results')
parser.add_argument('--detector', type=str, default='vitdet', choices=['vitdet', 'regnety'], help='Using regnety improves runtime')
args = parser.parse_args()
# Download and load checkpoints
download_models(CACHE_DIR_4DHUMANS)
model, model_cfg = load_hmr2(args.checkpoint)
# Setup HMR2.0 model
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = model.to(device)
model.eval()
# Load detector
from hmr2.utils.utils_detectron2 import DefaultPredictor_Lazy
if args.detector == 'vitdet':
from detectron2.config import LazyConfig
import hmr2
cfg_path = Path(hmr2.__file__).parent/'configs'/'cascade_mask_rcnn_vitdet_h_75ep.py'
detectron2_cfg = LazyConfig.load(str(cfg_path))
detectron2_cfg.train.init_checkpoint = "https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl"
for i in range(3):
detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.25
detector = DefaultPredictor_Lazy(detectron2_cfg)
elif args.detector == 'regnety':
from detectron2 import model_zoo
from detectron2.config import get_cfg
detectron2_cfg = model_zoo.get_config('new_baselines/mask_rcnn_regnety_4gf_dds_FPN_400ep_LSJ.py', trained=True)
detectron2_cfg.model.roi_heads.box_predictor.test_score_thresh = 0.5
detectron2_cfg.model.roi_heads.box_predictor.test_nms_thresh = 0.4
detector = DefaultPredictor_Lazy(detectron2_cfg)
# Setup the renderer
# renderer = Renderer(model_cfg, faces=model.smpl.faces)
# Make output directory if it does not exist
os.makedirs(args.out_folder, exist_ok=True)
# Iterate over all images in folder
outputs = []
for img_path in tqdm(sorted(Path(args.img_folder).glob('*.png'))):
img_cv2 = cv2.imread(str(img_path))
# Detect humans in image
det_out = detector(img_cv2)
det_instances = det_out['instances']
valid_idx = (det_instances.pred_classes==0) & (det_instances.scores > 0.5)
boxes=det_instances.pred_boxes.tensor[valid_idx].cpu().numpy()
# Run HMR2.0 on all detected humans
dataset = ViTDetDataset(model_cfg, img_cv2, boxes)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8, shuffle=False, num_workers=0)
temp = defaultdict(list)
for batch in dataloader:
batch = recursive_to(batch, device)
with torch.no_grad():
out = model(batch)
keys = ["pred_cam", "pred_cam_t", "focal_length", "pred_keypoints_2d"]
for k in keys: temp[k].append(out[k].float().cpu().numpy())
keys = ["box_center", "box_size", "personid", "img_size"]
for k in keys: temp[k].append(batch[k].float().cpu().numpy())
for k in out["pred_smpl_params"].keys():
temp[k].append(out["pred_smpl_params"][k].float().cpu().numpy())
for k in temp.keys(): temp[k] = np.concatenate(temp[k], axis=0)
outputs.append(temp)
output_path = Path(args.out_folder) / 'out_4dhuman.pkl'
joblib.dump(outputs, output_path) # Save results |
Thank you for sharing!! and I understood the difference between previous visualize_SMPL.py and visualize_SMPL.py which you shared this time. I will try to use 4D-human's output for instantavatar. |
Hello, by using your code I could work Instantavatar training! |
Hi Serizawa, We use GT intrinsics when it's available -- in case it's unknown, you will need to adjust it accordingly (from the resolution of the raw images etc.) If the results are not satisfying, I'd suggest checking estimated poses / key points / masks in your case for troubleshooting. Best, Tianjian |
Hello
I'd like to confirm your preprocess code result.
when I try to preprocess people snap shot dataset, the results does not seem to be difficult from animnerf.npz.
here is a result. since I can get good result when I use animnerf, I would like to ask you the difference.
I used InstantAvatar/tree/master/scripts/visualize-SMPL.py for visualization
anim_nerf ↓
https://github.com/user-attachments/assets/4b337b02-171d-4c24-84df-f2b7579ca363
pose_optimized ↓
https://github.com/user-attachments/assets/058a1e3f-9562-4d91-87a8-81d07b767751
Thank you
The text was updated successfully, but these errors were encountered: