-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom data #1
Comments
Thanks for your interest. Our code was developed guided by InstantAvatar https://github.com/tijiang13/InstantAvatar, and indeed, we have a data loader for their preprocessing data format. Our data loading subroutine for UBC data uses the InstantAvatarWildDataset class, which you can modify a little bit and use for the preprocessed data from InstantAvatar preprocessing. (you may take some time to compile the openpose .etc for their preprocessing) |
Thanks for your reply. In the paper you write you use ReFit to obtain the human pose. May I ask how to process the results from ReFit for the training of GART? |
We tried both using InstantAvatar pre-processing to estimate video poses (with temporal optimization), and ReFit https://yufu-wang.github.io/refit_humans/ to estimate per-frame poses. It turns out that under the challenging UBC sequences, Ours and InstantAvatar will work better with ReFit poses. So we first estimate per-frame poses and I manually turn the poses into the same format as InstantAvatar preprocessing. So the data loader is actually loading in the Instant-Avatar preprocessing format. |
Thanks. Could you share the script for transferring the ReFit pose into InstantAvatar format? |
I have tried to perform the conversion from ReFit result to InstantAvatar format, but the converted data does not work. The code for conversion is as follows: # self.smpl_poses is the results of ReFit
# K is computed according to focal = np.sqrt(height**2 + width**2)
pose = []
for x in self.smpl_poses['pred_rotmat'][idx]:
d, angle = mat2axangle(x)
pose.append(d * angle)
pose = np.stack(pose).astype(np.float32)
# pose[0] = -pose[0]
ret = {
"rgb": img.astype(np.float32),
"mask": msk,
"K": self.K.copy(),
"smpl_beta": self.smpl_poses['pred_shape'][idx], # ! use the first beta ???
"smpl_pose": pose,
"smpl_trans": self.smpl_poses["trans_full"][idx, 0],
"idx": idx,
} May I ask where is the problem? Many thanks! |
# convert our mono-pose estimation to ubc fashion dataset
import numpy as np
import os, os.path as osp
import imageio
from pytorch3d.transforms import matrix_to_axis_angle
import torch
from tqdm import tqdm
from pycocotools import mask as masktool
import cv2
def process(seq):
img_src = f"../data/ubcfashion/train_frames/{seq}/"
msk_fn = f"../data/ubcfashion/train_mask/{seq}.npy"
pose_fn = f"../data/ubcfashion/train_smpl/{seq}.npz"
dst = f"../data/insav_wild/ourpose_ubc_{seq}/"
os.makedirs(dst, exist_ok=True)
pose_data = np.load(pose_fn)
smpl_shape = pose_data["pred_shape"].mean(0) # Use the average shape
smpl_pose_list, smpl_global_trans = (
pose_data["pred_rotmat"],
pose_data["pred_trans"],
)
smpl_pose_list = matrix_to_axis_angle(torch.from_numpy(smpl_pose_list))
smpl_pose_list = smpl_pose_list.numpy()
focal, center = pose_data["img_focal"], pose_data["img_center"]
K = np.eye(3)
K[0, 0], K[1, 1] = focal, focal
K[0, 2], K[1, 2] = center[0], center[1]
pose_save_dict = {
"betas": smpl_shape,
"global_orient": smpl_pose_list[:, 0],
"body_pose": smpl_pose_list[:, 1:].reshape(-1, 69),
"transl": smpl_global_trans.squeeze(1),
}
np.savez_compressed(osp.join(dst, "poses_optimized.npz"), **pose_save_dict)
image_save_dst = osp.join(dst, "images")
mask_save_dst = osp.join(dst, "masks")
os.makedirs(image_save_dst, exist_ok=True)
os.makedirs(mask_save_dst, exist_ok=True)
mask_data = np.load(msk_fn, allow_pickle=True)
masks = [masktool.decode(m).astype(np.bool).astype(np.float32) for m in mask_data]
for img_fn in tqdm(sorted(os.listdir(img_src))):
image_id = int(img_fn.split(".")[0])
mask = masks[image_id]
img = cv2.imread(osp.join(img_src, img_fn))
cv2.imwrite(osp.join(image_save_dst, img_fn), img)
cv2.imwrite(osp.join(mask_save_dst, img_fn), mask * 255)
cam_save_dict = {
"intrinsic": K,
"extrinsic": np.eye(4),
"height": img.shape[0],
"width": img.shape[1],
}
np.savez_compressed(osp.join(dst, "cameras.npz"), **cam_save_dict)
if __name__ == "__main__":
# seqs = sorted(os.listdir("../data/ubcfashion/train_frames"))
seqs = ["91+bCFG1jOS"]
for seq in seqs:
process(seq)
# def load_insav_smpl_param(path):
# smpl_params = dict(np.load(str(path)))
# if "thetas" in smpl_params:
# smpl_params["body_pose"] = smpl_params["thetas"][..., 3:]
# smpl_params["global_orient"] = smpl_params["thetas"][..., :3]
# return {
# "betas": smpl_params["betas"].astype(np.float32).reshape(1, 10),
# "body_pose": smpl_params["body_pose"].astype(np.float32),
# "global_orient": smpl_params["global_orient"].astype(np.float32),
# "transl": smpl_params["transl"].astype(np.float32),
# }
# insav_pose_fn = "../data/insav_wild/91+20mY7UJS/poses_optimized.npz"
# load_insav_smpl_param(insav_pose_fn)
# insav_cam_fn = "../data/insav_wild/91+20mY7UJS/cameras.npz"
# insav_cam = dict(np.load(insav_cam_fn, allow_pickle=True))
# for k, v in insav_cam.items():
# print(k, v.shape, v.dtype)
# print()
This is the script I used to convert the poses into instant-avatar format, hope it may help you. |
@uniBruce Hi. When the ground truth focal is unavailable, we estimate it from the dimension of the image as I also added a script in the ReFit repo (here) that runs on a folder of images and save pose results compatiable for GART. Please give it a try. |
hi! when I use the data which processed by the instantAvatar, after ./scripts/fit.sh , the resutl is bad, like this: can you give me some suggestions? |
Hi, when I use real person video with da_pose pose, data preprocessed like InstantAvatar and trained GART for more than 25000 steps, the result is quite good but when you zoom in the image below, you can see RGB noise in the pants and shoes (pants should be really black, sleeves and shirt should be "smooth"), how should I edit the config or loss function to make the image sharper or the problem lies in my ground truth image? There is a problem in animate stage as the second image. This is my config in TOTAL_steps: 50000 #15000 #30000
SEED: 12345
VIZ_INTERVAL: 500
CANO_POSE_TYPE: da_pose #da_pose #t_pose #da_pose
VOXEL_DEFORMER_RES: 64 #128 #64 #64 #128 #64 #128 #64
W_CORRECTION_FLAG: True
W_REST_DIM: 32 #0 #16
W_REST_MODE: pose-mlp #delta-list #pose-mlp
W_MEMORY_TYPE: voxel #voxel #point
F_LOCALCODE_DIM: 0
MAX_SCALE: 1.0
MIN_SCALE: 0.0 #0.0003 #0.003 #3
MAX_SPH_ORDER: 4
INCREASE_SPH_STEP: [3000, 5000, 6000, 7000] #[3000, 5000, 6000, 7000] #[1000, 2000, 3000]
INIT_MODE: on_mesh #near_mesh #near_mesh
OPACITY_INIT_VALUE: 0.99
ONMESH_INIT_SUBDIVIDE_NUM: 1
ONMESH_INIT_SCALE_FACTOR: 1.0
ONMESH_INIT_THICKNESS_FACTOR: 0.5
NEARMESH_INIT_NUM: 10000
NEARMESH_INIT_STD: 0.1
SCALE_INIT_VALUE: 0.01 # only used for random init
###########################
LR_P: 0.00016
LR_P_FINAL: 0.0000016
LR_Q: 0.001
LR_S: 0.005
LR_O: 0.05
LR_SPH: 0.0025
LR_SPH_REST: 0.0005
W_START_STEP: 500 #1000 #500 #2000 #300 #2000
LR_W: 0.0002 # 1 # 0.00001
LR_W_FINAL: 0.00002
LR_W_REST: 0.0002
LR_W_REST_FINAL: 0.00002
LR_W_REST_BONES: 0.0003 # for mlp
LR_F_LOCAL: 0.0
# Pose Optimize
POSE_R_BASE_LR: 0.0001
POSE_R_BASE_LR_FINAL: 0.00001
POSE_R_REST_LR: 0.0003
POSE_R_REST_LR_FINAL: 0.00001
POSE_T_LR: 0.0001
POSE_T_LR_FINAL: 0.00001
POSE_OPTIMIZE_START_STEP: 500 #1000
# Reg Terms
LAMBDA_MASK: 0.0 #0.01
MASK_LOSS_PAUSE_AFTER_RESET: 100
# other optim
N_POSES_PER_STEP: 1 #50 #1 #3 # increasing this does not help
RAND_BG_FLAG: True #True #True #True
# DEFAULT_BG: [0.0, 0.0, 0.0]
NOVEL_VIEW_PITCH: 0.0
IMAGE_ZOOM_RATIO: 1.0
VIEW_BALANCE_FLAG: True #True # True #True #False
BOX_CROP_PAD: 50
# GS Control
# densify
MAX_GRAD: 0.0002 #0.0003 #0.0005 #0.0006 # 0.0002
PERCENT_DENSE: 0.005 #0.01
DENSIFY_START: 500
DENSIFY_INTERVAL: 100 #300 #500 #1000 #300
DENSIFY_END: 9000 #10000 #15000
# prune
PRUNE_START: 500
PRUNE_INTERVAL: 300
OPACIT_PRUNE_TH: 0.01
RESET_OPACITY_STEPS: [3000, 5000] #[3000, 5000] #5000 #3000
OPACIT_RESET_VALUE: 0.01
# regaussian
REGAUSSIAN_STD: 0.015 #0.02 #0.02 #0.01
REGAUSSIAN_STEPS: [7000. 14000]
CANONICAL_SPACE_REG_K: 6
LAMBDA_STD_Q: 0.01
LAMBDA_STD_S: 0.01
LAMBDA_STD_O: 0.01
LAMBDA_STD_CD: 0.03
LAMBDA_STD_CH: 0.03
# LAMBDA_STD_W: 0.3
# LAMBDA_STD_W_REST: 0.3
LAMBDA_STD_W: 0.3
LAMBDA_STD_W_REST: 0.1
LAMBDA_KNN_DIST: 0.00
LAMBDA_W_NORM: 0.01
LAMBDA_W_REST_NORM: 0.1
START_END_SKIP: [0, 400, 1] The walking animation is as below, how to fix the error when animate? |
Congratulations on this excellent work!
I wonder how to run this work on my own data. For example, after capturing a monocular video, how to run your method? How should I process the data for training?
Many thanks!
The text was updated successfully, but these errors were encountered: