ImagetNet Training Dataset Preprocessing #156

StephenYangjz · 2024-12-01T22:17:22Z

Hi, I am trying to train the dc-ae using the default setup (imagenet). I saw that imagenet here is being loaded as a npy file:


class LatentImageNetDataProvider(BaseDataProvider):
    def __init__(self, cfg: LatentImageNetDataProviderConfig):
        super().__init__(cfg)
        self.cfg: LatentImageNetDataProviderConfig

    def build_datasets(self) -> tuple[Dataset, Optional[Dataset], Optional[Dataset]]:
        train_dataset = DatasetFolder(self.cfg.data_dir, np.load, [".npy"])
        return train_dataset, None, None

I am wondering how is this file prepared and would it be possible to share a minimal working example of the file? Thank you!

The text was updated successfully, but these errors were encountered:

chenjy2003 · 2024-12-02T00:59:17Z

Hi Stephen,

You can refer to the readme here to extract latent data.

efficientvit/applications/dc_ae/README.md

Line 190 in 5dd097d

- Generate and save latent:

StephenYangjz · 2024-12-02T02:08:52Z

Hi @chenjy2003 thank you so much for the response. Would there be a easy way to fine-tune the autoencoder as well? I am thinking of training a dc-ae on the RUGD dataset, and im not sure if the pretrained autoencoder would work out of the box. Do you by any chance have any insights? Any pointers would be greatly appreciated! Thank you.

chenjy2003 · 2024-12-03T14:45:06Z

Hi Stephen,

We tried some images from the RUGD dataset and observed that our autoencoders worked well. Here are some examples. The left part is the original image and the right part is the reconstructed image. You can also use this script to test other images.

StephenYangjz · 2024-12-04T18:29:35Z

Thank you so much for getting back @chenjy2003! May I also know what would be the command for finetuning a DiT w/ the pretrained autoencoder, from the imagenet pretrained DiT presented in the paper? I think the readme has the command only for the uvit but not the DiT. Thank you!

han-cai · 2024-12-04T18:58:12Z

That's a good point. @chenjy2003, we should add the command to train DiT-XL on ImageNet 512x512.

chenjy2003 · 2024-12-05T05:52:19Z

@StephenYangjz Thanks for your suggestion.

The training command for DiT-XL on ImageNet 512x512 is added here and here.

If you want to finetune from the imagenet pretrained checkpoint, you can add dit.pretrained_path=... to the command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImagetNet Training Dataset Preprocessing #156

ImagetNet Training Dataset Preprocessing #156

StephenYangjz commented Dec 1, 2024

chenjy2003 commented Dec 2, 2024

StephenYangjz commented Dec 2, 2024

chenjy2003 commented Dec 3, 2024

StephenYangjz commented Dec 4, 2024 •

edited

Loading

han-cai commented Dec 4, 2024

chenjy2003 commented Dec 5, 2024

ImagetNet Training Dataset Preprocessing #156

ImagetNet Training Dataset Preprocessing #156

Comments

StephenYangjz commented Dec 1, 2024

chenjy2003 commented Dec 2, 2024

StephenYangjz commented Dec 2, 2024

chenjy2003 commented Dec 3, 2024

StephenYangjz commented Dec 4, 2024 • edited Loading

han-cai commented Dec 4, 2024

chenjy2003 commented Dec 5, 2024

StephenYangjz commented Dec 4, 2024 •

edited

Loading