Flux Validation #1518

wesleytruong · 2025-08-01T22:56:06Z

This pr implements the validator class for flux following the method discussed in Stable Diffusion 3 paper.

The paper shows that creating 8 equidistant timesteps and calculating the average loss on them will result in a highly correlated loss to external validation methods such as CLIP or FID score.

This pr's implementation rather than creating 8 stratified timesteps per sample, only applies one of these equidistant timesteps to each sample in a round-robin fashion. Aggregated over many samples in a validation set, this should give a similar validation score as the full timestep method, but will process more validation samples quickly.

Implementations

Integrates the image generation evaluation in the validation step, users can
Refactors and combines eval job_config with validation
- Adds an all_timesteps option to the job_config to choose whether to use round robin timesteps or full timesteps per sample
Creates validator class and validation dataloader for flux, validator dataloader handles generating timesteps for round-robin method of validation

Enabling all timesteps

Developers can enable the full timestamp method of validation by setting all_timesteps = True in the flux validation job config. Enabling all_timesteps may require tweaking some hyperparams validation.local_batch_size, validation.steps to prevent spiking memory and optimizing throughput. By using a ratio of around 1/4 for validation.local_batch_size to training.local_batch_size will not spike the memory higher than training when fsdp = 8.

Below we can see the difference between round robin and all timesteps. In the comparison the total number of validation samples processed is the same, but in all_timesteps=True configuration we have to lower the batch size to prevent memory spiking. All timesteps also achieves a higher throughput (tps) but still processes total samples of validation set more slowly.

Round Robin (batch_size=32, steps=1, fsdp=8)	All Timesteps (batch_size=8, steps=4, fsdp=8)

tianyu-l · 2025-08-01T23:42:01Z

torchtitan/experiments/flux/dataset/flux_dataset.py

    dp_rank (int): Data parallel rank.
    dp_world_size (int): Data parallel world size.
    infinite (bool): Whether to loop over the dataset infinitely.
+    generate_timesteps (booL): Generate stratified timesteps in round-robin style for validation


Suggested change

generate_timesteps (booL): Generate stratified timesteps in round-robin style for validation

generate_timesteps (bool): Generate stratified timesteps in round-robin style for validation

let's try to create a subclass to do this, as discussed offline.
o/w it's easy to confuse with timestep concept during training / inference.

tianyu-l · 2025-08-02T00:57:31Z

torchtitan/experiments/flux/dataset/flux_dataset.py


            # skip low quality image or image with color channel = 1
            if sample_dict["image"] is None:
+                # sample_id = sample.get('sample_id')


remove comment

tianyu-l · 2025-08-02T01:00:39Z

torchtitan/experiments/flux/job_config.py

    eval_freq: int = 100
    """Frequency of evaluation/sampling during training"""
+    save_imgs: int = 1
+    """ How many images to generate and save in validation, -1 means same number as steps"""


explain that the source prompt is coming from validation dataset and taken from "the beginning"

tianyu-l · 2025-08-02T01:02:01Z

torchtitan/experiments/flux/job_config.py

    """How many denoising steps to sample when generating an image"""
    eval_freq: int = 100
    """Frequency of evaluation/sampling during training"""
+    save_imgs: int = 1


call it something like save_imgs_count? save_imgs sounds like a bool

also, can you show examples of generate img? Just to verify capability

also, can you show examples of generate img? Just to verify capability

Here's a generated image to show that generation works. The image isn't good currently since I don't have pre-trained weights but will revisit after adding hf conversion

tianyu-l · 2025-08-02T01:02:32Z

torchtitan/experiments/flux/job_config.py


 @dataclass
-class Eval:
+class Validation(Validation):


do we have to inherit? Training is not inheriting.

tianyu-l · 2025-08-02T01:09:30Z

torchtitan/experiments/flux/model/validate.py

+                # Patchify: Convert latent into a sequence of patches
+                latents = pack_latents(latents)
+
+                latent_noise_pred = model(


please adapt to #1494 for the amp part

tianyu-l · 2025-08-02T01:11:29Z

torchtitan/experiments/flux/model/validate.py

+            bsz = labels.shape[0]
+
+            # To evaluate all 8 timesteps per sample do
+            if self.all_timesteps:


I think it's OK to keep this.

tianyu-l · 2025-08-02T01:15:31Z

torchtitan/experiments/flux/model/validate.py

+        validation_end_time = time.time()
+        validation_duration = validation_end_time - validation_start_time
+
+        # Log timing information
+        from torchtitan.tools.logging import logger
+
+        logger.info(f"Validation step {step} completed in {validation_duration:.3f}s ")


is this for debugging? we didn't need this in llama3 validator

tianyu-l · 2025-08-02T01:16:38Z

torchtitan/experiments/flux/model/validate.py

+
+            save_imgs = self.job_config.validation.save_imgs
+            if save_imgs == -1 or num_steps < save_imgs:
+                t5_tokenizer, clip_tokenizer = build_flux_tokenizer(self.job_config)


why build this multiple times?

tianyu-l · 2025-08-02T01:18:22Z

torchtitan/experiments/flux/dataset/flux_dataset.py

+    if isinstance(prompt, list):
+        prompt = " ".join(prompt)


in validate.py you are taking prompt[0]. Do you still need to concatenate here?

This concatenation is because coco dataset gives the prompts in a list of strings instead of a single string for one sample. This part just combines the list-style prompt into a single string.

The prompt[0] in validation is since we only generate an image from the first sample in each validation batch
Edit: I'm changing this so that it will generate the correct number of images regardless of batch size

wesleytruong · 2025-08-04T20:34:06Z

@tianyu-l addressed comments.

added all_timesteps option to job_config,
separated flux dataset and flux validation dataset to separate the timestep generation logic that is only used for validation.
changed save_img_count to generate images more logically rather than one per batch

…d flux validator dataset for clarity. Corrects the image generation in validation to reflect save_img_count

tianyu-l

Looks great! Had some final comments.

tianyu-l · 2025-08-05T05:05:16Z

torchtitan/experiments/flux/__init__.py

 from .model.args import FluxModelArgs
 from .model.autoencoder import AutoEncoderParams
 from .model.model import FluxModel
+from .model.validate import build_flux_validator


validate doesn't sound part of model. Let's just put it under flux/ root folder

tianyu-l · 2025-08-05T05:06:59Z

torchtitan/experiments/flux/dataset/flux_dataset.py

+    img = _process_cc12m_image(sample["image"], output_size=output_size)
+    prompt = sample["caption"]
+    if isinstance(prompt, list):
+        prompt = " ".join(prompt)


If you look at one example

[ "A desktop computer monitor sitting next to a keyboard.", "A computer, keyboard, modem, and mouse are sitting on a work desk.", "a computer monitor, keyboard, and mouse sit on a table.", "A black desktop computer atop a wooden table.", "a desktop computer monitor with a keyboard and mouse" ] It's always a list of alternative captions. I think we should only pick one (maybe the first one for deterministic training / eval).

tianyu-l · 2025-08-05T05:07:19Z

torchtitan/experiments/flux/dataset/flux_dataset.py


            # skip low quality image or image with color channel = 1
            if sample_dict["image"] is None:
+                # sample_id = sample.get('sample_id')


let's remove comment

tianyu-l · 2025-08-05T05:21:23Z

torchtitan/experiments/flux/model/validate.py

+                break
+
+            prompt = input_dict.pop("prompt")
+            if not isinstance(prompt, list):


To educate me: if prompt is yielded from the dataset as a str, would the dataloader batchify it into a list? Could you please share source on this behavior?

The torch dataloader handles this batchifying all the way up the inheritance stack ParallelAwareDataloader -> StatefulDataLoader -> DataLoader. And the FluxDataset gets passed to the ParallelAwareDataLoader in the build_dataloader_fn. I'm not sure of the exact implementation details and edge cases but the api wants for an IterableDataset with an overloaded iter for datastream. https://docs.pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset

…pt instead of joining,

tianyu-l

really awesome work!

# This pr implements the validator class for flux following the method discussed in Stable Diffusion 3 paper. The paper shows that creating 8 equidistant timesteps and calculating the average loss on them will result in a highly correlated loss to external validation methods such as CLIP or FID score. This pr's implementation rather than creating 8 stratified timesteps per sample, only applies one of these equidistant timesteps to each sample in a round-robin fashion. Aggregated over many samples in a validation set, this should give a similar validation score as the full timestep method, but will process more validation samples quickly. ### Implementations - Integrates the image generation evaluation in the validation step, users can - Refactors and combines eval job_config with validation - Adds an `all_timesteps` option to the job_config to choose whether to use round robin timesteps or full timesteps per sample - Creates validator class and validation dataloader for flux, validator dataloader handles generating timesteps for round-robin method of validation ### Enabling all timesteps Developers can enable the full timestamp method of validation by setting `all_timesteps = True` in the flux validation job config. Enabling all_timesteps may require tweaking some hyperparams `validation.local_batch_size, validation.steps` to prevent spiking memory and optimizing throughput. By using a ratio of around 1/4 for `validation.local_batch_size` to `training.local_batch_size` will not spike the memory higher than training when `fsdp = 8`. Below we can see the difference between round robin and all timesteps. In the comparison the total number of validation samples processed is the same, but in `all_timesteps=True` configuration we have to lower the batch size to prevent memory spiking. All timesteps also achieves a higher throughput (tps) but still processes total samples of validation set more slowly. | Round Robin (batch_size=32, steps=1, fsdp=8) | All Timesteps (batch_size=8, steps=4, fsdp=8) | | ---- | --- | | <img width="682" height="303" alt="Screenshot 2025-08-01 at 3 46 42 PM" src="https://github.com/user-attachments/assets/30328bfe-4c3c-4912-a329-2b94c834b67b" /> | <img width="719" height="308" alt="Screenshot 2025-08-01 at 3 30 10 PM" src="https://github.com/user-attachments/assets/c7325d21-8a7b-41d9-a0d2-74052e425083" /> |

wesleytruong requested review from fegin, tianyu-l, wconstab and wwwjn as code owners August 1, 2025 22:56

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 1, 2025

wesleytruong changed the title ~~creates an efficient validator for flux using loss method~~ Flux Validation Aug 1, 2025

tianyu-l reviewed Aug 2, 2025

View reviewed changes

wesleytruong added 5 commits August 4, 2025 17:28

implements flux validation according to SD3 paper

66f788d

updated flux dataloader test

dc3df99

Adds full support for all_timesteps option. Separates flux dataset an…

dc6131f

…d flux validator dataset for clarity. Corrects the image generation in validation to reflect save_img_count

moved save_img_count to validate call instead of init

c963321

rebase

09141ad

wesleytruong force-pushed the flux_validator branch from ed8068d to 09141ad Compare August 5, 2025 00:36

tianyu-l reviewed Aug 5, 2025

View reviewed changes

moved flux validate file, changed coco processor to choose first prom…

d5a5f70

…pt instead of joining,

tianyu-l approved these changes Aug 5, 2025

View reviewed changes

tianyu-l merged commit a204e31 into main Aug 5, 2025
10 checks passed

tianyu-l deleted the flux_validator branch August 5, 2025 22:00

	generate_timesteps (booL): Generate stratified timesteps in round-robin style for validation
	generate_timesteps (bool): Generate stratified timesteps in round-robin style for validation

Flux Validation #1518

Flux Validation #1518

Uh oh!

Conversation

wesleytruong commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This pr implements the validator class for flux following the method discussed in Stable Diffusion 3 paper.

Implementations

Enabling all timesteps

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesleytruong Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesleytruong commented Aug 4, 2025

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wesleytruong commented Aug 1, 2025 •

edited

Loading

wesleytruong Aug 4, 2025 •

edited

Loading