Skip to content

Conversation

@belo-involead
Copy link

TrainerState.save_to_json fails due to np.float32 values not being JSON serializable, causing crashes when saving training state.
Problem:
The TrainerState.save_to_json method in the transformers library fails when trying to save training state because np.float32 values are not natively serializable in JSON. This issue causes the training process to crash when saving checkpoints.

Steps to Reproduce:
Run the model distillation script using SentenceTransformerTrainer.
The training proceeds normally but fails when saving state due to an np.float32 serialization error.
The error message typically looks like:

TypeError: Object of type float32 is not JSON serializable
Expected Behavior:
The training state should save without errors.
The script should complete training and store checkpoints correctly.

Proposed Fix:
Convert np.float32 values to Python float before saving JSON.
Patch TrainerState.save_to_json to handle this conversion.

Fix float32 serialization in TrainerState & improve KD training script
@belo-involead
Copy link
Author

Hi @tomaarsen – just following up on this PR that fixes a JSON serialization issue in TrainerState.save_to_json caused by np.float32 values, which can interrupt checkpoint saving during training.

The PR has workflows awaiting approval. I'd really appreciate your review or any feedback to help move this forward. Let me know if any changes are needed.

I’m happy to revise the PR if there’s a better approach. Thanks for your time!

@belo-involead
Copy link
Author

Hi @tomaarsen – just a quick update on this PR: I’ve fixed the missing imports that caused the earlier test failures. All checks should now pass once the workflow is approved and re-run.

Would appreciate your time to approve the workflow when you get a chance! Let me know if any further changes are needed. Thanks again

@tomaarsen
Copy link
Member

Hello!

Thank you for opening this, and apologies for my radio silence so far. PRs like these are always easiest to test when my hardware is freed up to run some of the related training scripts, but I've been running models for #3222 all day lately.

When that PR is ready, I'll definitely switch focus to the new open PRs before I release v4.0

  • Tom Aarsen

@belo-involead
Copy link
Author

Hi Tom,
No worries at all, thank you for the update! I really appreciate you taking the time, and I’ll stay tuned for any feedback when you’re able to review it. Let me know if there’s anything else needed from my side.

Thanks again!

@tomaarsen
Copy link
Member

I ran the model_distillation.py script on the master branch of Sentence Transformers now, and I don't run into the issue that you mentioned. I think this might have already been indirectly solved with #3096, which updated the output of the evaluators to be regular floats instead of numpy floats.

Could you perhaps check if you can run the script with the master branch, and if not, what version of transformers you're using?

  • Tom Aarsen

@belo-involead
Copy link
Author

Hey Tom, thanks for checking this! I’m currently re-running model_distillation.py on the latest master and will confirm shortly if the issue persists.

Quick question though — the error I saw was from TrainerState.log_history due to np.float32 metrics failing JSON serialization.

Since #3096 focuses on evaluator outputs, just wanted to double-check: could this fix also have impacted TrainerState.log_history?

I’ll also share my transformers version in case it’s relevant. Thanks again!!

@tomaarsen
Copy link
Member

tomaarsen commented Mar 19, 2025

Quick question though — the error I saw was from TrainerState.log_history due to np.float32 metrics failing JSON serialization.

Since #3096 focuses on evaluator outputs, just wanted to double-check: could this fix also have impacted TrainerState.log_history?

Yes, it seems like the evaluator outputs also affect log_history. I looked into the trainer_state.json that was being saved on my side (with master), and I saw this:

{
  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.3888888888888888,
  "eval_steps": 100,
  "global_step": 500,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 0.2777777777777778,
      "eval_runtime": 1.2965,
      "eval_samples_per_second": 0.0,
      "eval_steps_per_second": 0.0,
      "eval_sts-dev_pearson_cosine": 0.7796666906730896,
      "eval_sts-dev_spearman_cosine": 0.7730397171620476,
      "step": 100
    },
    {
      "epoch": 0.5555555555555556,
      "eval_runtime": 1.2716,
      "eval_samples_per_second": 0.0,
      "eval_steps_per_second": 0.0,
      "eval_sts-dev_pearson_cosine": 0.8386008826071591,
      "eval_sts-dev_spearman_cosine": 0.8350261474348337,
      "step": 200
    },
    {
      "epoch": 0.8333333333333334,
      "eval_runtime": 1.2818,
      "eval_samples_per_second": 0.0,
      "eval_steps_per_second": 0.0,
      "eval_sts-dev_pearson_cosine": 0.8555351042472681,
      "eval_sts-dev_spearman_cosine": 0.8517600639199182,
      "step": 300
    },
    {
      "epoch": 1.1111111111111112,
      "eval_runtime": 1.2651,
      "eval_samples_per_second": 0.0,
      "eval_steps_per_second": 0.0,
      "eval_sts-dev_pearson_cosine": 0.8610353558300488,
      "eval_sts-dev_spearman_cosine": 0.8579782016575516,
      "step": 400
    },
    {
      "epoch": 1.3888888888888888,
      "grad_norm": 0.6396923661231995,
      "learning_rate": 1.4489164086687308e-05,
      "loss": 0.0442,
      "step": 500
    },
    {
      "epoch": 1.3888888888888888,
      "eval_runtime": 1.2579,
      "eval_samples_per_second": 0.0,
      "eval_steps_per_second": 0.0,
      "eval_sts-dev_pearson_cosine": 0.862790555751594,
      "eval_sts-dev_spearman_cosine": 0.8611115838346327,
      "step": 500
    }
  ],
  "logging_steps": 500,
  "max_steps": 1440,
  "num_input_tokens_seen": 0,
  "num_train_epochs": 4,
  "save_steps": 100,
  "stateful_callbacks": {
    "TrainerControl": {
      "args": {
        "should_epoch_stop": false,
        "should_evaluate": false,
        "should_log": false,
        "should_save": true,
        "should_training_stop": false
      },
      "attributes": {}
    }
  },
  "total_flos": 0.0,
  "train_batch_size": 16,
  "trial_name": null,
  "trial_params": null
}

So, it looks like the evaluation results (e.g. "eval_sts-dev_pearson_cosine") are definitely saved in here, I think this is where the issue originated. Most of the other values are produced by transformers (which Sentence Transformers extends for the training), so if it breaks for us it would also have to break for transformers (but I haven't seen any issues about that yet).

  • Tom Aarsen

@belo-involead
Copy link
Author

Hey @tomaarsen ,

Apologies for the delay, Thanks for the detailed insight on the trainer_state.json and log_history. I dug into it a bit more, and it turns out the root of the issue is with how the evaluator results are handled in Sentence Transformers.

Specifically, in sentence_transformers/training/losses/MultipleNegativesRankingLoss.py, the call to trainer.evaluate(eval_dataset=evaluator) returns a dict of results, but unlike the Trainer.evaluate() call in Hugging Face transformers (which internally logs to log_history), Sentence Transformers doesn’t seem to manage logging for these evaluator results in the same way. As a result, when the trainer.log(output) line runs, it appends the full dict—including all evaluator metrics—directly into log_history. This causes the json.dump to eventually crash due to type serialization issues (e.g., if it encounters a tensor or other non-serializable type).
That said, I’m using transformers version 4.49.0

So essentially, this evaluator output is getting appended without being properly sanitized or logged via the standard Trainer mechanisms, leading to the JSON dumping error. That’s why we see extra entries in log_history with all the eval_sts-* metrics from the evaluator.

Let me know what you think

@tomaarsen
Copy link
Member

I'm a bit confused, sentence_transformers/training/losses/MultipleNegativesRankingLoss.py does not call trainer.evaluate, nor does that path exist (the training part doesn't exist). Also, it's not possible to call trainer.evaluate with an evaluator, that's only possible with an evaluation dataset. Also, the internal evaluation_loop should now all be sanitized, although I'm not 100% sure that they all are.

I'm also on transformers v4.49.0 - I haven't had issues yet.

Do you still get errors if you run model_distillation.py?

  • Tom Aarsen

@belo-involead
Copy link
Author

To clarify, I did run model_distillation.py on the latest master branch of Sentence Transformers using transformers v4.49.0, and I still hit the JSON serialization error. Here’s the relevant part of the traceback:

TypeError: Object of type float32 is not JSON serializable

Full traceback (trimmed for brevity):
...
File "trainer_callback.py", line 143, in save_to_json
json_string = json.dumps(dataclasses.asdict(self), indent=2, sort_keys=True) + "\n"
...
File "json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type float32 is not JSON serializable

From what I can tell, the evaluator output still seems to be getting appended to log_history with non-serializable types (like float32), which triggers this crash during the JSON save.

Let me know if you'd like any more details from my end!

@tomaarsen
Copy link
Member

Hmm, that's tricky. I'm still not able to reproduce it. I just ran the script and it uploaded https://huggingface.co/tomaarsen/TinyBERT_L-4_H-312_v2-distilled-from-stsb-roberta-base-v2-new with 3.5.0.dev0 (i.e. master) and transformers v4.49.0.

  • Tom Aarsen

@belo-involead
Copy link
Author

thanks again for double-checking and running the script, it’s super helpful to know that it worked on your end and that you even uploaded the result, really appreciate that!

I was thinking if this could be an environment-specific edge case. I’m running this on Python 3.12.1 (inside a VSCode Dev Container workspace) and I’ve seen that json.dumps can sometimes be stricter about types in newer Python versions.

Do you think it’s possible that some evaluator outputs might still return float32 under certain conditions (maybe based on dataset or hardware)? I’m happy to try and dig into this further or test with an older Python version if you think it might help narrow it down.

@tomaarsen
Copy link
Member

I was able to reproduce it in Python 3.12, but only once for some reason. I think I was able to find the underlying issue, though. I'll open a PR in a bit.

  • Tom Aarsen

@belo-involead
Copy link
Author

Hey @tomaarsen really appreciate you taking the time to look into this and track it down! Makes sense now, especially with Python 3.12 being a bit stricter on JSON serialization. The fix in the PR looks clean and definitely more robust than my initial idea of patching save_to_json directly.

Thanks again for following up and opening the PR

@tomaarsen
Copy link
Member

Thanks for the update; I've merged the alternative fix that should hopefully work on all example scripts instead of just the distillation one. I'm closing this one under the expectation that your issue is now solved, but please do let me know if the issue persists even with the master version:

pip install git+https://github.com/UKPLab/sentence-transformers
  • Tom Aarsen

@tomaarsen tomaarsen closed this Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants