st to 4.0.1 & lint & model_card & fix mps cached_contrastive #107

sam-hey · 2025-03-30T10:31:00Z

chor: st to 4.0.2
fix: cached_contrastive.py Test test_contrastive_trainingfailing on mps device. #106
update model_card to use feat: add 'Path' parameter for ModelCard template huggingface/sentence-transformers#3253
linting
ci: rerun test_load_model on fail
ci: add cache to ci to reduce errors when loading files from hf

NohTow · 2025-03-31T10:16:39Z

add cache to ci to reduce errors when loading files from hf
Thank god, I had to rerun a bunch of tests just because of this lately...
fix: cached_contrastive.py #106
I have two questions regarding this fix:

Did you test if training was pretty sane on MPS devices? The affected line is only to make the different forward passes on the same elements be equivalent by fixing randomness state (e.g, for dropout), but I wonder how it goes when ignoring this. If it does not work properly, I would rather we find a solution or raise an error when using MPS devices.
I am pretty bad at handling MPS right now, is there any setup where torch.backends.mps.is_available(): could return true while other kind of devices are available/are actually being used for training?

update model_card to use huggingface/sentence-transformers#3253

Thank you the handling the PR on ST and also adding it here! FWIU what you did should work (did you test it?), the only nitpick I have is now we should remove the generate_model_card function overriding, even though you already removed it's usage and so the behavior is already what we want, it is cleaner to remove it to not let people think we override it!

chor: st to 4.0.1

I suppose this is because the PR has been included in this version that you want to do the upgrade in this PR.
That being said, although the major version change is mostly due to the reranker part and I believe Tom told me there shouldn't be any breaking change, maybe I should run some sanity checks on the trainings to check everything is correct before merging. Do you need this to be merged quickly?

Edit: also, obviously thanks for the contributions, really appreciate it!

sam-hey · 2025-03-31T11:16:31Z

Always glad to help!

I'm not too sure about the MPS implementation—the values looked ok, but I can't guarantee their correctness. Therefore, I switched to raising an error for MPS.

I am pretty bad at handling MPS right now, is there any setup where torch.backends.mps.is_available(): could return true while other kind of devices are available/are actually being used for training?

All devices with MPS do not have any other graphics card.

Thank you the handling the PR on ST and also adding it here! FWIU what you did should work (did you test it?),

Yes, i did:

model = models.ColBERT(
    model_name_or_path="colbert-ir/colbertv2.0",
    model_card_data=PylateModelCardData(language="de", model_name="testing"),
)
model.push_to_hub(
    "samheym/pylate-test", private=True, train_datasets=[], exist_ok=True
)

Do you need this to be merged quickly?

No, I have time to wait for this. Just out of curiosity, what are you checking manually? Is there any option to include these tests in the CI?

NohTow · 2025-03-31T12:27:25Z

No, I have time to wait for this. Just out of curiosity, what are you checking manually? Is there any option to include these tests in the CI?

Not much, just trying to launch some larger scale training (nothing fancy, just ms marco) and checking the results, to make sure there isn't any huge regression.
I think a proper training added to the tests would slow everything down even more, and there is very little chance the training is degraded while not crashing, but I am a bit paranoiac!

NohTow · 2025-04-07T12:26:18Z

Hello,
I did some tests and it seems to work fine!
Could you just change the st version directly to 4.0.2? It introduces a fix to multi gpu training

setup.py

tomaarsen · 2025-04-07T13:40:04Z

Hello!

I didn't investigate this PR with much care, but I do want to share that v4.0 was designed to not introduce anything breaking, especially not if you don't use CrossEncoder (training). Nice work here @sam-hey!

Tom Aarsen

NohTow · 2025-04-07T13:45:57Z

Yeah no it's just me being paranoiac, but I did some unrelated training and take the opportunity to test the memory fix/sanity check training and it's fine!
(but thanks for being cautious with releases and pinging us when there might be breaking changes, really appreciated!)

Co-authored-by: Antoine Chaffin <[email protected]>

sam-hey · 2025-04-08T06:21:10Z

Thanks a lot to you @tomaarsen @NohTow — it’s always a pleasure working with you!

Best
Sam

NohTow · 2025-04-08T07:29:45Z

LGTM thanks!

sam-hey added 7 commits March 30, 2025 12:26

st to 4.0.1 & lint & model_card & fix mps

fa548b1

ci: rerun ci on hf loading error

547edf5

ci: add other fn to reruns on fail

0967b86

add hf cache to ci

e25ef11

add pytest rerun to dev

82464ca

add pip cache

4d6c03b

ci:

094c2eb

sam-hey added 2 commits March 31, 2025 12:29

remove generate_model_card

9fb50a7

raise RuntimeError for mps device using RandContext

fae34b5

NohTow reviewed Apr 7, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Update setup.py

e93f138

Co-authored-by: Antoine Chaffin <[email protected]>

NohTow merged commit 5940bc7 into lightonai:main Apr 8, 2025
9 checks passed

NohTow mentioned this pull request Apr 8, 2025

Test test_contrastive_trainingfailing on mps device. #106

Closed

sam-hey deleted the ref/model_card branch April 8, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

st to 4.0.1 & lint & model_card & fix mps cached_contrastive #107

st to 4.0.1 & lint & model_card & fix mps cached_contrastive #107

Uh oh!

sam-hey commented Mar 30, 2025 •

edited

Loading

Uh oh!

NohTow commented Mar 31, 2025 •

edited

Loading

Uh oh!

sam-hey commented Mar 31, 2025

Uh oh!

NohTow commented Mar 31, 2025

Uh oh!

NohTow commented Apr 7, 2025

Uh oh!

Uh oh!

tomaarsen commented Apr 7, 2025

Uh oh!

NohTow commented Apr 7, 2025 •

edited

Loading

Uh oh!

sam-hey commented Apr 8, 2025

Uh oh!

NohTow commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

st to 4.0.1 & lint & model_card & fix mps cached_contrastive #107

st to 4.0.1 & lint & model_card & fix mps cached_contrastive #107

Uh oh!

Conversation

sam-hey commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NohTow commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sam-hey commented Mar 31, 2025

Uh oh!

NohTow commented Mar 31, 2025

Uh oh!

NohTow commented Apr 7, 2025

Uh oh!

Uh oh!

tomaarsen commented Apr 7, 2025

Uh oh!

NohTow commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sam-hey commented Apr 8, 2025

Uh oh!

NohTow commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sam-hey commented Mar 30, 2025 •

edited

Loading

NohTow commented Mar 31, 2025 •

edited

Loading

NohTow commented Apr 7, 2025 •

edited

Loading