Read stanford configs #88

NohTow · 2025-02-04T13:53:25Z

Right now, we are using default values for doc/query length, markers and attending to expansion tokens when reading stanford models.
This causes some issues as highlighted in #85 and also requires the user to specify a lot of information when using a model that is not using default values, as can be seen with the loading of Jina-ColBERT.

This PR simply add the reading process of the artifact.metadata file of Stanford NLP models and read markers, lengths and attend to expansion tokens values.

As usual, we override those if the user feed values to the init of the model. Also changed the attend_to_expansion_tokens parameter to match the other (None by default and override at the end).

NohTow · 2025-02-05T09:23:45Z

I think it should be ok @raphaelsty
Could you please review to make sure the behavior is in line with what's expected? I did some tests, but I might have missed something
Note that it is based on #87, so we should merge this one first

sam-hey · 2025-02-05T09:43:08Z

It seems that my reviews are in Pending state, so they can't be seen at the moment. I have just one small regrade:

I wouldn’t categorize this as a warning, as it’s expected behavior. It’s merely informing you that the StanfordNLP model has successfully loaded the weights.

pylate/pylate/models/colbert.py

Line 268 in f0899ab

logger.warning("Loaded the weights from Stanford NLP model.")

Similarly, I believe there should be a clear distinction between informational events and actual warnings. For instance, if there were an issue loading the file and it fell back to a default, that would merit a warning.

pylate/pylate/models/colbert.py

Lines 294 to 303 in f0899ab

    
               logger.warning("Loaded the configuration from Stanford NLP model.") 
        
           except EnvironmentError: 
        
               if self.query_prefix is None: 
        
                   self.query_prefix = "[unused0]" 
        
               if self.document_prefix is None: 
        
                   self.document_prefix = "[unused1]" 
        
               # We do not set the query/doc length as they'll be set to the default values afterwards. We do it for prefixes as the default from stanford is different from ours 
        
               logger.warning( 
        
                   "Could not load the configuration file from Stanford NLP model, using default values." 
        
               )

NohTow · 2025-02-05T16:07:30Z

That is very fair, I don't know why I started using logger.warning, I think I used it for a proper warning once and then copied it over the whole loading logic, my bad.
I'll clean that up before merging, thanks for the remark!

…d from stanford

NohTow · 2025-02-06T13:04:55Z

@sam-hey I changed some warnings to info
@raphaelsty you can merge after you have double checked the loading/overriding logic

raphaelsty · 2025-02-28T09:35:36Z

LGTM

NohTow mentioned this pull request Feb 4, 2025

ref: ruff python 3.8 #85

Closed

NohTow changed the title ~~[DRAFT] Read stanford configs~~ Read stanford configs Feb 5, 2025

NohTow force-pushed the read_stanford_configs branch from f0899ab to b957372 Compare February 6, 2025 10:00

NohTow added 4 commits February 6, 2025 10:36

Read stanford metadata

23c1b0c

Do not override attend_to_expansion_tokens to False by default if rea…

9c969d5

…d from stanford

Normalize the overriding for query/doc prefixes

164d0d6

Update the comment about the behavior of reading/overriding

d7e868c

NohTow force-pushed the read_stanford_configs branch from b957372 to d7e868c Compare February 6, 2025 10:43

Change some warnings to info

ad0c70f

raphaelsty merged commit 11d18b8 into main Feb 28, 2025
5 checks passed

NohTow deleted the read_stanford_configs branch March 24, 2025 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read stanford configs #88

Read stanford configs #88

Uh oh!

NohTow commented Feb 4, 2025

Uh oh!

NohTow commented Feb 5, 2025 •

edited

Loading

Uh oh!

sam-hey commented Feb 5, 2025 •

edited

Loading

Uh oh!

NohTow commented Feb 5, 2025

Uh oh!

NohTow commented Feb 6, 2025

Uh oh!

raphaelsty commented Feb 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Read stanford configs #88

Read stanford configs #88

Uh oh!

Conversation

NohTow commented Feb 4, 2025

Uh oh!

NohTow commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sam-hey commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NohTow commented Feb 5, 2025

Uh oh!

NohTow commented Feb 6, 2025

Uh oh!

raphaelsty commented Feb 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NohTow commented Feb 5, 2025 •

edited

Loading

sam-hey commented Feb 5, 2025 •

edited

Loading