Indexer / Clustering hyperparameter tuning #238

codemonk2023 · 2023-06-27T05:46:58Z

Hello,

Thank you for wonderful work. I am trying to train PECOS on a custom dataset and would like to understand indexer (example parameters below) hyperparameters. Is there any description of the hyperparameters one can explore? Also, it will be nice to understand each parameter.

train_params.preliminary_indexer_params.max_leaf_size = 380

train_params.preliminary_indexer_params.nr_splits = 2

train_params.refined_indexer_params.nr_splits = 2

train_params.ranker_params.nr_splits = 2

codemonk2023 · 2023-06-27T05:50:37Z

Also Can you help me point me difference between preliminary_indexer_params and refined_indexer_params?

jiong-zhang · 2023-06-28T22:54:31Z

Also Can you help me point me difference between preliminary_indexer_params and refined_indexer_params?

XR-Transformer will construct two HLTs:

Preliminary HLT will be used to construct the multi-resolution fine-tuning targets, with only sparse features, and is only used in training the encoder. This is controlled by preliminary_indexer_params.
Refined HLT will be used to train the rankers and will leverage both sparse and dense features to construct. This is controlled by refined_indexer_params.

jiong-zhang · 2023-06-28T22:54:34Z

For the document on hyper parameters see doc.

codemonk2023 · 2023-06-29T01:34:36Z

Is it correct to under sparse meaning tf-idf and dense is more of word embeddings/vectors? My precision at 5 is in 60s and recall is 80s, so my understanding is ranker params need to be tuned to improve precision, is it correct understanding?

Refined HLT will be used to train the rankers and will leverage both sparse and dense features to construct. This is controlled by refined_indexer_params.
My understanding this clustering used for negative samples to train ranker so please correct me.

Do you mean Machine learning matcher by encoder or the encoder model that encodes the text?

Also, what would be recommendation on playing with right splits for clustering for matcher and ranker? what would be good way to debug precision errors like printing cluster? Is it possible test ranker alone programmatically or indexer/matcher/ranker separately for understanding to tune?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer / Clustering hyperparameter tuning #238

Indexer / Clustering hyperparameter tuning #238

codemonk2023 commented Jun 27, 2023

codemonk2023 commented Jun 27, 2023

jiong-zhang commented Jun 28, 2023

jiong-zhang commented Jun 28, 2023

codemonk2023 commented Jun 29, 2023

Indexer / Clustering hyperparameter tuning #238

Indexer / Clustering hyperparameter tuning #238

Comments

codemonk2023 commented Jun 27, 2023

train_params.preliminary_indexer_params.max_leaf_size = 380

train_params.preliminary_indexer_params.nr_splits = 2

train_params.refined_indexer_params.nr_splits = 2

train_params.ranker_params.nr_splits = 2

codemonk2023 commented Jun 27, 2023

jiong-zhang commented Jun 28, 2023

jiong-zhang commented Jun 28, 2023

codemonk2023 commented Jun 29, 2023