add rwth_dbis learner models #284

Krishna-Rani-t · 2025-10-22T21:20:28Z

No description provided.

HamedBabaei

Hi @Krishna-Rani-t , the reviews are ready.

Please also check whether you are using pre-commit or not here, as I see a few issues with linting.

HamedBabaei · 2025-10-28T09:43:36Z

ontolearner/learner/__init__.py

 from .rag import AutoRAGLearner
 from .prompt import StandardizedPrompting
 from .label_mapper import LabelMapper
+from .taxonomy_discovery.rwthdbis import RWTHDBISSFTLearner as RWTHDBISTaxonomyLearner


using as here is not recommended. So i would recommend this way of importing.

from .taxonomy_discovery import RWTHDBISSFTLearner

don't add .rwdhdbis

keep only the class name from RWTHDBISSFTLearner to RWTHDBISTaxonomyLearner

HamedBabaei · 2025-10-28T09:44:35Z

ontolearner/learner/__init__.py

 from .prompt import StandardizedPrompting
 from .label_mapper import LabelMapper
+from .taxonomy_discovery.rwthdbis import RWTHDBISSFTLearner as RWTHDBISTaxonomyLearner
+from .term_typing.rwthdbis        import RWTHDBISSFTLearner as RWTHDBISTermTypingLearner


The previous line comment also applicable to this line of code as well.

HamedBabaei · 2025-10-28T09:45:51Z

requirements.txt

Remove changes from here and add them to the text section of the PR

HamedBabaei · 2025-10-28T09:50:03Z

ontolearner/learner/taxonomy_discovery/rwthdbis.py

Code-level documentation (functions' docstrings) is missing from this script!

HamedBabaei · 2025-10-28T09:51:49Z

ontolearner/learner/taxonomy_discovery/rwthdbis.py

+        self,
+        min_predictions: int = 1,
+        model_name: str = "distilroberta-base",
+        output_dir: str = "./results/{model_name}",


Here it would be great to do:

output_dir: str = "./results/taxonomy-discovery",

HamedBabaei · 2025-10-29T13:26:23Z

ontolearner/learner/term_typing/rwthdbis.py

+        self.id2label: Dict[int, str] = {}
+        self.label2id: Dict[str, int] = {}
+
+    def _term_typing(self, data: Any, test: bool = False) -> Optional[Any]:


please use if and else here for clarity

HamedBabaei · 2025-10-29T13:26:59Z

ontolearner/learner/term_typing/rwthdbis.py

+        terms = self._collect_eval_terms(data)
+        return self._predict_structured_output(terms)
+
+    def _load_robust_tokenizer(self, backbone: str) -> AutoTokenizer:


Why not just use AutoTokenizer?

HamedBabaei · 2025-10-29T13:28:04Z

ontolearner/learner/term_typing/rwthdbis.py

+                "  - pip install --upgrade sentencepiece\n"
+                "  - ensure network access for model files\n"
+                "  - clear your HF cache and retry\n"
+                "  - pin versions: transformers==4.43.*, tokenizers<0.20\n"


this conflicts with Ontolearner requirements! So please try to remove this exception, as I see it is not well aligned with the library

HamedBabaei · 2025-10-29T13:28:37Z

ontolearner/learner/term_typing/rwthdbis.py

+                f"Original error: {final_err}"
+            )
+
+    def _expand_multilabel_training_rows(


These function arguments are not well adjusted. Are you using pre-commits?

HamedBabaei · 2025-10-29T13:29:27Z

ontolearner/learner/taxonomy_discovery/rwthdbis.py

I see a few class functions are static, is there any specific reason for that? if it does, the other class (in term typing) doesn't have such feature!

HamedBabaei · 2025-10-29T13:57:30Z

Please also add unittests for models!

HamedBabaei · 2025-10-29T15:28:47Z

ontolearner/learner/__init__.py

@Krishna-Rani-t I see this is becoming problematic! So here is the new idea:

Let's not import the models here! so

from .taxonomy_discovery.skhnlp import SKHNLPSequentialFTLearner, SKHNLPZSLearner from .taxonomy_discovery.sbunlp import SBUNLPFewShotLearner from .term_typing.sbunlp import SBUNLPZSLearner from .text2onto import SBUNLPFewShotLearner as SBUNLPText2OntoLearner

or similar works will be removed from this init, and in the ontolearner/init.py you DO NOT NEED to do the following imports:

RWTHDBISTaxonomyLearner, RWTHDBISTermTypingLearner, SKHNLPZSLearner, SKHNLPSequentialFTLearner, SBUNLPFewShotLearner, SBUNLPZSLearner, SBUNLPText2OntoLearner)

In your examples, for loading lets say SKHNLPZSLearner, you will do this:

from ontolearner.learner.taxonomy_discover import SKHNLPZSLearner

so if you use the same class name inside the learner/term_typing / it will be

from ontolearner.learner.term_typing import SKHNLPZSLearner

HamedBabaei

Hi @Krishna-Rani-t , please check out the comments and once you made the fix please add comment of justification per comments.

HamedBabaei · 2025-11-02T03:00:48Z

ontolearner/learner/taxonomy_discovery/sbunlp.py

please add a clear docstring to the functions.

HamedBabaei · 2025-11-02T03:01:30Z

ontolearner/learner/taxonomy_discovery/sbunlp.py

+import os
+import re
+import json
+import importlib.util


why import is being done in this format?

HamedBabaei · 2025-11-03T00:00:14Z

ontolearner/learner/taxonomy_discovery/sbunlp.py

+
+        self.tokenizer: Optional[AutoTokenizer] = None
+        self.model: Optional[AutoModelForCausalLM] = None
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"


The device should be something that passed by user in argument! and default value it would be great to passed as a cpu!

HamedBabaei · 2025-11-03T00:00:47Z

ontolearner/learner/taxonomy_discovery/sbunlp.py

+
+        self.train_pairs_clean: List[Dict[str, str]] = []
+
+    # ----------------------- small helpers ----------------------


remove this (small helper comment line) and add notes inside the function docstring about it.

HamedBabaei · 2025-11-03T00:01:26Z

ontolearner/learner/taxonomy_discovery/sbunlp.py

+        if maybe_path:
+            os.makedirs(maybe_path, exist_ok=True)
+
+    # ---------------------- model load/gen ----------------------


again, remove this comment line and move it inside the load function.

HamedBabaei · 2025-11-03T02:55:40Z