Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: __init__() missing 1 required positional argument: 'text' while loading the model #64

Closed
Siddhijain16 opened this issue Apr 11, 2022 · 5 comments

Comments

@Siddhijain16
Copy link

Hi @jantrienes ,
I am following the step mention at README.md but I am not able to get any result.
While loading the model I am facing this error. Please help me out how to solve it.
image

Select downloaded model

model = './model_bilstmcrf_ons_large-v0.2.0/final-model.pt'

Instantiate tokenizer

tokenizer = TokenizerFactory().tokenizer(corpus='ons', disable=("tagger", "ner"))

Output ==> 2022-04-11 10:22:37.046 | INFO | deidentify.tokenizer.base:tokenizer:36 - Tokenizer for corpus: ons

Load tagger with a downloaded model file and tokenizer

tagger = FlairTagger(model=model, tokenizer=tokenizer, verbose=False)

Annotate your documents

annotated_docs = tagger.annotate(documents)

Output ==>
2022-04-11 10:30:37.252 | INFO | deidentify.taggers.flair_tagger:init:20 - Load flair model from /content/drive/MyDrive/47billion/model_bilstmcrf_ons_large-v0.2.0/final-model.pt

2022-04-11 10:30:37,257 loading file /content/drive/MyDrive/47billion/model_bilstmcrf_ons_large-v0.2.0/final-model.pt
2022-04-11 10:30:59,342 SequenceTagger predicts: Dictionary with 33 tags: , O, B-Date, B-Name, I-Name, B-Hospital, B-Internal_Location, B-Care_Institute, B-Initials, B-Organization_Company, I-Organization_Company, I-Date, B-ID, I-ID, I-Care_Institute, B-Address, I-Address, B-Age, I-Age, I-Hospital, I-Internal_Location, B-Phone_fax, B-Profession, I-Profession, I-Initials, B-Other, I-Other, B-URL_IP, I-Phone_fax, B-Email, B-SSN, ,

2022-04-11 10:31:00.242 | INFO | deidentify.taggers.flair_tagger:init:22 - Finish loading flair model.


TypeError Traceback (most recent call last)

in ()
2 tagger = FlairTagger(model=model, tokenizer=tokenizer, verbose=False)
3 # Annotate your documents
----> 4 annotated_docs = tagger.annotate(documents)

1 frames

/usr/local/lib/python3.7/dist-packages/deidentify/methods/bilstmcrf/flair_utils.py in standoff_to_flair_sents(docs, tokenizer, verbose)
59 flair_sents = []
60 for sent in sents:
---> 61 flair_sent = Sentence()
62 for token in sent:
63 if token.text.isspace():

TypeError: init() missing 1 required positional argument: 'text'

@jantrienes
Copy link
Collaborator

Hi @Siddhijain16, it appears that the flair sentence API changed with the recent release v0.11. Could you please try to downgrade the flair version for now?

pip install -U flair<0.11

You can also use the dependencies listed here: https://github.com/nedap/deidentify/blob/master/environment.yml

@Siddhijain16
Copy link
Author

Hi @Siddhijain16, it appears that the flair sentence API changed with the recent release v0.11. Could you please try to downgrade the flair version for now?

pip install -U flair<0.11

You can also use the dependencies listed here: https://github.com/nedap/deidentify/blob/master/environment.yml

Thank you @jantrienes for your quick reply. It's working now.
I have one more doubt regarding English text. By passing english text I am not able to get good annotated result. Any suggestion how can I get good result on English text?

@jantrienes
Copy link
Collaborator

@Siddhijain16 at this time, the pre-trained models are only available for Dutch. However, It's not too complicated to train an English deidentify model yourself. You can take a look here for pointers on how to get the English data (i2b2 and nursing notes), and how to train the models.

@Siddhijain16
Copy link
Author

@Siddhijain16 at this time, the pre-trained models are only available for Dutch. However, It's not too complicated to train an English deidentify model yourself. You can take a look here for pointers on how to get the English data (i2b2 and nursing notes), and how to train the models.
Thank you @jantrienes for your help. I will try that !!

@jantrienes
Copy link
Collaborator

@Siddhijain16 the original issue should be fixed with deidentify==0.7.3. I will close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants