Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hmmsearch callback tqdm update #60

Closed
jpjarnoux opened this issue Mar 4, 2024 · 4 comments
Closed

Hmmsearch callback tqdm update #60

jpjarnoux opened this issue Mar 4, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@jpjarnoux
Copy link

Hi,
I have a question about callback in the hmmsearch function. I would update my progress after each query, but my code does not work as expected.

bar = tqdm(range(len(hmm_list)), unit="hmm", desc="Align gene families to HMM", disable=disable_bar)
    options = {"bit_cutoffs": bit_cutoffs, 'callback': lambda p: bar.update()}
    for top_hits in pyhmmer.hmmsearch(hmm_list, gf_sequences, cpus=threads, **options):

Maybe I do not understand how to use it.

I update it manually at the end of the for loop to make it work for the time, but I would also use this to write the name of the HMM in a debug (with the logging package). So, it seems a good idea to define a callback function.

Thanks for your help

@althonos
Copy link
Owner

althonos commented Mar 4, 2024

Hi Jérôme,

The callback needs to take two arguments, the HMM object and the total number of currently loaded HMMs (useful in case you're reading the HMMs from a file, in which case the total is not known in advance and you can update it, tqdm doesn't support that but rich does).

In your snippet, that means:

options = {"bit_cutoffs": bit_cutoffs, 'callback': lambda hmm, total: bar.update()}

If i use only one argument like you did the progress bar is never updated, but since the exception is silenced the code enters a deadlock (the worker threads die on the exception, while the main thread still tries to pass them queries to process).

@althonos
Copy link
Owner

althonos commented Mar 4, 2024

I've patched the deadlock, so now with the code above you'd actually get the error and traceback:

  0%|                                                | 0/20795 [00:00<?, ?hmm/s]Traceback (most recent call last):
  File "/home/althonos/Code/pyhmmer/issue.py", line 18, in <module>
    for top_hits in pyhmmer.hmmsearch(hmms, sequences, cpus=2, callback=callback):
  File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 520, in _multi_threaded
    yield results[0].get()
          ^^^^^^^^^^^^^^^^
  File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 122, in get
    raise self.exception
  File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 215, in run
    hits = self.process(chore.query)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/althonos/Code/pyhmmer/pyhmmer/hmmer.py", line 232, in process
    self.callback(query, self.query_count.value)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: <lambda>() takes 1 positional argument but 2 were given

I'll publish a patch shortly for the deadlock issue, but you don't need to wait for it, just to change the callback signature for your code to work 👍

@althonos althonos added the bug Something isn't working label Mar 4, 2024
@althonos althonos closed this as completed Mar 4, 2024
@jpjarnoux
Copy link
Author

Hi, Thank you very much for your quick reply.
In my case, I have only one HMM per file, so I assume I could consider the length of my pyhmmer.plan7.HMM list as the total number of HMM.
Could you say how I could get the HMM object from the TopHits or Hit object? It's not clear to me.

@althonos
Copy link
Owner

althonos commented Mar 4, 2024

You basically have two choices:

  • The callback function has signature callback(query, total), and in the case of hmmsearch the query is the HMM object, so you could do have the following:

    def callback(hmm, total):
        logging.info("Finished annotation with HMM %s", hmm.name.decode())
        pbar.update()
    
    for top_hits in pyhmmer.hmmsearch(hmms, sequences, callback=callback):
        # ... #
  • The hmmsearch function is guaranteed to return one TopHits object per query, in the same order, so you can just use zip with your queries and your TopHits:

    for hmm, top_hits in zip(hmms, pyhmmer.hmmsearch(hmms, sequences)):
        logging.info("Finished annotation with HMM %s", hmm.name.decode())
        # ... #

@jpjarnoux jpjarnoux changed the title Hmmsearch callback tadm update Hmmsearch callback tqdm update Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants