feat: add batch evaluation method for pipelines #2942

julian-risch · 2022-08-01T16:26:23Z

Related Issue(s):
closes #2636

Proposed changes:

Add a pipeline.eval_batch method
Add a _build_eval_dataframe_from_batches method that calls _build_eval_dataframe internally
- I went for this solution to keep code duplication to a minimum. _build_eval_dataframe is very complex already (the method comprises >300 lines of code and would need some refactoring to be simplified)
Group some code in _add_sas_to_eval_result to avoid code duplication
Copy most eval tests into test/pipelines/test_eval_batch.py and make them use pipeline.eval_batch
Add use_batch_mode option to execute_eval_run with default set to False until pipeline.eval_batch is always faster as pipeline.eval

Limitations:

I faced multiprocessing issues as discussed offline and the current workaround is setting num_processes or max_processes to 1.
~~Up for discussion: Should standard_pipelines.eval and standard_pipelines.eval_batch have a documents parameter that they pass on?~~ We decided no, it's not needed at the moment.
run_batch does not support different filters (or more generally speaking any different params per query) and thus eval_batch cannot support filters that differ per query and its label. Thus, labels must not have filters, for example test case test_extractive_qa_labels_with_filters won’t work with eval_batch

Currently the following tests are commented out because they are expected to fail due to other issues:

test_extractive_qa_eval_translation because of TranslationWrapperPipeline returns wrong query format in debug mode #2964
test_qa_multi_retriever_pipeline_eval because of Incomplete debug results for pipeline with two retrievers #2962
test_multi_retriever_pipeline_eval because of Incomplete debug results for pipeline with two retrievers #2962
test_multi_retriever_pipeline_with_asymmetric_qa_eval because of Incomplete debug results for pipeline with two retrievers #2962

Pre-flight checklist

I have read the contributors guidelines
~~I have enabled actions on my fork~~
If this is a code change, I added tests or updated existing ones
If this is a code change, I updated the docstrings

docs/_src/api/api/pipelines.md

haystack/pipelines/standard_pipelines.py

Co-authored-by: Agnieszka Marzec <[email protected]>

…h-eval

julian-risch · 2022-08-09T13:15:22Z

@agnieszka-m Thank you for the detailed feedback. All the change requests are addressed now.

tstadel

Looks already quite nice! I left some comments that I think need to be fixed before merging:

there seems to be a small problem with add_doc_meta_data_to_answer within the reader implementation
adding sas values and sorting the dataframe should not both be done in a method called _add_sas_to_eval_result
there are some duplicate tests that can be deleted (if I didn't oversee something)

test/pipelines/test_eval_batch.py

haystack/nodes/reader/base.py

haystack/pipelines/base.py

tstadel · 2022-08-09T15:45:24Z

Up for discussion: Should standard_pipelines.eval and standard_pipelines.eval_batch have a documents parameter that they pass on?

I think it's fine to not have them here, as they are only needed by non-standard pipelines till now.

vblagoje

Left some minor comments @julian-risch Not sure if they warrant any changes but I'll leave the state in Request changes

haystack/nodes/reader/base.py

vblagoje · 2022-08-23T08:20:51Z

haystack/nodes/reader/base.py

        top_k: Optional[int] = None,
        batch_size: Optional[int] = None,
+        labels: Optional[List[MultiLabel]] = None,
+        add_isolated_node_eval: bool = False,


Do we need this parameter add_isolated_node_eval? As a user of this API it wasn't clear to me immediately what it is about and why do we need it?

Yes, we need it. It's the same parameter as in the standard run(). If it is set to True, the evaluation is executed with labels as node inputs in addition to the integrated evaluation, where the node inputs are the outputs of the previous node in the pipeline.

vblagoje · 2022-08-23T08:29:01Z

haystack/pipelines/standard_pipelines.py

+        params: Optional[dict] = None,
+        sas_model_name_or_path: Optional[str] = None,
+        sas_batch_size: int = 32,
+        sas_use_gpu: bool = True,


I traced the sas_use_gpu parameter being passed to CrossEncoder via semantic_answer_similarity method. Let's keep in mind that we should soon replace all use_gpu parameter with devices parameter (as per #3062 and #2826) Just to keep in mind as todo item.

vblagoje · 2022-08-24T11:11:58Z

haystack/pipelines/base.py

+                    context_matching_boost_split_overlaps=context_matching_boost_split_overlaps,
+                    context_matching_min_length=context_matching_min_length,
+                    context_matching_threshold=context_matching_threshold,
+                )


These are the same parameters for both function calls - not sure if it makes sense to create a dict and then unpack the dict in the two method calls. Maybe that's a bad practice, but it makes the code more compact.

Interesting thought. In this case here, I would leave it as is because it's part of the user-facing function pipeline.eval_batch that users can call directly and it occurs only twice here. I think listing all the parameters is more intuitive and easier to understand for users than having some dictionary/custom datastructure that they first need to understand. If it was a function that is used internally only or if it occurred much more often, we could make it more compact with your suggestion yes, I agree.

I agree, good points.

vblagoje · 2022-08-24T11:19:38Z

haystack/pipelines/base.py

+            if params is None:
+                params = {}
+            else:
+                params = params.copy()


How about one-liner params = {} if params is None else params.copy()

vblagoje

Rebase and it LGTM @julian-risch

@tstadel

Change requests are addressed. Thank you for your feedback! @tstadel

julian-risch added 4 commits August 1, 2022 18:10

add basic pipeline.eval_batch for qa without filters

9d1f2dd

Merge branch 'master' into batch-eval

4a398a3

black formatting

59b64c3

pydoc-markdown

789cc2d

julian-risch added topic:speed topic:pipeline labels Aug 1, 2022

remove batch eval tests failing due to bugs

9a6c681

This was referenced Aug 3, 2022

Incomplete debug results for pipeline with two retrievers #2962

Closed

TranslationWrapperPipeline returns wrong query format in debug mode #2964

Closed

julian-risch added 9 commits August 8, 2022 10:55

remove comment

671826a

explain commented out tests

ad40a55

avoid code duplication

1bda44b

black

7bacd93

mypy

8cb7a51

pydoc markdown

5cf8f0a

add batch option to execute_eval_run

b3b57f5

pydoc markdown

9aad547

Merge branch 'master' into batch-eval

313a5e8

julian-risch marked this pull request as ready for review August 8, 2022 13:36

julian-risch requested review from a team as code owners August 8, 2022 13:36

julian-risch requested a review from tstadel August 8, 2022 13:49

agnieszka-m requested changes Aug 9, 2022

View reviewed changes

julian-risch and others added 7 commits August 9, 2022 14:04

Apply documentation suggestions from code review

2738994

Co-authored-by: Agnieszka Marzec <[email protected]>

Apply documentation suggestion from code review

d184d20

Co-authored-by: Agnieszka Marzec <[email protected]>

add documentation based on review comments

07682eb

Merge branch 'batch-eval' of github.com:deepset-ai/haystack into batc…

a2d4d6f

…h-eval

black

195f8a1

black

17f750d

schema updates

16076ea

julian-risch requested a review from agnieszka-m August 9, 2022 13:14

tstadel previously requested changes Aug 9, 2022

View reviewed changes

julian-risch added 4 commits August 9, 2022 17:50

remove duplicate tests

7339b45

add separate method for column reordering

0fa1bcb

merge _build_eval_dataframe methods

a1ac6b4

pylint ignore in function

afd03a5

julian-risch requested a review from vblagoje August 17, 2022 15:00

agnieszka-m approved these changes Aug 17, 2022

View reviewed changes

vblagoje requested changes Aug 23, 2022

View reviewed changes

vblagoje reviewed Aug 24, 2022

View reviewed changes

julian-risch added 2 commits August 24, 2022 15:26

change type annotation of queries to list only

4b0b242

one-liner addressing review comment on params dict

ab4dccb

vblagoje approved these changes Aug 25, 2022

View reviewed changes

julian-risch added 2 commits August 25, 2022 16:34

black

af434ee

markdown files updated

16cf698

julian-risch merged commit 3e3ff33 into main Aug 25, 2022

julian-risch deleted the batch-eval branch August 25, 2022 15:50

bglearning mentioned this pull request Sep 5, 2022

Pipeline eval breaks when node has empty answers list in Tutorial 15 TableQA #3159

Closed

feat: add batch evaluation method for pipelines #2942

feat: add batch evaluation method for pipelines #2942

Uh oh!

Conversation

julian-risch commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-flight checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

julian-risch commented Aug 9, 2022

Uh oh!

tstadel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tstadel commented Aug 9, 2022

Uh oh!

vblagoje left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vblagoje left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

julian-risch commented Aug 1, 2022 •

edited

Loading