[pull] master from deepset-ai:master #1

pull · 2022-07-08T20:30:44Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* first draft of openai node for QA * Update Documentation & Code Style * fix mypy. add node to inits * Update Documentation & Code Style * fix linter * Adapt OpenAIGenerator to completions endpoint * Update Documentation & Code Style * Fix pylint * Fix doc strings * Make use of temperature * Make use of api key in tests * Adapt doc strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <[email protected]> Co-authored-by: bogdankostic <[email protected]>

* Changing the name that crawled page is saved to avoid long file names error on some file systems * Custom naming function for saving crawled files * Update Documentation & Code Style * Remove bad characters on file name and preffix * Add test for naming function * Update Documentation & Code Style * Fix expensive regex recalculation and linter warns * Check for exceptions on file dump * Remove param_naming variable * Fix file paths on Windows, Linux and Mac * Update Documentation & Code Style * Test using one of the docstrings examples * Change default naming function Update docstrings * Applying formatting rules * Update Documentation & Code Style * Fix mypy incompatible assignment error * Remove unused type declaration * Fix typo * Update tests for naming function * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…r names (#2786) * Change split logic to list * Fix wrong parameter for run * Fix mypy error * Fix layout/raw parameter * Add test for filename with whitespaces on PDFToText * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Passing the all the meta-data in the summerizer * Disable metadata forwarding if `generate_single_summary` is `True` * Update Documentation & Code Style * simplify tests * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Remove caching and install audio deps * Fix `Tutorials` as well * Run all tutorials even though some fail * Forgot fi * fix failure condition * proper bash string equality * Enable debug logs * remove audio files * Update Documentation & Code Style * Use the setup action in the Tutorial CI as well * Try with a file that exists * Update Documentation & Code Style * Fix the comments in the tutorials * Update Documentation & Code Style * Fix tutorials.sh * Remove debug logging * import pprint and try editable install * Update Documentation & Code Style * extract no run list * Add tutorial18 to no run list nightly * import pprint correctly * Update Documentation & Code Style * try making site-packages editable * Make pythonpath editable every time Tut17 is run on CI * typo * fix imports in tut5 * add git clean * Update Documentation & Code Style * add comments and remove` -e` * accidentally deleted a line * Update .github/utils/tutorials.sh Co-authored-by: Massimiliano Pippi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <[email protected]>

) * Add exception for in the validation code * Update Documentation & Code Style * Add tests * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* api key check in node and tests * Clarify skip message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* change "text" to "content" as dict key * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Tutorial 12: add introduction * PR review for Tutorial 12: add introduction * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Clean OpenAIAnswerGenerator docstrings * Incorporate reviewer feedback Co-authored-by: Sara Zan <[email protected]>

* clean up tests and run earlier * use change detection * better naming, skip ES * more cleanup * fix job name * dummy commit to trigger the CI * mock away the PDF converter * make the test compatible with 3.7 * removed leftover * always run the api tests, use a matrix for the OS * refactor all the tests * remove outdated dependency * pylint * new abstract method * adjust for older python versions * rename pipeline file * address PR comments

* fix gold_contexts_similarity for table documents * check for type of gold_context

* Integrating the mock into conftest.py * re-enable workflow * delete_all * Update Documentation & Code Style * remove ValueError * Add empty response * wrong condition * return response * revert removal of delete_all * change mock * Update Documentation & Code Style * test for rest api, to revert Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…/search` endpoint (#2720) * let params contain something else than dictionaries * rewrite the test same style as the main branch

* enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment

* openai response check * Update Documentation & Code Style * Update haystack/nodes/answer_generator/openai.py Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * correct indentation * add OpenAIError * raise OpenAIError * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <[email protected]>

* Remove deprecated method prepare_seq2seq_batch

* 🐛 fix: update deployment status codes * Update Documentation & Code Style * adjust error log * added tests for failed state * added valid initial states * fix * fix tests * add test * updated comments * uncommented code again Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <[email protected]>

* Add support for model folder into BasePreProcessor * First draft of custom model on PreProcessor * Update Documentation & Code Style * Update tests to support custom models * Update Documentation & Code Style * Test for wrong models in custom folder * Default to ISO names on custom model folder Use long names only when needed * Update Documentation & Code Style * Refactoring language names usage * Update fallback logic * Check unpickling error * Updated tests using parametrize Co-authored-by: Sara Zan <[email protected]> * Refactored common logic * Add format control to NLTK load * Tests improvements Add a sample for specialized model * Update Documentation & Code Style * Minor log text update * Log model format exception details * Change pickle protocol version to 4 for 3.7 compat * Removed unnecessary model folder parameter Changed logic comparisons Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * Removed unused import * Change errors with warnings * Change to absolute path * Rename sentence tokenizer method Co-authored-by: tstadel * Check document content is a string before process * Change to log errors and not warnings * Update Documentation & Code Style * Improve split sentences method Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * Empty commit - trigger workflow * Remove superfluous parameters Co-authored-by: tstadel * Explicit None checking Co-authored-by: tstadel Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <[email protected]>

* modify notebook * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix corrupted csv if text contains \r chars; make csv serialization configurable * Update Documentation & Code Style * incorporate feedback * Update Documentation & Code Style * adjust columns to be converted during loading Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* add custom pydoc renderer * create an example * revert example code

* Simplification of language_model.py and tokenization.py to remove code duplication Co-authored-by: vblagoje <[email protected]>

* modify notebook * skip invalid metadata * Update Documentation & Code Style * fix nonetype * fix nonetype * drop nonetype from valid types * drop nonetype from valid types * fix * Update sql.py * sqlalchemy validation * removed newlines * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* restart tutorials in the loop * remove container steps in tutorials.yml * forgotten quotes * unmatched bracket * give names to containers * try to limit the log size * make the containers restart on the scripts as well * feedback * Raise integration tests timeout * raising limit again

* move logging config from haystack lib to application * Update Documentation & Code Style * config logging before importing haystack * Update Documentation & Code Style * add logging config to all tutorials * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…rocessor (#2932) * Add page number to Documents coming from PDFConverters and PreProcessor * Fix mypy * Update API Docs * Update API Docs * Remove unused imports * Generate JSON schema * Generate JSON schema * Make test variable shorter * Make regex a separate function * Move counting of page breaks to a function * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * Update API Documentation * Don't create instance for testing staticmethod * Update haystack/nodes/preprocessor/preprocessor.py Co-authored-by: Agnieszka Marzec <[email protected]> Co-authored-by: Agnieszka Marzec <[email protected]>

* use hashlib.md5() instead of (interpreter dependent) hash() funtion to generate MultiLabel id * add tests to assess constancy of MultiLabel.id * make test_multilabel_id test ensure that MultiLabel ids are always the same

* feat: fetch results for DeepsetCloudExperiments * chore: test DC fetch predicitons for eval run * chore: switch to dict iteration with .items() * chore: update DC url to fetch predictions from * chore: update doc strings for fetching eval run results * chore: update DeepsetCloudExperiments description, change function names for fetching predictions of an eval run * chore: test for DeepsetCloudExperiments.get_run_results * chore: adjust request mock for test_get_eval_run_results * chore: push first row of dataframe into variable for test checks * chore: adjust mock data to correct data types * chore: make documentation more readable with line breaks * chore: update documentation for eval run result fetching

#3001) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI

…ine.run` (#2981) * Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` This is to fix #2968 * Bug: variable `i` was already in use * Removing unused imports * Removing unused import * [EMPTY] Re-trigger CI * Addressing concerns raised pre-review - Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines. * Refactoring based on reviews

…ne` (#2984) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: #2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI

* Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <[email protected]>

…e` when creating index (#3030) * fix embeddings_field_supports_similarity when creating index * fix test

* add delete_all_files() * rename `file` to `files` * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * streamline "If set to None" and "to the API call" Co-authored-by: Agnieszka Marzec <[email protected]>

* Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <[email protected]> * Improve logs Co-authored-by: Agnieszka Marzec <[email protected]> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <[email protected]>

…ntStore` (#2846) * Raise error upon duplicate document key found within meta info * value error msg fix * Update Documentation & Code Style * Raise exception instead of asserting * Update Documentation & Code Style * add test

* first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * add markdown * first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * add markdown * little corrections * little corrections and add py tutorial * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * update tutorial webpage * fix typo Co-authored-by: Agnieszka Marzec <[email protected]> Co-authored-by: Thomas Stadelmann <[email protected]>

…nAI Node) (#3036) * enable isolated node eval for answer generator nodes * adjust comment * remove unused import * fix mypy Co-authored-by: tstadel <[email protected]>

…ationPipeline` (#3034) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline

* Change text to content * Change text to content

* Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback

* Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)

* chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <[email protected]>

* update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi

* changing Slack to Discord * Update README.md * updating contributing

tholor and others added 30 commits July 8, 2022 13:59

API key check in OpenAIAnswerGenerator (#2791)

d8e7aae

* api key check in node and tests * Clarify skip message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Change "text" to "content" as dict key (#2800)

f599ce9

* change "text" to "content" as dict key * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Tutorial 12: add introduction (#2798)

2a7e333

* Tutorial 12: add introduction * PR review for Tutorial 12: add introduction * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Clean OpenAIAnswerGenerator docstrings (#2797)

0388284

* Clean OpenAIAnswerGenerator docstrings * Incorporate reviewer feedback Co-authored-by: Sara Zan <[email protected]>

Fix gold_contexts_similarity for table retrieval evaluation (#2815)

e6d8bcd

* fix gold_contexts_similarity for table documents * check for type of gold_context

Allow values that are not dictionaries in the request params in the `…

632cd1c

…/search` endpoint (#2720) * let params contain something else than dictionaries * rewrite the test same style as the main branch

Enable CI on tutorials (#2801)

48644b2

* enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment

fix healtcheck cmds for annotation tool postgres (#2840)

9ad90b2

remove unnecessary if else block #2835 (#2842)

925eedd

changed mock pinecone to use dict rather than list index (#2845)

a4e197c

Remove deprecated method prepare_seq2seq_batch (#2852)

de6b9c3

* Remove deprecated method prepare_seq2seq_batch

Exclude docker from Tutorial 15 (#2861)

e350781

* modify notebook * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add a custom pydoc renderer for Readme.io (#2825)

8ee2b6b

* add custom pydoc renderer * create an example * revert example code

Simplify language_modeling.py and tokenization.py (#2703)

4e45062

* Simplification of language_model.py and tokenization.py to remove code duplication Co-authored-by: vblagoje <[email protected]>

Raise timeout on integration tests (#2880)

5119acb

bogdankostic and others added 28 commits August 9, 2022 15:55

chore: add topic tags to auto generation of release notes (#3008)

b685409

switch label variables in test_labels (#3011)

524c9b9

convert to set so not relying on correct order (#3015)

5d4e3bd

fix run_batch unbound error (#3016)

5778b6f

revert Remove pipes (#3006)

2cd65e9

Move azure-core pin into the dev dependency list (#3022)

fc8ecbf

Resolving issue 2853: no answer logic in FARMReader (#2856)

44e2b1b

* Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ci: Increase Weaviate's disk usage + print docker logs (#3026)

81a5949

test: add meta fields for meta_config to be used during testing (#3021)

26c938a

* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <[email protected]>

Fix embeddings_field_supports_similarity of `OpenSearchDocumentStor…

668fd54

…e` when creating index (#3030) * fix embeddings_field_supports_similarity when creating index * fix test

feat: Enable isolated node eval for answer generator nodes (incl. Ope…

1b422ab

…nAI Node) (#3036) * enable isolated node eval for answer generator nodes * adjust comment * remove unused import * fix mypy Co-authored-by: tstadel <[email protected]>

bug: Make TranslationWrapperPipeline work with `QuestionAnswerGener…

3a849d6

…ationPipeline` (#3034) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline

Update Summarizer Docs (#3032)

7312f99

* Change text to content * Change text to content

docs: update File Classifier Docstring (#3018)

ff38a20

* Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback

Fix broken MultiLabel serialization (#3037)

0aa0c68

* Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)

chore: fix typo in API docs (#3023)

d617553

* chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <[email protected]>

Upgrade to v1.7.0 and copy docs folder (#3014)

baefd32

* update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi

changing Slack to Discord (#3040)

2298155

* changing Slack to Discord * Update README.md * updating contributing

masci deleted the branch MarcelRaschke:master August 16, 2022 10:03

masci deleted the master branch August 16, 2022 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from deepset-ai:master #1

[pull] master from deepset-ai:master #1

pull bot commented Jul 8, 2022

[pull] master from deepset-ai:master #1

Are you sure you want to change the base?

[pull] master from deepset-ai:master #1

Conversation

pull bot commented Jul 8, 2022