forked from deepset-ai/haystack
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] master from deepset-ai:master #1
Open
pull
wants to merge
104
commits into
MarcelRaschke:master
Choose a base branch
from
deepset-ai:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* first draft of openai node for QA * Update Documentation & Code Style * fix mypy. add node to inits * Update Documentation & Code Style * fix linter * Adapt OpenAIGenerator to completions endpoint * Update Documentation & Code Style * Fix pylint * Fix doc strings * Make use of temperature * Make use of api key in tests * Adapt doc strings Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ZanSara <[email protected]> Co-authored-by: bogdankostic <[email protected]>
* Changing the name that crawled page is saved to avoid long file names error on some file systems * Custom naming function for saving crawled files * Update Documentation & Code Style * Remove bad characters on file name and preffix * Add test for naming function * Update Documentation & Code Style * Fix expensive regex recalculation and linter warns * Check for exceptions on file dump * Remove param_naming variable * Fix file paths on Windows, Linux and Mac * Update Documentation & Code Style * Test using one of the docstrings examples * Change default naming function Update docstrings * Applying formatting rules * Update Documentation & Code Style * Fix mypy incompatible assignment error * Remove unused type declaration * Fix typo * Update tests for naming function * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…r names (#2786) * Change split logic to list * Fix wrong parameter for run * Fix mypy error * Fix layout/raw parameter * Add test for filename with whitespaces on PDFToText * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Passing the all the meta-data in the summerizer * Disable metadata forwarding if `generate_single_summary` is `True` * Update Documentation & Code Style * simplify tests * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Remove caching and install audio deps * Fix `Tutorials` as well * Run all tutorials even though some fail * Forgot fi * fix failure condition * proper bash string equality * Enable debug logs * remove audio files * Update Documentation & Code Style * Use the setup action in the Tutorial CI as well * Try with a file that exists * Update Documentation & Code Style * Fix the comments in the tutorials * Update Documentation & Code Style * Fix tutorials.sh * Remove debug logging * import pprint and try editable install * Update Documentation & Code Style * extract no run list * Add tutorial18 to no run list nightly * import pprint correctly * Update Documentation & Code Style * try making site-packages editable * Make pythonpath editable every time Tut17 is run on CI * typo * fix imports in tut5 * add git clean * Update Documentation & Code Style * add comments and remove` -e` * accidentally deleted a line * Update .github/utils/tutorials.sh Co-authored-by: Massimiliano Pippi <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Massimiliano Pippi <[email protected]>
* api key check in node and tests * Clarify skip message * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* change "text" to "content" as dict key * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Tutorial 12: add introduction * PR review for Tutorial 12: add introduction * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Clean OpenAIAnswerGenerator docstrings * Incorporate reviewer feedback Co-authored-by: Sara Zan <[email protected]>
* clean up tests and run earlier * use change detection * better naming, skip ES * more cleanup * fix job name * dummy commit to trigger the CI * mock away the PDF converter * make the test compatible with 3.7 * removed leftover * always run the api tests, use a matrix for the OS * refactor all the tests * remove outdated dependency * pylint * new abstract method * adjust for older python versions * rename pipeline file * address PR comments
* fix gold_contexts_similarity for table documents * check for type of gold_context
* Integrating the mock into conftest.py * re-enable workflow * delete_all * Update Documentation & Code Style * remove ValueError * Add empty response * wrong condition * return response * revert removal of delete_all * change mock * Update Documentation & Code Style * test for rest api, to revert Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…/search` endpoint (#2720) * let params contain something else than dictionaries * rewrite the test same style as the main branch
* enable ci on tutorials * Disable all path restrictions for safety * actually comment out the paths block * remove comment
* openai response check * Update Documentation & Code Style * Update haystack/nodes/answer_generator/openai.py Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * correct indentation * add OpenAIError * raise OpenAIError * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <[email protected]>
* Remove deprecated method prepare_seq2seq_batch
* 🐛 fix: update deployment status codes * Update Documentation & Code Style * adjust error log * added tests for failed state * added valid initial states * fix * fix tests * add test * updated comments * uncommented code again Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Thomas Stadelmann <[email protected]>
* Add support for model folder into BasePreProcessor * First draft of custom model on PreProcessor * Update Documentation & Code Style * Update tests to support custom models * Update Documentation & Code Style * Test for wrong models in custom folder * Default to ISO names on custom model folder Use long names only when needed * Update Documentation & Code Style * Refactoring language names usage * Update fallback logic * Check unpickling error * Updated tests using parametrize Co-authored-by: Sara Zan <[email protected]> * Refactored common logic * Add format control to NLTK load * Tests improvements Add a sample for specialized model * Update Documentation & Code Style * Minor log text update * Log model format exception details * Change pickle protocol version to 4 for 3.7 compat * Removed unnecessary model folder parameter Changed logic comparisons Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * Removed unused import * Change errors with warnings * Change to absolute path * Rename sentence tokenizer method Co-authored-by: tstadel * Check document content is a string before process * Change to log errors and not warnings * Update Documentation & Code Style * Improve split sentences method Co-authored-by: Sara Zan <[email protected]> * Update Documentation & Code Style * Empty commit - trigger workflow * Remove superfluous parameters Co-authored-by: tstadel * Explicit None checking Co-authored-by: tstadel Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sara Zan <[email protected]>
* modify notebook * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix corrupted csv if text contains \r chars; make csv serialization configurable * Update Documentation & Code Style * incorporate feedback * Update Documentation & Code Style * adjust columns to be converted during loading Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add custom pydoc renderer * create an example * revert example code
* Simplification of language_model.py and tokenization.py to remove code duplication Co-authored-by: vblagoje <[email protected]>
* modify notebook * skip invalid metadata * Update Documentation & Code Style * fix nonetype * fix nonetype * drop nonetype from valid types * drop nonetype from valid types * fix * Update sql.py * sqlalchemy validation * removed newlines * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* restart tutorials in the loop * remove container steps in tutorials.yml * forgotten quotes * unmatched bracket * give names to containers * try to limit the log size * make the containers restart on the scripts as well * feedback * Raise integration tests timeout * raising limit again
* move logging config from haystack lib to application * Update Documentation & Code Style * config logging before importing haystack * Update Documentation & Code Style * add logging config to all tutorials * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…rocessor (#2932) * Add page number to Documents coming from PDFConverters and PreProcessor * Fix mypy * Update API Docs * Update API Docs * Remove unused imports * Generate JSON schema * Generate JSON schema * Make test variable shorter * Make regex a separate function * Move counting of page breaks to a function * Generate JSON schema * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * Update API Documentation * Don't create instance for testing staticmethod * Update haystack/nodes/preprocessor/preprocessor.py Co-authored-by: Agnieszka Marzec <[email protected]> Co-authored-by: Agnieszka Marzec <[email protected]>
* use hashlib.md5() instead of (interpreter dependent) hash() funtion to generate MultiLabel id * add tests to assess constancy of MultiLabel.id * make test_multilabel_id test ensure that MultiLabel ids are always the same
* feat: fetch results for DeepsetCloudExperiments * chore: test DC fetch predicitons for eval run * chore: switch to dict iteration with .items() * chore: update DC url to fetch predictions from * chore: update doc strings for fetching eval run results * chore: update DeepsetCloudExperiments description, change function names for fetching predictions of an eval run * chore: test for DeepsetCloudExperiments.get_run_results * chore: adjust request mock for test_get_eval_run_results * chore: push first row of dataframe into variable for test checks * chore: adjust mock data to correct data types * chore: make documentation more readable with line breaks * chore: update documentation for eval run result fetching
#3001) * Adding support for additional distance metrics for Weaviate Fixes #3000 * Updating the docs * Fixing error texts * Fixing issues raised by the review * Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity` * [EMPTY] Re-trigger CI * Fixing things based on review * [EMPTY] Re-trigger CI
…ine.run` (#2981) * Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run` This is to fix #2968 * Bug: variable `i` was already in use * Removing unused imports * Removing unused import * [EMPTY] Re-trigger CI * Addressing concerns raised pre-review - Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines. * Refactoring based on reviews
…ne` (#2984) * Enable the `JoinDocuments` node to work with documents with `score=None` This fixes #2983 As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate. THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`. There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: #2436 This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried. The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default. * Fixing test bug * Addressing PR review comments - Extending unit tests - Simplifying logic * Making the sorting work even with no scores By making the no score being sorted as -Inf * Forgot to commit the change in `join_docs.py` * [EMPTY] Re-trigger CI * Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None` * Adjusting the arguments of `any()` * [EMPTY] Re-trigger CI
* Update FARMReader.eval_on_file to be consistent with FARMReader.eval * Update Documentation & Code Style Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore * Add documentation on metadata filtering in docstring * docs Co-authored-by: Sara Zan <[email protected]>
…e` when creating index (#3030) * fix embeddings_field_supports_similarity when creating index * fix test
* add delete_all_files() * rename `file` to `files` * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * Update haystack/utils/deepsetcloud.py Co-authored-by: Agnieszka Marzec <[email protected]> * streamline "If set to None" and "to the API call" Co-authored-by: Agnieszka Marzec <[email protected]>
* Add embedding_dim to dc store * Remove similarity from query params, it is not used * Remove unused `return_embedding` parameter * Remove unused param * Update the documentation * Update schemas * Revert openapi changes * Revert openapi changes * Fix openapi * Fix json schema * Improve docstrings Co-authored-by: Agnieszka Marzec <[email protected]> * Improve logs Co-authored-by: Agnieszka Marzec <[email protected]> * Update the docs * Fix similarity Co-authored-by: Agnieszka Marzec <[email protected]>
…ntStore` (#2846) * Raise error upon duplicate document key found within meta info * value error msg fix * Update Documentation & Code Style * Raise exception instead of asserting * Update Documentation & Code Style * add test
* first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * add markdown * first draft for tutorial extension * forgotten markdown * improved tutorial * Apply suggestions from code review Co-authored-by: Agnieszka Marzec <[email protected]> * add markdown * little corrections * little corrections and add py tutorial * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * Update tutorials/Tutorial14_Query_Classifier.ipynb Co-authored-by: Agnieszka Marzec <[email protected]> * update tutorial webpage * fix typo Co-authored-by: Agnieszka Marzec <[email protected]> Co-authored-by: Thomas Stadelmann <[email protected]>
…nAI Node) (#3036) * enable isolated node eval for answer generator nodes * adjust comment * remove unused import * fix mypy Co-authored-by: tstadel <[email protected]>
…ationPipeline` (#3034) * Overwrite output_translator's run method with run_batch * Fix mypy * Revert change * Overwrite run method only with QuestionAnswerGenerationPipeline
* Change text to content * Change text to content
* Update docstring * Trigger pre-commit hook * Trigger pre-commit hook * Incorporate reviewer feedback * Incorporate reviewer feedback
* Fix MultiLabel serialization * update docs * better comment * remove unused imports * remove unused imports (2)
* chore: fix typo in API docs * fix openapi Co-authored-by: Thomas Stadelmann <[email protected]>
* update version to 1.7.0 * copy docs * update openapi * generate schemas * make update_json_schema() idempotent * update docs, schema and openapi
* changing Slack to Discord * Update README.md * updating contributing
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )