Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix searx #1

Closed
wants to merge 170 commits into from
Closed

fix searx #1

wants to merge 170 commits into from

Conversation

blob42
Copy link
Owner

@blob42 blob42 commented Feb 21, 2023

hwchase17 and others added 30 commits February 2, 2023 19:54
This does not involve a separator, and will naively chunk input text at
the appropriate boundaries in token space.

This is helpful if we have strict token length limits that we need to
strictly follow the specified chunk size, and we can't use aggressive
separators like spaces to guarantee the absence of long strings.

CharacterTextSplitter will let these strings through without splitting
them, which could cause overflow errors downstream.

Splitting at arbitrary token boundaries is not ideal but is hopefully
mitigated by having a decent overlap quantity. Also this results in
chunks which has exact number of tokens desired, instead of sometimes
overcounting if we concatenate shorter strings.

Potentially also helps with langchain-ai#528.
add ability to retry when certain exceptions are raised by
`openai.Completions.create`

Test plan: ran all OpenAI integration tests.
Signed-off-by: Filip Haltmayer <[email protected]>
Signed-off-by: Frank Liu <[email protected]>
Co-authored-by: Filip Haltmayer <[email protected]>
Co-authored-by: Frank Liu <[email protected]>
Just noticed this little typo while reading the docs, thought I'd open a
PR!
The re.DOTALL flag in Python's re (regular expression) module makes the
. (dot) metacharacter match newline characters as well as any other
character.

Without re.DOTALL, the . metacharacter only matches any character except
for a newline character. With re.DOTALL, the . metacharacter matches any
character, including newline characters.
Was passing prompt in directly as string and getting nonsense outputs.
Had to inspect source code to realize that first arg should be a list.
Could be nice if there was an explicit error or warning, seems like this
could be a common mistake.
PR to fix outdated environment details in the docs, see issue langchain-ai#897 

I added code comments as pointers to where users go to get API keys, and
where they can find the relevant environment variable.
Fix for issue langchain-ai#906 

Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone
`from_texts` method
langchain-ai#899)

This allows the LLM to correct its previous command by looking at the
error message output to the shell.

Additionally, this uses subprocess.run because that is now recommended
over subprocess.check_output:

https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module

Co-authored-by: Amos Ng <[email protected]>
Basic integration test for pinecone
nan-wang and others added 28 commits February 19, 2023 21:15
add missing links to toc

---------

Signed-off-by: Nan Wang <[email protected]>
Co-authored-by: Michael Chen <[email protected]>
Co-authored-by: Michael Chen <[email protected]>
- fix notebook formatting, remove empty cells and add scrolling for long
text

---------

Co-authored-by: blob42 <spike@w530>
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
langchain-ai#1054

---------

Signed-off-by: Naveen Tatikonda <[email protected]>
Lets a chain prompt the user for more input as a part of its execution.
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
For persistence, it's convenient to have a default collection name which
gets used everywhere.
langchain-ai#1153)

It is useful to be able to specify `verbose` or `memory` while still
keeping the chain's overall structure.

---------

Co-authored-by: Francisco Ingham <>
When I try to import the Class HuggingFaceEndpoint I get an Import
Error: cannot import name 'HuggingFaceEndpoint' from 'langchain'.
(langchain version 0.0.88)
These two imports work fine: from langchain import HuggingFacePipeline
and from langchain import HuggingFaceHub.

So I corrected the import statement in the example. There is probably a
better solution to this, but this fixes the Error for me.
conceptually, no reason a tool should know what an "agent action" is

unless any objections, can change in all callback handlers
…ons (langchain-ai#1208)

### Summary

Corrects the install instruction for local inference to `pip install
"unstructured[local-inference]"`
@blob42 blob42 closed this Feb 21, 2023
blob42 pushed a commit that referenced this pull request May 4, 2023
without --no-sandbox param, load documents from url by selenium in
chrome occured error below:

```Traceback (most recent call last):
  File "/data//playgroud/try_langchain.py", line 343, in <module>
    langchain_doc_loader()
  File "/data//playgroud/try_langchain.py", line 67, in langchain_doc_loader
    documents = loader.load()
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 102, in load
    driver = self._get_driver()
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/langchain/document_loaders/url_selenium.py", line 76, in _get_driver
    return Chrome(options=chrome_options)
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriver.py", line 80, in __init__
    super().__init__(
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/chromium/webdriver.py", line 104, in __init__
    super().__init__(
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 286, in __init__
    self.start_session(capabilities, browser_profile)
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 378, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
    self.error_handler.check_response(response)
  File "/install/anaconda3-env/envs/python3.10/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55cf8da1bfe3 <unknown>
#1 0x55cf8d75ad36 <unknown>
langchain-ai#2 0x55cf8d783b20 <unknown>
langchain-ai#3 0x55cf8d77fa9b <unknown>
langchain-ai#4 0x55cf8d7c1af7 <unknown>
langchain-ai#5 0x55cf8d7c111f <unknown>
langchain-ai#6 0x55cf8d7b8693 <unknown>
langchain-ai#7 0x55cf8d78b03a <unknown>
langchain-ai#8 0x55cf8d78c17e <unknown>
langchain-ai#9 0x55cf8d9dddbd <unknown>
langchain-ai#10 0x55cf8d9e1c6c <unknown>
langchain-ai#11 0x55cf8d9eb4b0 <unknown>
langchain-ai#12 0x55cf8d9e2d63 <unknown>
langchain-ai#13 0x55cf8d9b5c35 <unknown>
langchain-ai#14 0x55cf8da06138 <unknown>
langchain-ai#15 0x55cf8da062c7 <unknown>
langchain-ai#16 0x55cf8da14093 <unknown>
langchain-ai#17 0x7f3da31a72de start_thread
```

add option `chrome_options.add_argument("--no-sandbox")` for chrome.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.