Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize Embeddings Docs #24856

Closed
1 of 2 tasks
efriis opened this issue Jul 31, 2024 · 3 comments
Closed
1 of 2 tasks

Standardize Embeddings Docs #24856

efriis opened this issue Jul 31, 2024 · 3 comments
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder Ɑ: embeddings Related to text embedding models module help wanted Good issue for contributors integration-docs

Comments

@efriis
Copy link
Member

efriis commented Jul 31, 2024

Privileged issue

  • I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.

Issue Content

Issue

To make our Embeddings integrations as easy to use as possible we need to make sure the docs for them are thorough and standardized. There are two parts to this: updating the embeddings docstrings and updating the actual integration docs.

This needs to be done for each embeddings integration, ideally with one PR per embedding provider.

Related to broader issues #21983 and #22005.

Docstrings

Each Embeddings class docstring should have the sections shown in the Appendix below. The sections should have input and output code blocks when relevant.

To build a preview of the API docs for the package you're working on run (from root of repo):

make api_docs_clean; make api_docs_quick_preview API_PKG=openai

where API_PKG= should be the parent directory that houses the edited package (e.g. community, openai, anthropic, huggingface, together, mistralai, groq, fireworks, etc.). This should be quite fast for all the partner packages.

Doc pages

Each Embeddings docs page should follow this template.

  • TODO(Erick): populate a complete example

You can use the langchain-cli to quickly get started with a new chat model integration docs page (run from root of repo):

poetry run pip install -e libs/cli
poetry run langchain-cli integration create-doc --name "foo-bar" --name-class FooBar --component-type Embeddings --destination-dir ./docs/docs/integrations/text_embedding/

where --name is the integration package name without the "langchain-" prefix and --name-class is the class name without the "Embedding" prefix. This will create a template doc with some autopopulated fields at docs/docs/integrations/text_embedding/foo_bar.ipynb.

To build a preview of the docs you can run (from root):

make docs_clean
make docs_build
cd docs/build/output-new
yarn
yarn start

Appendix

Expected sections for the Embedding class docstring.

__package_name___: This is the full name of the package (e.g., langchain-anthropic)
__ModuleName__ : This is the CamelCase name of the partner (e.g., Anthropic)
__MODULE_NAME__: SCREAMING_SNAKE_CASE name of the partner (e.g., ANTHROPIC)

Y

    """__ModuleName__ embedding model integration.

    # TODO: Replace with relevant packages, env vars.
    Setup:
        Install ``__package_name__`` and set environment variable ``__MODULE_NAME___API_KEY``.

        .. code-block:: bash

            pip install -U __package_name__
            export __MODULE_NAME___API_KEY="your-api-key"

    # TODO: Populate with relevant params.
    Key init args — completion params:
        model: str
            Name of __ModuleName__ model to use.

   # TODO: Populate with relevant params.
    Key init args — client params:
      api_key: Optional[SecretStr]

    See full list of supported init args and their descriptions in the params section.

    # TODO: Replace with relevant init params.
    Instantiate:
        .. code-block:: python

            from __module_name__ import __ModuleName__Embeddings

            embed = __ModuleName__Embeddings(
                model="...",
                # api_key="...",
                # other params...
            )

    Embed single text:
        .. code-block:: python

            input_text = "The meaning of life is 42"
            vector = embed.embed_query(input_text)
            print(vector[:3])

        .. code-block:: python

            [-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]

    # TODO: Delete if token-level streaming isn't supported.
    Embed multiple texts:
        .. code-block:: python

             input_texts = ["Document 1...", "Document 2..."]
            vectors = embed.embed_documents(input_texts)
            print(len(vectors))
            # The first 3 coordinates for the first vector
            print(vectors[0][:3])

        .. code-block:: python

            2
            [-0.024603435769677162, -0.007543657906353474, 0.0039630369283258915]

    # TODO: Delete if native async isn't supported.
    Async:
        .. code-block:: python

            vector = await embed.aembed_query(input_text)
           print(vector[:3])

            # multiple:
            # await embed.aembed_documents(input_texts)

        .. code-block:: python

            [-0.009100092574954033, 0.005071679595857859, -0.0029193938244134188]
    """

Tip: if you copy and paste the template to a template.txt file, you could use the following sed commands to fill in the appropriate values for OpenAI:

 cat template.txt | sed 's/__package_name__/langchain_openai/g'  | sed 's/__MODULE_NAME__/OPENAI/g' | sed 's/__ModuleName__/OpenAI/' | sed 's/__module_name__/langchain_openai/'
@efriis efriis added help wanted Good issue for contributors integration-docs labels Jul 31, 2024
@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Jul 31, 2024
@wulifu2hao
Copy link
Contributor

running
poetry run langchain-cli integration create-doc --name "community" --name-class Ollama --component-type Embeddings --destination-dir ./docs/docs/integrations/text_embedding/ results in

ValueError: Unrecognized component_type='Embeddings'. Expected one of 'ChatModel', 'DocumentLoader', 'Tool'.

am I missing anything?

@efriis
Copy link
Member Author

efriis commented Jul 31, 2024

Try updating your cli with

poetry run pip install -U langchain-cli

ccurme pushed a commit that referenced this issue Aug 1, 2024
- **Description:** Standardize ZhipuAIEmbeddings rich docstrings.
- **Issue:** the issue #24856
ccurme pushed a commit that referenced this issue Aug 3, 2024
- **Description:** Standardize SparkLLMTextEmbeddings docstrings
- **Issue:** the issue #24856
ccurme pushed a commit that referenced this issue Aug 3, 2024
- **Description:** Standardize MiniMaxEmbeddings
  - docs, the issue #24856 
  - model init arg names, the issue #20085
eyurtsev added a commit that referenced this issue Aug 12, 2024
eyurtsev added a commit that referenced this issue Aug 12, 2024
Add API Reference documentation for the FireworksEmbedding model.

Issue: #24856
isahers1 added a commit that referenced this issue Aug 14, 2024
Related issue: #24856

```json
[
   {
      "provider": "mistralai",
      "js":  true,
      "local": false,
     "serializable": false,
    "native_async": true
   }
]
```

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
isahers1 added a commit that referenced this issue Aug 14, 2024
Update AI21 Integration docs

Issue: #24856

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
isahers1 added a commit that referenced this issue Aug 14, 2024
Issue: #24856

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
isahers1 added a commit that referenced this issue Aug 14, 2024
#24856

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
isahers1 added a commit that referenced this issue Aug 14, 2024
This can be finished after the following issue is resolved:

langchain-ai/langchain-cohere#81

Related to: #24856

```json
[
   {
      "provider": "cohere",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: isaac hershenson <[email protected]>
Co-authored-by: Isaac Francisco <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
- **Description:** Standardize ZhipuAIEmbeddings rich docstrings.
- **Issue:** the issue langchain-ai#24856
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
- **Description:** Standardize SparkLLMTextEmbeddings docstrings
- **Issue:** the issue langchain-ai#24856
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
- **Description:** Standardize MiniMaxEmbeddings
  - docs, the issue langchain-ai#24856 
  - model init arg names, the issue langchain-ai#20085
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…ain-ai#25292)

Add API Reference documentation for the FireworksEmbedding model.

Issue: langchain-ai#24856
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…5313)

Issue: langchain-ai#24856

Using the same template for the fake embeddings in langchain_core as
used in the integrations.
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…i#25247)

Providers:
* fireworks

See related issue:
* langchain-ai#24856

Features:

```json
[
   {
      "provider": "fireworks",
      "js":  true,
      "local": false,
     "serializable": false,
   }



]


```

---------

Co-authored-by: isaac hershenson <[email protected]>
Co-authored-by: Isaac Francisco <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
Related issue: langchain-ai#24856

```json
   {
      "provider": "openai",
      "js":  true,
      "local": false,
     "serializable": false,
"async_native": true
  }
```

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
Update together AI embedding integration docs

Related issue: langchain-ai#24856

```json
[
   {
      "provider": "together",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…in-ai#25253)

Related issue: langchain-ai#24856

```json
[
   {
      "provider": "mistralai",
      "js":  true,
      "local": false,
     "serializable": false,
    "native_async": true
   }
]
```

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
Update AI21 Integration docs

Issue: langchain-ai#24856

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
Issue: langchain-ai#24856

---------

Co-authored-by: Isaac Francisco <[email protected]>
Co-authored-by: isaac hershenson <[email protected]>
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
olgamurraft pushed a commit to olgamurraft/langchain that referenced this issue Aug 16, 2024
…25250)

This can be finished after the following issue is resolved:

langchain-ai/langchain-cohere#81

Related to: langchain-ai#24856

```json
[
   {
      "provider": "cohere",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: isaac hershenson <[email protected]>
Co-authored-by: Isaac Francisco <[email protected]>
Copy link

dosubot bot commented Nov 11, 2024

Hi, @efriis. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale.

Issue Summary:

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of the LangChain repository by commenting here.
  • If there are no further updates, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 11, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder Ɑ: embeddings Related to text embedding models module help wanted Good issue for contributors integration-docs
Projects
None yet
Development

No branches or pull requests

2 participants