Skip to content

[Inference API] Add Completion Inference API for Alibaba Cloud AI Search Model#112512

Merged
davidkyle merged 10 commits intoelastic:mainfrom
Huaixinww:feature/add-alibabacloud-ai-search-completion-model
Sep 12, 2024
Merged

[Inference API] Add Completion Inference API for Alibaba Cloud AI Search Model#112512
davidkyle merged 10 commits intoelastic:mainfrom
Huaixinww:feature/add-alibabacloud-ai-search-completion-model

Conversation

@Huaixinww
Copy link
Copy Markdown
Contributor

@Huaixinww Huaixinww commented Sep 4, 2024

Related to #111181
Add Completion Inference API for Alibaba Cloud AI Search Model.

Prerequisites to Model Creation

An Alibaba Cloud Account with Alibaba Cloud Opensearch access
An api key used to access Alibaba Cloud AI Search Model

Inference Model Creation:

PUT _inference/completion/{inference_model_id}
{
  "service": "alibabacloud-ai-search",
  "service_settings": {
    "api_key": "{{api_key}}",
    "service_id": "<<service_id>>",
    "host": "<<host>",
    "workspace": "<<workspace_name>>",
    <<ADDITIONAL SERVICE SETTINGS (see below)>>,
  }
  "task_settings": {
    <<TASK SETTINGS (see below)>>
  }
}

Testing

Creating the inference endpoint for Alibaba Cloud AI Search

PUT _inference/completion/os-completion-test
{
    "service": "alibabacloud-ai-search",
    "service_settings": {
      "host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com",
      "api_key": "{{API_KEY}}",
      "service_id": "ops-qwen-turbo",
      "workspace" : "default"
    }
}

Performing completion inference:

POST _inference/completion/os-completion-test
{
  "input":["What is Elastic?"]
}

@elasticsearchmachine elasticsearchmachine added v8.16.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Sep 4, 2024
@benwtrent benwtrent added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Sep 4, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Sep 4, 2024
@weizijun
Copy link
Copy Markdown
Contributor

weizijun commented Sep 4, 2024

Hi, @davidkyle, we have an LLM called qwen in Alibaba Cloud, and we provide inference services for it, here is the Completion Inference API, can you help review it?
Here is the document: https://help.aliyun.com/zh/open-search/search-platform/developer-reference/text-generation-api-details

@davidkyle davidkyle self-assigned this Sep 5, 2024
@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine test this please

@davidkyle
Copy link
Copy Markdown
Member

Thank you @weizijun, I'm excited another contribution from you.

Copy link
Copy Markdown
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. I have a question about why the input must be an odd number of strings, I don't understand the reason for that. Thank you.

@Huaixinww
Copy link
Copy Markdown
Contributor Author

Huaixinww commented Sep 8, 2024

The code looks good. I have a question about why the input must be an odd number of strings, I don't understand the reason for that. Thank you.

hi @davidkyle , Thank you for taking the time to review my code. I really appreciate it!

The reason we check the input is that Alibaba Cloud's completion API supports the functionality for historical conversations.

For example:

curl -XPOST -H"Content-Type: application/json" 
"http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-generation/ops-qwen-turbo" 
-H "Authorization: Bearer ${API-KEY}"   
 -d "{
      \"messages\":[
      {
          \"role\":\"user\",
          \"content\":\"Where is the capital of Henan?\"    # history query
      },
      {
          \"role\":\"assistant\",
          \"content\":\"Zhengzhou\"                          # history answer
      },
      {
          \"role\":\"user\",
          \"content\":\"What fun things are there?\"       # current query
      }
      ],
      \"stream\":false
}"

For details, you can refer to the messages parameter in the body of this document: https://help.aliyun.com/zh/open-search/search-platform/developer-reference/text-generation-api-details

Due to limitations in the inference framework, currently in our implementation, if there are more than one input parameter, we process the first n inputs as history queries and history answers, and use the last parameter as the current query, so we check the number of inputs.

we are also considering placing the historical Q&A in task_settings with a format similar to the one below:

{
	"task_settings":{
		"history":[
			{
				"role":"${role}",
				"content":"${content}"
			},
			{
				"role":"${role}",
				"content":"${content}"
			}
			...
		]
	}
}

This way, the input parameters will only require the current query to be filled in.

@Huaixinww
Copy link
Copy Markdown
Contributor Author

hi~ @davidkyle
If you find the input restrictions confusing, we can limit it to just one input for completion, similar to what Cohere does.

@davidkyle
Copy link
Copy Markdown
Member

Thanks @Huaixinww, your idea for adding history is very inventive given the restrictions of the inference API design, this is a very imaginative solution. In future we want to expose all the options in the Alibaba API so the user can explicitly set the message history and any other options.

Starting with your example, I ran this completion:

POST _inference/completion/ali-chat
{
  "input":["Where is the capital of Henan?", "The capital of Henan is Zhengzhou.", "What fun things are there?" ]
}

And the history is clearly considered in the response.

{
  "completion": [
    {
      "result": "I'm sorry, I do not have enough information to provide a specific list of fun things to do in Zhengzhou, Henan. I can only tell you that Zhengzhou is the capital of Henan province. To find out about fun activities, attractions, or events in Zhengzhou, I would suggest researching local tourism websites, asking locals, or checking out travel guides for the area."
    }
  ]
}

Copy link
Copy Markdown
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine test this please

@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine update branch

@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine test this please

…inference/external/action/alibabacloudsearch/AlibabaCloudSearchCompletionAction.java
@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine test this please

@Huaixinww
Copy link
Copy Markdown
Contributor Author

@elasticmachine test this please

@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine update branch

@davidkyle
Copy link
Copy Markdown
Member

@elasticmachine test this please

@davidkyle
Copy link
Copy Markdown
Member

@szabosteve please can you update the docs with the new completion task

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.x

davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 12, 2024
elasticsearchmachine pushed a commit that referenced this pull request Sep 17, 2024
…rch Model (#112512) (#112814)

Co-authored-by: Huaixinww <141887897+Huaixinww@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@Huaixinww Huaixinww deleted the feature/add-alibabacloud-ai-search-completion-model branch September 25, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :ml Machine learning Team:ML Meta label for the ML team v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants