Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[typescript] Save output.data with text content instead of response data #603

Merged
merged 1 commit into from
Dec 27, 2023

Conversation

rossdanlm
Copy link
Contributor

@rossdanlm rossdanlm commented Dec 23, 2023

[typescript] Save output.data with text content instead of response data

This comes after Sarmad's schema updates in #589. To keep diffs small and easier to review, this simply converts from model-specific outputs --> pure text. I have a diff in #610 which converts from pure text --> OutputData format.

We only needed to update the hf.py and openai.py, because palm.py already returns output in the form of string | null type.

Ran yarn automated tests, but there aren't any specifically for openai. I also ran the typescript demos to make sure that they still work. Run these commands from aiconfig top-level dir:

npx ts-node typescript/demo/function-call-stream.ts
npx ts-node typescript/demo/demo.ts
npx ts-node typescript/demo/test-hf.ts

For the extensions, we only have typescript for hf.ts (trivial: just changed response to response.generated_text), while llama.ts already outputs it in text format so no changes needed

TODO

I still need to add function call support directly to OutputData format. See

@saqadri
Copy link
Contributor

saqadri commented Dec 26, 2023

FYI to run demo.ts, simply run

ts-node demo.ts

in that folder. You may need to install ts-node if you don't already have it, from here: https://www.npmjs.com/package/ts-node#installation

extensions/HuggingFace/typescript/hf.ts Outdated Show resolved Hide resolved
extensions/HuggingFace/typescript/hf.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/openai.ts Outdated Show resolved Hide resolved
@rholinshead
Copy link
Contributor

Can you make sure to test the function calling example,

npx ts-node demo/function-call-stream.ts

I think it could be broken by the change to move it to metadata here?

Also, just curious why you chose to do this first and then move to the OutputData type in #610 instead of doing it in one go? I agree that using OutputData with 'string' kind is the correct way to go

Copy link
Contributor

@saqadri saqadri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes on the breaking changes introduced here.

extensions/HuggingFace/typescript/hf.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/hf.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/openai.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/openai.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/openai.ts Outdated Show resolved Hide resolved
typescript/lib/parsers/openai.ts Outdated Show resolved Hide resolved
@saqadri
Copy link
Contributor

saqadri commented Dec 26, 2023

I agree that using OutputData with 'string' kind is the correct way to go

@rholinshead is that significantly different on the frontend? It would be a much simpler JSON/YAML file for string outputs if the "string" kind was inferred if the typeof output.data was string.

@rossdanlm
Copy link
Contributor Author

Can you make sure to test the function calling example,

npx ts-node demo/function-call-stream.ts

I think it could be broken by the change to move it to metadata here?

Also, just curious why you chose to do this first and then move to the OutputData type in #610 instead of doing it in one go? I agree that using OutputData with 'string' kind is the correct way to go

cc thanks @rholinshead for flagging, you are totally correct this breaks because of the way we implement it in places like this, but moving forwards this isn't something we should continue to support and we should use the function call directly instead from OutputData or metadata
Screenshot 2023-12-26 at 16 00 00

@saqadri I know this breaks existing code API implementation that may be calling into this, but we can't both choose to support old formats while also wanting to make our output types easy to parse and understand. The getOutputText() methods are still supported, it's just if anyone is engaging with the old data structure their implementation will break

@saqadri
Copy link
Contributor

saqadri commented Dec 26, 2023

Can you make sure to test the function calling example,

npx ts-node demo/function-call-stream.ts

I think it could be broken by the change to move it to metadata here?

Also, just curious why you chose to do this first and then move to the OutputData type in #610 instead of doing it in one go? I agree that using OutputData with 'string' kind is the correct way to go

cc thanks @rholinshead for flagging, you are totally correct this breaks because of the way we implement it in places like this, but moving forwards this isn't something we should continue to support and we should use the function call directly instead from OutputData or metadata

Screenshot 2023-12-26 at 16 00 00

@saqadri I know this breaks existing code API implementation that may be calling into this, but we can't both choose to support old formats while also wanting to make our output types easy to parse and understand. The getOutputText() methods are still supported, it's just if anyone is engaging with the old data structure their implementation will break

Can you share how the function call script should be updated to support the new format? One other option can be adding output kind "function" to the schema to it's distinguished from plain string.

We cannot have breaking changes without explicitly mentioning it and having a clear major version bump. The issue is there are several implicit breaking changes introduced in this stack

@rossdanlm
Copy link
Contributor Author

rossdanlm commented Dec 26, 2023

Can you share how the function call script should be updated to support the new format? One other option can be adding output kind "function" to the schema to it's distinguished from plain string.

Saw your comments, I didn't have time to update the diff summary before you replied but just did. Basically we need to do this:
Screenshot 2023-12-26 at 16 14 14
I updated the change to the function-call-stream.ts file

We cannot have breaking changes without explicitly mentioning it and having a clear major version bump. The issue is there are several implicit breaking changes introduced in this stack

Alright, I'm in favour of doing a clear major version (I haven't posted changelog yet for this reason). Let's also include in this NOT to support arbitrary JSON in OutputData schema.

@rossdanlm
Copy link
Contributor Author

Updated based on feedback, main changes:

  • sending rawResponse as a field in metadata for all non-streaming options
  • no longer using delta instead of accumulatedData, created task for it: [ts] HuggingFace extension uses delta instead of accumulatedText in output.data #620
  • Made note to support function calling more directly as part of OutputData. I will do this end of diff stack
  • Supported all old AIConfig schemas with backwards compatbility inside of the getOutputText() methods

Note that this change will introduce breaking the SDK implementation. Ex: instead of check for data.content you now just check data instead. I will land changelog for today before landing these changes, so this won't be included or mentioned in that

@rholinshead
Copy link
Contributor

rholinshead commented Dec 26, 2023

I agree that using OutputData with 'string' kind is the correct way to go

@rholinshead is that significantly different on the frontend? It would be a much simpler JSON/YAML file for string outputs if the "string" kind was inferred if the typeof output.data was string.

It isn't significantly different on the front end, it's just checking if the type is a string or an object first instead of just checking the keys of the object. In general, was just thinking it is a much cleaner API to get rid of the redundant union types (i.e. can represent string output in 2 supported ways) that don't have a clear standard or recommendation.

If it's a lot nicer for the JSON/YAML format then maybe it's worth supporting raw string there, we should just consider:

  • what is our recommendation / standard for how we think it should be used? All our example configs should be consistent
  • do we even want to consider {kind: 'string'} then, or just remove it to have the single way (raw text as the data value for text outputs)

@rholinshead
Copy link
Contributor

Let's also include in this NOT to support arbitrary JSON in OutputData schema.

Based on previous PR discussion with @saqadri I think we should still support arbitrary JSON for now and just provide the new structured types. We can revisit later to make a better informed decision with more data (e.g. once we see more examples of user-created parsers to know how they're using it)

const message = output.data as ChatCompletionMessageParam;
const rawResponse = output.metadata?.rawResponse;
const function_call = rawResponse?.function_call;
console.log("function_call=", function_call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep it because we have other console.logs() earlier and it shows more clearly in the demo how function_call is being used. Ex: https://github.com/lastmile-ai/aiconfig/blob/main/typescript/demo/function-call-stream.ts#L176

// If the prompt has output saved, add it to the messages array
if (outputMessage.role === "assistant") {
messages.push(outputMessage);
if (output.metadata?.role === "assistant") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to handle the existing outputMessage-as-ChatCompletionMessageParam case here for backwards compatibility on deserialize. We can reserialize in our updated format, but it's possible the config being loaded has the old format

I think you can do similar check to above

if (typeof output.data === "string") {
   ... handle how you do here
} else if (output.data.hasOwnProperty("role")) {
  // for backwards compatibility
  const outputMessage = output.data as unknown as Chat.ChatCompletionMessageParam;
  if (outputMessage.role === "assistant") {
    messages.push(outputMessage);
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, thank you for catching!

Comment on lines 212 to 213
const rawResponse = output.metadata?.rawResponse;
const function_call = rawResponse?.function_call;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good model. If the only way for someone to support function calling is by delving into the raw response, then we've missed a critical scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok after rebasing onto #628, I will update but do this at the end of the diff stack. I just want to keep this for string-only to make it easier to review, then the rest of diffs will be easier to followup

Copy link
Contributor

@saqadri saqadri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock :). We are aligned on the acceptable breaking changes to the SDK while still supporting previous aiconfig schemas.

Also, as you land this stack please document the breaking changes explicitly in our changelog doc for next week so we can communicate it explicitly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this

This comes after Sarmad's schema updates in #589. To keep diffs small and easier to review, this simply converts from model-specific outputs --> pure text. I have a diff in #610 which converts from pure text --> `OutputData` format.


We only needed to update the `hf.py` and `openai.py`, because `palm.py` already returns output in the form of `string | null` type.

Ran yarn automated tests, but there aren't any specifically for openai. I also ran the typescript demos to make sure that they still work. Run these commands from `aiconfig` top-level dir:
```
npx ts-node typescript/demo/function-call-stream.ts
npx ts-node typescript/demo/demo.ts
npx ts-node typescript/demo/test-hf.ts
```


For the extensions, we only have typescript for `hf.ts` (trivial: just changed `response` to `response.generated_text`), while `llama.ts` already outputs it in text format so no changes needed


## TODO
I still need to add function call support directly to `OutputData` format. See
@rossdanlm rossdanlm merged commit a1e1378 into main Dec 27, 2023
2 checks passed
@rossdanlm rossdanlm deleted the pr603 branch December 27, 2023 01:32
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants