Skip to content

Commit

Permalink
feat (provider/anthropic): pdf support (#3458)
Browse files Browse the repository at this point in the history
  • Loading branch information
lgrammel authored Nov 2, 2024
1 parent 2cbed46 commit 4d2e53b
Show file tree
Hide file tree
Showing 11 changed files with 554 additions and 278 deletions.
5 changes: 5 additions & 0 deletions .changeset/brave-bikes-happen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@ai-sdk/anthropic': patch
---

feat (provider/anthropic): pdf support
5 changes: 3 additions & 2 deletions content/docs/02-foundations/03-prompts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,10 @@ const result = await generateText({
<Note type="warning">
Only a few providers and models currently support file parts: [Google
Generative AI](/providers/ai-sdk-providers/google-generative-ai), [Google
Vertex AI](/providers/ai-sdk-providers/google-vertex), and
Vertex AI](/providers/ai-sdk-providers/google-vertex),
[OpenAI](/providers/ai-sdk-providers/openai) (for `wav` and `mp3` audio with
`gpt-4o-audio-preview)
`gpt-4o-audio-preview), [Anthropic](/providers/ai-sdk-providers/anthropic)
(for `pdf`).
</Note>

User messages can include file parts. A file can be one of the following:
Expand Down
32 changes: 32 additions & 0 deletions content/providers/01-ai-sdk-providers/05-anthropic.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,38 @@ Parameters:

These tools can be used in conjunction with the `sonnet-3-5-sonnet-20240620` model to enable more complex interactions and tasks.

### PDF support

Anthropic Sonnet `claude-3-5-sonnet-20241022` supports reading PDF files.
You can pass PDF files as part of the message content using the `file` type:

```ts
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model according to this document?',
},
{
type: 'file',
data: fs.readFileSync('./data/ai.pdf'),
mimeType: 'application/pdf',
},
],
},
],
});
```

The model will have access to the contents of the PDF file and
respond to questions about it.
The PDF file should be passed using the `data` field,
and the `mimeType` should be set to `'application/pdf'`.

### Model Capabilities

See also [Anthropic Model Comparison](https://docs.anthropic.com/en/docs/about-claude/models#model-comparison).
Expand Down
1 change: 0 additions & 1 deletion examples/ai-core/src/generate-text/anthropic-image.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import fs from 'node:fs';
async function main() {
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20240620'),
maxTokens: 512,
messages: [
{
role: 'user',
Expand Down
30 changes: 30 additions & 0 deletions examples/ai-core/src/generate-text/anthropic-pdf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';
import 'dotenv/config';
import fs from 'node:fs';

async function main() {
const result = await generateText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model according to this document?',
},
{
type: 'file',
data: fs.readFileSync('./data/ai.pdf'),
mimeType: 'application/pdf',
},
],
},
],
});

console.log(result.text);
}

main().catch(console.error);
1 change: 0 additions & 1 deletion examples/ai-core/src/stream-text/anthropic-image.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import fs from 'node:fs';
async function main() {
const result = await streamText({
model: anthropic('claude-3-5-sonnet-20240620'),
maxTokens: 512,
messages: [
{
role: 'user',
Expand Down
32 changes: 32 additions & 0 deletions examples/ai-core/src/stream-text/anthropic-pdf.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
import 'dotenv/config';
import fs from 'node:fs';

async function main() {
const result = await streamText({
model: anthropic('claude-3-5-sonnet-20241022'),
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: 'What is an embedding model according to this document?',
},
{
type: 'file',
data: fs.readFileSync('./data/ai.pdf'),
mimeType: 'application/pdf',
},
],
},
],
});

for await (const textPart of result.textStream) {
process.stdout.write(textPart);
}
}

main().catch(console.error);
15 changes: 14 additions & 1 deletion packages/anthropic/src/anthropic-api-types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ export type AnthropicCacheControl = { type: 'ephemeral' };
export interface AnthropicUserMessage {
role: 'user';
content: Array<
AnthropicTextContent | AnthropicImageContent | AnthropicToolResultContent
| AnthropicTextContent
| AnthropicImageContent
| AnthropicDocumentContent
| AnthropicToolResultContent
>;
}

Expand All @@ -37,6 +40,16 @@ export interface AnthropicImageContent {
cache_control: AnthropicCacheControl | undefined;
}

export interface AnthropicDocumentContent {
type: 'document';
source: {
type: 'base64';
media_type: 'application/pdf';
data: string;
};
cache_control: AnthropicCacheControl | undefined;
}

export interface AnthropicToolCallContent {
type: 'tool_use';
id: string;
Expand Down
20 changes: 13 additions & 7 deletions packages/anthropic/src/anthropic-messages-language-model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,11 @@ export class AnthropicMessagesLanguageModel implements LanguageModelV1 {
});
}

const messagesPrompt = convertToAnthropicMessagesPrompt({
prompt,
cacheControl: this.settings.cacheControl ?? false,
});
const { prompt: messagesPrompt, betas: messagesBetas } =
convertToAnthropicMessagesPrompt({
prompt,
cacheControl: this.settings.cacheControl ?? false,
});

const baseArgs = {
// model id:
Expand All @@ -127,12 +128,17 @@ export class AnthropicMessagesLanguageModel implements LanguageModelV1 {

switch (type) {
case 'regular': {
const { tools, tool_choice, toolWarnings, betas } = prepareTools(mode);
const {
tools,
tool_choice,
toolWarnings,
betas: toolsBetas,
} = prepareTools(mode);

return {
args: { ...baseArgs, tools, tool_choice },
warnings: [...warnings, ...toolWarnings],
betas,
betas: new Set([...messagesBetas, ...toolsBetas]),
};
}

Expand All @@ -152,7 +158,7 @@ export class AnthropicMessagesLanguageModel implements LanguageModelV1 {
tool_choice: { type: 'tool', name },
},
warnings,
betas: new Set<string>(),
betas: messagesBetas,
};
}

Expand Down
Loading

0 comments on commit 4d2e53b

Please sign in to comment.