Skip to content

Commit

Permalink
Revert ":sparkles: Add PdfLoader (#74)"
Browse files Browse the repository at this point in the history
This reverts commit 36b593a.
  • Loading branch information
lowczarc authored Sep 26, 2023
1 parent 89b73ed commit 29bbc72
Show file tree
Hide file tree
Showing 11 changed files with 181 additions and 312 deletions.
10 changes: 0 additions & 10 deletions examples/dataloader.ts

This file was deleted.

Binary file added examples/dataloader/AudioLoader.mp3
Binary file not shown.
Binary file added examples/dataloader/PdfLoader.pdf
Binary file not shown.
34 changes: 34 additions & 0 deletions examples/dataloader/StringLoader.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
export default `
The Meaty Code of Sir Loin
In the bustling town of Byteville, there was a developer unlike any other.
Sir Loin, as he was fondly called, had a peculiar method of coding.
While others used keyboards and mice, Sir Loin used meats. Yes, you read that right. Meats!
Sir Loin believed that every piece of meat had a unique texture and essence that could be translated into code.
He had a special butcher's table instead of a regular desk.
On it, instead of the usual computer setup, there were slabs of meat of various kinds:
beef, chicken, pork, and even some exotic ones like kangaroo and ostrich.
Each meat type represented a different programming language. Beef was for Java, given its robustness.
Chicken, being versatile, was for Python. Pork, with its layers, was for HTML/CSS, and the exotic meats were for the lesser-known languages.
Sir Loin's coding process was a sight to behold. To start a new project, he'd tenderize a piece of meat, marinating it with special herbs representing libraries and frameworks. For debugging, he'd sniff the meat. Freshness indicated bug-free code, while any off-odors meant there were errors.
His IDE? A massive grill. To compile the code, he'd cook the meat.
The cooking time varied depending on the complexity of the project.
Once done, he'd feed the meat to his pet, Byte, a mini pig with a keen sense for code quality.
If Byte ate the meat without hesitation, the code was perfect.
If Byte hesitated or refused, it was back to the butcher's table for Sir Loin.
People from all over came to witness this bizarre coding method.
Many were skeptical, but when they saw the efficiency and tasted the delicious results, they were believers.
Sir Loin's applications were not only functional but also had this unique 'flavor' to them, making them popular across Byteville.
One day, a challenge arose.
The town's main server, which held crucial data, crashed.
The best developers tried to revive it but to no avail.
The mayor, in desperation, turned to Sir Loin. With a massive slab of wagyu beef (reserved for the most complex of codes), Sir Loin got to work.
He marinated, grilled, debugged, and re-grilled for hours.
Byte, after tasting the wagyu, gave a satisfied oink.
And just like that, the server whirred back to life.
Sir Loin's meaty method became legendary.
He started a school, "The Butcher's Code", teaching others the art of meat coding.
It was a rigorous program, combining culinary skills with coding, but it produced some of the finest developers Byteville had ever seen.
Years later, people would speak of the legend of Sir Loin, the meat-coding maestro.
They'd talk about how he changed the face of development in Byteville, all while serving up some of the most delicious barbecues the town had ever tasted.
And so, in the heart of Byteville, amidst all the tech and modern coding methods, there remained a place where meats and codes blended in a symphony, all thanks to the genius of Sir Loin.
`;
41 changes: 41 additions & 0 deletions examples/dataloader/TextFileLoader.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
ID,Name,Email,Age,City,Product
1,John Doe,[email protected],34,New York,TV
2,Jane Smith,[email protected],28,Los Angeles,Laptop
3,Robert Brown,[email protected],45,Chicago,Refrigerator
4,Emily Davis,[email protected],22,Houston,Smartphone
5,Chris Wilson,[email protected],31,Phoenix,Tablet
6,Lucy Johnson,[email protected],40,Philadelphia,Watch
7,Michael White,[email protected],50,San Antonio,Microwave
8,Emma Jones,[email protected],29,San Diego,Camera
9,Brian Anderson,[email protected],37,Dallas,Headphones
10,Sophia Martinez,[email protected],25,San Jose,Bicycle
11,James Taylor,[email protected],32,Austin,Bookshelf
12,Nicole Lee,[email protected],27,Jacksonville,Blender
13,William Harris,[email protected],44,San Francisco,Oven
14,Lily Clark,[email protected],24,Indianapolis,Coffee Maker
15,Kevin Lewis,[email protected],42,Columbus,Vacuum Cleaner
16,Olivia Young,[email protected],35,Charlotte,Radio
17,Richard Hall,[email protected],49,Seattle,Printer
18,Grace Allen,[email protected],23,Denver,Washing Machine
19,Joseph Turner,[email protected],52,Boston,Smartphone
20,Anna Walker,[email protected],26,El Paso,Toaster
21,David King,[email protected],38,Nashville,Recliner
22,Mia Wright,[email protected],30,Portland,Dishwasher
23,Charles Rodriguez,[email protected],41,Memphis,Rice Cooker
24,Chloe Green,[email protected],33,Las Vegas,Gaming Console
25,Stephen Parker,[email protected],47,Louisville,Fan
26,Zoe Thompson,[email protected],21,Milwaukee,Grill
27,Benjamin Adams,[email protected],39,Albuquerque,Air Purifier
28,Ella Nelson,[email protected],36,Tucson,Router
29,Andrew Mitchell,[email protected],53,Fresno,Smart Watch
30,Madison Scott,[email protected],20,Sacramento,Smartphone
31,Daniel Rivera,[email protected],55,Kansas City,Smartphone
32,Isabella Morris,[email protected],42,Long Beach,Juicer
33,Raymond Carter,[email protected],48,Mesa,Sunglasses
34,Abigail Price,[email protected],43,Atlanta,Notebook
35,Edward Perez,[email protected],46,Omaha,Kettle
36,Victoria Hughes,[email protected],29,Cleveland,DVD Player
37,Anthony Collins,[email protected],40,Minneapolis,Smartphone
38,Jessica Rogers,[email protected],51,Tampa,Hat
39,Mark Flores,[email protected],37,Orlando,Watch
40,Laura Torres,[email protected],45,New Orleans,Backpack
69 changes: 69 additions & 0 deletions examples/dataloader/dataloader.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import { generate } from "../../lib/index";
import { AudioLoader, StringLoader, TextFileLoader } from "../../lib/dataloader";
import { generate } from "../../lib/generate";
import weirdStory from "./StringLoader";
import fs from "fs";

const clientOptions = {
endpoint: "https://api.polyfact.com",
token: "<YOUR_POLYFACT_TOKEN>", // You can get one at https://app.polyfact.com/
};
(async () => {
// Generate and ask question from an audio file
fs.readFile(`${__dirname}/AudioLoader.mp3`, async (_err, data) => {
const response = await generate(
"The man ask a question. Find and Answer at this one.",
{
data: AudioLoader(data),
model: "gpt-4",
},
clientOptions,
);
console.log(response);
});

// Generate and ask question from a text file with token usage

fs.readFile(`${__dirname}/TextFileLoader.csv`, async (_err, data) => {
const res2 = await generate(
"Give me back the only rows for people who bought a Smartphone (Product column)\n",
{
data: TextFileLoader(data),
model: "gpt-4",
infos: true,
},
clientOptions,
);

console.log(res2);
});

// Generate and ask question from a big string with token usage
const res3 = await generate(
"What are strange or unreal in this story ?",
{
data: StringLoader(weirdStory),
model: "gpt-4",
infos: true,
},
clientOptions,
);

console.log(res3);

// Generate a stream and ask question from an audio file
fs.readFile(`${__dirname}/AudioLoader.mp3`, async (_err, data) => {
const stream = generate(
"What does this audio talk about? ",
{
data: AudioLoader(data),
model: "gpt-3.5-turbo-16k",
stream: true,
},
clientOptions,
);
stream.pipe(process.stdout);
await new Promise((res) => stream.on("end", res));
process.stdout.write("\n");
});
})();
67 changes: 22 additions & 45 deletions lib/dataloader/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import { getDocument, GlobalWorkerOptions, version } from "pdfjs-dist";
import { Memory } from "../memory";
import { transcribe } from "../transcribe";
import { splitString } from "../split";
Expand All @@ -21,25 +20,34 @@ async function batchify<T extends Array<unknown>>(

export type LoaderFunction = (memory: Memory, clientOptions: InputClientOptions) => Promise<void>;

export function StringLoader(str: string, maxTokenPerChunk = 100): LoaderFunction {
return async function loadStringIntoMemory(
export function TextFileLoader(file: FileInput, maxTokenPerChunk = 100): LoaderFunction {
return async function loadPdfIntoMemory(
memory: Memory,
_clientOptions: InputClientOptions = {},
) {
const splittedStr = splitString(str, maxTokenPerChunk);
const fileBuffer = await fileInputToBuffer(file);
const splittedFile = splitString(fileBuffer.toString("utf8"), maxTokenPerChunk);

async function addBatchIntoMemory(batches: string[]) {
await Promise.all(batches.map(async (batch) => memory.add(batch)));
}

await batchify(splittedStr, 10, addBatchIntoMemory);
await batchify(splittedFile, 10, addBatchIntoMemory);
};
}

export function TextFileLoader(file: FileInput, maxTokenPerChunk = 100): LoaderFunction {
return async function loadTextIntoMemory(...args) {
const fileBuffer = await fileInputToBuffer(file);
return StringLoader(fileBuffer.toString("utf8"), maxTokenPerChunk)(...args);
export function StringLoader(str: string, maxTokenPerChunk = 100): LoaderFunction {
return async function loadPdfIntoMemory(
memory: Memory,
_clientOptions: InputClientOptions = {},
) {
const splittedStr = splitString(str, maxTokenPerChunk);

async function addBatchIntoMemory(batches: string[]) {
await Promise.all(batches.map(async (batch) => memory.add(batch)));
}

await batchify(splittedStr, 10, addBatchIntoMemory);
};
}

Expand All @@ -50,44 +58,13 @@ export function AudioLoader(file: FileInput, maxTokenPerChunk = 100): LoaderFunc
) {
const fileBuffer = await fileInputToBuffer(file);
const transcription = await transcribe(fileBuffer, clientOptions);
return StringLoader(transcription, maxTokenPerChunk)(memory, clientOptions);
};
}

export async function pdfToString(pdf: Uint8Array): Promise<string> {
if (typeof window === "undefined") {
GlobalWorkerOptions.workerSrc = `pdfjs-dist/build/pdf.worker.js`;
} else {
GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${version}/pdf.worker.js`;
}
const pdfDocument = await getDocument(pdf).promise;
const pagesPromises = [];

for (let i = 1; i <= pdfDocument.numPages; i++) {
pagesPromises.push(pdfDocument.getPage(i));
}
const transcriptions = splitString(transcription, maxTokenPerChunk);

const pages = await Promise.all(pagesPromises);

const textEntries = await Promise.all(
pages.map(async (page) => {
const pageObject = await page.getTextContent();

return pageObject.items
.map((e) => ("str" in e ? e.str : ""))
.filter((e) => e !== "")
.join("\n");
}),
);

return textEntries.join("\n");
}
async function addBatchIntoMemory(batches: string[]) {
await Promise.all(batches.map(async (batch) => memory.add(batch)));
}

export function PdfLoader(file: FileInput, maxTokenPerChunk = 100): LoaderFunction {
return async function loadPdfIntoMemory(...args) {
const fileBuffer = await fileInputToBuffer(file);
const pdfText = await pdfToString(new Uint8Array(fileBuffer));
return StringLoader(pdfText, maxTokenPerChunk)(...args);
await batchify(transcriptions, 10, addBatchIntoMemory);
};
}

Expand Down
3 changes: 1 addition & 2 deletions lib/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import { usage } from "./user";
import { get as KVGet, set as KVSet } from "./kv";
import PolyfactClientBuilder from "./client";
import { generateImage } from "./image";
import { TextFileLoader, StringLoader, AudioLoader, PdfLoader } from "./dataloader";
import { TextFileLoader, StringLoader, AudioLoader } from "./dataloader";

import {
getAllPrompts,
Expand Down Expand Up @@ -73,7 +73,6 @@ export {
TextFileLoader,
StringLoader,
AudioLoader,
PdfLoader,
};

export default PolyfactClientBuilder;
6 changes: 2 additions & 4 deletions lib/prompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,17 +34,15 @@ export type Prompt = {
like?: number;
use?: number;
tags?: string[];
user_id?: string; // eslint-disable-line camelcase
public: boolean;
};

export type PromptInsert = Pick<Prompt, "name" | "description" | "prompt" | "tags" | "public">;
export type PromptInsert = Pick<Prompt, "name" | "description" | "prompt" | "tags">;
export type PromptUpdate = Partial<PromptInsert>;

async function axiosWrapper<T>(
method: "get" | "post" | "put" | "delete",
url: string,
data?: Record<string, string | string[] | boolean> | undefined,
data?: Record<string, string | string[]> | undefined,
clientOptions: InputClientOptions = {},
): Promise<T> {
const { token, endpoint } = await defaultOptions(clientOptions);
Expand Down
5 changes: 2 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
"isomorphic-ws": "^5.0.0",
"js-tiktoken": "^1.0.7",
"pdf-parse": "^1.1.1",
"pdfjs-dist": "^3.10.111",
"polyfact-io-ts": "^2.2.20",
"process": "^0.11.10",
"react": "^18.2.0",
Expand Down Expand Up @@ -68,8 +67,8 @@
"start": "ts-node cmd/index.ts",
"lint": "prettier --check lib/ cmd/ ; eslint lib/ cmd/",
"lint:fix": "prettier --write lib/ cmd/ ; eslint --fix lib/ cmd/",
"build:cmd": "esbuild cmd/index.ts --bundle --external:canvas --outfile=build/polyfact.tmp --platform=node && echo '#!/usr/bin/env node\n' | cat - build/polyfact.tmp > build/polyfact && rm build/polyfact.tmp",
"build:vanilla-js": "esbuild target/vanilla-js.ts --bundle --external:canvas --minify --target=chrome67,firefox68,edge79,safari15 --outfile=build/vanilla-js.js",
"build:cmd": "esbuild cmd/index.ts --bundle --outfile=build/polyfact.tmp --platform=node && echo '#!/usr/bin/env node\n' | cat - build/polyfact.tmp > build/polyfact && rm build/polyfact.tmp",
"build:vanilla-js": "esbuild target/vanilla-js.ts --bundle --minify --target=chrome67,firefox68,edge79,safari15 --outfile=build/vanilla-js.js",
"build": "tsc --target es2021 --lib es2021,DOM --moduleResolution node --strict --esModuleInterop --declaration --jsx react --skipLibCheck --outDir dist --rootDir lib lib/*.ts lib/**/*.ts && npm run build:cmd && cp build/polyfact package.json README.md dist/",
"npm-publish": "npm run build && cd dist && npm publish && cd .."
}
Expand Down
Loading

0 comments on commit 29bbc72

Please sign in to comment.