Skip to content

Latest commit

 

History

History
299 lines (235 loc) · 16.7 KB

readme.md

File metadata and controls

299 lines (235 loc) · 16.7 KB

Kindle AI Export

Export any Kindle book you own as text, PDF, EPUB, or as a custom, AI-narrated audiobook. 🔥

Build Status MIT License Prettier Code Formatting

Intro

This project makes it easy to export the contents of any ebook in your Kindle library as text, PDF, EPUB, or as a custom, AI-narrated audiobook. It only requires a valid Amazon Kindle account and an OpenAI API key.

You must own the ebook on Kindle for this project to work.

How does it work?

It works by logging into your Kindle web reader account using Playwright, exporting each page of a book as a PNG image, and then using a vLLM (gpt-4o or gpt-4o-mini) to transcribe the text from each page to text. Once we have the raw book contents and metadata, then it's easy to convert it to PDF, EPUB, etc. 🔥

This example uses the first page of the scifi book Revelation Space by Alastair Reynolds:

The automated script starts from the Kindle web reader's library page and selects the book we want to export. Kindle web reader library
We use Playwright to navigate to each page of the selected book. Kindle web reader page
For each page, we use Playwright to export a scaled down PNG screenshot of the page's rendered content. First page of Revelation Space by Alastair Reynolds
We then convert each page's screenshot into text using one of OpenAI's vLLMs (gpt-4o or gpt-4o-mini).

Mantell Sector, North Nekhebet, Resurgam, Delta Pavonis system, 2551

There was a razorstorm coming in.

Sylveste stood on the edge of the excavation and wondered if any of his labours would survive the night. The archaeological dig was an array of deep square shafts separated by baulks of sheer-sided soil: the classical Wheeler box-grid. The shafts went down tens of metres, walled by transparent cofferdams spun from hyperdiamond. A million years of stratified geological history pressed against the sheets. But it would take only one good dustfall—one good razorstorm—to fill the shafts almost to the surface.

“Confirmation, sir,” said one of his team, emerging from the crouched form of the first crawler. The man’s voice was muffled behind his breather mask. “Cuvier’s just issued a severe weather advisory for the whole North

After doing this for each page, we now have access to the book's full contents and metadata, so we can export it in any format we want. 🎉

Here are some output previews containing only the first page of this book:

Audiobook Examples 🔥

We can even use TTS to generate custom audiobooks.

Here are some auto-generated examples using a few different TTS providers & voices, containing only the first page of this book as a preview:

OpenAI tts-1-hd "alloy" voice
(female; solid quality but more expensive)
openai-alloy-preview.mp4
OpenAI tts-1-hd "onyx" voice
(male; solid quality but more expensive)
openai-onyx-preview.mp4
Unreal Speech "Scarlett" voice
(female; medium quality but cheaper)
unrealspeech-scarlett-preview.mp4

Why is this necessary?

Kindle uses a custom AZW3 format which includes heavy DRM, making it very difficult to access the contents of ebooks that you own. It is possible to strip the DRM using existing tools, but it's a serious pain in the ass, is very difficult to automate, and the "best" solution is expensive and not open source.

This project changes that.

Why? Because I love reading books on Kindle (especially scifi books!!), but none of the content is hackable. The official Kindle apps are also lagging behind in their AI features, so my goal with this project was to make it easy to build AI-powered experiments on top of my own Kindle library. In order to do that, I first needed a reliable way to export the contents of my Kindle books in a reasonable format.

I also created an OSS TypeScript client for the unofficial Kindle API, but I ended up only using some of the types and utils since Playwright + vLLMs allowed me to completely bypass their API and DRM. This approach should also be a lot less error-prone than using their unofficial API.

Usage

Make sure you have node >= 18 and pnpm installed.

  1. Clone this repo
  2. Run pnpm install
  3. Set up environment variables (details)
  4. Run src/extract-kindle-book.ts (details)
  5. Run src/transcribe-book-content.ts (details)
  6. (Optional) Run src/export-book-pdf.ts (details)
  7. (Optional) Export book as EPUB (details)
  8. (Optional) Run src/export-book-markdown.ts (details)
  9. (Optional) Run src/export-book-audio.ts (details)

Setup Env Vars

Set up these required environment variables in a local .env:

AMAZON_EMAIL=
AMAZON_PASSWORD=
ASIN=

OPENAI_API_KEY=

You can find your book's ASIN (Amazon ID) by visiting read.amazon.com and clicking on the book you want to export. The resulting URL will look like https://read.amazon.com/?asin=B0819W19WD&ref_=kwl_kr_iv_rec_2, with B0819W19WD being the ASIN in this case.

Extract Kindle Book

npx tsx src/extract-kindle-book.ts
  • (This takes a few minutes to run)
  • This logs into your Amazon Kindle web reader using headless Chrome (Playwright). It can be pretty fun to watch it run, so feel free to tweak the script to use headless: false to watch it do its thing.
  • If your account requires 2FA, the terminal will request a code from you before proceeding.
  • It uses a persistent browser session, so you should only have to auth once.
  • Once logged in, it navigates to the web reader page for a specific book (https://read.amazon.com/?asin=${ASIN}).
  • Then it changes the reader settings to use a single column and a sans-serif font.
  • Then it extracts the book's table of contents.
  • Then it goes through each page of the book's main contents and saves a PNG screenshot of the rendered content to out/${asin}/pages/${index}-${page}.png.
  • Example: examples/B0819W19WD/pages
  • Lastly, it resets the reader to the original position so your reading progress isn't affected.
  • It also records some JSON metadata with the TOC, book title, author, product image, etc to out/${asin}/metadata.json.
  • Example: examples/B0819W19WD/metadata.json

Note

I'm pretty sure Kindle's web reader uses WebGL at least in part to render the page contents, because the content pages failed to generate when running this on a VM (Browserbase). So if you're getting blank or invalid page screenshots, that may be the reason.

Transcribe Book Content

npx tsx src/transcribe-book-content.ts
  • (This takes a few minutes to run)
  • This takes each of the page screenshots and runs them through a vLLM (gpt-4o or gpt-4o-mini) to extract the raw text content from each page of the book.
  • It then stitches these text chunks together, taking into account chapter boundaries.
  • The result is stored as JSON to out/${asin}/content.json.
  • Example: examples/B0819W19WD/content.json

(Optional) Export Book as PDF

npx tsx src/export-book-pdf.ts
  • (This should run instantly)
  • It uses PDFKit under the hood.
  • It includes a valid table of contents for easy navigation.
  • The result is stored to out/${asin}/book.pdf.
  • Example: examples/B0819W19WD/book-preview.pdf

(Optional) Export Book as EPUB

If you want, you can use Calibre to convert your book's PDF to the EPUB ebook format. On a Mac, you can install calibre using Homebrew (brew install --cask calibre).

# replace B0819W19WD with your book's ASIN
ebook-convert out/B0819W19WD/book.pdf out/B0819W19WD/book.epub --enable-heuristics

(ebook-convert docs)

(Optional) Export Book as Markdown

npx tsx src/export-book-markdown.ts

(Optional) Export Book as AI-Narrated Audiobook 🔥

npx tsx src/export-book-audio.ts
  • This takes a few minutes to run.
  • We support two TTS engines: OpenAI TTS and Unreal Speech TTS.
    • To use OpenAI, set TTS_ENGINE=openai (the default)
    • To use Unreal Speech, set TTS_ENGINE=unrealspeech and UNREAL_SPEECH_API_KEY=(your-api-key)
    • OpenAI is higher quality but more expensive; Unreal Speech is medium quality and cheaper
    • To set the OpenAI voice, use OPENAI_TTS_VOICE=onyx (defaults to alloy)
    • To set the Unreal Speech voice, use UNREAL_SPEECH_VOICE='Scarlett' (defaults to Scarlett)
    • OpenAI TTS for a full novel (~1M tokens) is approximately $30 (1.5GB MP3 ~21 hours long)
    • Unreal Speech TTS for a full novel (~1M tokens) is approximately $2 (1.7GB MP3 ~23 hours long)
    • It should be pretty easy to support other TTS providers in the future.
  • The TTS will be broken up into reasonly sized chunks and stored in mp3 files under out/${asin}/audio/<tts-engine-hash>/.
    • The <tts-engine-hash> directory is based on the TTS engine settings and book contents
  • After generating audio for each chunk, we use ffmpeg to concat them together.
  • The resulting audiobook is stored to out/${asin}/audio/<tts-engine-hash>/audiobook.mp3.
  • Examples: examples/B0819W19WD/audio-previews

Disclaimer

This project is intended purely for personal and educational use only. It is not endorsed or supported by Amazon / Kindle. By using this project, you agree to not hold the author or contributors responsible for any consequences resulting from its usage.

Author's Notes

This project will only work on Kindle books which you have access to in your personal library. Please do not share the resulting exports publiclywe need to make sure that our authors and artists get paid fairly for their work!

With that being said, I also feel strongly that we should individually be able to use content that we own in whatever format best suits our personal needs, especially if that involves building cool, open source experiments for LLM-powered book augmentation, realtime narration, and other unique AI-powered UX ideas.

I expect that Amazon Kindle will eventually get around to supporting some modern LLM-based features at some point in the future, but ain't nobody got time to wait around for that.

Alternative Approaches

If you want to explore other ways of exporting your personal ebooks from Kindle, this article gives a great breakdown of the options available, including Calibre (FOSS) and Epubor Ultimate (paid). Trying to use the most popular free online converter will throw a DRM error.

Compared with these approaches, the approach used by this project is much easier to automate. It also retains metadata about Kindle's original sync positions which is very useful for cases where you'd like to interoperate with Kindle. E.g., be able to jump from reading a Kindle book to listening to an AI-generated narration on a walk and then jumping back to reading the Kindle book and having the sync positions "just work".

The main downside is that it's possible for some transcription errors to occur during the image ⇒ text step - which uses a multimodal LLM and is not 100% deterministic. In my testing, I've been remarkably surprised with how accurate the results are, but there are occasional issues mostly with differentiating whitespace between paragraphs versus soft section breaks. Note that both Calibre and Epubor also use heuristics to deal with things like spacing and dashes used by wordwrap, so the fidelity of the conversions will not be 100% one-to-one with the original Kindle version in any case.

The other downside is that the LLM costs add up to a few dollars per book using gpt-4o or around 30 cents per book using gpt-4o-mini. With LLM costs constantly decreasing and local vLLMs, this cost per book should be free or almost free soon. The screenshots are also really good quality with no extra content, so you could swap any other OCR solution for the vLLM-based image ⇒ text quite easily.

How is the accuracy?

The accuracy / fidelity has been very close to perfect in my testing, with the only discrepancies being occasional whitespace issues.

I'm sure there will be edge cases and ebook features that are missing (like embedded images), but it shouldn't be too hard to add those if there's enough interest.

License

MIT © Travis Fischer

If you found this project interesting, consider following me on Twitter.