thought-forge-ai

An experiment in generating 30-60 second "deep thought" TikTok-style video including a spoken monologue, moving video scenes, music, and subtitles.

Examples: (make sure to turn on audio, GitHub mutes by default)

Why.Being.Bored.Might.Be.Your.Secret.Superpower.mp4

The.1.Change.That.Can.Transform.Your.Life.mp4

Why.Being.Weak.Is.Actually.Your.Greatest.Strength.sm.mp4

Some more examples are in ./data/examples.

Background / Motivation

I've recently seen a fair amount of "philosophical" and story-telling content on social media. Common traits are a calm voice, soothing music, calm pictures and a discussion of some topic of self-improvement, or love, or the world and our place in it. The algorithms have noticed that I sometimes enjoy this content.

Some of this type of content has clearly AI-generated elements, for example using the same voice or images that are slightly off. So I wondered how hard it would be to create similar content in a 100% automated manner.

Turns out, it's pretty easy. I also think it's pretty hilarious to create thoughtful content about human struggles in modern life fully with AI.

Quality and Future Work

Sometimes the quality is surprisingly good, both in the topic and structure as well as the video. Mostly it is somewhat mediocre.

I think the output could be improved a bit with better prompts, and significantly with more cherry-picking or using (human) source material or at least inspiration. It currently also doesn't generate first-person personal stories which I might add later.

I've also noticed that the LLM is more deterministic than I thought. Even with temperature 1 it kept generating the same or similar topics, so I had to add the previous topics into the context. The same happens for the monologues, they often have too similar content, probably they would also need context of previous texts in order to generate more different content.

Steps and Tools

This project uses a combination of tools together with custom written code:

Step	Task	Tool	Cost per Video	Free Alternative	Code	Example
0	Choose Topics, Voice and Clickbait Title	LLM (Claude 3.5)	<$0.01¹	Llama 3.1	step-00-find-topic.ts	topic.json
1	Write monologue script	LLM (Claude 3.5)	$0.01²	Llama 3.1	step-01-write-monologue.ts	monologue.txt
2	Read monologue	TTS (Elevenlabs)	$0.20³	coqui-ai	step-02-text-to-speech.ts	speech.mp3
3	Split monologue into scenes, create image prompts, calculate start+end times	LLM (Claude 3.5)	$0.01	Llama 3.1	step-03-text-to-image-prompt.ts	alignments.json
4	Create starting image for each scene	Text to Image (Flux.1 Pro)	$0.35⁴	self-hosted Flux.1 Dev	step-04-text-to-image.ts	example.jpg
5	Create scene video	Image to Video (RunwayML Gen-3 Alpha)	1x$6.00 or $80/month⁵	?	step-05-image-to-video.ts	example.mp4
6	Create music prompt	LLM (Claude 3.5)	$0.01	Llama 3.1	step-06-text-to-music-prompt.ts	music-prompt.txt
7	Create music from prompt	Text to Audio (MusicGen)	$0.17⁶	MusicGen self-hosted	step-07-music.ts	music.mp3
8	Create subtitles for burn in	None	$0	-	step-08-subtitles.ts	subtitles.ass
9	Merge and cut video, normalize loudness and merge audio, burn in subtitles	ffmpeg	$0	-	step-09-ffmpeg.ts	merged.mp4

Apart from the video generation the whole process is pretty cheap ($0.7 per video). The video generation is both the most enticing part and also the worst (wrt API and quality). I assume this will change in the coming months.

Setup / Running

Install the dependencies using pnpm install. You can get pnpm by running corepack enable, which comes with node.

Copy the .env.example file to .env and add all the needed API keys. This is going to be pretty annoying since it needs five independent accounts most of them with payment set up.

Generate a list of 40 topics using pnpm generate.

Choose a topic from that list to generate a full video for using pnpm generate 12. This will take a few minutes, depending mainly on the speed of the video AI.

Footnotes

1200 input tokens * 3$/million + 2000 output tokens * 3$/million = $0.03 for 10 videos ↩
3000 tokens * 3$/million + 200 tokens * 15$/million ↩
around $20 for 100 minutes worth of tokens ↩
$0.05 per image, ~7 images per video ↩
100 credits for 10 seconds of video, $1 for 100 credits. For 60 seconds $6. Unlimited generation for $80/month. ↩
on replicate.com around 2min of one A100 80GB at $5/hour makes ~$0.17 ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

thought-forge-ai

Background / Motivation

Quality and Future Work

Steps and Tools

Setup / Running

Files

README.md

Latest commit

History

README.md

File metadata and controls

thought-forge-ai

Background / Motivation

Quality and Future Work

Steps and Tools

Setup / Running

Footnotes