Dubbie - Open-source AI video dubbing studio

Admin

Dubbie

Dubbie is an open-source AI dubbing studio that costs $0.1/min, which is about ~20x less than alternatives like ElevenLabs, RaskAI, or Speechify. While still in early development and not at feature parity with these alternatives, Dubbie offers enough features to create dubs for basic videos.

What is Dubbie built with?

  • NextJS 14: Client app (app.dubbie.com)
  • Tailwind: Styling
  • ShadcnUI: Components
  • Prisma: Database interface (Postgres)
  • Clerk: User authentication
  • Stripe: Payments
  • Openrouter: LLM selection for best-fit tasks
  • Azure/OpenAI: Voice generation
  • Firebase: Storage
  • NodeJS: Longer running functions (initialization/exporting)

How are the folders structured?

This project is a monorepo with 4 packages

  1. /next
  2. /node
  3. /shared
  4. /db

next and node are applications that are deployed to vercel/railway. db contains our Prisma schema + client. shared contains individual functions that are used inside of both the next and node.

How does the dubbing initialization process work?

  1. The user uploads the video and click “create project
  2. Upload the video to Firebase storage
  3. Extract the audio and upload it to Firebase storage as well
  4. Transcribe the audio via Whisper
    • This will give users the entire transcription in a big paragraph and time stamps for each word.
  5. Use an LLM to break down the entire paragraph into individual sentences.
  6. Match the individual sentences with the word level timestamps to figure out when each sentence begins and ends.
    • Since the LLM output may not be “perfect” match, we will then use an approximation algorithm.
  7. Use an LLM to translate each sentence it into the language the user selected.
    • We do this translation chunk by chunk, and use certain techniques to ensure the output matches the input.
  8. Use a text to speech API(currently just Azure and OpenAI) to generate audio!
  9. Upload those audio files to firebase storage, and save the URLs to our database via Prisma.
  10. The frontend client updates and renders all of that so users can preview realtime and edit

How does the frontend editor work?

On a high level: there are 3 elements that we need to sync

  1. Video element
  2. Timeline scrubber
  3. Invisible audio player

Tone.js connects individual audio URLs and serves as the main timer. See useAudioTrack.ts for implementation details.

GWPSan