Subtitle localization

Description

A script creating a web page with a localized summary of an English video. The video is embedded on the page, and when a user starts playing it, the page displays synchronized localized subtitles right below the video. What’s cool about the implementation: the page parses out .SRT file, and it you have one for your video, it would take a minute to make it all work.

Sample screenshot

Check out the output

Check out this link (Russian localization): Решение судьи против Трампа изменяет политический ландшафт выборов 2024 года (drwuaze.site)

Motivation

I developed this service to share English-language videos with my parents who don’t speak English. This tool allows them to get an overview of the content first, and then watch the video with Russian subtitles below the video.

Who Could This Be Useful For?

This type of service could benefit various groups:

  1. Language Learners: Students learning English or Russian can watch videos with subtitles in their native language and learn language and prononciation.
  2. Content Creators: educators wanting to reach a broader, multilingual audience.
  3. Researchers: Those studying language acquisition or conducting cross-cultural media studies.
  4. Accessibility Services: Organizations looking to make content more accessible to non-native speakers.
  5. Educational Institutions: Schools or universities with international programs could use this to make English-language content more accessible.
  6. Anyone who wants to make a video available in another language quickly (just get your .SRT machine-translated).

Development Details

  • Time to create the script: 3-4 days
  • Code composition:
    • Python code and HTML page template: Created by ChatGPT
    • Script to parse and display subtitles from .SRT: Created by Claude
  • Cost of running the script: About 1 cent per run for a 5-minutes video.

AI Services Utilized

  • OpenAI Whisper: For audio-to-text conversion
  • ChatGPT: For video content summarization
  • Microsoft Bing Translator: For translation to Russian

Workflow

  1. The script downloads a video from YouTube.
  2. OpenAI Whisper converts audio into .SRT format.
  3. ChatGPT summarize the video content using the SRT file.
  4. .SRT and summary submitted to Microsoft Bing Translator for Russian translation.
  5. The script creates the target web page from a template.
  6. It publishes all files (HTML, JS, CSS, and MP4) to the Web site (via FTP).

Key Learnings

  1. Avoid over-specifying tasks for LLMs: focus on the goal/what you want to accomplish, and not on implementation details. A good LLM (like a good software developer) will give you what you want faster, and the implementation could be better than your idea 🙂
  2. ChatGPT 3.5 struggled with producing working JS code for this task, while Claude generated fully functional JS and CSS in one go.
  3. Both ChatGPT and Claude had difficulties making cosmetic changes to HTML+JS without breaking functionality.
  4. LLMs frequently produce CSS with minor syntax issues that are easy to fix manually. So, it is good to know some programming basics.
  5. I spent several days trying to embed subtitles directly over YouTube videos, but these failed due to YouTube API limitations. LLMs did their best to make it work but failed: apparently, this scenario used to be supported by YouTube, but not anymore.
  6. An 8GB GPU can run Whisper locally with decent quality and speed.

Considerations for Future Improvements

  1. Make it a service, test with a few more target languages.
  2. Migrate to ChatGPT 4o-mini, potentially – use it to translate .SRT in one go.
  3. Further explore options for direct YouTube integration without downloading videos.
  4. Explore using locally-run Llama for video summarization, though the effort may not be justified for most use cases.

Bottom Line

This project offers a decent solution for localizing short videos and producing a user-friendly overview. The majority of the development was accomplished using AI assistance, showcasing the potential for rapid prototyping and development of multilingual content tools.