We’ll use these timestamps to help align our subtitles and dubs with the source video. This flag tells the API that we want transcribed words returned along with the times that the speaker said them. This flag prevents the API from returning any naughty words. These are Speech-to-Text models that have been trained on specific data types (“video,” “phone_call”) and are usually higher-quality. For our use case, we’ll want to enable a couple of special features, like: This tool can recognize audio spoken in 125 languages, but as I mentioned above, the quality is highest in English. To do this, I used Google Cloud’s Speech-to-Text API. The first step in translating a video is transcribing its audio to words. Using the Google Cloud Speech-to-Text API If you want to leave translation/dubbing to humans, well–I can’t blame you. Here’s one particularly unimpressive dub from Japanese to English of one of my favorite shows, Death Note: This is largely because the quality of transcription (Speech-to-Text) was much higher in English than other source languages.ĭubbing from non-English languages proved substantially more challenging. In my experience, the most successful dubbed videos were those that featured a single speaker over a clear audio stream and that were dubbed from English to another language. Those translations can be mispronounced by the Text-to-Speech API.That text can be incorrectly or awkwardly translated by the Translation API.The video can be incorrectly transcribed from audio to text by the Speech-to-Text API.What makes this project trickier (read: more fun) than most is that there are at least three possible points of failure: (Ignore the fact that the speaker sometimes speaks too fast–more on that later.) Overall, you can easily get the gist of what’s going on from this dubbed video, but it’s not exactly near human-quality. I haven’t done any tuning or adjusting on it:Īs you can see, the transcriptions are decent but not perfect, and the same for the translations. Here’s one example dubbed automatically from English to Spanish (the subtitles are also automatically generated in English). What quality can we realistically expect to achieve from an ML-video-dubbing pipeline? So you can treat this project like your new hobby of baking sourdough from scratch: a really inefficient use of 30 hours.) AI-Dubbed Videos: Do they axe usually sound grood?īefore you embark on this journey, you probably want to know what you have to look forward to. (By the way, before you flame-blast me in the comments, I should tell you that YouTube will automatically and for free transcribe and translate your videos for you. Finally, we’ll “voice act” the translations using the Text-to-Speech API, which produces voices that are, according to the docs, “humanlike.” Next, we’ll translate that text with the Translate API. We’ll start by transcribing audio to text using Google Cloud’s Speech-to-Text API. “AI-Powered Video Dubs.” It might not get you Netflix-quality results, but you can use it to localize online talks and YouTube videos in a pinch. So in this post, I’ll show you how to use machine learning to transcribe, translate, and voice-act videos from one language to another, i.e. Besides, even if you did use machine learning to translate a video, you couldn’t use a computer to dub… I mean, who would want to listen to machine voices for an entire season? It’d be awful. Translation is a careful art that can’t be automated, and requires the loving touch of a human hand. So what are you supposed to do? The answer is obviously not to ask a computer to transcribe, translate, and voice-act entire episodes of a TV show from Japanese to English. Other times, entire seasons of shows aren’t translated at all, and you’re left on the edge of your seat with only Wikipedia summaries and 90s web forums to ferry you through the darkness. Sometimes you get the subtitles (“subs”) but not the voicing (“dubs”). The problem with watching anime, though, is that short of learning Japanese, you become dependent on human translators and voice actors to port the content to your language. Alongside cooking for myself and walking laps around the house, Japanese cartoons (or “anime” as the kids are calling it) are something I’ve learned to love during quarantine.
0 Comments
Leave a Reply. |