YouTube is using algorithms for automatically captioning speech for the last eight years. It is immensely helpful for the deaf and people who are hard of hearing. The feature was pretty rough at first, but it improved significantly over time. Google said this YouTube automated sound effects are getting “closer and closer to the human transcription rates.”
Since speech is just a part of the audio, YouTube launched the YouTube automated sound effect captioning for the first time. At present, the system can show you three classes of sound. These include Applause, laughter, and music. These are the most frequently captioned sounds, and they add meaningful content for the viewers who have trouble in hearing.
The sound effects are merged with automatic speech recognition, and it is shown as a “part of standard automatic captions.” It is similar to the close captioned TV show. YouTube is very much aware that the captions are simplistic. But this time, it will be easier because the company built a strong backend.
In future, it will also introduce other sounds like barking, knocking, or ringing. These will pose new challenges because YouTube automated sound effects AI will need to figure out if the sound is sound is coming from an alarm, a doorbell or phone.
Google is helping YouTube automated sound effect captions AI
We cannot deny that it will be well worth the effort. Google says. 2/3rd of the participants finds these result captions enhance a vide experience. Though it is bound to make mistakes no matter how good it gets. It is mostly 95% accurate. But the users think the odd error will not distract the benefits.
Google is using machine learning for picking sounds and then display them as text. Google developed the Deep Neural Network (DNN) model for the ambient sound and trained it with thousand hours of video. The company hopes for the best results. The toughest part according to Google was separating and displaying events which are identical, like laughter and applause.