Выставка в Дубае для IT-компаний при поддержке МЭЦ
Step towards the full automation of the creation of shock content for PSYOP.
Meta* engineers talked about the Voicebox neural network model, which has a wide range of capabilities for working with oral speech: generation, editing, or stylization according to a model.
Voicebox reads the given text with high quality or processes an already finished voice recording, for example, removes extraneous sounds from it, such as car horns and dog barks, while maintaining the content and style of speech. If necessary, you can even “replay” a fragment of the recording, pointwise correcting, for example, an incorrectly pronounced word. Six languages are supported: English, French, German, Spanish, Polish and Portuguese. Voicebox can be used as a simultaneous interpreter, conveying the voice and manner of speaking of the interlocutor.
The model was trained on 50 hours of audio books, and this was enough for her to master the skills of oral speech to the full: she profiles her voice and manner of speech based on a sample lasting only two seconds, after which she can reproduce it with any text. In practice, these features can be useful in metaverse applications by providing natural-sounding voices for virtual assistants and NPCs or for visually impaired people – the model can voice letters in the voices of their authors.
