Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia
June 29, 2016 ยท Declared Dead ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Salita Ulitia Prini, Ary Setijadi Prihatmanto
arXiv ID
1606.09222
Category
cs.SD: Sound
Cross-listed
cs.CL,
cs.RO
Citations
2
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Adding an emotions using prosody manipulation method for Indonesian text to speech system. Text To Speech (TTS) is a system that can convert text in one language into speech, accordance with the reading of the text in the language used. The focus of this research is a natural sounding concept, the make "humanize" for the pronunciation of voice synthesis system Text To Speech. Humans have emotions / intonation that may affect the sound produced. The main requirement for the system used Text To Speech in this research is eSpeak, the database MBROLA using id1, Human Speech Corpus database from a website that summarizes the words with the highest frequency (Most Common Words) used in a country. And there are 3 types of emotional / intonation designed base. There is a happy, angry and sad emotion. Method for develop the emotional filter is manipulate the relevant features of prosody (especially pitch and duration value) using a predetermined rate factor that has been established by analyzing the differences between the standard output Text To Speech and voice recording with emotional prosody / a particular intonation. The test results for the perception tests of Human Speech Corpus for happy emotion is 95 %, 96.25 % for angry emotion and 98.75 % for sad emotions. For perception test system carried by intelligibility and naturalness test. Intelligibility test for the accuracy of sound with the original sentence is 93.3%, and for clarity rate for each sentence is 62.8%. For naturalness, accuracy emotional election amounted to 75.6 % for happy emotion, 73.3 % for angry emotion, and 60 % for sad emotions. ----- Text To Speech (TTS) merupakan suatu sistem yang dapat mengonversi teks dalam format suatu bahasa menjadi ucapan sesuai dengan pembacaan teks dalam bahasa yang digunakan.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
๐ฎ
๐ฎ
The Ethereal
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted
MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted