Audio manipulation, Convolutional Neural Network (CNN), Mel-frequency cepstral coefficients (MFCCs) Text-to-speech model.