Project Phonè
The Phoné consortium, formed by the University of Naples Federico II, the CNR-ISTI in Pisa and the Free University of Bozen-Bolzano, takes its name from the Greek word ‘φωνή’, meaning ‘linguistic sound’ or ‘voice’. It was established as a voluntary initiative with the following objectives:
- Collection of speech datasets (transcribed and non-transcribed) for training and evaluating Italian speech recognition and synthesis models;
- Definition of relevant use cases for model evaluation;
- System benchmarking using traditional and new metrics;
- Providing an independent evaluation of currently available automatic speech recognition and synthesis models in sensitive contexts such as industry, academia and government;
- Providing scientific and technological support for the third-party development of system architectures and training processes for speech recognition and synthesis.
Phoné products:
- Italian speech dataset: partially transcribed and useful for training speech recognition systems. The dataset can be distributed freely;
- two ASR models: trained from scratch using only Italian speech;
- a comprehensive speech synthesis and voice cloning system accessible via a web platform. This implementation is based on Gérard Bailly’s (CNRS, Grenoble) PyTorch implementation of Tacotron 2. New contributions include (but are not limited to): the packaging of the TTS module into a containerised web service based on FastAPI; the setup of Kubernetes deployment;
- a pipeline of audio file manipulation tools that are useful for preparing additional training material for TTS and ASR.