Project Phonè

The Phoné consortium, formed by the University of Naples Federico II, the CNR-ISTI in Pisa and the Free University of Bozen-Bolzano, takes its name from the Greek word ‘φωνή’, meaning ‘linguistic sound’ or ‘voice’. It was established as a voluntary initiative with the following objectives:

  • Collection of speech datasets (transcribed and non-transcribed) for training and evaluating Italian speech recognition and synthesis models;
  • Definition of relevant use cases for model evaluation;
  • System benchmarking using traditional and new metrics;
  • Providing an independent evaluation of currently available automatic speech recognition and synthesis models in sensitive contexts such as industry, academia and government;
  • Providing scientific and technological support for the third-party development of system architectures and training processes for speech recognition and synthesis.

Phoné products:

  • Italian speech dataset: partially transcribed and useful for training speech recognition systems. The dataset can be distributed freely;
  • two ASR models: trained from scratch using only Italian speech;
  • a comprehensive speech synthesis and voice cloning system accessible via a web platform. This implementation is based on Gérard Bailly’s (CNRS, Grenoble) PyTorch implementation of Tacotron 2. New contributions include (but are not limited to): the packaging of the TTS module into a containerised web service based on FastAPI; the setup of Kubernetes deployment;
  • a pipeline of audio file manipulation tools that are useful for preparing additional training material for TTS and ASR.