Computer systems that can listen and speak to humans are becoming more and more common. Such systems answer telephones, provide requested information, type dictated documents, and assist people in many different ways on a daily basis. In the USA alone, several million customers of banks, airlines and search companies are served by speech recognition systems every day. But what if the languages spoken are not English, but rather, some of the 3,000 languages spoken in Africa?

This is where the MuST research group focuses its efforts. We create speech technologies for the less-resourced languages of the world, and try to find new ways of doing this quickly and cost-effectively. In order to be able to build these systems, we have to answer many questions: How can our systems be made to understand the many different accents within a single language? How do people pronounce proper names they have never heard before? How can we capture and understand the essence of a language from a limited set of speech samples?

The development of speech technology for the under-resourced languages is not simply a recapitulation of the steps taken for well-resourced languages (such as English), for a number of reasons:

  • Conventional speech technology is highly resource intensive. Scarcity of existing electronic resources forces us to be much more resource-efficient in the creation of speech technologies for under-resourced languages.
  • The availability of vast amounts of linguistic research in the world languages has supported a staged approach in those languages: technology is developed based on existing scientific knowledge. For the under-resourced languages, in which detailed linguistic knowledge is limited or absent, the interaction between linguistic research and technology development is likely to be much closer.
  • Most existing speech technology systems are monolingual in nature, but the environments in which under-resourced languages occur are often linguistically complex, requiring significant attention to multilingual phenomena such as a dynamic set of loan words, code switching, or multilingual names produced across languages.

Our research approach requires working at very different levels:

  • We continuously work towards a better understanding of the essence of pattern analysis (learning from data). This work touches on many interesting disciplines, from machine learning to linguistics, and has the potential to impact the world far beyond speech technology.
  • We build tools that can be used to collect and analyse samples of a language quickly and effectively. For example, we have recently completed a data collection effort with Internet giant Google, for voice building in four South African languages.
  • We build and test real-world systems and applications that use speech recognition in practical ways. Here we are very proud of our multilingual Speech Transcription Platform, which can be used to align or transcribe recorded audio, and is useful in various environments: from transcribing lectures in university environments, to debates in parliament.

This research focus on the data-driven learning of speech and language also provides us with the ideal environment to experiment with some of the fundamental questions related to computational adaptation and human learning, a fascinating area of research in its own right.

Visit our Facebook page!