But few people knew the language well enough to manually transcribe the audio. Inspired by voice assistants like Siri, Mahelona started looking into natural language processing. “Teaching the computer to speak Maori has become absolutely necessary,” Jones says.
But Te Hiku faced a chicken and egg problem. To build a you reo voice recognition model, it needed an abundance of transcribed audio. To transcribe the audio, it needed the advanced speakers which it was trying to compensate for the small number in the first place. There were, however, plenty of beginner and intermediate speakers who could read you reo words out loud better than they could recognize them on a recording.
So Jones and Mahelona, along with Suzanne Duncan, COO of Te Hiku, came up with a nifty solution: rather than transcribing the existing audio, they would ask people to record themselves reading a series of sentences designed to capture the full range of language sounds. For an algorithm, the resulting dataset would perform the same function. From these thousands of pairs of spoken and written sentences, he would learn to recognize you reo syllables in audio.
The team announced a competition. Jones, Mahelona and Duncan contacted every Maori community group they could find, including traditional groups kapa-haka dance troupes and waka-ama canoe racing teams and revealed whoever submits the most entries will win a grand prize of $5,000.
The whole community got involved. The competition has heated up. Maori community member Te Mihinga Komene, an educator and advocate for using digital technologies to revitalize you reorecorded 4,000 sentences on its own.
Money was not the only motivation. People bought into Te Hiku’s vision and trusted him to protect their data. “Te Hiku Media said, ‘What you give us, we are here like kaitiaki [guardians]. We take care of it, but you still own your audio,” says Te Mihinga. “It’s important. These values define who we are as Maori.
In 10 days, Te Hiku accumulated 310 hours of speech-to-text pairs from some 200,000 recordings made by about 2,500 people, an unprecedented level of commitment among researchers in the AI community. “No one could have done it except a Maori organization,” says Caleb Moses, a Maori data scientist who joined the project after hearing about it on social media.
The amount of data was still small compared to the thousands of hours typically used to train English language models, but it was enough to get started. Using the data to bootstrap an existing open-source model from the Mozilla Foundation, Te Hiku created his very first you reo speech recognition model with 86% accuracy.