Speech Recognition using the Raspberry Pi

raspberry pi

I've finally received my Raspberry Pi, and I've immediately gotten to work transferring the speech recognition system I used for the robotic arm to the pi. Due to its small size and low power requirements, the Raspberry Pi is an excellent platform for the Julius open-source speech recognition system. This opens up almost limitless possibilities for voice command applications.

EDIT: I am no longer working on Julius/HTK for speech recognition. Please see this post for more information.

There do exist commercial offerings of electronic voice command modules, as well as voice command applications appearing in recent smartphones (i.e. Siri), however, they are either not as versatile or not as cheap as the Raspberry Pi. Additionally, Julius is an LVCSR - a Large-Vocabulary Continuous Speech Recognition decoder, which means you can develop large vocabularies and complex grammars so you can make more natural voice interfaces.

In this tutorial, I will be demonstrating how to use the Raspberry Pi for a simple speech recognition system to control the Maplin USB Robotic Arm. Later on, I will demonstrate how to interface this system with other devices using the Raspberry Pi's GPIO pins.

Requirements:

  1. Raspberry Pi set up and running debian (please follow setup instructions from www.raspberrypi.org) and preferably connected to the internet
  2. USB microphone

Also, you need to have followed the instructions for creating an acoustic model in my earlier tutorial here using a full-sized computer. It's a lot easier to get it working on a full-sized computer then just transferring it to the pi, however you can of course follow the entire acoustic model generation tutorial right on the pi itself.

Since we are not using the GPU very much, we'll allocate less RAM for video memory. Instructions to do that can be found here [andybold.me.uk].

To save on resources the Raspberry Pi should only be running in command line mode, and all my instructions below in boxes should be typed into the Raspberry Pi command line.

Loading Drivers

To begin, we need to load the sound card driver. In the command line, type:

sudo modprobe snd_bcm2835

Software

There are a few packages that we need to install to get the system working properly. To get them, you need the Raspberry Pi connected to the internet, otherwise, download the packages to the SD card and install them from there along with their dependencies. If you have a working internet connection on the pi, just type into the command line:

sudo apt-get install alsa-tools alsa-oss flex zlib1g-dev libc-bin libc-dev-bin python-pexpect libasound2 libasound2-dev cvs

I'm not sure if libc-bin and its headers are actually required, but this is what I installed when I was trying to get it to work. Now, to test if the microphone is working, try recording 10 seconds of audio using arecord (from alsa-tools) and play it back using aplay:

arecord -d 10 -D plughw:1,0 test.wav
aplay test.wav

The -D option (plughw:1,0) assigns the device you want to record from. Since the Raspberry Pi's internal sound device is using plughw:0,0, attaching a USB microphone would typically assign it to plughw:1,0. If you're attaching several sound devices to the Raspberry Pi, you should change this to the appropriate value, as well as the ALSADEV environment variable (see the last section below).

Installing Julius

The latest stable version of the Julius LVCSR decoder (4.2.1) does not detect the ALSA headers correctly when I tried it. A search in the forums indicated that the CVS version was working, so my advice is to compile julius from the CVS source.

cvs -z3 -d:pserver:anonymous@cvs.sourceforge.jp:/cvsroot/julius co julius4

If you're using Raspbian, set the compiler flags by the environment variables:

export CFLAGS="-O2 -mcpu=arm1176jzf-s -mfpu=vfp -mfloat-abi=hard -pipe -fomit-frame-pointer"

Afterwards, go into the folder julius4 just created, and run the configure, make and make install commands:

cd julius4
./configure --with-mictype=alsa
make
sudo make install

Finally, julius needs an environment variable called ALSADEV to tell it which device to use for a microphone:

export ALSADEV="plughw:1,0"

To run the speech recognition, you can simply copy an existing model if you have followed my previous tutorial. From my previous tutorial, copy the entire 'voxforge' folder with your acoustic models to the Raspberry Pi home directory in the SD card. Insert the SD card into the Raspberry Pi and boot it up. On the raspberry pi command line, navigate to voxforge/auto. Hopefully, once the environment variable is set you can execute Julius and begin speech input:

julius -input mic -C julius.jconf

Issues

I've encountered the issue of recognition accuracy dropping when I use the Raspberry Pi as compared to my laptop, using the same acoustic model. I've found that recompiling the HMMs on the Raspberry Pi improves this a bit, however it's still not as accurate as on my linux laptop. If you have any idea what's causing this, please let me know!

In the meantime, to compile HTK I'd recommend:

./configure --without-x --disable-hslab
make all
make install
Tags: 

Comments

I got the above error when executing ./configure

I found the solution to be -
sudo apt-get install libasound2-dev

wr1472

Thanks, I forgot about that! I've updated the article to include it. Please let me know if you find a solution to the recognition accuracy issue!

The standard debian on the raspberry pi does not use the floating point processor, and thus will be a bit slow calculating this in software. There is another distro http://www.raspbian.org/ they are currently recompiling all the kernal and modules to use the floating point in the cpu, expected to be ready early June.

I've actually had a look at this - it's still not in a very usable state, but I'm hoping it improves performance when it's released.

hi algorithmic,

thanks for this useful post. What USB mic did you use, would you recommend it?

bests,
Tobie

I used the microphone in the PS3 eye.

Could the problems with accuracy be anything to do with the fact that the voice samples to train the acoustic model were recorded using a different PC/software? The raspberry PI may handle the audio slightly differently. Just a guess.

No...I don't think so. The audio samples are 16-bit unsigned PCM, 48kHz. It should be the same whether it's the Raspberry Pi or another PC.

Great proof of concept. I was just wondering if voice recognition could be done on a Rasp Pi and here's were I ended up. Excellent!

Hey there, excellent work is awesome all the thinks are possible with the raspberry. keep going.

One think the result for the link http://www.raspberrypi.org for some reason is http://www.aonsquared.co.uk/www.raspberrypi.org and go to 404 in this site.

Ok see u man.

First of all, thanks very much for sharing this, it is one of the most interesting uses for the Raspberry Pi that I have seen yet. Following your instructions I was able to get it up and running (although I need to try re-recording my samples, the recognition rate is not very good so far unfortunately)

If possible, can you please share the julian.jconf that you are using? From the docs, I was able to modify the Sample.jconf to get it working, but I am not sure about what options would be best?

Also fyi a couple of minor items I found while going through the tutorial:

* In addition to alsa-tools, I also needed to get alsa-utils in order to test the recording (arecord and aplay)

* Instead of mkdfa sample, I needed to use mkdfa.pl sample (from /robot_arm_tutorial_1)

* When running it on the RPi, I had to modify HTK_Compile_Model.sh to #!/bin/bash instead of #!/bin/sh (from /robot_arm_tutorial_1, possibly only applicable if training on the RPi)

Again thanks, great tutorial!

I am trying to use a USB microphone on RPi and have both a CMixer CM108 one which sounds awful and a Tenx TP6911 whic does not do a lot!

Please could you tell me which microphone you are using and the chipset?

Thanks

Andy.

I'm using the PS3 eye camera!

That's great, really looking forward to using your speech recognition, -- we've got text to speech going and the two will really work together well. There's a silly demo driving an animatronic chicken that reads tweets

This is great work, I was wondering if the raspberry pi could be controlled by another raspberry pi and to use them as microcontrollers for a robot arm

Hi

Thanks for this interesting article. I am having difficulty with the following:

cvs -z3 -d:pserver:anonymous@cvs.sourceforge.jp:/cvsroot/julius co julius4

with it not liking cvs? Any thoughts would be greatly welcomed. I am using a R-Pi with debian squeeze and have run the instructions above exactly.

Many Thanks

Ash

have you tried 'apt-get install cvs'? what error message are you getting?

Thanks that sorted that but out but I am now having trouble with ERROR: m_jconf: failed to open jconf file: julius.jconf?

Any thoughts.

Thanks very much for your help.

If you can use a different audio codec, that might have a different performance.. You should be able to use the telephone or radio quality codec as the higher ones shouldnt really increase accuracy, and a smaller one will use less memory, bandwidth, etc.

Hi,

Dobby is a Python project I've been working on that works on my NAS (Synology ARM-based) that I aim to setup on my RPi.
https://github.com/Diaoul/Dobby

This is still a work in progress and requires some manual steps I need to document (julius grammar configuration and speechd configuration)

The goal is to recognize speech and do whatever actions you set it up to do. The Qt client helps with the configuration of Dobby and some basic actions are already in place (weather, date and time, RSS feed)

I have many ideas and I'm really excited about Dobby on my RPi so you can expect this project to evolve in a near future :)

Cheers

sir, you make me very proud to be pinoy. mabuhay po kayo! :-)

Raspberry Pi is undoubtedly a great alternative. As it's cheap, it'd be more helpful for commercial applications. Yes, voice command has become a popular idea now a days. It's applications and users are increasing day by day. So, your step by step review will be really helpful to others to understand it's different aspects. I's thinking that can it be used for controlling the cree led headlamp for bicycle?

Hello, I work for ZDNet.com. We'd like to feature your robotic arm in a gallery around Raspberry Pi devices. If you're interested, drop me an email! Thanks.

Hi Jon,

Yes I'll be glad for ZDnet to feature some of my projects - however, you didn't leave your email! Let me know on aonsquared in gmail if there's anything you need, thanks!

Hi I have the same robot but don't have a USB interface. Can it be Hooked up using the GPIO pins?

Cheers

Hi,
I just got my RPI and I'd like to power it throug a powered usb hub like i saw on your video; to get power and more usb slots available. Tried with a commercial one but there are power issues.

Could you post the specs of the one you're using in the video, please?
It seems working well...

thnx.
Luca

Ps: You're doing a GREAT JOBS with the RPI!!!!

Super cool. Any chance of uploading the RPi image? (ie the image of the setup). I haven't got an RPi yet, but was hoping to voice control my Hexy Hexapod. Woo!

have you tried the android port to rpi ...i am curious if the built-in voice control may be faser and slightly more accurate. of course this is only germane if your goal is function, performance, and the app controlled vs the learning process of getting linux to do voice. just an idea.

heay this is nice.... wonderfull... :)

Hello! First of all, thank you very much for sharing this with us. I'm starting my master in embedded systems and i have to propose a project to be evaluated in. I would like to make something related to speech recognition, one of the reasons i liked so much this page. The problem i'm facing is that i have to develop the project with NGW100, a 32bit AVR board. Based in your experience, do you think it's feasible with this board?

Hi. How does the voice control work in a noisy environment? I thought about the control of an audioplayer. Do you think this could work?