Ken Munro, partner at Pen Test Partners
Ken Munro, partner at Pen Test Partners

Voice-recognition software has some very interesting implications for security, as soon we might all be using it.

The iPhone has had ‘voice control' for some time. Hold down the home button, say “voicemail” and a locked phone dials the default account. I haven't found many phones where the user has disabled voice dialling from a locked handset.

However, the recent Siri voice recognition application for the iPhone makes voice-based attacks a whole lot more interesting. One of my colleagues bought an iPhone 4S, which we all had a good play around with. He pointed out one of the security issues with the following example, from his PIN-locked phone:
“Text wife.”
“OK, what would you like to say?”
“Just calling to say I love you.”
“Send.”

He then rapidly rang said wife to explain that he was demonstrating a security issue. And yes, the entry for my colleague's spouse in his contacts was indeed filed under ‘wife'!

It's a simple change in the passcode lock settings to disable voice control/Siri from a locked device, but seemingly no one does it, as few are aware of its potential.

There is much more potential for Siri and security, particularly in social engineering, but what about other uses for voice recognition? I had a high-end BMW a few years back, and was quite surprised by its capability to recognise names from my phone contacts, and for me to voice-dial them without training the car to my voice first. It worked brilliantly until I had kids. Try getting them all to keep quiet while talking to the car.

Similarly, one could have great fun in the early days of speech recognition on PCs. Simply walk past your victim and shout “help!” or “delete!”. Extra help windows pop up, or their words start disappearing. Given the hype in the early days of voice recognition, it's amazing how few people use it nowadays.

The MalCon malware conference had an interesting submission recently. It appears to be malware for the Xbox 360 Kinect controller. It reacts to a spoken keyword, then starts taking images of the victims and their surroundings while they play their Xbox, and uploads them to Google Picassa. The potential for invasion of privacy and more is obvious.

Voice recognition starts to get interesting when combined with bugging: listening to hours of recorded audio for juicy information is a pain. It would be much better if the victim's phone could be infected with malware. Even easier if the malware analysed the audio and only recorded it if certain keywords were heard. “Merger” or “acquisition” would be a good start. As would “password”.

The future is interesting – voice recognition software has evolved to a point where ‘training' of the application to your voice is either minimal or unnecessary. Voice control is entering everyday use in the home through games consoles. Users will become accustomed to using their voice to control systems, and no doubt will soon expect to authenticate using their voice.

How long before the majority of tech devices offer no-enrolment voice recognition? Users simply speak to authenticate. Apple devices have voice control enabled by default, and your execs expect to be able to connect these to corporate systems. Speak your password, please…

I do wonder if there is a security benefit to voice recognition: malware involving keyloggers will be pretty useless. There are no keystrokes to be logged any more. Maybe the future is lip-reading from the victim's webcam instead?

I think we will hear a lot more about ‘voice logging' in future.