Playing with Speech Recognition

Can we type and wordpad? Yes we can.

I’m testing out Windows speech recognition. I last used speech recognition about 10 years ago on the Macintosh with a now defunct product called Power Secretary. I even wrote an entire book using Power Secretary. That was the first edition of Java Network Programming. However I gave up on it fairly quickly because it was simply too difficult.

The first thing I noted when trying out Windows speech recognition today was that it doesn’t seem to work in Firefox. I have to dictate into Wordpad and then copy the results into Firefox to post it on the blog here.

Window speech recognition on my new 2.2 GHz dual core Dell system with a couple of gigabytes of memory is much more accurate than Power Secretary ever was even with minimal training. Even when I get something wrong, it’s much easier to correct it than correcting mistakes in Power Secretary was. (the word was does seem to confuse windows speech recognition fairly frequently I’ve had to correct it several times in this article all ready.

I can actually type fairly quickly despite talking like this. Having to stop at the end of every sentence just to insert the punctuation marks, though, is tricky. I can tell I’m going to have to do a lot of editing on this article to make it worthy of publication. Or perhaps I should just leave it in a its unedited uncorrected form. The punctuation will probably get better as I remember and learn to think that, as with my voice was I don’t normally do one of the things that is not normally considered in working with speech recognition is that speaking as a different way of thinking than talking sorry than writing.

I can already see that this article is going to be rather poor compared to the ones that I compose by typing them, and I think that has more to do with the way home one thinks when speaking a vs. writing a rather than with a failure of the speech recognition system to accurately transcribe by words.

Besides the enhanced accuracy, one thing I’m noticing about windows speech recognition is that the corrections in just the whole user interface are much more fluid, much more intuitive they and the old power secretary commands ever were. For example, when I need to spell a word, as I did with intuitive in the previous sentence, because speech recognition recognized it as in to it if”, I simply spell out the letters, I space. In power secretary, I had to use the radio who alphabet instead of saying A I would say alpha; instead of saying B, I would say Bravo; instead of saying C, I would say Charlie. Etc.

It’s pretty obvious that speech recognition has come a hell of a long way in the last 12 years. I’m not even giving it an especially fair test. I’m using a fairly crappie microphone that I got for essentially free. And I’m also using what is known not to be the best speech recognition program available. Pretty much all reviews unanimously agree that Nuance Dragon Naturally Speaking is the superior program, and if I really want to start doing this than I should probably buy a copy of that.

Still, I could see getting used to this. I’ll have to learn to think more clearly and to speak articles rather than writing them. I suspect they would still benefit heavily from a full edit cycle with the mouse and a keyboard rather than with my voice. I think you can see that from the rhetoric disconnected stream of consciousness approach you find in this article itself. I might also have to consider getting an office with a door because I’m really not sure my wife wants to listen to me talk to each of my individual articles. It may also be relatively challenging to speak more code heavy articles like the ones I write for java .net. Nonetheless as a 20 words per minute typist at best, I can certainly crank out the text much much faster using voice recognition that I can while typing.

2 Responses to “Playing with Speech Recognition”

  1. John Cowan Says:

    The “radio who alphabet” left me baffled.

    “Crappie” is good (it’s a fish).

  2. Stuart Says:

    “radio who alphabet” probably refers to what is documented at
    http://en.wikipedia.org/wiki/NATO_phonetic_alphabet

Leave a Reply