Microsoft has just announced that it is adding a new transcription search service to word, starting with Word for the Web.
In fact, the transcribing feature is already available in the online version of Word (see below) and this article was written using it. It allows users to either record conversations directly in Word and have them transcribed automatically, or to upload pre-recorded audio files or even video files, in a range of standard formats, and automatically get searchable, editable transcriptions.
Clearly, this will have terrific value in future for people using mobile devices or other times they not sitting at a desk with a keyboard, though the mobile version For Android and iOS isn’t available until late this year. Right now, there’s already value in being able to just sit and dictate correspondence, articles, blogs or whatever your thing is; much as we used to do in the old days with a Dictaphone or a PA (those were the glory days of my corporate career, before I went all entrepreneurial).
Unlike the existing Dictate function, which shows you what you are saying as you say it, the Transcribe function allows you to deliver a long stream of thought as one continuous dictation. Personally, I find this enables better flow and a more integrated chain of thought. The Dictate function is a bit like having a stutter, but you are constantly distracted by what you’ve just said; Transcribe just let you get on with saying what you need, with the expectation that you will review it later , as I have done with this piece. This makes for more natural dictation and allows thoughts to flow more freely.
The audio recording you made are added to a new Transcribed Files folder in your OneDrive for Business
Right now, it officially supports just US English. However, the press release and Microsoft’s blog that alerted me to the new feature (both dated yesterday as I write this) may already be out of date as it seemed to pick up British English (proper English, as some of us like to refer to it) correctly, based on the existing Dictation settings. Nine dictation languages are supported today, with 12 in preview. Transcription worked very acceptably with my middle English accent and respected British English spelling, which I suspect is a function of my Word document templates or having been set up with the correct language (unlike so many other people I could mention).
This article was dictated at full speaking speed from an ordinary desktop headset. No clever audio hardware needed. No need to change the way you speak. No need to speak slowly or differently. Just Speak normally.
Don’t forget to review it, since even the very best voice recognition is only about 80 or 90% accurate, and there will be oddities in what you actually said until you build up experience. That’s better than my typing accuracy and some transcription services have you seen the past using real people. right I think it is really rather impressive.
“To empower every person and every organization on the planet to achieve more.”Microsoft’s corporate mission
Of course, this is all part of Microsoft’s grand vision; infusing AI into everything that people do and, more importantly, empowering others; ensuring that they enable everybody regardless of circumstance, ability level, or disabilities and differences to interact effectively and fully with technology. Personally, have been very impressed with how big a change Microsoft has made in its approach to these things since their new vision was announced by Satya Nadella a few years ago. They really have taken this to heart; Accessibility tools and advanced AI helpers have improved remarkably. I count myself amongst those fortunate enough to be benefiting from these improvements as I suffer with Repetitive Strain Injury from even mild mounts of keyboard and mouse use. My occasional editor, Mr. Tim Danton has less to say about my written grammar and phrasing less too.
I’ve mentioned in a previous blog how impressed I am also with the Text to Voice capabilities that are now appearing. Machines are able to listen to what you say and make a decent job of turning it into text or take text and turn that into really rather fairly human sounding voice, with intonation, pacing and pauses. There are still gaps and weaknesses, so you shouldn’t worry that someone will pass a machine off as a person in either direction, yet… it’s already better than some people I know!
The above was all dictated using the transcription feature. Once I had finished, I simply paused recording and hit the Save and Transcribe now button. After about a minute Word finished converting the recorded audio it had captured and displayed the transcription in a pane on the right hand side of the document. This lets you listen to what you recorded, correct anything it got wrong, add either specific portions or everything to your document. Note that starting a new transcription clears the one you have already created. Transcribe has another trick, which is identification of different speakers; it does this in the background and is something the AI companies have been working on for a while. It takes a little longer to process than the actual transcription; the Speaker label updates after several minutes with a number for each different voice it recognises. I already have some uses for that, but if all you want is straight dictation you have to do a search replace to remove them from the document (which took me 20 seconds). I’m guessing Microsoft sort that out soon.
For those that worry about privacy, Microsoft have this to say:
This service does not store your audio data or transcribed text. Your speech utterances will be sent to Microsoft and used only to provide you with text results.
Having talked with very senior folk at Microsoft, they are serious about data privacy; about as good as it gets in the industry.
Overall, I am very impressed. Enough that I will be uninstalling my copy of Nuance Dragon Dictate later this week I imagine. You should try it.