AI Cognitive Services Innovation Microsoft Microsoft 365 Microsoft Office

Transcribe: More talking, better words

Microsoft has just announced a new Transcription feature in Word. I tested it and found it was good

Microsoft has just announced that it is adding a new transcription search service to word, starting with Word for the Web

In fact, the transcribing feature is already available in the online version of Word (see below) and this article was written using it. It allows users to either record conversations directly in Word and have them transcribed automatically, or to upload pre-recorded audio files or even video files, in a range of standard formats, and automatically get searchable, editable transcriptions. 

Start Transcribe from Word on the web, from the Dictate microphone icon
Start Transcribe from Word in the browser
Start recording or upload a file to Transcribe
Start recording or upload a file to Transcribe

Clearly, this will have terrific value in future for people using mobile devices or other times they not sitting at a desk with a keyboard, though the mobile version For Android and iOS isn’t available until late this year.  Right now, there’s already value in being able to just sit and dictate correspondence,  articles, blogs or whatever your thing is; much as we used to do in the old days with a Dictaphone or a PA (those were the glory days of my corporate career, before I went all entrepreneurial).

Pause or complete recording and start transcription
Pause or complete recording
& start transcription

Unlike the existing Dictate function, which shows you what you are saying as you say it, the Transcribe function allows you to deliver a long stream of thought as one continuous dictation. Personally, I find this enables better flow and a more integrated chain of thought. The Dictate function is a bit like having a stutter, but you are constantly distracted by what you’ve just said; Transcribe just let you get on with saying what you need, with the expectation that you will review it later , as I have done with this piece.  This makes for more natural dictation and allows thoughts to flow more freely. 

The audio recording you made are added to a new Transcribed Files folder in your OneDrive for Business

Audio recording stored in OneDrive for Business
Audio recording stored in OneDrive for Business


Transcribe Review pane
Transcribe Review pane

Right now, it officially supports just US English. However, the press release and Microsoft’s blog that alerted me to the new feature (both dated yesterday as I write this) may already be out of date as it seemed to pick up British English (proper English, as some of us like to refer to it) correctly, based on the existing Dictation settings. Nine dictation languages are supported today, with 12 in preview. Transcription worked very acceptably with my middle English accent and respected British English spelling, which I suspect is a function of my Word document templates or having been set up with the correct language (unlike so many other people I could mention).

This article was dictated at full speaking speed from an ordinary desktop headset. No clever audio hardware needed. No need to change the way you speak. No need to speak slowly or differently. Just Speak normally.

Don’t forget to review it, since even the very best voice recognition is only about 80 or 90% accurate, and there will be oddities in what you actually said until you build up experience. That’s better than my typing accuracy and some transcription services have you seen the past using real people. right I think it is really rather impressive. 

“To empower every person and every organization on the planet to achieve more.”

Microsoft’s corporate mission

Of course, this is all part of Microsoft’s grand vision; infusing AI into everything that people do and, more importantly, empowering others; ensuring that they enable everybody regardless of circumstance, ability level, or disabilities and differences to interact effectively and fully with technology. Personally, have been very impressed with how big a change Microsoft has made in its approach to these things since their new vision was announced by Satya Nadella a few years ago.  They really have taken this to heart; Accessibility tools and advanced AI helpers have improved remarkably. I count myself amongst those fortunate enough to be benefiting from these improvements as I suffer with Repetitive Strain Injury from even mild mounts of keyboard and mouse use. My occasional editor, Mr. Tim Danton has less to say about my written grammar and phrasing less too.

I’ve mentioned in a previous blog how impressed I am also with the Text to Voice capabilities that are now appearing. Machines are able to listen to what you say and make a decent job of turning it into text or take text and turn that into really rather fairly human sounding voice, with intonation, pacing and pauses.  There are still gaps and weaknesses, so you shouldn’t worry that someone will pass a machine off as a person in either direction, yet… it’s already better than some people I know!


The above was all dictated using the transcription feature. Once I had finished, I simply paused recording and hit the Save and Transcribe now button. After about a minute Word finished converting the recorded audio it had captured and displayed the transcription in a pane on the right hand side of the document. This lets you listen to what you recorded, correct anything it got wrong, add either specific portions or everything to your document. Note that starting a new transcription clears the one you have already created. Transcribe has another trick, which is identification of different speakers; it does this in the background and is something the AI companies have been working on for a while. It takes a little longer to process than the actual transcription; the Speaker label updates after several minutes with a number for each different voice it recognises. I already have some uses for that, but if all you want is straight dictation you have to do a search replace to remove them from the document (which took me 20 seconds). I’m guessing Microsoft sort that out soon.

For those that worry about privacy, Microsoft have this to say:

This service does not store your audio data or transcribed text. Your speech utterances will be sent to Microsoft and used only to provide you with text results.

Having talked with very senior folk at Microsoft, they are serious about data privacy; about as good as it gets in the industry.

Overall, I am very impressed. Enough that I will be uninstalling my copy of Nuance Dragon Dictate later this week I imagine. You should try it.

By Simon Hudson

Simon Hudson is an entrepreneur, health sector specialist and founder of Cloud2 Ltd. and Kinata Ltd. and, most recently, Novia Works Ltd. He has an abiding, evangelical interest in information, knowledge management and has a lot to say on best practice use of Microsoft Teams, SharePoint and cloud technologies, the health sector, sustainability and more. He has had articles and editorials published in a variety of knowledge management, clinical benchmarking and health journals. He is a co-facilitator of the M365 North User Group Leeds and is Entrepreneur in Residence at the University of Hull.

Simon is passionate about rather too many things, including science, music (he writes and plays guitar & mandola), skiing, classic cars, technology and, by no means least, his family.


5 replies on “Transcribe: More talking, better words”

Today I played with Transcribe a bit more and remain highly impressed.
Uploading an audio file for it to work with takes a little longer than a live recording (mostly due to the upload time, so dependant on bandwidth). A pessimistic estimate is that it takes 1 minute per minute of audio to upload, process, transcribe and store the audio file; in practice larger files will be closer to half that (to be tested).
I can confirm that it is as seamless, simple and flexible.

I also tested recording a video call, using a desktop mic and speakers, straight into Word. It was near-flawless, even with some stronger Yorkshire accents. A 7 minute conversation was transcribed in under 30 seconds (since the audio was captured in real time).

It’s possible to share the Word document with other people and they have access to the Transcription pane and all the features – the link between the document and the audio is clearly preserved behind the scenes. How well this works across different tenants, with guest users etc. still needs research.

Very impressive!

I noted an information bar at the bottom of the New Transcription pane that said “0/300 transcription upload minutes used this month”; Looking into that, Microsoft advises,
“With Transcribe you are completely unlimited in how much you can record and transcribe within Word for the Web. Currently, there is a five hour limit per month for uploaded recordings and each uploaded recording is limited to 200mb.”

The Transcribe help page is here, for reference, which goes into how to use it in more detail:

Another update in the last few days. When you click “Add to document”.
Previously it added the entire text, with speaker and time stamp information. That was a bit annoying as you had to do a Search and Replace to remove the extra ‘who said what when’ information in most cases.
This latest enhancement provides 4 options:
– Just text
– With speakers
– With timestamps
– With speakers and timestamps

Choosing the Just text option makes the whole process even quicker and simpler.

We just need to persuade MS to make embedded metadata update in Word Online…

I’m curious to see how it deals with my Australian accent. All those slurred vowels 😀

I reckon it will do fine. I routinely understand all my Australian friends and colleagues, so it can’t be that hard!! 🙂
But let us know how you get on

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s