Hello, I’ve recently encountered a very strange and frustrating issue while working with Microsoft Word documents, and I wonder if anyone else has experienced anything similar.
The Problem
When reading Microsoft Word documents using Office 365 (logged in with my university account), I’ve noticed that sequences of random characters such as ccccc
, bbbbbbb
, or ccccccbbb
are being unexpectedly inserted into the text. These insertions occur whether the files are saved locally or synced with OneDrive, and regardless of whether I use JAWS or NVDA, both updated to the latest versions.
Here’s an example of the issue:
The quantum hypothesis explained the observed rate of emission of radiation from hot bodies very well, but its implications for determinism were not realized until 1926, when another German scientist, bbbbWerner Heisenberg, formulated his famous uncertainty principle.
These insertions seem to happen especially:
- At the beginning of a sentence.
- When a language tag changes in the document. (In the example above, “Werner Heisenberg” was tagged as German.)
The issue appears regardless of the speech synthesizer used:
- JAWS with Eloquence
- NVDA with IBMTTS
- NaturalVoiceSAPIAdapter, a tool that exposes Microsoft's neural voices as SAPI-compatible voices, available here: https://github.com/gexgd0419/NaturalVoiceSAPIAdapter
The problem started appearing in December, around the same time I installed NaturalVoiceSAPIAdapter (see link above). Recently I discovered that also ABBYY FineReader's add-in was active in Word, which I turned off.
Testing and Troubleshooting
Here are the steps I've taken to identify the source of the issue:
Uninstalled NaturalVoiceSAPIAdapter: Since the issue began shortly after installing it, I removed the software to test. However, the problem persisted.
Disabled OneDrive completely: I worked entirely with local files, but the issue continued.
Performed a complete clean reinstall of Windows: I formatted the system and reinstalled everything from scratch, avoiding any installation of NaturalVoiceSAPIAdapter or ABBYY FineReader. The problem still occurred.
Tested with LibreOffice: When using LibreOffice to open and edit the same Word files, the issue never occurs, even after repeated reads and saves. This seems to isolate the issue to Word or its interaction with the screen readers.
Character deletions don’t permanently fix it: Even when I manually delete the inserted characters in Word, save the file, and reopen it, new or relocated insertions can still appear later.
Final Thoughts
At this point, I’m at a loss. Even a fresh system with no third-party TTS or OCR tools still shows the same problem—only when using Microsoft Word in combination with a screen reader.
Has anyone else encountered anything like this? Any help, insight, or possible workaround would be greatly appreciated.
Thank you!
Comments
my $.02
It's a stumper, all right, since you obviously know what you're doing. However, given where in the document you say it occurs, it sounds like currupt tagging in the .docx code. You may or may not be aware that .docx is an application of XML schemas, which are like HTML from a parallel universe.
Technically, you could open a docx file in something like Notepad ++ and search for "Werner" etc. and see if there's something obviously funky there.
First, though, a couple of clarifications might help.
1. I didn't understand whether you're saying that the codes are (a) confirmed to not be there in the file before you read them and (b) are inserted into the file after you read it, so that opening it in Libre Writer also shows the garbage previously inserted.
2. If you start with the same uncorrupted version of the file, do the same codes appear in the same places, or is it totally uncorrelated with either the file or how you read it?
3. Also, does it happen during say all, or are they appearing as you cursor over the text?
4. The other thing to ask is, where are these files coming from?
Honestly, I don't have any clue regardless of your answers there, but maybe one of us will be inspired. The only time I've had garbage character strings inserted is when using a bluetooth keyboard and the connection flakes for an instant. Or if I've spilled water on the keyboard in the past few days. If it truly is the screen reader doing it... I really can't imagine: screen readers don't send characters other than activation, mouse clicks, etc.
Here are some other potential testing measures:
1. try Narrator.
2. open a file on a different computer running speech.
3. convert the file to plain text to get rid of the tagging.
4. try a different keyboard.
5. Jiggle NVDA's advanced settings regarding UIA for Word controls. Honestly, I've never figured out whether this affects documents or just menus and dialogs.
6. You'd have to do a deep dive to find this, since I haven't used it in a few years and have forgotten what it's called, but Microsoft has an accessibility tool that records each event in the accessibility tree throughout the operating system. If it's the screen reader communicating with the document somehow, it might show up.
7. In Word settings, check the box to display all formatting marks, particularly making sure "hidden text" is displayed.
I think its the document doing this.
I am running the latest Windows 11 and Word, though I am still running NVDA 2024.4.2. Regardless, I have yet to experience this on any of my documents, and I went through a number of official docs for my Cisco courses I took last year for Cyber Security and Network Engineering.
Admittedly I do not have JAWS, but I did test this out with Narrator and NVDA, and as I said, I am not experiencing this issue.
I have to agree with Voracious P. Brain, its gotta be the document(s) your reading.
Best of luck with this. 🙂
language tagging
Hi Knut,
I've never seen this issue, sounds absolutely crazy.
You say it seems to happen when a language tag changes in a document. Would it be worth disabling language tags in Word, or messing about with the documents' settings and choosing English as the only language? I imagine you can do this. I always turn off language detection in NVDA or JAWS, and I never tag text as different languages just because it's too much work, and I'm usually the only one reading my documents containing often English and any other language I'm studying at the time. I imagine language tags are only there for screen readers anyway, and unless your documents contain words in various languages for study purposes, it seems pointless having these tags just for people and places. I reckon some code from these language tags is being output to the screen reader, and possibly inserted somewhere. Things like 'BBB' sound like line breaks to me, and maybe they're included in the language tagging code. Have you turned off your screen reader, opened one of these documents, and asked a sighted person to have a look and ascertain whether these characters are still there? This would be after deleting all these characters, and then opening it after your screen reader is turned off and getting someone to have a look. Either way, I reckon it's the combination of language tagging occurring and a screen reader in use.