I’m looking for an OCR app that lets me read printed material more like I used to read when I was sighted. It seems like artificial intelligence should make this possible. Here’s some examples.
Example 1: Receipts. I want to skip past the name and address, date and time, phone and fax numbers, web page URLs, and advertisements. I only want to read the itemized list of my purchases and their prices.
Example 2: Restaurant menus. I want to skip past entire sections like appetizers and entrees, find the sandwich section, read through the titles of each sandwich, then read the full description and price for the two or three sandwiches I’m interested in.
Example 3: Tax forms. As a sighted person, I can use cues like font sizes, boxes, and tables to get a feel for the layout of the form, skip past unimportant sections or instructions, read a few lines of text, and find the one box with the one number I’m looking for.
Example 4: Mailed package labels. When someone delivers a package, I want to glance at the label and see whether it’s for me or my spouse. I should be able to quickly ignore GUIDs and other information that is meaningful only to the shipper.
All of these examples have one thing in common. The printed material displays information organized visually in a way that is obvious if you’re sighted, making it trivial to disregard extraneous text and quickly find pertinent information. And, in all these examples, existing OCR apps completely fail to detect and preserve that organization.
Artificial intelligence should be able to detect and preserve the visual, often hierarchical, organization of printed information and present it to blind users in a well-organized manner that’s easy for us to navigate and comprehend. We should not have to try to zero in on key information manually, as with Seeing AI’s short text feature. Nor should we be forced to wade through line after line of flat text, as with KNFB Reader or Seeing AI’s document feature. Don’t get me wrong. I’m not complaining. These OCR apps are great. I just expect that artificial intelligence should do a much better job at processing and organizing scanned text before it presents it to blind users.
If there’s a smarter OCR app out there that does what I want, please make my day and tell me about it.
Comments
OrCam
This is not an app but if you could get the OrCam to work well I believe it does something like this. where you can do a voice command for example find the phone number on something you are looking at. Have not kept up with the OrCam but maybe it has gotten better at this or worse. could not say. Very pricy device though. Hopefully something like this could come to an app if it is not around already. With the OCR of IOS 15 native camera and if apple works on an a AI maybe some day you can just ask siri the questions you posted above but that is to many if's.
Work Around
Hi Paul,
The best I've come up with is converting the results to a text document, opening it it text edit, pages, or whatever, and conducting a command f search for the pertinent information, e.g., a dollar sign if you're looking at a receipt, or your last name if you're looking at a mailing label, etc.
Hope this helps.
Bruce
Thanks
The Orcam has always sounded promising and seemed to perform very well in their marketing videos. Bruce, your workaround is pretty much what I do, if I have a keyboard handy. Still, both these suggestions aren't quite what I'm looking for. I'd like an OCR app that preserves the organizational structure of whatever text it scans and presents it with that structure, so I can skip over text I deem extraneous and focus in on text I deem relevant. Isn't this how the human visual system works?
The history behind my post is that I've lost my remaining vision over the past few years, and I've been comparing how I functioned with some eyesight to how I now function with no eyesight. One huge difference I've noticed is that my current OCR tools don't allow me to parse text anything like I used to when I was sighted. My OCR tools present flat text but fail to preserve even obvious organizational elements like sections with headings, tables, or text boxes. This all seems totally possible.
If a tool like this exists, what is it? And if it doesn't, why not?
Oh god, this is the holy…
Oh god, this is the holy grail, isn't it? Skim reading, looking at something in a quick glance and taking in all of it, rather than having to get mired in all the details of stuff you don't care about. I've never been able to see, but I've spent a lifetime wishing I could do that, ☺️.
I think we'll get there one day, but I honestly have no idea when. I mean heck, we can read handwriting these days with OCR as long as it's not too insane and I never thought we'd get to that. I don't know why formatting is so much harder than text. I mean, I don't seem to be able to find an app that can even scan an image of a calendar and get anything useful. Why you can't scan something and get something closer to a web documents with headers and tables and stuff I don't know.
As for AI, well maybe. I doubt it's as easy as you'd think, and it's the edge cases that I reckon would kill you every time. I mean it has to know *what* the document is as well as all the elements in it before it can give you information like that. Easy to do in a demo where you know all the types of documents you'll get, and harder in the real world where you have to try to scan a document and get a great scan, which for totally blind people is tricky, and also know it's a bill or a receipt or a menu or whatever.
Anyway, with all that said, one thing I've been experimenting with is Seeing AIs explore by touch feature. For example, I was able to read the picture of the calendar I mentioned by sliding my hand around the screen of my iPad. I could see the columns and the layout with some practice. Particularly in the case of a document that fits on one screen, this might be worth playing with, particularly on something like an iPad.
Working spacially like this isn't easy, and for me who's been totally blind for more than 40 years I think it's taken some kind of brain reorganisation from using an iPad all day every day for 5 years, but I'm finding myself thinking more about where stuff is, so for example I can touch on or very near the thing I want on my screen.
Agree with Yvonne but I’m confident we’ll get there
This really would be the holy grail and I’m sure it will happen at some point, if a human brain can understand structure, meaning and formatthen it seems likely machine learning will be able to at some point. For the time being I get the best results by far with voicedream scanner. It seems to have at least rudimentary understanding of layout and formatting, nothing like we are all looking for but the most advanced I’ve found so far. Along the same line of thinking, how long until a phone can do what a guide dog does? Self driving cars are leading the charge on machines that are aware of the world around them and using that information to navigate it. Seems like a logical next step the same awareness could be loaded into a phone. I don’t think my phone will be as good at cuddles on the sofa or play fighting though.
my thoughts
@Andy Lane one thing your not thinking of regarding guide dogs if you had an open manhole cover would your phone tell you there is one there? Would it tell you what way to go around it or har far to go in one direction? Will your phone let you know a car is coming at you from the side when crossing a street? Will your phone help you walk down the sidewalk in a straight line? My point in asking all these questions is to get people thinking. Your phone can help you but it won't replace a guide dog.
interestingly, they're…
interestingly, they're having similar problems with self dri'ving cars that we've been talking about, edge cases. At least for now, that's how AI is. It gets like 95 percent of the way and run into things that require real judgement and creativity.
Not really concerned about edge cases
This is not the same as a self-driving car that must make split-second life or death decisions autonomously. It just needs to figure out that I've got a frozen dinner on the counter, identify the cooking directions, nutrition facts, and the marketing hype, and then give me a choice of which I want to read.
There are a ton of image recognition apps that come up with probabilities for what they see. If my hypothetical dream OCR app thinks there's an equal probability that my receipt is a piece of junk mail, I'll tell it the correct answer, and it can learn over time. I won't fall into an open manhole if it's temporarily confused.
We know OCR apps can differentiate between fonts. We know they can identify columns and text inset boxes. iOS already analyzes text to identify names, addresses, dates and times, phone numbers, links, email addresses, and more. All the capabilities are there. We just need a developer with enough creativity and imagination to provide a better interface than flat text.
Approach Developers With This
Why not approach Microsoft, whoever develops the Voice Dream apps, etc. with these thoughts and suggestions?
and yes, I've been wanting this sort of reading convenience my whole life as well. Imagine all the time and brain power we'd save, if we wern't always doing extra listening, sorting, orienting...
Optacon as a solution
I still have the Optacon I got in 1975, and it can do some of the things you are looking for. The Optacon consists of a unit about the size of a cassette recorder and a little camera attached by a cable. You run the camera across a line of printed text, with one hand and the print letters are produced by an array of vibrating pins that you read with your finger on the other hand. I can easily zeroe in on the middle of an address label, read a receit and read directions on some packages.
Sadly, Optacons haven't been produced for quite a while. There was talk in the 80s of trying to shrink it down so you could slide the camera and read with the same hand. It should be possible these days if only someone would be willing to work on it.
Again, seeing ai has let me…
Again, seeing ai has let me do some stuff like that with explore image. Sort of like an electronic Optacon.
the 80% rule
Bearing in mind that this sort of AI is pretty sophisticated, the fact that we probably could have it right now but don't comes down IMO to the 80% rule familiar to folks working in web accessibility. The top 20% of pages get 80% of traffic, so concentrate on those; and develop for the 80% of users, because pissing off the other 20% (like the disabled or people not using Chrome...) makes more economic sense than investing in the last 20%. These sorts of AI applications are low return on investment compared to face recognition and the like. I'm really surprised by how far image recognition in Chrome and IOS have come, but I think those are piggybacking on things like Google Images technology.
That being said, examples 1-3 are happening now, because there's mainstream appeal in web-based restaurant menus and tax forms (just did mine). Most receipts get emailed now, and I can only hope that comes to a grocery store near me/you very soon. Package labels would be awesome. I suspect a bar code used by the postal service/ups will be how that gets done.
Not saying way better AI for OCR wouldn't be great or that it won't happen. There just needs to be a business model for it. Disabled-only applications don't get very far or very fast. Hell, look what happened to Nearby Explorer and KNFBReader, the two most expensive apps on my phone! Sadly, a common way that blind people's needs are addressed in the face of these head winds, as we all know, is through a plaintiff law firm.
Boy, I seem to be in a cynical mood today. Probably because I owe on my taxes this year.
Re: 80% Rule
I get what you're saying, Voracious P, and you're not wrong. Several companies were going to be sued, and that's why accessibility was built into certain products. Have a good one.
I don't know that it's an…
I don't know that it's an entirely *bad* thing, really. Things specially designed for disabled people are often hidiously expensive or don't get created in the first place.
On the other hand, if you can figure out a way mainstream people can use it, it's everywhere, it's cheap, and your aunt who's just lost her sight can probably find it.
Want ebooks? or Audio books? When they're just for blind people the publishers scream murder about having to provide them and most of what's available is 10 ears old or more or not what you want to read. Sighted people want to use them and if you can afford it you can read practically whatever you want.
Want decent speech technology? Sighted people suddenly want their computers to talk to them so it's everywhere and decent quality.
Heck, even OCR is in that category. I can still remember when a scanner cost as much as a car, ☺️.