Weblog on the Internet and public policy, journalism, virtual community, and more from David Brake, a Canadian academic, consultant and journalist

Archive forSeptember 20th, 2004 | back to home

20 September 2004

There are good free applications for most tasks available for Windows (there’s a good “directory of Windows free and open source software”:http://www.jairlie.com/oss/). The bad news is that in the case of OCR the Windows options I have found are pretty poor. SimpleOCR is the best of a bad lot – it is free but doesn’t work too well- at least it didn’t on the page I tried it on. There’s also a GNU option called “GOCR”:http://sourceforge.net/projects/jocr/ but I didn’t try it as it appears to be a DOS program with a text-only interface and I am skeptical that a half-Mb application could really do much!

Fortunately, there is another option. You can get the “US National Library of Medicine”:http://www.nlm.nih.gov/ to do the work for you. They have put an “experimental application online”:http://docmorph.nlm.nih.gov/docmorph/default.htm which allows you to upload files of a variety of different formats to their server and get back PDF, TIFF, text, or synthesized speech. This can be slow since a typical A3 scan of two pages at 300dpi is around 3Mb which takes a while to upload but may be a good alternative if you have no other way to get OCR done. They have a ‘MyMorph’ application which automates the upload and conversion process for multiple files but it only converts them to Adobe Acrobat files and does not OCR the text.

If anyone knows of an OCR program that is available as non-time-limited shareware or freeware and works reasonably well under Windows please let me know. “ABBY FineReader 7”:http://buy.abbyy.com/content/frpro/default.aspx works quite well I found but costs 81 pounds to buy and after a 15 day trial period it no longer lets you use it unless you buy.

P.S. I’m still at AoIR but I haven’t had time to craft a blog entry so this is one I did earlier. Pics etc will probably have to wait until Thursday.