Week 4 OCR Exercise

(Omeka post can be found here)

For my test, I chose to use Office Lens because it’s the one that I imagine is the most widely accessible as well being a good representative of what I think is the industry standard. (Microsoft tends to enjoy having a hand in every pot imaginable while making the most mediocre contribution to each respective area, but some would call that a personal opinion.)

As far as the text I chose to translate, I figured I could answer the most interesting questions at once by using an original page that is written in my handwriting and specifically designed to stress test what I predicted that it would struggle the most with. I wrote in pencil with varying pressures on the paper at different points, randomized capitalization, a grammar error or two, and challenged it to a line of Gibberish as well.

For the most part it seemed to have no qualms with transcribing a page in botched English that means nothing, but it struggled on a few specifics and, as I’d predicted, it misread a few things as cues for things that they weren’t. An example might be separating two words presumably because it believed that a capitalization mid-sentence was meant to indicate the start of a new word, when in fact it was nothing more than a cruel ruse orchestrated by a higher being. (For specific examples, I also included a screenshot of the exact image it was working with on the Omeka page.)

I’d be most likely to use this kind of software personally to transcribe older pieces of media from my childhood into a more readily accessible format, or to scan real world text such as notes from a projected slide to have in an easy to copy/paste form.

In summary, I’d say that the areas where it failed my tests are mostly inconsequential, and that it’s an extremely useful tool for any historian or academic not looking to spend months in a cave like an old testament monk transcribing complete passages by hand. Having a tool that can do a large portion of the grunt work and allow a diligent human to then look over and revise as necessary is an incredible gift to anyone who’s been tasked with such a thing. Luckily for them too, I’d be willing to bet that most authors from the 19th century and earlier didn’t write with as much anti-Microsoft spite as I may be inclined to, making the findings of my traps far less relevant.


Leave a Reply

Your email address will not be published. Required fields are marked *