For Research Nerds: clipping information and making it OCR searchable

For Research Nerds: clipping information and making it OCR searchable

When I’m researching and I see a bit of text relevant to my WIP (either online or on Kindle), I will clip it: shift-command-4 and “pull” rectangle around the text I want.

It’s so easy to do I often end up with a mess on my computer desktop. Here’s a screen shot to give you an idea how things can look at the end of a day:

Graphical user interface, application Description automatically generated

My next step is to gather up all the clips and drag them into an empty “Today’s clips” folder. That, at least, looks manageable.

Graphical user interface, application Description automatically generated

Why it’s important to a researcher for all files to be searchable

Research clips are not of much use to me unless I can search the text. For example, if I was looking for clips about Mother of the Maids I should be able to search that title and all relevant files would be listed.

Clips are a png file, a type of photo, so how is this done?

For the text in a photo to be searchable it needs to have OCR — optical character recognition. Basically, OCR makes it possible to search for a word in a photo. Magic.

I’ve messed around with OCR software quite a bit over time. Software such as OneNote claims to convert files automatically, but sometimes so very slowly it’s not practical. Evernote can be touchy and deliver poor results. Also, I need to be able to easily move the OCR’d clip into my writing programmes (Scrivener and AEON Timeline, at this time), which I can’t do — at least not easily — out of Evernote or OneNote.

Why not send a clip directly to Scrivener?

One of the strengths of Scrivener is that I can clip a bit of text from a website directly into my project. However, Scrivener does not offer OCR (at least at this time), so while I will often post a web address URL to Scrivener, I won’t send a clip. I need to be able to search the text of a clip, and without OCR, I can’t do that.

ABBYY FineReader software to the rescue

FineReader is simply an awesome OCR software: fast, powerful and reliable, but at around $150 for Mac, it’s not cheap. (There is a 2-week free trial offered, however.) (Check: is it subscription for Pro?)

I’ll show you what FineReader does with a day’s pile of clips:

Table Description automatically generated

I select and drag all of them into the Fine Reader icon in the dock:

It bounces around a bit as it does the work, and then, snap, it’s done.

FineReader, step-by-step

First I get a “Completed” notice with problem alerts — these I simply ignore and click “Close.”

The next screen shows small images of the clips on the left with enlarged versions on the right. I also ignore all that and click “Export” in the header.

That brings me to this window, which has the PDF option lightly highlighted. I ignore all the options in the middle and click Next in the lower right corner.

Graphical user interface, text, application, email Description automatically generated

The final window gives me the opportunity to name the collection. I type in “Today’s clips” and select where I want them to show up on my computer (desktop, for now).

Graphical user interface, text, application, email Description automatically generated

Caution!

The really, really, really important thing about this final window is to save each clip to a separate file. (Unless it’s a book file: You don’t want a separate file for each page!)

Click “Export,” and it’s done.

Done? Almost …

I now have 24 searchable clips which I rename and file to put into either AEON Timeline (for events and date-specific clips) or my Scrivener project file. I then trash the original unsearchable clips.

I do a computer search for “Mother of the Maids” and of the over 20 pdf files listed, my 3 newest clips that mention “Mother of the Maids” are listed at the top:

Success!

And that’s it, for now. Next up, I hope to write a blog post about Mrs. Stonor (sometimes Stoner), a Mother of the Maids who looked after the young, unmarried Maids of Honour who served most if not all of Henry VIII’s six wives. Four of these maids became the next wife and all four ended up dead.

Now there’s a story. :-)


After not using FineReader for a time, the documents would not come out searchable. I messed with this for hours without luck — until I clicked the “set to default” button in the last frame. :-)

The one problematic part of this process is that the file will come out “untitled,”  so I select and copy the title before putting it through the process.

The pros & cons and ups & downs of OCR and Scrivener

The pros & cons and ups & downs of OCR and Scrivener

(Warning: tech talk ahead!)

I’ve been putting research documents into Scrivener, assuming that they were searchable. After all, one oft-stated advantage of using Scrivener is that you have all your documents in one place.

Dissertating with Scrivener « The Junto

It’s true that I can put everything and anything into Scrivener, but I also need to be able to search within those documents. I mistakenly assumed that one of Scrivener’s many superpowers was the ability to make all documents searchable. In other words, I assumed that Scrivener utilized OCR (Optical Character Recognition). Not so. :-(

Having searchable documents is important for my current WIP because it’s set in the 16th century, and a number of the resources are rare and/or ancient and only available on BooksGoogle or InternetArchive. I’ve taken to clipping relevant parts of such documents (shift-control-4 on a Mac) or exporting them whole as PDFs before sending them to Scrivener. The clips are a type of image, so they need OCR to be searched, and most PDFs are not searchable as well.

And so I began to look at ways to make documents searchable before putting them in Scrivener. In the process, I discovered that anything to do with OCR opened a bottomless pit. I will try to keep this simple.

Dedicated OCR software

One possibility would be to invest in a software programme dedicated to making documents OCR searchable. The highest-rated programme for Mac is ABBYY FineReader Pro, available on trial for 30 days. I tested it out on a clip (below), and in seconds had a searchable Word document that beautifully preserved the formatting of the original.

This is the original clip:

And this is the searchable Word document:

Wow.

Databases that make documents searchable

The other possibility would be to use a database that automatically makes documents OCR searchable. The advantage of using such a database is that it is—duh—a database, a logical place to store research documents. … which brings me to OneNote and EverNote.

evernote logo design | Flickr - Photo Sharing! Best 7 Evernote Alternatives: Try These Great Apps Like ...

Both EverNote and OneNote convert documents to OCR, so I decided to test them both using the test clip above.

It took well over an hour for OneNote to convert it to a searchable text, but EverNote has yet to do so even a day later!

Once made searchable, there is a way to create a copy in EverNote, a copy that can then be put in Scrivener, but it’s weird and basically unreadable, showing every word as a separate object.

In OneNote, once the document has gone through the OCR treatment, it’s possible to easily create a searchable text version. (Control-click the document and select “Copy text from picture.”)

This is what I got from my test clip:

Here comes old Woodcock, the Yeoman of Kent, that’s half Farmer and half Gentleman; his horses go to the plow all week, and are put into the coach o’ Sunday.

Tunbridge Walks or the Yeoman of Kent, act I, sc. 1

Not as pretty as ABBYY FineReader, but not at all bad. (I did clean it up a bit.) This text can now be copied and pasted into Scrivener or wherever I want it.

Note: It would have been nice to be able to send this searchable text directly to Scrivener. I passed on this recommendation to OneNote and discovered 1) that their help menu actually helps (EverNote Help is extremely basic), and 2) that they ask how to improve. What a concept! (But do they listen? That remains to be seen.)

A word about Web Clippers

One beautiful thing about EverNote is its Web Clipper. With it, I can send the contents of any webpage to EverNote and, at the same time, indicate which notebook it should be filed in and how it should be tagged.

Evernote Web Clipper 6 For Googles Chrome Browser Launches ...

OneNote’s Web Clipper is not functional on Safari right now due to recent OS changes at Apple. I trust that this will be solved. In any case, it is available on Chrome or FireFox.

It’s a good clipper, but it’s not as useful as EverNote’s. Although you can choose what OneNote notebook to file it in, you can’t specify beyond that with tags, and you can’t file it in more than one place.

OneNote Web Clipper updated with YouTube support, preview ...

Which brings me to Tags

Being able to add tags to a document in EverNote is great. For example, I’d be able to tag an 18th-century French recipe for roasted swans as 18th century, France, food, recipes and swans. This would allow me to narrow a search for a perfect detail regarding a roasted swan snack.

OneNote doesn’t have a tag function, alas—at least not that I can see.

What about cost?

I use EverNote heavily, so I need their Premium plan, which costs $5.83 US a month when paying annually. For that I get 10GB uploads per month, and am able to search PDFs. (For more information about Evernote pricing, click here.)

OneNote is included in an Office 365 subscription package. (Some claim it’s also now available as a free stand-alone, but I’ve not been able to confirm that.) Since I’m already subscribed to the Office 365 world, I can start using OneNote at no additional cost. With OneNote, I get unlimited uploads, so win-win.

Say what? A scanner app?

Scanning pages from books is too slow to be practical. I’m delighted with the Microsoft app Office Lens, which will send a image directly to OneNote. This will save me lots of time.

For example, I took the image below with Office Lens and sent it to OneNote at 10:30 am. In under 30 minutes, it was searchable and even the all-text extract was surprisingly good.

EverNote or OneNote or … ? My conclusion

I do need a database, but given the pros and cons of OneNote and Evernote, where do I stand?

Because of the expense and inconsistent, slow and inadequate OCR function of EverNote, I have decided to migrate my extensive EverNote database to OneNote.

I should mention, as well, that there are indications that EverNote might be heading into hard times, and I don’t want to be left in the lurch.

It’s possible to import EverNote documents into OneNote using their OneNote Importer app, but judging from this note—

The importer software described on this page is still available for you to download and use, but we’re no longer actively developing or supporting this tool.

—that may not always be possible, so migrating now is perhaps wise.

I’ve never been a Microsoft fan—Mac users aren’t their priority—but OneNote for Mac looks worthy, so I’m going to make the move.  I’ve also purchased ABBYY FineReader Pro, and given that I will be unsubscribing from EverNote, I’ll be coming out ahead in more ways than one.  :-)


The links below might be of interest.

Be aware that there are differences between OneNote for Mac and the mothership OneNote for PC users. Also, OneNote for Mac has been recently “updated”—but the changes have caused quite an uproar because it’s no longer possible to arrange tabs along the top, as in this example:

I would love to have such tabs back and I’m hoping the OneNote engineers succumb. Some long-time users are even advocating reverting to the 2016 version and vowing never again to upgrade.

Evernote vs OneNote: The Best Note-taking App in 2019

Top 10 things you didn’t know about OneNote

Using Onenote for your Novel I was excited about trying out this template but it’s for an old version of OneNote, and possibly not applicable to Mac.

Why OneNote is One-Derful for Writers. Inspiring!