For Research Nerds: clipping information and making it OCR searchable

For Research Nerds: clipping information and making it OCR searchable

When I’m researching and I see a bit of text relevant to my WIP (either online or on Kindle), I will clip it: shift-command-4 and “pull” rectangle around the text I want.

It’s so easy to do I often end up with a mess on my computer desktop. Here’s a screen shot to give you an idea how things can look at the end of a day:

Graphical user interface, application Description automatically generated

My next step is to gather up all the clips and drag them into an empty “Today’s clips” folder. That, at least, looks manageable.

Graphical user interface, application Description automatically generated

Why it’s important to a researcher for all files to be searchable

Research clips are not of much use to me unless I can search the text. For example, if I was looking for clips about Mother of the Maids I should be able to search that title and all relevant files would be listed.

Clips are a png file, a type of photo, so how is this done?

For the text in a photo to be searchable it needs to have OCR — optical character recognition. Basically, OCR makes it possible to search for a word in a photo. Magic.

I’ve messed around with OCR software quite a bit over time. Software such as OneNote claims to convert files automatically, but sometimes so very slowly it’s not practical. Evernote can be touchy and deliver poor results. Also, I need to be able to easily move the OCR’d clip into my writing programmes (Scrivener and AEON Timeline, at this time), which I can’t do — at least not easily — out of Evernote or OneNote.

Why not send a clip directly to Scrivener?

One of the strengths of Scrivener is that I can clip a bit of text from a website directly into my project. However, Scrivener does not offer OCR (at least at this time), so while I will often post a web address URL to Scrivener, I won’t send a clip. I need to be able to search the text of a clip, and without OCR, I can’t do that.

ABBYY FineReader software to the rescue

FineReader is simply an awesome OCR software: fast, powerful and reliable, but at around $150 for Mac, it’s not cheap. (There is a 2-week free trial offered, however.) (Check: is it subscription for Pro?)

I’ll show you what FineReader does with a day’s pile of clips:

Table Description automatically generated

I select and drag all of them into the Fine Reader icon in the dock:

It bounces around a bit as it does the work, and then, snap, it’s done.

FineReader, step-by-step

First I get a “Completed” notice with problem alerts — these I simply ignore and click “Close.”

The next screen shows small images of the clips on the left with enlarged versions on the right. I also ignore all that and click “Export” in the header.

That brings me to this window, which has the PDF option lightly highlighted. I ignore all the options in the middle and click Next in the lower right corner.

Graphical user interface, text, application, email Description automatically generated

The final window gives me the opportunity to name the collection. I type in “Today’s clips” and select where I want them to show up on my computer (desktop, for now).

Graphical user interface, text, application, email Description automatically generated

Caution!

The really, really, really important thing about this final window is to save each clip to a separate file. (Unless it’s a book file: You don’t want a separate file for each page!)

Click “Export,” and it’s done.

Done? Almost …

I now have 24 searchable clips which I rename and file to put into either AEON Timeline (for events and date-specific clips) or my Scrivener project file. I then trash the original unsearchable clips.

I do a computer search for “Mother of the Maids” and of the over 20 pdf files listed, my 3 newest clips that mention “Mother of the Maids” are listed at the top:

Success!

And that’s it, for now. Next up, I hope to write a blog post about Mrs. Stonor (sometimes Stoner), a Mother of the Maids who looked after the young, unmarried Maids of Honour who served most if not all of Henry VIII’s six wives. Four of these maids became the next wife and all four ended up dead.

Now there’s a story. :-)


After not using FineReader for a time, the documents would not come out searchable. I messed with this for hours without luck — until I clicked the “set to default” button in the last frame. :-)

The one problematic part of this process is that the file will come out “untitled,”  so I select and copy the title before putting it through the process.

The pros & cons and ups & downs of OCR and Scrivener

The pros & cons and ups & downs of OCR and Scrivener

(Warning: tech talk ahead!)

I’ve been putting research documents into Scrivener, assuming that they were searchable. After all, one oft-stated advantage of using Scrivener is that you have all your documents in one place.

Dissertating with Scrivener « The Junto

It’s true that I can put everything and anything into Scrivener, but I also need to be able to search within those documents. I mistakenly assumed that one of Scrivener’s many superpowers was the ability to make all documents searchable. In other words, I assumed that Scrivener utilized OCR (Optical Character Recognition). Not so. :-(

Having searchable documents is important for my current WIP because it’s set in the 16th century, and a number of the resources are rare and/or ancient and only available on BooksGoogle or InternetArchive. I’ve taken to clipping relevant parts of such documents (shift-control-4 on a Mac) or exporting them whole as PDFs before sending them to Scrivener. The clips are a type of image, so they need OCR to be searched, and most PDFs are not searchable as well.

And so I began to look at ways to make documents searchable before putting them in Scrivener. In the process, I discovered that anything to do with OCR opened a bottomless pit. I will try to keep this simple.

Dedicated OCR software

One possibility would be to invest in a software programme dedicated to making documents OCR searchable. The highest-rated programme for Mac is ABBYY FineReader Pro, available on trial for 30 days. I tested it out on a clip (below), and in seconds had a searchable Word document that beautifully preserved the formatting of the original.

This is the original clip:

And this is the searchable Word document:

Wow.

Databases that make documents searchable

The other possibility would be to use a database that automatically makes documents OCR searchable. The advantage of using such a database is that it is—duh—a database, a logical place to store research documents. … which brings me to OneNote and EverNote.

evernote logo design | Flickr - Photo Sharing! Best 7 Evernote Alternatives: Try These Great Apps Like ...

Both EverNote and OneNote convert documents to OCR, so I decided to test them both using the test clip above.

It took well over an hour for OneNote to convert it to a searchable text, but EverNote has yet to do so even a day later!

Once made searchable, there is a way to create a copy in EverNote, a copy that can then be put in Scrivener, but it’s weird and basically unreadable, showing every word as a separate object.

In OneNote, once the document has gone through the OCR treatment, it’s possible to easily create a searchable text version. (Control-click the document and select “Copy text from picture.”)

This is what I got from my test clip:

Here comes old Woodcock, the Yeoman of Kent, that’s half Farmer and half Gentleman; his horses go to the plow all week, and are put into the coach o’ Sunday.

Tunbridge Walks or the Yeoman of Kent, act I, sc. 1

Not as pretty as ABBYY FineReader, but not at all bad. (I did clean it up a bit.) This text can now be copied and pasted into Scrivener or wherever I want it.

Note: It would have been nice to be able to send this searchable text directly to Scrivener. I passed on this recommendation to OneNote and discovered 1) that their help menu actually helps (EverNote Help is extremely basic), and 2) that they ask how to improve. What a concept! (But do they listen? That remains to be seen.)

A word about Web Clippers

One beautiful thing about EverNote is its Web Clipper. With it, I can send the contents of any webpage to EverNote and, at the same time, indicate which notebook it should be filed in and how it should be tagged.

Evernote Web Clipper 6 For Googles Chrome Browser Launches ...

OneNote’s Web Clipper is not functional on Safari right now due to recent OS changes at Apple. I trust that this will be solved. In any case, it is available on Chrome or FireFox.

It’s a good clipper, but it’s not as useful as EverNote’s. Although you can choose what OneNote notebook to file it in, you can’t specify beyond that with tags, and you can’t file it in more than one place.

OneNote Web Clipper updated with YouTube support, preview ...

Which brings me to Tags

Being able to add tags to a document in EverNote is great. For example, I’d be able to tag an 18th-century French recipe for roasted swans as 18th century, France, food, recipes and swans. This would allow me to narrow a search for a perfect detail regarding a roasted swan snack.

OneNote doesn’t have a tag function, alas—at least not that I can see.

What about cost?

I use EverNote heavily, so I need their Premium plan, which costs $5.83 US a month when paying annually. For that I get 10GB uploads per month, and am able to search PDFs. (For more information about Evernote pricing, click here.)

OneNote is included in an Office 365 subscription package. (Some claim it’s also now available as a free stand-alone, but I’ve not been able to confirm that.) Since I’m already subscribed to the Office 365 world, I can start using OneNote at no additional cost. With OneNote, I get unlimited uploads, so win-win.

Say what? A scanner app?

Scanning pages from books is too slow to be practical. I’m delighted with the Microsoft app Office Lens, which will send a image directly to OneNote. This will save me lots of time.

For example, I took the image below with Office Lens and sent it to OneNote at 10:30 am. In under 30 minutes, it was searchable and even the all-text extract was surprisingly good.

EverNote or OneNote or … ? My conclusion

I do need a database, but given the pros and cons of OneNote and Evernote, where do I stand?

Because of the expense and inconsistent, slow and inadequate OCR function of EverNote, I have decided to migrate my extensive EverNote database to OneNote.

I should mention, as well, that there are indications that EverNote might be heading into hard times, and I don’t want to be left in the lurch.

It’s possible to import EverNote documents into OneNote using their OneNote Importer app, but judging from this note—

The importer software described on this page is still available for you to download and use, but we’re no longer actively developing or supporting this tool.

—that may not always be possible, so migrating now is perhaps wise.

I’ve never been a Microsoft fan—Mac users aren’t their priority—but OneNote for Mac looks worthy, so I’m going to make the move.  I’ve also purchased ABBYY FineReader Pro, and given that I will be unsubscribing from EverNote, I’ll be coming out ahead in more ways than one.  :-)


The links below might be of interest.

Be aware that there are differences between OneNote for Mac and the mothership OneNote for PC users. Also, OneNote for Mac has been recently “updated”—but the changes have caused quite an uproar because it’s no longer possible to arrange tabs along the top, as in this example:

I would love to have such tabs back and I’m hoping the OneNote engineers succumb. Some long-time users are even advocating reverting to the 2016 version and vowing never again to upgrade.

Evernote vs OneNote: The Best Note-taking App in 2019

Top 10 things you didn’t know about OneNote

Using Onenote for your Novel I was excited about trying out this template but it’s for an old version of OneNote, and possibly not applicable to Mac.

Why OneNote is One-Derful for Writers. Inspiring!

Using Scrivener: the good, the bad, and the hopeful

Using Scrivener: the good, the bad, and the hopeful

I’m using Scrivener right now to write my next novel and most everything else I need to write … a speech, a workshop, etc.

Notice I said “right now.” It’s a bit of a love/hate relationship so far. For the short pieces, I jump in frustration to Word fairly quickly … only to recall why Word frustrates me. That said, the newest Word for Mac has an amazing feature — “Insert Online” pictures —which makes crafting an illustrated blog post a breeze. I’ll be using it for blog posts, for sure.

Ergonomic necessities

I love trying new systems (a new To Do List method, new Exercises, etc.), but I’m in systems overload right now. Back problems have forced me to change how I even go about writing. No more cozy in bed for hours with my latte and laptop. No more sitting with a notebook on my lap to write. Now I have to do what I’ve been told for years I should do: get up off the &%*# couch.

In short, I’m learning to adjust to a sit-down/stand-up desk, learning to put a 30-minute timer out of reach so that I have to move to turn it off. In short, there will be no more losing myself for hours in a cramped position while writing, but moving, moving, always moving.

There are often benefits in making changes. For example, I’m learning to dictate while moving. Yeah!

So end of the world? Hardly.

Plotting on Scrivener

Which brings me around to the initial subject of this post: an intriguing YouTube video on plotting with Scrivener. Every day I look for an article on writing to post to my Flipboard magazine. I always read the article to see if I feel it’s worthy, and this one absorbed me for quite some time. I’ve downloaded the template (the download link is toward the bottom of the page), loaded it into Scrivener and am going to give it a try. I’ll let you know what I think — once I stop moving, that is.

Organizing Scrivener to Plot Your Novel with Allan L. Mann

Bed-bound promo, website craziness, and Scrivener awe

Bed-bound promo, website craziness, and Scrivener awe

I’ve been bed-bound for over a week since a minor knee operation to repair a meniscus issue. I’m not going to whine about it! In fact, I’ve discovered that I’m the perfect candidate for this type of life.

On the bed beside me are:

  • my Mac Air;
  • my Levenger notebook (calendar etc.);
  • a Circa notebook fat with my To Do lists;
  • another notebook (a Semikolon Mucho Spiral Notebook for the stationery curious), where I’m thinking through The Next Novel;
  • a three-ring binder for scene sheets (@ Story Genius), which I’m really using as a support for my mouse pad and mouse);
  • my Kindle;
  • an iPad;
  • and a stack of magazines (The New Yorker, Real Simple, and Bookmarks).

Beside the bed is my walker (required for just a little longer!), a water bottle, clock, and iPhone. Moisturizer, lipstick, post-its, pencil, pen, pills. Snacks, tissue. Basic clutter.

Everything I need, in short, right where I can reach it. The only problem with this rat’s-nest life is that I can’t climb stairs (yet), can’t get up to my office.

But for now, I’m making great use of this time.

Website renovation

With every publication, a writer needs to update his/her website with information about the new book, a new media kit, author events, and a new author portrait throughout.

I didn’t have time to get an author portrait taken this year (I tried a selfie, with poor results), so I’ve used one James Brylowski took of me five years ago.

“Problem is, books are written slowly, and aging happens all of a sudden.” — from a wonderful article: The Agony and the Ecstasy of taking Author Photos.)

Having neglected my website for years, I discovered a number of problems. Fortunately, I was able to find a great website person through Fiverr.com who is helping me. We have quite a bit to do yet.

(Frankly, I don’t know how authors who publish a book a year manage.)

An important part of getting my website more reader-worthy was setting up my Media page. Following the directions of Tim Grahl (see below), I learned to code my Media page so that high-definition images would be automatically downloaded with just a click. I’m fairly stoked that I was able to do this.

Also, on Fiverr.com, I found someone to turn the book cover of The Game of Hope into a 3D image (see above). For $5!

Easy Outreach with Tim Grahl

When it comes to marketing, I’m a fan of Tim Grahl, He’s experienced, down-to-earth and realistic. I’ve taken a few of his online courses, and they’ve always been worthwhile. Right now I’m following a new one he’s testing out, “Easy Outreach.” Basically, it’s about how to get interviewed on podcasts, but the detailed system he outlines would apply to any outreach: to blogs, vlogs, or podcasts, etc.

An important part of the process is to identify suitable podcasts and to study them before making a pitch. (I’ve discovered a number of wonderful podcasts in the process.) I’m kind of excited about putting this into practice. I ordered a USB Yeti mike, and already have one podcast interview scheduled for the fall.

I’m ready! Who knows where this might lead?

Finally learning Scrivener

I’ve promised myself that I would write The Next Novel on Scrivener. I’ve taken stabs at learning it before, but I’ve always ended up confused and frustrated. It’s a complex programme! I was on the verge of giving up when I came upon a Udemy Scrivener 3 course for Mac. It had excellent reviews so I went for it. It’s been fantastic. I have questions almost every day, and the teacher responds to every one. I take it bit by bit, and immediately apply what I’ve learned, so hopefully it will stick. I’m finally understanding why so many writers love it.

Additionally, I’ve been developing my next novel following the guidelines in Story Genius by Lisa Cron. Puzzling over how to get Cron’s scene card templates into my Scrivener project, I Googled “Story Genius Scrivener” and found a wonderful article by Gwen Hernandez on WriterUnboxed: Using Scrivener with Story Genius. Bingo! She even included a downloadable Scrivener template with scene card templates (and much more).

Watching movies, reading and listening to books and reading magazines …

And then, of course, there have been wonderful movies to watch: Three Billboards Outside Ebbing, Missouri; Call Me by Your Name; and, last night, Lady Bird. All were simply great. Of the three, I found Call Me by Your Name the most enchanting, swooningly European.

And then, of course, books, books, books! In addition to books on writing, I’m reading The Burning Girl by Claire Messud and listening, on Audible, to an amazing performance of The Hate U Give by Angie Thomas.

A hard life, eh?

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave

SaveSave