Jump to content

(Archived) FEATURE REQUEST: Copy and Paste OCR data


Recommended Posts

I may have missed it, but here's a cool feature that should be fairly easy for the Evernote team to implement:

Allow us to select OCR text in images, then copy and paste the text in other places.

They're already doing the hard part: Evernote analyzes every photo, scan or PDF, identifies any text elements, and converts it into searchable data.

But at present, that's all you can do: search for a word or phrase, which takes you to the image containing that text string.

Why not go a step farther? let us go to an image and copy that OCR'd text directly to the clipboard, so we can use it as editable text.

Imagine: I'm looking thru a textbook and find an important passage. I snap a photo of the page and upload it to Evernote. I can then open the new note; drag my mouse across the section I want to quote; copy the text to the clipboard; then paste that quote into my term paper.

Am I crazy thinking this feature would be handy, and should be relatively simple for the Evernote team to implement? Is there already a way to select and copy/paste text in Evernote that I've missed so far?

Link to comment
  • Level 5*

Sounds to me like you are proposing plagiarism!

I believe that the OCR process is carried out by a third party tool - in which case this isn't something that Evernote can do themselves.

Link to comment

There are a thousand legitimate, legal reasons why it would be useful to copy text from an image and paste it into a live text document. It could be my own handwritten notes that I want to paste into an email... or a letter from my children that I'd like to typeset and put in a beautiful memory book... Shall I go on?

The legality of any copy-and-paste action is generally left to the user to decide, not the technology provider. (Unless you're a Blu-Ray manufacturer.) :)

And even if the OCR process is provided by a third party, wouldn't it be simple to ask that third party to make the converted text available for extraction?

Link to comment
  • Level 5*

Who knows how simple or expensive that would be? I've no idea....

And I wouldn't make the mistake of assuming that it was either simple or cheap - after all it would also require code changes to at least the desktop clients.

This may be on their roadmap (which they don't share), but I haven't seen anything to indicate that it is on it's way short term.

Link to comment
  • Level 5*

OCR text isn't necessarily in any form that would be useful; for example, it often contains guesses as to what the text might be. You can check out image OCR data by exporting a note to Evernote forum, and examining the OCR content, down towards the bottom, I think.

Link to comment
  • Level 5*

In the Windows client, you can right-click on a note, choose Export Note... and choose ENEX format. This will result in a text file that expresses the contents of your note in Evernote's ENML.

Remember, the text rendered by the OCR process is not in the form of a complete transcription that represents the text in the image, it's a series of ENML elements that describe guesses of individual words (there may be multiple per text segment), along with information that points to its location in the image (which supports highlighting). I would examine several samples before you attempt to gauge the utility for your purposes, no?

Link to comment
  • 1 year later...

+1 on @ajbezark's suggestion.

 

Microsoft OneNote, which I've migrated from, has a wonderfully simple copy-text-from-image command you can invoke simply by right-clicking an image - the OCRed (plain) text is copied to the clipboard; it's also worth noting that it works offline, without server support.

Link to comment

+1 on @ajbezark's suggestion.

 

Microsoft OneNote, which I've migrated from, has a wonderfully simple copy-text-from-image command you can invoke simply by right-clicking an image - the OCRed (plain) text is copied to the clipboard; it's also worth noting that it works offline, without server support.

And if _____ (insert Onenote feature that Evernote lacks) were so easy to implement across a variety of platforms, I'm pretty sure Onenote would be available for all the platforms EN lives on.

Link to comment
  • Level 5*

I'd suggest an experiment: take a photo of a pageful of text, and import it into Evernote. Once it's OCR'd, export the note to Evernote format, and open up the resulting .ENEX file in a text editor. Start looking to the <item> elements -- these are the areas in the image (pixel locations) that the OCR process has identified as having text, and each of these <item> elements have one or more recognition text candidates embedded as <t> elements. I'd start by isolating each fill <item>...</item> element on a line, and see whether all of the OCR information would actually be all that useful. My own experiment suggests that it would not be, but your experience might suggest otherwise. 

 

Might be in interesting third-party application, if Evernote doesn't do it first.

Link to comment

Great idea, @jefito - I'll look into that.

 

As @ajbezark has already stated, the OCR doesn't have to be 100% correct - just extracting the bulk of the text correctly would be a great help, even if minor corrections are required afterward.

Link to comment
  • Level 5*

It's not that the OCR isn't not correct, it's that there can be several incorrect guesses as an area, and they're all included. And image-relative positional is included, but I don't know that there's any guarantees about ordering.

Link to comment

Understood. The guesses are ordered by the value of a numerical attribute named "w" in descending order.

 

The "w" attribute is in the range of 1 to 100 expressing confidence in the correctness of the recognition, with 100 expressing certainty (see the DTD)

 

In other words, the highest-confidence guess is listed first inside the relevant element.

 

I did a quick test and found that *more often than not* the highest-confidence guess was indeed the best.

 

Thus, it shouldn't be too hard to at least partially automate the following:

  • export a note as *.enex
  • run a script that extracts the highest-confidence guesses and strings them together to form the plain-text extraction of the text contained in the image.

Obviously, all you would get is unstructured plain text, but often that alone is a great help.

Link to comment

This is why I - even though I'm a premium account holder - don't use the EN OCR but instead use the OCR which came with my scanner.

My workflow is this:

1) scan with Doxie Go (love this scanner)

2) import to computer when I have time

3) "Save as OCR PDF"  - this does OCR locally on my computer and puts the text behind the scanned image. It's selectable, copiable, but doesn't obscure the original scan

4) save it to a folder on my Mac which has a folder action to create Evernote Notes. (like a watched folder on the Windows EN client)

 

Yes, the local OCR takes a little longer, but it's instantly searchable and I can copy/paste text.

Link to comment
  • Level 5*

@scooper4711: Like you, if I'm scanning something, I'll use the scanner's OCR, but for images that you clip off the web or obtain by methods other than scanning, you're gonna get Evernote's version.

Link to comment

@scooper4711: thanks for sharing your approach; the Doxie Go sounds neat.

As is implied by @jefito's response, it doesn't cover all OCR scenarios, though, and having Evernote provide such services centrally would be great. 

Link to comment

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 
Installation:
 
 
   * Prerequisites:
   * 
      * You must be running OS X
      * You must be an Evernote Premium user
   * Download the attached *.zip file, extract the embedded *.workflow bundle, and open it - OS X will offer installation as a an OS X Service - opt to do so.
 
 
Use:
 
 
   * With a note of interest (with at least 1 embedded image) selected in Evernote, select Evernote > Services > Copy OCR Text from Note (Evernote) in the menu bar.
   * Unstructured plain text composed of all text tokens recognized in the image is copied to the clipboard. If you have Growl installed, a notification will be shown upon completion.
 
 
 
Notes and limitations:
 
 
   * OCR text can only be copied once server-side OCR processing has been completed and synced to the client. This OS X Service will initiate synchronization on demand and will keep waiting until the data is received - unless you cancel first. Note, however, that it can take quite a while, possibly minutes, for the data to arrive.
   * If a note contains multiple images, the text copied is grouped into blocks separate by "##########"
   * The returned text only uses the highest confidence recognition guess for each piece of text detected in the source image, whereas Evernote often stores multiple guesses per piece of text.
 
   * The usefulness of the text returned varies greatly, depending on image quality and distribution of text in the image:
   * 
      * Works best with images of lines of text.
      * By contrast, text scattered over an image will likely not be returned in the same sequence you may expect.
      * In short: YMMV.
Link to comment

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

I looked into a PowerShell-based solution, but there's a crucial piece missing: ENscript.exe has no support for acting on the note currently selected in the Evernote client (see http://discussion.evernote.com/topic/20897-enscript-exporting-selected-notes-to-enex/; by contrast, Evernote's AppleScript support on OS X does allow you to do that).

 

If that capability were available, it would be fairly easy to create a solution analogous to the one I created for OS X.

(You would then still to have to solve the additional problem of how to invoke the external functionality from the Evernote client, but there are solutions for that.)

 

Of course, it's conceivable to create a PowerShell script that targets the notes of interest via queries from the command line, but to me that would be too cumbersome to use - I want to locate a note of interest in the GUI client, then invoke functionality on it.

Link to comment
  • Level 5*

Yes, I know that ENScript.exe doesn't know the current state of the Evernote.exe application (though it can communicate to Evernote via COM, evidently). Not of much interest to me; I'd rather have a plug-in framework for Evernote.exe, and use ENScript for batch operations.

Link to comment

I agree that plug-in support of sorts is the right solution to this problem; ENScript.exe was clearly designed for batch operations. By contrast, on the OS X side Evernote already covers *both* use cases with the existing AppleScript support. Therefore, when it comes to integrating external functionality into the client app (conceptually speaking), OS X is your only bet at the moment (via OS X Services).

 

 

Afterthoughts:
 
I've since noticed that ENScript.exe has already dipped a toe into GUI-client-integration territory with its `showNotes` command.
Another toe could be dipped in the form of pseudo queries that target the set of currently displayed / the single currently selected note in the GUI client.
 
Longer-term, creating a PowerShell provider and having the GUI act as a PowerShell host could provide excellent extensibility.
 
Link to comment

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

I looked into a PowerShell-based solution, but there's a crucial piece missing: ENscript.exe has no support for acting on the note currently selected in the Evernote client (see http://discussion.evernote.com/topic/20897-enscript-exporting-selected-notes-to-enex/; by contrast, Evernote's AppleScript support on OS X does allow you to do that).

 

If that capability were available, it would be fairly easy to create a solution analogous to the one I created for OS X.

(You would then still to have to solve the additional problem of how to invoke the external functionality from the Evernote client, but there are solutions for that.)

 

Of course, it's conceivable to create a PowerShell script that targets the notes of interest via queries from the command line, but to me that would be too cumbersome to use - I want to locate a note of interest in the GUI client, then invoke functionality on it.

 

 

I've created a PowerShell cmdlet that can extract OCR data from the command line, using a command-line query (search expression) to target the desired note(s).
For now it must be run as script from the PowerShell console and sends output to the console, allowing for easy capture to a file; the ability to send to the clipboard wouldn't be hard to add.
However, integration with the GUI client - as in my OS X Service - is out of reach for now.
 
Example:
 
Get-EvernoteOcrText -Query "tag:electricity-bill" -ShowNotes >TextFromElectricityBills.txt
   
    Extracts the OCR text from all notes tagged with 'electricity-bill' and saves it to file 'TextFromElectricityBills.txt';
    furthermore, shows the matching notes in the Evernote GUI application.
 
Details and download here.
Link to comment
  • 3 months later...

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

Link to comment

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

Link to comment

 

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

 

 

It installed perfectly and I got the error when selecting the service while in Evernote. Version is 5.2.

 

And thanks for the quick response!

Link to comment

 

 

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

 

 

It installed perfectly and I got the error when selecting the service while in Evernote. Version is 5.2.

 

And thanks for the quick response!

 

In Finder, locate ~/Library/Services/Copy OCR Text from Note (Evernote).workflow and open it - it should open in Automator. With Evernote open and a note with an image selected, run the workflow in Automator and see what error message you get.

Link to comment

 

 

 

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

 

 

It installed perfectly and I got the error when selecting the service while in Evernote. Version is 5.2.

 

And thanks for the quick response!

 

In Finder, locate ~/Library/Services/Copy OCR Text from Note (Evernote).workflow and open it - it should open in Automator. With Evernote open and a note with an image selected, run the workflow in Automator and see what error message you get.

 

 

Can’t get application id "com.Growl.GrowlHelperApp".

 

Is growl required to be running for the script to work?

Link to comment

 

 

 

 

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

 

 

It installed perfectly and I got the error when selecting the service while in Evernote. Version is 5.2.

 

And thanks for the quick response!

 

In Finder, locate ~/Library/Services/Copy OCR Text from Note (Evernote).workflow and open it - it should open in Automator. With Evernote open and a note with an image selected, run the workflow in Automator and see what error message you get.

 

 

Can’t get application id "com.Growl.GrowlHelperApp".

 

Is growl required to be running for the script to work?

 

 

 

 

 

 

 

 

 

Gosh, Windows you could probably do it all with PowerShell, then send the results to the clipboard.

Anyways, all I can say is "Have fun" then. :)

 

Thanks. I naively thought I would, but AppleScript taught me otherwise.

 

Anyway, I have an OS X-only solution now, based on an OS X Service created with Automator, containing an AppleScript action:

 

 

 

Here's the read-me from the shared note that contains the download
 

-------

 

 

 
 

 

 

I downloaded and install the applescript, but got the following error message when running it:

 

The action “Run AppleScript” encountered an error.

 

Not that helpful, but any ideas where I might start to get this working. Its the killer feature missing for me...

 

Sorry to hear it. So you were able to install it as a service and you get the error when you invoke the service, or opening the *.workflow file after downloading failed? What is your Evernote client application version? 

 

 

It installed perfectly and I got the error when selecting the service while in Evernote. Version is 5.2.

 

And thanks for the quick response!

 

In Finder, locate ~/Library/Services/Copy OCR Text from Note (Evernote).workflow and open it - it should open in Automator. With Evernote open and a note with an image selected, run the workflow in Automator and see what error message you get.

 

 

Can’t get application id "com.Growl.GrowlHelperApp".

 

Is growl required to be running for the script to work?

 

 

Installed the latest version of Growl and removed an old one and all is working now thanks! Great script!

Link to comment

 

 

Installed the latest version of Growl and removed an old one and all is working now thanks! Great script!

 

 

Glad to hear you got it to work, and thanks for alerting me to the Growl issue. I've added Growl 2.x as a prerequisite to the download page and also added instructions for manually modifying the workflow to work without Growl.

(The embedded AppleScript actually tries to ignore errors relating to Growl, but the AppleScript runtime is unforgiving when it comes to referencing apps that aren't installed - it simply shows the generic "encountered an error" dialog before the script code even gets to run.)

Link to comment

I just got my shiny new Moleskine Evernote for my birthday and am trying to use it to take meeting notes. My previous note taking method is typing in Evernote on iPad. Moleskine worked as advertised; the note image is clearly visible and readable, but now what? Retyping manually all action items into my To-Do manager does not seem compelling. So, I'm looking for taking OCRed text out. Thanks a bunch, mklement, for putting together a Mac solution. It worked as advertised. Yes, I bought Growl 2 on the spot (used to use Growl 1), but it was the time. Also, I was not Evernote Premium user, but 3 months come with Moleskine, so I can at least try.

 

Now, after all of that, I got OCRed text out, and now I have a theory why Evernote does not have this feature. The quality of result is absolutely atrocious. It's abysmal to the point of being totally useless. And, consider:

 

1. I wrote in #*(& BLOCK letters.

2. The image was in black ink, pretty clear, well lit, etc.

3. It was written on #*)(#)* Moleskine paper with all its high-tech dots, etc.

 

I kind of had suspicion that Moleskine is a gimmick, but at least it looks very nice. :)

 

Lower your expectations, people, that's where I want to conclude.

Link to comment
  • 4 months later...

+1 on @ajbezark's suggestion.

 

Microsoft OneNote, which I've migrated from, has a wonderfully simple copy-text-from-image command you can invoke simply by right-clicking an image - the OCRed (plain) text is copied to the clipboard; it's also worth noting that it works offline, without server support.

I LOVE Evernote! Except for the OCR text not being available. I have to resort to using Onenote to grab the text, then paste into my Evernote.

I just, like many others, wish this seemingly simple (well at least Microsoft makes it look simple) capability were built into Evernote. This is the only reason I have to open Onenote, is to easily grab OCR text out of images. And Onenote does this quite well.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...