Jump to content

(Archived) Cannot search inside uploaded documents


Recommended Posts

Wait wait wait, are you telling me that Evernote can index handwritten text in scanned documents, but that it doesn't index text encoded in bog-standard office files? Really?

I am managing a community project in which I receive reports from volunteers in a variety of formats - on paper, in the body of emails, and as attached files of various kinds, but mostly MS Word documents. I had hoped to dump everything into an Evernote notebook in order to be able to search across the entire archive of reports. A classic Evernote use case, no? Well, apparently not.

I've read comments from Dave on this issue to the effect that the variety of formats makes this complex... parsing... But we've all been used to 'searching inside files' on our desktop PCs for decades, so this just seems weird.

Have I missed or misunderstood something? Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

Link to comment
  • Level 5

Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

You don't have to write reports by hand.

In Windows, just print them to a PDF format and set up a save location using Evernote's Import Folder.

>Tools >Import Folders

Link to comment
  • Level 5*

But if you do print your Word files into PDF remember that's a document exchange format - an electronic printed page. It's not easy to convert some layouts back to a wordprocessor-friendly format. If there's a chance you will need to edit the document again, or boilerplate from it in future, make sure you save the original undisplayed, unindexed document alongside the visible and indexed PDF!

Link to comment

Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

You don't have to write reports by hand.

In Windows, just print them to a PDF format and set up a save location using Evernote's Import Folder.

>Tools >Import Folders

Thanks. It was supposed to be a joke.

Link to comment
  • Level 5

Has he stated why?

Quite a few times. Here are just a few of his previous replies on the topic:

2010

Our software doesn't know how to extract text from all of the different versions of various Office file formats, so you can't currently search for these documents. (If you converted the document to PDF, you could, but then it's read-only.)

2009

No, we do not include code to parse all of the MS Office formats in each of our clients in order to search those document types. If you print to PDF and then put the PDF into Evernote, we will process and search that document, but not the native Office document itself.

Indexing each file type requires special code in multiple places (at least Mac, Windows, service) that can "read" that file format to find the text. This is relatively challenging even for standard formats like PDF, and it's a gigantic project for proprietary formats like MS Office that have undergone dozens of changes through the decades. In addition to reading the text for indexing, this would ideally require a way to display that document type with highlighting to show why each match was found.

I.e. this is something we'd love to see, but it's a fairly large task, so we haven't been able to do it for these proprietary formats.

MS Office documents are relatively complicated, and each version of Office has a different file format. We don't currently process all of these file formats in Evernote, so you can't search for text within an .xls file.

If you were to print/export the file to PDF, we would index that PDF document, however.

Any plans to include code to parse DOCs and DOCXs?

Not currently, but thanks for the feedback.

Link to comment

I remember frustration with Excel interop when writing some .NET program a while ago. I ended up using Aspose.Cells for .NET which allowed me to handle any Excel file, regardless of version, in a consistent way. I know that other such libraries exist for other languages. I wonder why not purchase one of these to allow for Office interop like thumbnail creation or OCR? By subscribing to one of these libraries, the responsibility for maintaining the library and keeping it current would fall on the libraries developer rather than Evernote's programmers.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...