Jump to content
Simon Blackley

(Archived) Cannot search inside uploaded documents

Recommended Posts

Wait wait wait, are you telling me that Evernote can index handwritten text in scanned documents, but that it doesn't index text encoded in bog-standard office files? Really?

I am managing a community project in which I receive reports from volunteers in a variety of formats - on paper, in the body of emails, and as attached files of various kinds, but mostly MS Word documents. I had hoped to dump everything into an Evernote notebook in order to be able to search across the entire archive of reports. A classic Evernote use case, no? Well, apparently not.

I've read comments from Dave on this issue to the effect that the variety of formats makes this complex... parsing... But we've all been used to 'searching inside files' on our desktop PCs for decades, so this just seems weird.

Have I missed or misunderstood something? Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

Share this post


Link to post

Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

You don't have to write reports by hand.

In Windows, just print them to a PDF format and set up a save location using Evernote's Import Folder.

>Tools >Import Folders

Share this post


Link to post

But if you do print your Word files into PDF remember that's a document exchange format - an electronic printed page. It's not easy to convert some layouts back to a wordprocessor-friendly format. If there's a chance you will need to edit the document again, or boilerplate from it in future, make sure you save the original undisplayed, unindexed document alongside the visible and indexed PDF!

  • Like 1

Share this post


Link to post

Do you mean that when you upload for example a word file into an EN note and you want to search the contents of the word file?

Share this post


Link to post

Or should I redesign my workflow to include writing out the content of Word reports by hand and then scanning the handwritten documents into Evernote as searchable PDFs?

You don't have to write reports by hand.

In Windows, just print them to a PDF format and set up a save location using Evernote's Import Folder.

>Tools >Import Folders

Thanks. It was supposed to be a joke.

Share this post


Link to post

Has he stated why?

Quite a few times. Here are just a few of his previous replies on the topic:

2010

Our software doesn't know how to extract text from all of the different versions of various Office file formats, so you can't currently search for these documents. (If you converted the document to PDF, you could, but then it's read-only.)

2009

No, we do not include code to parse all of the MS Office formats in each of our clients in order to search those document types. If you print to PDF and then put the PDF into Evernote, we will process and search that document, but not the native Office document itself.

Indexing each file type requires special code in multiple places (at least Mac, Windows, service) that can "read" that file format to find the text. This is relatively challenging even for standard formats like PDF, and it's a gigantic project for proprietary formats like MS Office that have undergone dozens of changes through the decades. In addition to reading the text for indexing, this would ideally require a way to display that document type with highlighting to show why each match was found.

I.e. this is something we'd love to see, but it's a fairly large task, so we haven't been able to do it for these proprietary formats.

MS Office documents are relatively complicated, and each version of Office has a different file format. We don't currently process all of these file formats in Evernote, so you can't search for text within an .xls file.

If you were to print/export the file to PDF, we would index that PDF document, however.

Any plans to include code to parse DOCs and DOCXs?

Not currently, but thanks for the feedback.

Share this post


Link to post

I always convert the word file into PDF format BUT keep the original for editing again. So whenever I go out I'll upload the new PDF file into EN and I can access + search it from my phone.

Share this post


Link to post

I remember frustration with Excel interop when writing some .NET program a while ago. I ended up using Aspose.Cells for .NET which allowed me to handle any Excel file, regardless of version, in a consistent way. I know that other such libraries exist for other languages. I wonder why not purchase one of these to allow for Office interop like thumbnail creation or OCR? By subscribing to one of these libraries, the responsibility for maintaining the library and keeping it current would fall on the libraries developer rather than Evernote's programmers.

Share this post


Link to post
Guest
This topic is now closed to further replies.

×
×
  • Create New...