Jump to content

(Archived) Indexing of powerpoint, xls, word MS Office


GeorgeP

Recommended Posts

Hi

 

Does evernote index PDF , PPT XLS DOC etc attachments to allow the contents of the files to be searchable?

 

This article suggests that MS OFffice are searchable, yet on my evernote client, when I goto information of a note, it says not idexed, or no attachments to index (even though there is an attachment in the note)

 

http://blog.evernote.com/blog/2013/03/25/search-better-with-evernote-premium-document-search/

 

In addition to this NONE of my PDF are being indexed hence not searchable....

 

Ive placed numerous calls with evernote. The latest response is this

 

The 'X' number of PDFs haven't been indexed section is a known issue and we are working on it. PDFs do not get indexed we OCR them. (My comment:Fair enough there is a bug but they are not telling me when it will be fixed)

 

This is my concern though As for the powerpoint attachment not getting indexed, that is by design. Since the file is not open we have no way to index what is inside it. 

 

Really? The reason I went premium was to search my attachments, and from the link above here is an extract

"Today, we’re supercharging search for Evernote Premium andEvernote Business users with the addition of Document Search. Now any attached document, presentation and spreadsheet created using Microsoft Office and iWork will show up in your search results across almost every version of Evernote that you use."

 

 

If anyone can assist me with this please

Link to comment
  • Level 5*

Hi - I've been under the impression that all these documents should be searchable though I haven't tested all of them by any means.  I do know PDFs and DOC / XLS files are indexed because I can find their content.  I OCR my own files precisely to avoid the uncertainty of whether and when the files would be processed.  Indexing via Evernote may take some time if you're adding a number of files.

 

As to fixing the bug with PDFs - if Evernote are working on it they'll fix it as soon as possible.  Forecasts of how long that sort of thing will take aren't normally available - most of the effort goes in to finding the problem and testing a correct fix.  Those are both very much piece of string operations.

 

The answer anyway is as I said - do your own OCRs and you know they're done.

Link to comment
  • Level 5*

Ahh - a great light dawns..

 

PDFs are most often produced from formatted text files with some illustrations but the content is actually a picture of the finished page,  not the individual characters of the text content.

 

"OCRing" a document means running it through software that converts the picture of a page back into those individual characters so the words can be searched.  Adobe are the 'Microsoft' in that market but there are other PDF readers that can OCR a document and save it as a searchable file.  That type of file is smaller than the picture version too - a few bits of code define the letter rather than having many more to describe all the pixels that make up the image.

 

From memory Powerpoint does PDFs,  but these even more are 'pictures' of a whole slide rather than of the text and images which they contain.  It may be that they're not recognisable enough as text to allow Adobe (or anyone else) to OCR them.

 

I'd suggest you get one of the PDF editors that can turn documents into searchable files,  and stop converting the powerpoint files into PDF,  just save them as PPT which Evernote should be set up to make into searchable files.

 

Hope that helps..

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...