Jump to content

VERY basic search questions


Recommended Posts

I'm just a few days into Evernote (using the Mac client). I'm discovering search function oddities that surprise me.

 

For example, if I'm searching for a currency amount, typing "3842.42" (without the quotes) won't find an entry of $3,842.42.

 

Another: If text wraps mid-word in a pdf (added to Evernote as a formatted pdf file, not a scan), searching for a whole word won't find that entry if its been hyphenated. I'd never encountered this before, so I tested searching a pdf in Adobe Reader, and text search within the application can find both therefore and there-fore (when the hyphen occurs at a line break in the pdf.

 

Actually, Adobe Reader's search seems dramatically more intelligent than Evernote's. For example, if I search for "there-fore" in Adobe reader, it will not find words that are hyphenated solely because of line break formatting, but it will find all instances of "non-optional" hyphens and display them with the leading word fragment. For example, a search for "audit-based" will find all instances of the hyphenated word, and a search for "-" will display "audit-" but not the hyphens that occur as a consequence of line linebreaks.

 

Obviously, Adobe's entire history is intertwined with text formatting, so one might expect their tools to be more sophisticated. I've found examples where nuanced searching isn't as heuristic; e.g., my bank's website won't find a transaction for a specific amount if the formatting isn't exactly as entered in the search box; e.g., it fails to find the amount listed above unless I include the comma in the search field, and it won't find dollar amounts unless I enter the ".00" in the search field.

Link to comment
  • Level 5*

Evernote Search looks only for WORDS that start with the text you type into the Search Box.

Special characters, like the dash ("-"), are treated as word terminators.

So, "therefore" is treated differently from "there-fore".

Link to comment

Apologies in advance if I didn't make myself clear.

 

If I'm searching for the word "therefore" and I know that a file (a word doc or pdf incorporated into Evernote as searchable data rather than scanned in) contains that word (but don't remember that text formatting has hyphenated it so that it will look pretty in left- and right-justified columns), Evernote won't find the word. That is brain-dead (or braindead, take your pick) in my view. I can be expected to remember in general where I put something in Evernote, but if the software can't find the sole instance of a long word because it's been hyphenated in the source document, that strikes me as a major limitation, particularly since the search engine in Adobe Reader presents freely accessible proof that the search can be intelligent enough between formatting hyphens and intentionally inserted hyphens. (And, no, I don't think "therefore" is a long word; it was an easy choice for testing Adobe's search engine)

 

I am enormously impressed by Evernote, but my first quest for help on this in the Evernote community forums produced a long list of posts alleging that text search was broken in Evernote. None of them touched expressly on my issue, so I posted my question, hoping to find that there perhaps was a way to format my searches so that formatting hyphens in the source document wouldn't produce false null results.

 

I find elegant software containing senseless limitations far more maddening than clearly bad software; it's easy to toss the latter into the trash with no remorse, much more painful to tolerate it in a tool that offers so much promise.

Link to comment
  • Level 5*

No problem.  I understood you the first time.

 

That's just how Evernote Search works, and it's been that way for a very long time.  I don't foresee any changes.

 

As a workaround, try:

any: therefore "there-fore"

Link to comment

No problem.  I understood you the first time.

 

That's just how Evernote Search works, and it's been that way for a very long time.  I don't foresee any changes.

 

As a workaround, try:

any: therefore "there-fore"

 

At first blush, my response was "you must be kidding."

 

Let's say I needed to search for "antediluvian". Your workaround theoretically would work if I composed my search as 

 

any: antediluvian "an-tediluvian" "ante-diluvian" "antedi-luvian" "antedilu-vian" "antediluvi-an"

 

and, by extension all possible text-formatting-inserted hyphen combinations in any rich text document for every multi syllable word or phrase being searched for.

 

Then, despite having no knowledge of the workings of the text formatting routines used by Adobe, Microsoft, Apple, and other software vendors who each publish software that formats text flow in rich-text documents, I realized that what I'd expected might not be so easily accomplished.

 

For example, in a formatted Word or Pages or pdf document, the line break hyphens are known by the software that created the document not to be fixed elements in the text flow, so that if other text is added or removed, the optional hyphens will disappear, or move, or appear in other places. On the other hand, when Evernote imports a pdf, those formatting hyphens are fixed elements in the document, and unless the formatting engines employed by each of these vendors for their own documents can be incorporated into the text search algorithms in Evernote, there may not be an easy way for Evernote  to discern that the hyphen in the word "therefore" when it appears at the end of a line and splits the word is only there because of the end of line splitting of the word.

 

Is that the problem?

 

Any place I can read about this stuff?

 

Thanks so much,

Jim Robertson

Link to comment

 

No problem.  I understood you the first time.

 

That's just how Evernote Search works, and it's been that way for a very long time.  I don't foresee any changes.

 

As a workaround, try:

any: therefore "there-fore"

 

At first blush, my response was "you must be kidding."

 

Let's say I needed to search for "antediluvian". Your workaround theoretically would work if I composed my search as 

 

any: antediluvian "an-tediluvian" "ante-diluvian" "antedi-luvian" "antedilu-vian" "antediluvi-an"

 

and, by extension all possible text-formatting-inserted hyphen combinations in any rich text document for every multi syllable word or phrase being searched for.

 

Then, despite having no knowledge of the workings of the text formatting routines used by Adobe, Microsoft, Apple, and other software vendors who each publish software that formats text flow in rich-text documents, I realized that what I'd expected might not be so easily accomplished.

 

For example, in a formatted Word or Pages or pdf document, the line break hyphens are known by the software that created the document not to be fixed elements in the text flow, so that if other text is added or removed, the optional hyphens will disappear, or move, or appear in other places. On the other hand, when Evernote imports a pdf, those formatting hyphens are fixed elements in the document, and unless the formatting engines employed by each of these vendors for their own documents can be incorporated into the text search algorithms in Evernote, there may not be an easy way for Evernote  to discern that the hyphen in the word "therefore" when it appears at the end of a line and splits the word is only there because of the end of line splitting of the word.

 

Is that the problem?

 

Any place I can read about this stuff?

 

Thanks so much,

Jim Robertson

 

While you might get some false positives, you could try a wildcard to shorten the search query:

any: ante* "an-tediluvian" 

this is assuming that "an-tediluvian" is the only permutation in which there is a hyphen before "ante". The wildcard should take care of any permutation in which a hyphen occurs after "ante". The false positives will occur for anything that begins with "ante" but isn't "antediluvian" or its various permutations. 

Link to comment

 

I'm hoping someone else weighs in. You've certainly told me that Evernote can't correctly parse formatting hyphens in rich-text documents, but I haven't sensed any opinion on your part as to whether that's an unfortunate limitation, or whether it's simply a problem the developers have chosen to ignore. I had found the article you've suggested from the KB before I made my original post, but it didn't seem to address my issue.

 

Practically, at least an attempt at what I would expect from the text search engine seems not too much to ask. Just now, I created a paragraph in MS Word, turned on automatic hyphenation, creating a paragraph where each line was terminated by a formatting hyphen, with a cascade of hyphens occurring at different syllable breaks in each line. I copied the entire text into Apple's Pages (without hyphenation activated in Pages). Pages correctly identified the hyphens as superimposed on the words because of the page margins and removed them all. When I turned on hyphenation, the text hyphenated appropriately accord to the (different) margins I'd created in Pages. And, when I copied the formatted hyphenated text back into MS Word, it had no difficulty discarding the optional hyphens that were no longer appropriate for its margins, then inserting hyphens that were appropriate for its page margins.

 

Evernote's inability to distinguish optional formatting hyphens from the Word processors of the three biggest companies in the consumer text generation space, which significantly deprecates the accuracy and utility of its search engine, is at the very least unfortunate. If your statement that it's not likely to change is based on inside knowledge of what Evernote's developers think is important, I would escalate that to "unfortunate and misguided." The entire purpose of a search engine is to enable the searcher to find what he's looking for. If software gets in the way of that, that's not a little issue.

 

Sorry to get testy i my first day here, because I know people rave about this product, but I can't believe that someone who's obviously among the cognoscenti here seems to think this is silly and unimportant.

 

[Editorial comment]

The most valuable company on our planet got that way because of its dogged pursuit of "the complete experience." Dell, MS, and others all seemed on the verge of consuming Apple.  Steve Jobs is notorious for discarding the iPhone that was weeks away from release because it just wasn't quite right. The crappy keyboards attached to those Dell and HP and other vendors' in every hospital where I work are the source of never-ending frustration for the users, and (I believe) one of the reason that the only company still growing its sales of laptop and desktop computers in what Steve has accurately called the "post-PC world" speaks to the wisdom of never being satisfied with "good enough" when good enough really isn't.

 

I'll shut up now.

Link to comment
  • Level 5

Well, searching for a phrase in possibly thousands of notes including pictures, PDFs, Word documents is a bit more that formatting a text and adding or removing hyphens. In order to do a quick search - and Evernote's search is really quick - you need to create a clever index from all the content. This is not a trivial task. I agree that in some respect the Evernote search is limited but it is unique and at the moment it is as it is. Requesting more is OK but without knowing the search algorithm it is impossible to know how much effort this might mean and it could even be impossible without re-implementing the search completely.

Link to comment

Well, searching for a phrase in possibly thousands of notes including pictures, PDFs, Word documents is a bit more that formatting a text and adding or removing hyphens. In order to do a quick search - and Evernote's search is really quick - you need to create a clever index from all the content. This is not a trivial task. I agree that in some respect the Evernote search is limited but it is unique and at the moment it is as it is. Requesting more is OK but without knowing the search algorithm it is impossible to know how much effort this might mean and it could even be impossible without re-implementing the search completely.

My concern is that I wouldn't expect the search engine to be unable to find instances of single words, which I've discovered can indeed be the case if the word appears in the source document hyphenated. I'll try Scott Lougheed's suggestion this evening and see if that works.

Link to comment

I use wildcards extensively, not the least because my memory is pretty terrible so wildcards help with those times where I don't remember exactly how I phrased it. Especially using tags... I can never remember the date code I use on a given travel related tag, so I always end up searching tag:italy*   (which returns all of my "italy1014" tags, but I never remember the 1014 part despite its self-evidence). 

In general I am much happier with a slew of false positives than false omissions, and wildcards, in general, have served me will in this regard. 

Link to comment
  • Level 5*

I use wildcards extensively, not the least because my memory is pretty terrible so wildcards help with those times where I don't remember exactly how I phrased it. Especially using tags... I can never remember the date code I use on a given travel related tag, so I always end up searching tag:italy*   (which returns all of my "italy1014" tags, but I never remember the 1014 part despite its self-evidence). 

In general I am much happier with a slew of false positives than false omissions, and wildcards, in general, have served me will in this regard. 

 

The number of false positives depends on the number of notes, or, more importantly, the size of your notes.  If you have a lot of large notes, with large PDFs, then the number of false positives can go up exponentially.

 

By default, Evernote will assume a wildcard search if you just enter text into the Search box.

So, if you enter "italy" it will search for text that starts with "italy" in just about everything, including the Note body, Title, Tags, attachments.

 

This is why I make extensive  use of Tags, which I assign, not Evernote.

Tags eliminate a lot of false positives.

 

For more info, see  The Benefit of Using Tags

Link to comment

That's precisely why I've talked about the specific search syntax:

tag:italy*

(and made a similar case to you in that very "Benefit of using tags" post!)

 

You are right, false positive increase with the number and size of notes, and a general search for "italy*" would be terrible indeed. In general though I'd assume that if you have a huge number of notes you might have some other means of constraining your search slightly to reduce falsies. 

Link to comment
  • Level 5*

 

Wildcards are back? Yahoo! Hope it's true for free accounts as well.

did they ever disappear?

 

I think that there was a bug in the Android client that caused search to not honor wildcards. They appear to be working now.

Link to comment
  • 2 weeks later...

For what it's worth, jsrnephdoc, I agree with you completely about the poor functionality searching Evernote, and about maddening implementation in otherwise good software.

 

Unfortunately, like many other features in Evernote that I would consider "unfinished", there is a strong community that will staunchly defend the developers' choice not to fix most of these, and no real discussion (or change) happens.

 

Edit: The second tactic used is also to list workarounds that shouldn't be needed if the software functioned in a more mature fashion.

Link to comment
  • Level 5

For what it's worth, jsrnephdoc, I agree with you completely about the poor functionality searching Evernote, and about maddening implementation in otherwise good software.

 

Unfortunately, like many other features in Evernote that I would consider "unfinished", there is a strong community that will staunchly defend the developers' choice not to fix most of these, and no real discussion (or change) happens.

 

Edit: The second tactic used is also to list workarounds that shouldn't be needed if the software functioned in a more mature fashion.

I don't see anybody in this forum defending the "developers' choice not to fix most of these" except the developers themselves which is there right to do so. I see just the contrary and a lot of criticism very often in an agressive way against the forums' code of conduct. If you categorize explaining things why they might be as they are or presenting workarounds in order to help other users as tactic then we should better stop doing this and close this forum altogether.

Link to comment

To echo Stuher, as this a Users Helping Other Users forum, none of us have inside knowledge of what Evernote or its Dev teams are working on, so offering each other workarounds is about as good as it gets. A lot of software companies now have user forums like this one, though this is by far the best one I've personally encountered.

No software is ever 100% bug free, ever. That is just the nature of the beast. And like every other user active on this forum, I have my own personal wishlist of bugs - and design decisions - that I'd like to see addressed. But, as Gazumped and other long time users have pointed out, their product, their choice.

Don't get me wrong, I don't think we as users, free and paying, should stop complaining about bugs we experience, asking for features we like, or even politely voicing our discontent about the things we don't like. But keep in mind, the Devs are working on things as they are instructed to. It's not them who make the decisions about what the company itself does or doesn't do. That falls on their executive team, with Phil Libin at the top as the CEO.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...