Jump to content

Search does not find partial string matches or non-alphanumeric characters within text


Recommended Posts

searched online for this but could not find any discussion or resource addressing this point where if i enter a hashtag in the evernote search box before a word no results show up even though i have #dog , for example, in one of my notes. hashmarks seem to be ignored completely in searches.

Link to comment

The only non alphanumeric character that EN will find is the underscore. And we had to fight to get that one in the new version. So, you'll have to switch from #tag to _tag.

Link to comment
  • 1 month later...

Hi. I'm not yet an Evernote user, but I have tested it somewhat with a small collection of notes that I entered. I have found what I consider to be a major flaw in the search functionality: Search does not find partial string matches within text, only word matches.

For example, if I have a note containing the word "iphone", a search on the string "phone" does not find it.

What are the chances that this can be fixed, and have partial string matches succeed when searched? At least, have it as optional behavior?

There is much to like about Evernote, including the Web syncing function and iPhone app. But this search limitation is keeping me from switching from Info Select, which can't do any of these modern things, but has wonderful search capability.

Link to comment
  • Level 5

Sure it would be nice, but...with all the things you can do with Evernote, I would not classify this as a major flaw. It's more of a minor hiccup.

If I want to find an article about the Obama's trillion dollar deficit, I search for trillion, not lion.

Link to comment

It's not going to be fixed. Well, unless the Evernoters have a complete change of heart. If you search the board, there have been numerous requests over the year for a more fine-grained search capability, maybe even going so far as regex. But every response comes back the same. The way that EN works, it builds indexes of words - this makes search pretty fast, but it means that you can only search for full words or the start of words. So, you can use *, but only at the end of the word you're searching for.

Link to comment
  • 1 year later...
hmm, so how is a search for punctuation characters achieved? like looking for markers such as "$$$"?

I think the answer is no can do. You may want to employ other tactics. IE, if $$$ is ranking price of restaurants you want to try, you could simply create tags for $, $$, $$$

Link to comment
  • Level 5*

Short of writing a tool to pull out all of your note text and searching that, you can't. And according to my experiments, you can't do it in Google either. Or Bing.

~Jeff

Link to comment

just curious why such a limitation was put in place? there are a lot of other characters besides letters and numbers that makeup a database, anyone know the reason they chose to make it impossible to search for such a large percentage of them?

so i can't search for 30% or AT&T or brad@me.com or A++ those all will fail because they involve punctuation?

Link to comment
just curious why such a limitation was put in place? there are a lot of other characters besides letters and numbers that makeup a database, anyone know the reason they chose to make it impossible to search for such a large percentage of them?

so i can't search for 30% or AT&T or brad@me.com or A++ those all will fail because they involve punctuation?

Well, like Jeff said, you can't do that in Bing or Google either. So there must be a valid reason.

Link to comment
  • Level 5*
i searched for at&t in google now and it worked just fine, it does not work in EN.

I'm guessing that a search for "at&t" didn't turn up much different than a search for "att". Did you try "$$$"?

As Dave said, punctuation is ignored in Evernote. That doesn't mean that it works identically in Google or Bing, but neither of those search services handle punctuation straightforwardly either. You're just going to have to accept that, no, searching for punctuation doesn't work in Evernote.

~Jeff

Link to comment

If you search for "@Home" in Google, you'll get a ton of pages about "home", not limited by that punctuation. For the vast majority of users, this is a "feature" not a "bug", because it means that they can search for things without needing to exactly match the original punctuation, spacing, capitalization, diacritics (accent marks), etc.

The issue is that fast search over large amounts of data require that the raw data be processed and "indexed" in some way. The only way to search for arbitrary sequences of Unicode characters would be to crawl through every byte of every file for every search. This is fine for a tiny database, but completely impractical when you have gigabytes of data (or petabytes, in the case of Google). So if you want a responsive and interactive search capability, you index the text in some manner. That's why people with 5GB of notes can type a word into our web site and see matching notes in a second or two instead of 2 minutes later.

This process typically breaks text down into "words" of some sort, and then builds an inverted index for the whole collection of data that allows you to quickly look up which documents contain each word. This process is common in any fast text search systems that you see, and our particular details are described in Appendix C of our developer overview:

http://www.evernote.com/about/developer ... c277181469

Link to comment

dave -

thanks for the details and i agree you need an effective indexing system but you also need a backup system when your indexing system fails, which is the case here.

if i put something in quotes like "30%" you should do a full on search of everything in the database, the choice to go slow was one that i made based on the specific requirement of the quotes.

i agree you need a method to make things fast but i suggest you add an alternate method to make things "accurate".

if you don't, we can't "remember everything"... :lol:

brad

Link to comment
We don't support search for arbitrary sequences of punctuation. I'd recommend using letters, numbers, and underscores to mark sequences that you want to search for.

dave,

when i search for _sb on iphone per your instructions, search returns 56 items when there is only 1 match in the windows application - and the underscore is stripped form the search term. is this proper behavior or not?

Link to comment
  • 1 month later...
No, the underscore character is an "honorary" letter that should be preserved. That's a bug, thanks for the report.

Is there any movement on this? I deliberately use something like "log_" in some notes to allow me to filter them out later on. I'm finding that now (4.2.0.x), it no longer works for me.

I went back to a saved search that I use rarely, whose syntax is "any: tag:@waiting _waiting" and I find that it's broken too. The _ isn't being noticed/used/preserved in the search.

Link to comment
  • 2 weeks later...

I just ran the upgrade to v4.2 and now search is not working.

I notice that the note search details now specifies that search term will only be found if they are at the beginning of words.

Yesterday (before the upgrade) it would find a note that contained a search term anywhere not just at the beginning.

Even the 'Begins with' is not working correctly

If I search for '_PP1'

it won't find any matches even the following note (snippet) -

SYSTESTA _Q31

SYSTESTB _Q41

DEVLOCO _D3

QAFAT1 _T1

QAFAT2 _T2

PREPROD _PP1

DEVINT1 _D1

TRAINING _TR1

This is a horrible change in behavior. The main purpose of Evernote is to allow you to find your notes. Forcing only 'begins with' is a bad change.

I would like to see the previous search functionality restored.

Best regards,

PCH

Link to comment

First, Evernote has never found matches within words. (Or at least since the introduction of 3.1 ~mid 2008.) AFAIK, they don't plan on implementing that type of search.

viewtopic.php?f=30&t=18993&p=78350&hilit=underscore#p78332

Second, I've been using various builds of 4.x & have never had a problem with the search. If I search on webca, I will find all notes with the word "webcam" in them, so the "begins with" does indeed work. I've just confirmed this with build 121492, 121256 and 121810.

Link to comment
  • Level 5*

Further: Your example list has spaces between the entries, (e.q. PREPROD_PP1); I made a note and copied the list in. I was able to successfully search for both '_PP1' and 'PP1'. So OK, I then removed the spaces (e.g.PREPROD_PP1), and while I wasn't able to find '_PP1', I was able to find 'PP1'.

~Jeff

Link to comment
So OK, I then removed the spaces (e.g.PREPROD_PP1), and while I wasn't able to find '_PP1', I was able to find 'PP1'

Really? I created two notes - one had the space & one didn't. If I search on PP1, neither note shows up...??? (Which seems to be the expected behaviour as indicated by Dave's post I linked to above.)

Link to comment

Ok, on the computer I entered the two notes on (build 121810), searching on PP1 does not bring up either of the notes. BUT...on the other computer (build 121256), when the two notes were sync'd over there, PP1 DOES bring up both notes. So...in case the indexing wasn't completely done/sync'd down on the first computer, I re-sync'd & it continues to not bring up either note when searching on PP1... :?

Based upon Dave's post, I would think neither note should be a hit when searching on PP1, since a contiguous "word" is letters, numbers & underscore...

Link to comment

Furthermore, when searching for _pp1 in build 121810, it finds only the note with the space. (Expected behaviour.) Same search in build 121256 shows both notes. It seems like build 121256 treats the underscore as a delimiter AND a part of a contiguous word. (It's two mints in one!) And it was corrected somewhere along the line.

Link to comment

Ok, I think I need to step away from my computer for a while & go out & take a walk. I just updated EN on the 2nd computer to 121810 & the behaviour has not changed. IOW, on computer 1 (where I entered the two notes & have recently sync'd again), when searching on pp1, neither of the test notes shows up. Yet, on computer 2, BOTH do. This doesn't make any sense at all...

(Flips sign on door over from "open" to "closed for the day" & heads for the nearest box of wine.)

Link to comment

Alrighty... hopefully cleared head & eyes. (Couldn't find a box of wine.)

Both computers & the web version are behaving the same:

Searching on pp1 finds neither of the notes, which seems like the correct behaviour.

Searching on _pp1 finds only the note with the space, which also seems like the correct behaviour.

Which then causes me to ask Jeff...

So OK, I then removed the spaces (e.g.PREPROD_PP1), and while I wasn't able to find '_PP1', I was able to find 'PP1'

Really? Doesn't seem like that should be happening...

(She says after posting a multitude of messages to herself in this thread.)

Link to comment

Yes, I think that our handling of underscores may have broken in 4.2 (along with handling of Chinese and Japanese search indexing). We're working on straightening that out, thanks.

Link to comment

Jeff - I'm glad that everything is working fine for you.

Burgers - thanks for taking the time to test this and confirm that something isn't right.

Engberg -You seem to hint that the problem is caused by the use of the underscore as the first character in the word.

I can confirm that search will Not find words that begin with an underscore.

Hopefully this will be fixed soon.

Best regards,

PCH

Link to comment
Yes, I think that our handling of underscores may have broken in 4.2 (along with handling of Chinese and Japanese search indexing). We're working on straightening that out, thanks.

Ahhh...now I feel better. :lol: Thanks!

Jeff - I'm glad that everything is working fine for you.

Burgers - thanks for taking the time to test this and confirm that something isn't right.

Engberg -You seem to hint that the problem is caused by the use of the underscore as the first character in the word.

I can confirm that search will Not find words that begin with an underscore.

Hopefully this will be fixed soon.

Best regards,

PCH

Please go back & re-read the thread. Ok, admittedly it's a bit boogered b/c of my many posts. So instead, please read (or re-read) the post I linked to in my initial reply - that's the way the EN searches are supposed to perform. IOW, Jeff's version is not working correctly. This appears to be fixed in a later upgrade. Also please note the only problem I found was indeed only when using underscore. If you upgrade your EN version, to the latest version, it should be working correctly. Searching on "bcd" will still not find a note with "abcde". It never has and probably never will. Beginning of the word searches worked fine in all my tests (except when using underscore, which was incorrect behaviour.)

Link to comment
Hey,

Same problem, even lost all information :lol: from the start of 2011!

Please re-read the thread. I don't know what you mean about losing all info. This bug applies to search results & does not destroy data.

Big *****!

I'm sure you must be feeling a lot better. Thanks for sharing.

Link to comment

Burgers -

I guess this was the first time I ever saw the 'Begins with' restriction. Search wasn't working so I hunted around and saw the restriction. It seems strange to restrict searching this way, but as you say - I've been living with it for a while. My complaint is that I used to be able to search for 'PP1' or '_PP1' and it would find that note. After the upgrade to v4.2.0.3639 (121185) it will no longer find the note. No matter how I feel it should or should not work - it currently is not working according to any rule. It's just unfortunate that I stumbled onto this underscore bug. Sorry if I ruffled any feather by suggesting that maybe it should work differently.

Thanks,

PCH

Link to comment
  • Level 5*

Search works this way as it makes for faster indexing. It's also more or less how search engines like Google work.

That you turned up a bug is unfortunate, but us actually a good thing.

~Jeff

Link to comment
Burgers -

I guess this was the first time I ever saw the 'Begins with' restriction.

Ok. Trying once more. There is no "begins with" constriction. Never has been, AFAIK. As noted earlier, in this thread, if I search on "webca", any notes with the word "webcam" will show up in the results pane. Searching on "webca" (without quotes) is the same as using a wild card a la "webca*" (without quotes.) If I want to find notes ONLY containing "webca" (and EXCLUDING notes with the word webcaM), enclose it in quotes.

My complaint is that I used to be able to search for 'PP1' or '_PP1' and it would find that note. After the upgrade to v4.2.0.3639 (121185) it will no longer find the note. No matter how I feel it should or should not work - it currently is not working according to any rule.

I don't know the build numbers when indexing the underscore went amuck & where indexing underscores was corrected. But...based upon Dave's post (that I linked to above) which describes what constitutes a "word" and based upon your example note you posted & Jeff noted (that you have _PP1 in your note), then...when you get to the build where the indexing is corrected, you will be able to search on _pp1 and find your note. Have you checked for an update??? (Help/check for updates)

I'm still on an internal/beta release, so I think I get updates before people who are using the stable releases. But I know by build 121810, the indexing appears to be corrected & you should be good to go.

Link to comment

We updated the database engine (SQLite) in the latest version of the client (4.2). This updated the "full text search" part of the database in a way that caused some regressions in how we indexed text. This included incorrect parsing and indexing of Chinese and Japanese, and incorrect handling of the underscore character. Both of these problems have been fixed in the version that you can download from our web site, but we're working to provide an automatic update next week that will fix both this and the recent PDF searching problems. That version will automatically reindex all of your notes to make sure that broken notes are retroactively fixed.

We hope to push this automatic update (version 4.2.1) by mid-week.

Link to comment
. . . incorrect handling of the underscore character. Both of these problems have been fixed in the version that you can download from our web site . . .

Dave,

I've been grappling with a related underscore search bug. I just downloaded the 4.2.1 client and rebooted but the bug's still there.

I'm doing a search with this command:

tag:@*

The search results includes notes tagged with things like @hello and @world. But they also falsely include unrelated tags that start with an underscore, like _scooby or _doo .

On the web version it works fine.

Please check this out. Thanks!

Jim

Link to comment
. . . incorrect handling of the underscore character. Both of these problems have been fixed in the version that you can download from our web site . . .

Dave,

I've been grappling with a related underscore search bug. I just downloaded the 4.2.1 client and rebooted but the bug's still there.

I'm doing a search with this command:

tag:@*

The search results includes notes tagged with things like @hello and @world. But they also falsely include unrelated tags that start with an underscore, like _scooby or _doo .

On the web version it works fine.

Please check this out. Thanks!

Jim

That's interesting. Just checked it out on my computer. Same issue, tag:@* finds things also tagged with _*. I did a search with tag:@* -tag:_* and noticed that it was finding things tagged with [*. It's like every non-alpha character is being searched with @.

Link to comment
We updated the database engine (SQLite) in the latest version of the client (4.2). This updated the "full text search" part of the database in a way that caused some regressions in how we indexed text. This included incorrect parsing and indexing of Chinese and Japanese, and incorrect handling of the underscore character.

Have I got this straight? You changed database engine version - one of the fundamental parts of the client - and didn't find regressions in text searching - the key function of Evernote - before you shipped a new version of the client?

This sounds as if you're flying by the seat of your pants and have no proper testing or release strategy at all!

Link to comment
  • 6 months later...

I normally use ’ as an apostrophe in any text I've typed myself, but in text that I've copied, it will often be '.

The two characters are treated differently in searching. I saw in today’s Tech Blog that the filters Evernote uses “remove apostrophes and other intra-word punctuation.”

A lot of my notes are in French, so a word like hiver (or any word that starts with h or a vowel) could appear as l'hiver or l’hiver depending on the source. To find the first, I'd have to search for lhiver, whereas the second shows as l and hiver separately. Of course, if I can only remember that the word appeared in the document, but can't remember it said un hiver, l'hiver, or d'hiver, then it can take a few attempts to find what I'm looking for.

Does anybody have any clever way to deal with these quirks in the way different forms of the same punctuation marks are treated?

Also, the Tech Blog said that it removed “English “stop words” like “the” and “and””. Is there any way to get the same sort of effect for other languages? So, for instance, in my example, I could just type hiver, and not worry about whether it had l', l’, d' or d’ in front of it.

Link to comment
  • 2 months later...

Our searches support "words" and "phrases", and ignore punctuation, capitalization, etc.

So the '$' character is ignored and removed from the search.

This was going to be my question, whether or not capitalization of words and tags etc. affected later searches. You just answered it. :(

Link to comment
  • 4 months later...

It's not just punctuation. I tried other symbols like @ and % which I do not consider punctuation. So searches are based on text only? Is there a way to search just NA alone instead of having it find na in words?

If anyone in power is reading this, I would like to request special characters be added to the search capabilities. Thank you.

Link to comment
  • Level 5*

You should be able to find 'NA' alone; this worked in my tests. However, you will also match 'NAG', 'NATTY', etc.

Stuff like this has been requested before; Evernote staff is pretty well aware of it. You might be interested in the search grammar specification: http://dev.evernote.com/documentation/cloud/chapters/search_grammar.php; the section on Matching Literal Terms describes the punctuation thingie.

Link to comment

Consecutive letters, numbers and the underscore are considered "words". Everything else is a delimiter. There has been no indication this will change any time soon.

Consequently, if you type "_NA" and will search for it you will get back only _NA, without NAME, NAG, etc. Same for NA_.

Link to comment

I call the underscore "The Evernoter's best friend" :) . This is because on one hand it is searchable, and on the other hand it is rarely used in daily writing. Thus it is great for instant tagging any common word as a keyword for future searches, without being diffused by similar text. For me the humble underscore successfully replaces the whole EN tag system..

Good luck!

Link to comment
  • 1 month later...

Hi guys, I'm in the process of teaching myself some programming in C++, and over the past few months I accumulated a number of webclips on the topic, that I would now like to organize. Is there any way I can search for the string:

C++

I'm having some trouble with this... :wacko:

Thanks in advance for any help.

Link to comment

Welcome to the forums.

Unfortunately, no.

Evernote removes most punctuation, including specifically +, from the search string and the search index.

I ran into the same problem when I tried to search for "Google+".

Link to comment
  • Level 5*

Evernote ignores characters it regards as "punctuation" which unfortunately includes "+". If there are no other unique terms on which you can search, you may wind up looking through all clips since a certain date. Once you find your clips, you can certainly assign a "C++" tag and add it to your taskbar to make life easier in future.

Link to comment
  • 3 months later...
  • Level 5*
:Separate search syntax for subject & body (subject:x | body:x)

There already is a subject (intitle:) search and a general search

Although in some clients (Windows, at least), a straight search for 'x' will also include search note titles and tags as well.

recognize hash in serach - #tag and tag should NOT yield same results.

Punctuation is usually ignored in searches; '#' counts as punctuation. See the search grammar: http://dev.evernote.com/documentation/cloud/chapters/search_grammar.php. Relevant quote is:

Punctuation is used to split the input query and document into words, but it is ignored for text matching. The behavior of a quoted search should behave as if the following operations were performed on both the search query and the target note:

1. All XML markup is removed from the document, leaving only the visible text as a string

2. The string is converted to a list of words which are separated by one or more whitespace and/or punctuation characters.

3. The case of each word in the list is normalized

4. The list of words in the query must match with the same sequence of words in the converted Note

It would be nice if we could override that behavior, say by quoting the string containing puctuation, but it doesn't seem to work that way.

Link to comment
  • 1 month later...
  • Level 5*

I don't think so, at least I couldn't figure out how to do it in the Evernote for Windows client. Best guess is that " is regarded as punctuation, which is not considered for search purposes. In the Evernote developer documentation on the search grammar (http://dev.evernote.com/documentation/cloud/chapters/search_grammar.php), the relevant quote, I think, is "Punctuation is used to split the input query and document into words, but it is ignored for text matching".

Link to comment
  • 3 weeks later...

Hi,

i want to search over my complete evernote with a special syntax and want to find only entries that match EXACTLY ma serach word.

Example:

I made entries that begin with "K! -" and want to find them ALL and nothing else!

now when i search for K! - there are lots of entries displayed do not match to my search

is there a way fpr EXACT searching for a string ?

how do i have to set my searching ?

any ideas?

Link to comment

It sounds like you could do what you want with a tag

I agree. If you have a particular string that you are using out of context, meaning that it doesn't occur naturally in the note but is put there simply to 'tag' the note, then use a tag!

The only time I would use a piece of text rather than a tag is for transient tag topics. For example, at work I do a variety of research projects, some that last only a few days while others carry on for years. Rather than creating a tag for every new project, I use random six-character strings from a password generator (thanks GrumpyMonkey for the idea, I believe!) that I place at the end of relevant notes. I use a saved search to find them. Once the project is complete, I copy the saved search text into a note of its own before deleting it. That way I can find stuff later and I'm not left with dozens of no longer relevant tags.

Link to comment
  • Level 5*
Thanks for this hint, it is the "-" because this neogates search string!

thats a big problem fopr me because now i have to look for another "special syntax" to find my entries ;-(

No, the '-' does not negate the search string in this case, at least on the Windows client. It appears to be ignored. That is, when presented with "K! -", the Windows client turns that into a search for "K".

Not sure what your overall scheme is, but dlu's suggested approach is where I would start looking...

Link to comment

Thanks all of you for yout tipps!

1.) yes the "!K" is stripped to "K" when searching, so it was a bad idea to take this synctax for somethint i want to search ;-)

2.) i dont want to use this string for searching. it has only the function to see in the list of notes easy which of the entries is an "korrespondenz" (german word ;-)

3,) for solution i am looking for a simple way to change the syntax "K! - " against "K_". Is there any solution doing that automatically, or must i do this by hand?

Link to comment
  • Level 5*

Sorry, no easy way to do this that I know of. Obviously, you already have a problem locating these notes because of the search problem. But once you do that, you can use the Find and Replace operation (Ctrl+H) to do the replacement. Unfortunately, that only works in a single note at a time. On the other hand, if you do a replace operation of "K_" for "K!", that will be remembered the next time you do Find and Replace.

Another way, maybe simpler ultimately, to tackle the overall categorization problem would be to use a tag named "Korrespondenz".

Link to comment
  • 3 weeks later...

I used ">>" to flag items I want to follow up on later. I want to search for notes that contain the >> note. But search seems to ignore the ">" character. In fact, it ignores every non-alpha charcter I tried.

The documentation doesn't discuss this.

Using Evernote on Mac (OS X Lion).

Link to comment
  • Level 5

I have a few notes that I want quick reference to, but I don't want to create a unique tag, so what I did was added a memorable code added to the end of the notes. Each code begins with 7777 (no reason, just because I could remember it easy). I can search for 7777 and find them all or search for specific codes below.

7777E = Evernote note with example of my most commonly used title formats

7777C = Cut the cord (cable) information

7777R = Stuff to read later

7777S = Note including specialized search engine links

7777W = World or Warcraft links to leveling

Link to comment
  • 3 months later...

If punctuation is included in a user's search string why not default to a character level search?  A warning could be displayed along with a progress bar.  Or, the search type could be left up to the user as a choice of "indexed or unindexed".

 

At the very minimum, a message could pop up to tell the user that the punctuation will be ignored.

 

 

I have about 7,800 notes (19MB of data).   As lame and incomplete as Apple's Notes app may be I can still do a 1 second search for punctuation, such as ??? (three question marks in a row, indicating to me a question that requires an answer).  The speed is the same on my iMac and my iPhone and iPad.

 

Evernote just sits there.  Why?

 

 

Link to comment
  • 1 year later...

I have a system of note taking that let's me track issues of a particular type:

 

[x] = No Longer In Use

[?] = Open Issue

[+] = Add/Include

[-]  = Remove

[...] = Clarify

 

 and so on.

 

Sadly, I thought Evernote was the product I've been looking for, however, using it makes all of my existing notes almost entirely useless.

Is there some way to get some traction on adding this feature back into Evernote?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...