Jump to content

Text importing: "beta" character vs. "b"


Recommended Posts

I have no idea whose bug this is (Chrome browser, Mac OS, Evernote, Evernote Web Clipper, etc.

 

I'm trying to develop a workflow that will allow me to bring pdf articles from the medical journals to which I subscribe into Evernote, to make searching them easier.

 

One of them, Journal of the American Society of Nephrology, by default displays article pdfs (selected by clicking it's link in the table of contents on the web) in a pane of the website window. It's possible to click again to view the article as a pdf that fills the entire webpage.

 

In the Yosemite Safari browser, activating the EN Web Clipper when the article is displayed in a pane of the browser window brings in the article's title as the note title, but none of the article itself is transferred to Evernote. Here's the article title:

 

β-Blockers in Dialysis Patients: A Nephrocardiology Perspective

 

Activating the EN Web Clipper in Safari with the article displayed as a full-page pdf (instructing the clipper to import the pdf) brings in a note containing an iconic placeholder for the pdf:

 

post-235961-0-87197000-1417011276_thumb.

 

Unfortunately, the pdf file itself is damaged and cannot be opened, either within the Evernote app on the Mac or from the Finder if the file is saved from Evernote to the Finder.

 

Yesterday, people suggested that the way to resolve this was to use a different browser (Google Chrome), noting that the EN Web Clipper (versioned as a later release) for Chrome worked better than that for Safari. So I tried that. The results  were identical trying to import the pdf from the pane in Chrome (a note in Evernote titled exactly as was the article, but missing the pdf).

 

Finally, I tried using the EN Web Clipper on the article pdf displayed as a non-paned full page.

 

This time, the resulting "Untitled note" in Evernote contained the pdf. "Eureka!" I thought. Just use the Chrome browser.

 

Except…

 

Not wanting to fill up my medical literature notebook with hundreds of "Untitled note" notes, I'd decided I would simply replace the default note title with the pdf's title, and here, finally is my  bug report. If, in Evernote, I copy the article title and use for the note title, it displays not with the "beta" character, but as

 

b-Blockers in Dialysis Patients: A Nephrocardiology Perspective

 

Is this an Adobe bug (probably not); a Chrome bug? a Mac OS bug? an Evernote bug? I have no idea, but I am concerned about the implications it will have for searching with the keyboard instead of my eyes 

Link to comment

(answering my own question once again)

 

It's not an Evernote bug. If I download the pdf directly to my computer, copy the title, and paste it into any of more than a half-dozen Mac OS programs that deal with rich text, in every one of them the upper case "beta" character morphs into the lower case "b" character.

 

My guess now is that it's a problem with the embedded fonts in the original pdf. I've sent a query to the Journal of the American Society of Nephrology webmaster. If what they respond is of general interest, I'll post a summary here.

 

Jim Robertson

Link to comment

I've wasted spent the morning further investigating this. I thought I might have a solution to titling my notes with the "beta" character—just choose the beta character using the Apple Keyboard viewer. Only problem with that was that what looks like the Greek "beta" character on the Mac keyboard viewer is in fact the German "sharp s" as in street, Strasse, a character apparently not used much in modern German. So once I entered that character in the search field in pdfs, I can find all instances of "ss". Not very useful.

 

And, if I use a Greek keyboard layout on my Mac to enter the Greek lower case b (or beta) character into the "find" field of a pdf or Evernote containing a pdf, I'm told there are no occurrences of the character in the document.

 

So, there seems to be a fundamental difference between how the Mac and Win OS deal with at least some foreign language characters. My knowledge of fonts is exhausted by that speculation. Anyone have any explanation? Thanks so much,

Jim Robertson

Link to comment
  • Level 5*

I suspect that the Beta symbol is not being indexed by Evernote as you would expect.

Your simplest solution is most likely to just rename the PDF title (and Note Title) to use the normal text "Beta".

Link to comment

I suspect that the Beta symbol is not being indexed by Evernote as you would expect.

Your simplest solution is most likely to just rename the PDF title (and Note Title) to use the normal text "Beta".

 

I've made quite a bit of progress in understanding what's going on here. The most likely explanation is that the creator of the pdf used a special glyph of the lower case "b" for all instances of "β". Trying to sort this out, I was impeded but then assisted by my misunderstanding of what the <option-s> keystroke character from the Mac's US keyboard, "ß", really represents.

 

On the ASN Website, it's possible for journal subscribers to view full-text articles as html as well as in pdf format. Remarkably, in the html version at the website the "β" character is not a "styled" lower case "b", but a different character, and either there or in a note brought into Evernote by the EN Web Clipper, it retains its identity as a separate and searchable character.

 

So in the end, I guess this is actually no-one's bug, but a pdf design decision. Now the major question is why the pdf creators chose to do what they did. I don't have much hope that I'll be able to discover that.

Link to comment

 

I suspect that the Beta symbol is not being indexed by Evernote as you would expect.

Your simplest solution is most likely to just rename the PDF title (and Note Title) to use the normal text "Beta".

 

I've made quite a bit of progress in understanding what's going on here. The most likely explanation is that the creator of the pdf used a special glyph of the lower case "b" for all instances of "β". Trying to sort this out, I was impeded but then assisted by my misunderstanding of what the <option-s> keystroke character from the Mac's US keyboard, "ß", really represents.

 

On the ASN Website, it's possible for journal subscribers to view full-text articles as html as well as in pdf format. Remarkably, in the html version at the website the "β" character is not a "styled" lower case "b", but a different character, and either there or in a note brought into Evernote by the EN Web Clipper, it retains its identity as a separate and searchable character.

 

So in the end, I guess this is actually no-one's bug, but a pdf design decision. Now the major question is why the pdf creators chose to do what they did. I don't have much hope that I'll be able to discover that.

 

 

I think I've been helped (on a few Mac listservs) to an explanation that makes sense and explains seemingly irreconcilable behavior of the "β" character in various circumstances.

 

It turns out that the current Mac OS deprecates the "Symbol" font in favor of encoding characters in what's called "UTF-8." There's an explanation of this on wikipedia that makes my head spin. It appears that UTF-8 gives the 'β' character its own code identifier rather than making it a styled "b". The symbol font on the Mac is available only for what are called "Carbon" apps (those that were basically modified PPC processor code when Macs went from Motorola/IBM PowerPC processors to Intel Processors. There are still some "Carbon" apps available; e.g., MS Office Mac 2008, and Symbol and other .ttf fonts are available to those apps, but "Cocoa" apps require UTF-8 character sets.

 

Jim Robertson

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...