note = noteStore.getNote(DEVELOPER_TOKEN, n_guid,
withContent=True,
withResourcesData=False,
withResourcesRecognition=False,
withResourcesAlternateData=False
)
print type(note.content) # => str
unicode( note.content, "ascii" ) # => UnicodeDecodeError
unicode( note.content, "utf-8" ) # worksThis is true of other fields, such as the *****le field, on Notes. According to the source, these fields are all of type thrift.Thrift.TType.STRING. Looking at the Thrift source, it appears that this is meant to represent ASCII-encoded strings. There are separate types for Unicode strings (UTF7, UTF8, UTF16).
This is bad for a few reasons. First of all, some of this data returned by the API is inherently Unicode-encoded. For example, note content is written in ENML which is explicitly encoded as Unicode UTF-8:
<?xml version="1.0" encoding="UTF-8"?>. Returning everything as ASCII leads to numerous bugs when using most libraries (for example, Jinja2). You get errors like these:
In [45]: unicode( note.content, "ascii" ) --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) /Users/grobinso/Dropbox/entervals/<ipython-input-45-fd7ebf585aec> in <module>() ----> 1 unicode( note.content, "ascii" ) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1589: ordinal not in range(128)
Most annoying is the solution - I have to explicitly convert all of the string fields returned from Evernote API calls to Unicode in order to work with them. Imagine this in every piece of code that makes an API call:
# debug unicode
try:
unicode( note.content, "ascii" )
except UnicodeError:
note.content = unicode( note.content, "utf-8" )
else:
# value was valid ASCII data
pass
# or, just blanket convert everything to UTF-8
note.content = unicode( note.content, "utf-8" )
# ... likewise, for every field I work with, depending on what I'm going to do with itIs there a reason why the API does not use Unicode strings? Would changing this be as easy as subs*****uting thrift.Thrift.TTypes.UTF8 for thrift.Thrift.TTypes.STRING?












