MobileRead Forums - View Single Post

davidfor · 12-28-2020, 10:41 PM

Quote:

Originally Posted by jhowell

I don't know if this is relevant, but I happened to notice that the My Clippings.txt file in the post above has a Unicode BOM character at the beginning of each book title. Those might not be visible depending on how you view those files.

Of course. I looked for blanks and things, but, I didn't think of that.

It looks like the Kindle is adding the BOM character at the beginning of each annotation block. They are probably building the complete string for the block and then dumping it into the file without worrying about this. It might be deliberate as a way of separating the annotations, but, I don't think so. And my brain hurts whenever I play with this stuff.

In any case, the attached beta does "record.encode().decode('utf-8-sig')" where record is the annotation block. Though I am sure there is a better way to handle this, it seems to work in the test harness against @UMNiK, but that just parses the file and doesn't do the matching to the actual book. It produces a title without the BOM character, so it should work.

The attached beta has this change, so we will see.