![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 955
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
Character set used for exported notes
Does anyone know or know how to determine the character set used by Onyx Boox for the exported notes?
I've wrote a perl script to analyse the exported notes and convert it to my layout in LaTeX. However, I have a spot of trouble with identifying the original character set. If I open it in an editor such as sublime, it is seen as a hexadecimal file. Code:
424f 4f58 2052 6561 6469 6e67 204e 6f74 6573 c2a0 7cc2 a03c 3c47 4120 3236 2e20 2d20 4d65 7461 7068 7973 6973 6368 6520 416e 6661 6e67 7367 72c3 bc6e 6465 2064 6572 204c 6f67 696b 2069 6d20 4175 7367 616e 6720 766f 6e20 4c65 6962 6e69 7a20 2853 756d 6d65 7220 7365 6d65 7374 6572 2031 3932 3829 2c20 6564 2e20 4b2e 4865 6c64 2c20 3139 3738 2c20 326e 6420 6564 6e20 3139 3930 2c20 5649 2c20 3239 3270 3e3e 0a4e 6f74 6550 726f 0a0a 5469 6d65 efbc 9a32 3032 302d 3038 2d32 3720 3233 3a32 340a e380 904f 7269 6769 6e61 6c20 5465 7874 e380 9167 6c69 6564 6572 6e64 6520 4175 66ef bfbe 6465 636b 756e 6700 0ae3 8090 416e 6e6f 7461 7469 6f6e 73e3 Code:
BOOX Reading Notes*|*<<GA 26. - Metaphysische Anfangsgründe der Logik im Ausgang von Leibniz (Summer semester 1928), ed. K.Held, 1978, 2nd edn 1990, VI, 292p>> NotePro Time:2020-08-27 23:24 【Original Text】gliedernde Aufdeckung Last edited by Markismus; 10-21-2021 at 11:38 AM. |
![]() |
![]() |
![]() |
#2 | |
Onyx-maniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,918
Karma: 17236157
Join Date: Feb 2012
Device: Nook NST, Glow2, 3, 4, '21, Kobo Aura2, Poke3, Poke5, Go6
|
Quote:
U+00A0 No-Break Space U+3010 Left Black Lenticular Bracket U+3011 Right Black Lenticular Bracket Edit: Don't forget U+FF1A Fullwidth Colon Last edited by Renate; 10-21-2021 at 03:48 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 955
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
At least some of the characters aren't:
Code:
UTF-8 "\xEF\xBF\xBE" does not map to Unicode at .... ![]() It appears out of nowhere in the middle of a few words. Rather odd. E.g. in the piece of code in the first post it's between Auf and deckung. EDIT: Just checked and it appears in two different books within the cititations. One book is scanned and OCR'ed, the other is a PDF with only text within. So it seems that the source is Onyx Boox's export code. Last edited by Markismus; 10-21-2021 at 11:42 AM. |
![]() |
![]() |
![]() |
#4 |
Onyx-maniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,918
Karma: 17236157
Join Date: Feb 2012
Device: Nook NST, Glow2, 3, 4, '21, Kobo Aura2, Poke3, Poke5, Go6
|
|
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 955
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
It's indeed real close to the UTF-8 byte order mark. Wikipedia BOM says that one is EF BB BF.
Still, all other errors are gone now that I opened and closed all files with specified encodings both in Perl and LaTeX. So unless I run into other problems, I'll just have to filter the codepoints out. Last edited by Markismus; 10-21-2021 at 11:43 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Onyx-maniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,918
Karma: 17236157
Join Date: Feb 2012
Device: Nook NST, Glow2, 3, 4, '21, Kobo Aura2, Poke3, Poke5, Go6
|
Now you're getting me confused.
Normal UTF-8-BOM 0xEF, 0xBB, 0xBF -> U+FEFF (a valid Unicode) 0xEF, 0xBF, 0xBE -> 0xFFFE (not a Unicode anything, a byte reversed UTF-16-BOM). |
![]() |
![]() |
![]() |
#7 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 955
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
|
I rather have clear insight into things. However, given the current topic, it also nice to be confused together.
![]() Anyways, this is Perl generated while writing to an UTF-8 encoded textfile: Code:
... Wir versuchen eine philosophische Logik und damit eine Ein\xEF\xBF\xBEführung in das Philosophieren ... |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Exported ePUBs stay the same, exported PDFs always change. | halloleo | Library Management | 4 | 01-10-2021 08:02 AM |
Onyx Boox Max 2 – Character Encoding of exported Annotations | Sklanfurt | Onyx Boox | 0 | 01-12-2019 06:44 AM |
traditional and simplified chinese character set? | mzmm | ePub | 3 | 05-10-2013 07:41 AM |
character set troubles | wijnands | Calibre | 5 | 05-15-2010 11:12 AM |
Customized character set problem - with solution | BlackVoid | Sony Reader Dev Corner | 2 | 09-13-2008 12:54 AM |