MobileRead Forums - View Single Post

Markismus · 10-21-2021, 06:36 AM

Does anyone know or know how to determine the character set used by Onyx Boox for the exported notes?

I've wrote a perl script to analyse the exported notes and convert it to my layout in LaTeX. However, I have a spot of trouble with identifying the original character set.
If I open it in an editor such as sublime, it is seen as a hexadecimal file.

Code:

424f 4f58 2052 6561 6469 6e67 204e 6f74
6573 c2a0 7cc2 a03c 3c47 4120 3236 2e20
2d20 4d65 7461 7068 7973 6973 6368 6520
416e 6661 6e67 7367 72c3 bc6e 6465 2064
6572 204c 6f67 696b 2069 6d20 4175 7367
616e 6720 766f 6e20 4c65 6962 6e69 7a20
2853 756d 6d65 7220 7365 6d65 7374 6572
2031 3932 3829 2c20 6564 2e20 4b2e 4865
6c64 2c20 3139 3738 2c20 326e 6420 6564
6e20 3139 3930 2c20 5649 2c20 3239 3270
3e3e 0a4e 6f74 6550 726f 0a0a 5469 6d65
efbc 9a32 3032 302d 3038 2d32 3720 3233
3a32 340a e380 904f 7269 6769 6e61 6c20
5465 7874 e380 9167 6c69 6564 6572 6e64
6520 4175 66ef bfbe 6465 636b 756e 6700
0ae3 8090 416e 6e6f 7461 7469 6f6e 73e3

If I reopen it with encoding UTF-8, it is almost correct, but not entirely. Some encoding troubles remain, such as hexadecimal 0x00, whitespace characters that are not spaces and odd choices for characters for brackets:

Code:

BOOX Reading Notes*|*<<GA 26. - Metaphysische Anfangsgründe der Logik im Ausgang von Leibniz (Summer semester 1928), ed. K.Held, 1978, 2nd edn 1990, VI, 292p>>
NotePro

Time：2020-08-27 23:24
【Original Text】gliedernde Aufdeckung

for example the spaces between 'BOOX Reading Notes' are normal spaces, around 'Notes | <<' are different. In Perl you can simply write '/s' for all whitespace and get on with it, but other characters pose more trouble down the line.

10-21-2021, 06:36 AM	#1
Markismus Guru Posts: 963 Karma: 149907 Join Date: Jul 2013 Location: Rotterdam Device: HiSenseA5ProCC, OnyxNotePro, Note5, Kobo Glo, Aura	Character set used for exported notes Does anyone know or know how to determine the character set used by Onyx Boox for the exported notes? I've wrote a perl script to analyse the exported notes and convert it to my layout in LaTeX. However, I have a spot of trouble with identifying the original character set. If I open it in an editor such as sublime, it is seen as a hexadecimal file. Code: 424f 4f58 2052 6561 6469 6e67 204e 6f74 6573 c2a0 7cc2 a03c 3c47 4120 3236 2e20 2d20 4d65 7461 7068 7973 6973 6368 6520 416e 6661 6e67 7367 72c3 bc6e 6465 2064 6572 204c 6f67 696b 2069 6d20 4175 7367 616e 6720 766f 6e20 4c65 6962 6e69 7a20 2853 756d 6d65 7220 7365 6d65 7374 6572 2031 3932 3829 2c20 6564 2e20 4b2e 4865 6c64 2c20 3139 3738 2c20 326e 6420 6564 6e20 3139 3930 2c20 5649 2c20 3239 3270 3e3e 0a4e 6f74 6550 726f 0a0a 5469 6d65 efbc 9a32 3032 302d 3038 2d32 3720 3233 3a32 340a e380 904f 7269 6769 6e61 6c20 5465 7874 e380 9167 6c69 6564 6572 6e64 6520 4175 66ef bfbe 6465 636b 756e 6700 0ae3 8090 416e 6e6f 7461 7469 6f6e 73e3 If I reopen it with encoding UTF-8, it is almost correct, but not entirely. Some encoding troubles remain, such as hexadecimal 0x00, whitespace characters that are not spaces and odd choices for characters for brackets: Code: BOOX Reading Notes\|<<GA 26. - Metaphysische Anfangsgründe der Logik im Ausgang von Leibniz (Summer semester 1928), ed. K.Held, 1978, 2nd edn 1990, VI, 292p>> NotePro Time：2020-08-27 23:24 【Original Text】gliedernde Aufdeckung for example the spaces between 'BOOX Reading Notes' are normal spaces, around 'Notes \| <<' are different. In Perl you can simply write '/s' for all whitespace and get on with it, but other characters pose more trouble down the line. Last edited by Markismus; 10-21-2021 at 12:38 PM.