MobileRead Forums - View Single Post

bizzybody · 01-30-2011, 09:26 PM

All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing. The extended ASCII set with line drawing characters is a creation of IBM.

I had to leave the semicolons off the UTF-8 codes because the forum software is not setup to leave *everything* between the code commands 100% exactly as entered. With the semicolons after the numbers the bleeping forum "helpfully" converts the codes to the characters.

Any e-book conversion software that can convert to formats for which there is a reader for non-unicode platforms should have an option to use extended ASCII or Windows 1252 encoding, including converting all these UTF-8 codes (with the semicolon of course) to their ASCII equivalents instead of to their Unicode equivalents.

The result looks exactly the same, but the file size can be significantly smaller.

Code:

&#033
!
&#034
"
&#035
#
&#036
$
&#037
%
&#038
&
&#039
'
&#040
(
&#041
)
&#042
*
&#043
+
&#044
,
&#045
-
&#046
.
&#047
/
&#048
0
&#049
1
&#050
2
&#051
3
&#052
4
&#053
5
&#054
6
&#055
7
&#056
8
&#057
9
&#058
:
&#059
;
&#060
<
&#061
=
&#062
>
&#063
?
&#064
@
&#065
A
&#066
B
&#067
C
&#068
D
&#069
E
&#070
F
&#071
G
&#072
H
&#073
I
&#074
J
&#075
K
&#076
L
&#077
M
&#078
N
&#079
O
&#080
P
&#081
Q
&#082
R
&#083
S
&#084
T
&#085
U
&#086
V
&#087
W
&#088
X
&#089
Y
&#090
Z
&#091
[
&#092
\
&#093
]
&#094
^
&#095
_
&#096
`
&#097
a
&#098
b
&#099
c
&#100
d
&#101
e
&#102
f
&#103
g
&#104
h
&#105
i
&#106
j
&#107
k
&#108
l
&#109
m
&#110
n
&#111
o
&#112
p
&#113
q
&#114
r
&#115
s
&#116
t
&#117
u
&#118
v
&#119
w
&#120
x
&#121
y
&#122
z
&#123
{
&#124
|
&#125
}
&#126
~
&#128
€
&#130
‚
&#131
ƒ
&#132
„
&#133
…
&#134
†
&#135
‡
&#136
ˆ
&#137
‰
&#138
Š
&#139
‹
&#140
Œ
&#142
Ž
&#145
‘
&#146
’
&#147
“
&#148
”
&#149
•
&#150
–
&#151
—
&#152
˜
&#153
™
&#154
š
&#155
›
&#156
œ
&#158
ž
&#159
Ÿ
&#160
&nbsp
&#161
¡
&#162
¢
&#163
£
&#164
¤
&#165
¥
&#166
¦
&#167
§
&#168
¨
&#169
©
&#170
ª
&#171
«
&#172
¬
&#173
*
&#174
®
&#175
¯
&#176
°
&#177
±
&#178
²
&#179
³
&#180
´
&#181
µ
&#182
¶
&#183
·
&#184
¸
&#185
¹
&#186
º
&#187
»
&#188
¼
&#189
½
&#190
¾
&#191
¿
&#192
À
&#193
Á
&#194
Â
&#195
Ã
&#196
Ä
&#197
Å
&#198
Æ
&#199
Ç
&#200
È
&#201
É
&#202
Ê
&#203
Ë
&#204
Ì
&#205
Í
&#206
Î
&#207
Ï
&#208
Ð
&#209
Ñ
&#210
Ò
&#211
Ó
&#212
Ô
&#213
Õ
&#214
Ö
&#215
×
&#216
Ø
&#217
Ù
&#218
Ú
&#219
Û
&#220
Ü
&#221
Ý
&#222
Þ
&#223
ß
&#224
à
&#225
á
&#226
â
&#227
ã
&#228
ä
&#229
å
&#230
æ
&#231
ç
&#232
è
&#233
é
&#234
ê
&#235
ë
&#236
ì
&#237
í
&#238
î
&#239
ï
&#240
ð
&#241
ñ
&#242
ò
&#243
ó
&#244
ô
&#245
õ
&#246
ö
&#247
÷
&#248
ø
&#249
ù
&#250
ú
&#251
û
&#252
ü
&#253
ý
&#254
þ
&#255
ÿ
&#338
Œ
&#339
œ
&#352
Š
&#353
š
&#376
Ÿ
&#402
ƒ
&#8211
–
&#8212
—
&#8216
‘
&#8217
’
&#8218
‚
&#8220
“
&#8221
”
&#8222
„
&#8224
†
&#8225
‡
&#8226
•
&#8230
…
&#8240
‰
&#8364
€
&#8482
™

Like I said earlier, the *best* thing would be for e-book reader software to include its own Unicode support on platforms without native support, but unless someone else does it, that will never ever happen for Mobipocket since Amazon bought it for use on Kindle. Failing that, the only thing one can do when converting to formats for any Palm reader app or other non-unicode platform is to pre-convert the source to de-Unicode it, unless you like common punctuation replaced by spaces, blank boxes, 'weird' characters or simply removed and replaced with nothing at all.

If there's anyone here that knows the C# programming language, I posted a program with source code on the forum. It's a text string replacer that works if the list of text strings to swap is kept short enough. It still has some bugs, it can't handle the full list of UTF-8 and ASCII codes, it corrupts the list by replacing much of it with the unknown character box. With it fixed to handle long enough lists it'd be very useful for doing very fast replacing of any strings of text with any other strings of text.

01-30-2011, 09:26 PM	#145
bizzybody Addict Posts: 302 Karma: 8317682 Join Date: Apr 2007 Location: Idaho, USA Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati	All these characters are in the extended ASCII set, or Windows 1252 which is pretty much the same thing. The extended ASCII set with line drawing characters is a creation of IBM. I had to leave the semicolons off the UTF-8 codes because the forum software is not setup to leave everything between the code commands 100% exactly as entered. With the semicolons after the numbers the bleeping forum "helpfully" converts the codes to the characters. Any e-book conversion software that can convert to formats for which there is a reader for non-unicode platforms should have an option to use extended ASCII or Windows 1252 encoding, including converting all these UTF-8 codes (with the semicolon of course) to their ASCII equivalents instead of to their Unicode equivalents. The result looks exactly the same, but the file size can be significantly smaller. Code: &#033 ! &#034 " &#035 # &#036 $ &#037 % &#038 & &#039 ' &#040 ( &#041 ) &#042 * &#043 + &#044 , &#045 - &#046 . &#047 / &#048 0 &#049 1 &#050 2 &#051 3 &#052 4 &#053 5 &#054 6 &#055 7 &#056 8 &#057 9 &#058 : &#059 ; &#060 < &#061 = &#062 > &#063 ? &#064 @ &#065 A &#066 B &#067 C &#068 D &#069 E &#070 F &#071 G &#072 H &#073 I &#074 J &#075 K &#076 L &#077 M &#078 N &#079 O &#080 P &#081 Q &#082 R &#083 S &#084 T &#085 U &#086 V &#087 W &#088 X &#089 Y &#090 Z &#091 [ &#092 \ &#093 ] &#094 ^ &#095 _ &#096 ` &#097 a &#098 b &#099 c &#100 d &#101 e &#102 f &#103 g &#104 h &#105 i &#106 j &#107 k &#108 l &#109 m &#110 n &#111 o &#112 p &#113 q &#114 r &#115 s &#116 t &#117 u &#118 v &#119 w &#120 x &#121 y &#122 z &#123 { &#124 \| &#125 } &#126 ~ &#128 € &#130 ‚ &#131 ƒ &#132 „ &#133 … &#134 † &#135 ‡ &#136 ˆ &#137 ‰ &#138 Š &#139 ‹ &#140 Œ &#142 Ž &#145 ‘ &#146 ’ &#147 “ &#148 ” &#149 • &#150 – &#151 — &#152 ˜ &#153 ™ &#154 š &#155 › &#156 œ &#158 ž &#159 Ÿ &#160 &nbsp &#161 ¡ &#162 ¢ &#163 £ &#164 ¤ &#165 ¥ &#166 ¦ &#167 § &#168 ¨ &#169 © &#170 ª &#171 « &#172 ¬ &#173 * &#174 ® &#175 ¯ &#176 ° &#177 ± &#178 ² &#179 ³ &#180 ´ &#181 µ &#182 ¶ &#183 · &#184 ¸ &#185 ¹ &#186 º &#187 » &#188 ¼ &#189 ½ &#190 ¾ &#191 ¿ &#192 À &#193 Á &#194 Â &#195 Ã &#196 Ä &#197 Å &#198 Æ &#199 Ç &#200 È &#201 É &#202 Ê &#203 Ë &#204 Ì &#205 Í &#206 Î &#207 Ï &#208 Ð &#209 Ñ &#210 Ò &#211 Ó &#212 Ô &#213 Õ &#214 Ö &#215 × &#216 Ø &#217 Ù &#218 Ú &#219 Û &#220 Ü &#221 Ý &#222 Þ &#223 ß &#224 à &#225 á &#226 â &#227 ã &#228 ä &#229 å &#230 æ &#231 ç &#232 è &#233 é &#234 ê &#235 ë &#236 ì &#237 í &#238 î &#239 ï &#240 ð &#241 ñ &#242 ò &#243 ó &#244 ô &#245 õ &#246 ö &#247 ÷ &#248 ø &#249 ù &#250 ú &#251 û &#252 ü &#253 ý &#254 þ &#255 ÿ &#338 Œ &#339 œ &#352 Š &#353 š &#376 Ÿ &#402 ƒ &#8211 – &#8212 — &#8216 ‘ &#8217 ’ &#8218 ‚ &#8220 “ &#8221 ” &#8222 „ &#8224 † &#8225 ‡ &#8226 • &#8230 … &#8240 ‰ &#8364 € &#8482 ™ Like I said earlier, the best thing would be for e-book reader software to include its own Unicode support on platforms without native support, but unless someone else does it, that will never ever happen for Mobipocket since Amazon bought it for use on Kindle. Failing that, the only thing one can do when converting to formats for any Palm reader app or other non-unicode platform is to pre-convert the source to de-Unicode it, unless you like common punctuation replaced by spaces, blank boxes, 'weird' characters or simply removed and replaced with nothing at all. If there's anyone here that knows the C# programming language, I posted a program with source code on the forum. It's a text string replacer that works if the list of text strings to swap is kept short enough. It still has some bugs, it can't handle the full list of UTF-8 and ASCII codes, it corrupts the list by replacing much of it with the unknown character box. With it fixed to handle long enough lists it'd be very useful for doing very fast replacing of any strings of text with any other strings of text.