Quote:
Originally Posted by JSWolf
Do you know how much larger the eBook would actually be with ­ added to every place possible that a hyphen could be?
|
Yes, of course I know.
The html part is rougly twice the original, estimated with a prefix/suffix of 2 characters, which gives the highest amount of hyphenation, considering an average of 2.5 characters per syllable and estimating 5 characters to store the "
­" entity (which gets actually easily compressed).
Spoiler:
Just put the HTML files in a directory and run this script to have an estimate
Code:
#!/bin/bash
for F in *.html *.xhtml
do lynx --assume-charset UTF8 --dump $F |
tr '[:alpha:]' x |
tr -cs 'x' '[\n*]'
done |
sort |
uniq -c |
sort -n |
grep xxxx |
awk 'BEGIN { \
SL=2.5; \
TOT=0; \
SHY=5; \
} \
{
print $1 " instances of " length($2) " character long words" ; \
TOT += $1 * (length($2) / SL + 0);
} \
END {
print "Estimated text increase: " SHY*TOT \
} '
echo -n "Total text: "
du -c -b *.html | tail -1 | awk '{print $1}'
There are no memory concerns (embedding fonts - even with subsampling - or adding oversized images for a best view with KPW which has an higher resolution uses much more memory and images cannot be compressed like text) and - at the moment - this is the
ONLY way to have hyphenation on AZW3 files on e-ink Kindles.