MobileRead Forums - View Single Post

j.p.s · 07-26-2014, 06:10 PM

Quote:

Originally Posted by Ripplinger

I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the and   there for no reason. I might tackle the rest later this weekend.

That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering.

That is why the (false) claim is made that bloat in XML doesn't matter.

The probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are:
<prebloat></prebloat><postbloat></postbloat>

Make another couple of files with the same number of lines, all of which are:
<prebloat></prebloat><postbloat></postbloat>

zip all four files into separate zip files.

Code:

ls -l span-*
-rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt
-rw-r--r-- 1 me me  231 Jul 26 14:39 span-no.zip
-rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt
-rw-r--r-- 1 me me  245 Jul 26 14:39 span-yes.zip

ls -l span-*
-rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt
-rw-r--r-- 1 me me   349 Jul 26 14:42 span-no.zip
-rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt
-rw-r--r-- 1 me me   397 Jul 26 14:42 span-yes.zip

Actually, every other line should be some unique string, but this should give you an idea of what is going on when the characters before and after are often the same.