Quote:
Originally Posted by Ripplinger
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span> </span> there for no reason. I might tackle the rest later this weekend.
That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering.
|
That is why the (false) claim is made that bloat in XML doesn't matter.
The <span></span> probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are:
<prebloat></prebloat><postbloat></postbloat>
Make another couple of files with the same number of lines, all of which are:
<prebloat></prebloat><span></span><postbloat></postbloat>
zip all four files into separate zip files.
Code:
ls -l span-*
-rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt
-rw-r--r-- 1 me me 231 Jul 26 14:39 span-no.zip
-rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt
-rw-r--r-- 1 me me 245 Jul 26 14:39 span-yes.zip
ls -l span-*
-rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt
-rw-r--r-- 1 me me 349 Jul 26 14:42 span-no.zip
-rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt
-rw-r--r-- 1 me me 397 Jul 26 14:42 span-yes.zip
Actually, every other line should be some unique string, but this should give you an idea of what is going on when the characters before and after <span></span> are often the same.