View Single Post
Old 07-26-2014, 06:10 PM   #19
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,813
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by Ripplinger View Post
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span>&nbsp;</span> there for no reason. I might tackle the rest later this weekend.

That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering.
That is why the (false) claim is made that bloat in XML doesn't matter.

The <span></span> probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are:
<prebloat></prebloat><postbloat></postbloat>

Make another couple of files with the same number of lines, all of which are:
<prebloat></prebloat><span></span><postbloat></postbloat>

zip all four files into separate zip files.
Code:
ls -l span-*
-rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt
-rw-r--r-- 1 me me  231 Jul 26 14:39 span-no.zip
-rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt
-rw-r--r-- 1 me me  245 Jul 26 14:39 span-yes.zip

ls -l span-*
-rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt
-rw-r--r-- 1 me me   349 Jul 26 14:42 span-no.zip
-rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt
-rw-r--r-- 1 me me   397 Jul 26 14:42 span-yes.zip
Actually, every other line should be some unique string, but this should give you an idea of what is going on when the characters before and after <span></span> are often the same.
j.p.s is offline   Reply With Quote