Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 07-25-2014, 11:29 PM   #16
Ripplinger
350 Hoarder
Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.
 
Ripplinger's Avatar
 
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
Some of those spans call up italics or bold, so to do it right, each one really should be checked and not blindly run a regex script removing them all. Quite a few of them are useless though, even with many <span></span> for no reason multiple times with a sentence.

And with over 10,000 instances of span in the book, I'm not sure I want to do the publisher's job on it since I've read it already. If it was a book I scanned, even for my own use, I'd have gladly cleaned it all up. I did make a lot of corrections for spelling and grammar and works omitted, run together, punctuation mid-sentence where you know it was picked up by OCR and then not proofed very well.

I may still give in though and do it when I'm bored, just to see if the page numbers change still (although I'm not sure they will).
Ripplinger is offline   Reply With Quote
Old 07-26-2014, 11:30 AM   #17
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Well, Modify ePub will get rid of those empty <span></span> combinations. That's a start. Then you can see what else needs to go.
JSWolf is online now   Reply With Quote
Advert
Old 07-26-2014, 04:17 PM   #18
Ripplinger
350 Hoarder
Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.Ripplinger ought to be getting tired of karma fortunes by now.
 
Ripplinger's Avatar
 
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span>&nbsp;</span> there for no reason. I might tackle the rest later this weekend.

That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering.
Ripplinger is offline   Reply With Quote
Old 07-26-2014, 06:10 PM   #19
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by Ripplinger View Post
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span>&nbsp;</span> there for no reason. I might tackle the rest later this weekend.

That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering.
That is why the (false) claim is made that bloat in XML doesn't matter.

The <span></span> probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are:
<prebloat></prebloat><postbloat></postbloat>

Make another couple of files with the same number of lines, all of which are:
<prebloat></prebloat><span></span><postbloat></postbloat>

zip all four files into separate zip files.
Code:
ls -l span-*
-rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt
-rw-r--r-- 1 me me  231 Jul 26 14:39 span-no.zip
-rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt
-rw-r--r-- 1 me me  245 Jul 26 14:39 span-yes.zip

ls -l span-*
-rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt
-rw-r--r-- 1 me me   349 Jul 26 14:42 span-no.zip
-rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt
-rw-r--r-- 1 me me   397 Jul 26 14:42 span-yes.zip
Actually, every other line should be some unique string, but this should give you an idea of what is going on when the characters before and after <span></span> are often the same.
j.p.s is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf (with page numbers) to epub crill Workshop 13 01-12-2019 09:13 AM
Aura Page numbers in ePub? rxmom03 Kobo Reader 2 09-29-2013 02:19 PM
continuous page numbers in Epub pcskibum Conversion 9 03-07-2012 09:42 AM
Epub to Kindle and page numbers apropos Calibre 11 12-09-2010 01:42 PM
page numbers messed up in my epub verybadcat ePub 1 04-13-2010 04:47 PM


All times are GMT -4. The time now is 04:26 AM.


MobileRead.com is a privately owned, operated and funded community.