07-25-2014, 11:29 PM | #16 |
350 Hoarder
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
Some of those spans call up italics or bold, so to do it right, each one really should be checked and not blindly run a regex script removing them all. Quite a few of them are useless though, even with many <span></span> for no reason multiple times with a sentence.
And with over 10,000 instances of span in the book, I'm not sure I want to do the publisher's job on it since I've read it already. If it was a book I scanned, even for my own use, I'd have gladly cleaned it all up. I did make a lot of corrections for spelling and grammar and works omitted, run together, punctuation mid-sentence where you know it was picked up by OCR and then not proofed very well. I may still give in though and do it when I'm bored, just to see if the page numbers change still (although I'm not sure they will). |
07-26-2014, 11:30 AM | #17 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Well, Modify ePub will get rid of those empty <span></span> combinations. That's a start. Then you can see what else needs to go.
|
Advert | |
|
07-26-2014, 04:17 PM | #18 |
350 Hoarder
Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span> </span> there for no reason. I might tackle the rest later this weekend.
That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering. |
07-26-2014, 06:10 PM | #19 | |
Grand Sorcerer
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
|
Quote:
The <span></span> probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are: <prebloat></prebloat><postbloat></postbloat> Make another couple of files with the same number of lines, all of which are: <prebloat></prebloat><span></span><postbloat></postbloat> zip all four files into separate zip files. Code:
ls -l span-* -rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt -rw-r--r-- 1 me me 231 Jul 26 14:39 span-no.zip -rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt -rw-r--r-- 1 me me 245 Jul 26 14:39 span-yes.zip ls -l span-* -rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt -rw-r--r-- 1 me me 349 Jul 26 14:42 span-no.zip -rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt -rw-r--r-- 1 me me 397 Jul 26 14:42 span-yes.zip |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
pdf (with page numbers) to epub | crill | Workshop | 13 | 01-12-2019 09:13 AM |
Aura Page numbers in ePub? | rxmom03 | Kobo Reader | 2 | 09-29-2013 02:19 PM |
continuous page numbers in Epub | pcskibum | Conversion | 9 | 03-07-2012 09:42 AM |
Epub to Kindle and page numbers | apropos | Calibre | 11 | 12-09-2010 01:42 PM |
page numbers messed up in my epub | verybadcat | ePub | 1 | 04-13-2010 04:47 PM |