![]() |
#16 |
350 Hoarder
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
Some of those spans call up italics or bold, so to do it right, each one really should be checked and not blindly run a regex script removing them all. Quite a few of them are useless though, even with many <span></span> for no reason multiple times with a sentence.
And with over 10,000 instances of span in the book, I'm not sure I want to do the publisher's job on it since I've read it already. If it was a book I scanned, even for my own use, I'd have gladly cleaned it all up. I did make a lot of corrections for spelling and grammar and works omitted, run together, punctuation mid-sentence where you know it was picked up by OCR and then not proofed very well. I may still give in though and do it when I'm bored, just to see if the page numbers change still (although I'm not sure they will). |
![]() |
![]() |
![]() |
#17 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,760
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Well, Modify ePub will get rid of those empty <span></span> combinations. That's a start. Then you can see what else needs to go.
|
![]() |
![]() |
Advert | |
|
![]() |
#18 |
350 Hoarder
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
I made a dent in it last night and got the number down half to a just over 5000 now in a few minutes using regex to remove the <span></span> and <span> </span> there for no reason. I might tackle the rest later this weekend.
That still didn't change the size of the book in size though, not even by 1kb which I find odd. And I still don't see anything that will give the inflated page numbering. |
![]() |
![]() |
![]() |
#19 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,788
Karma: 103362673
Join Date: Apr 2011
Device: pb360
|
Quote:
The <span></span> probably frequently followed and/or preceded another frequently occurring string. Make two files with 100 and 1000 lines, all of which are: <prebloat></prebloat><postbloat></postbloat> Make another couple of files with the same number of lines, all of which are: <prebloat></prebloat><span></span><postbloat></postbloat> zip all four files into separate zip files. Code:
ls -l span-* -rw-r--r-- 1 me me 4500 Jul 26 14:38 span-no.txt -rw-r--r-- 1 me me 231 Jul 26 14:39 span-no.zip -rw-r--r-- 1 me me 5800 Jul 26 14:35 span-yes.txt -rw-r--r-- 1 me me 245 Jul 26 14:39 span-yes.zip ls -l span-* -rw-r--r-- 1 me me 45000 Jul 26 14:41 span-no.txt -rw-r--r-- 1 me me 349 Jul 26 14:42 span-no.zip -rw-r--r-- 1 me me 58000 Jul 26 14:41 span-yes.txt -rw-r--r-- 1 me me 397 Jul 26 14:42 span-yes.zip |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
pdf (with page numbers) to epub | crill | Workshop | 13 | 01-12-2019 09:13 AM |
Aura Page numbers in ePub? | rxmom03 | Kobo Reader | 2 | 09-29-2013 02:19 PM |
continuous page numbers in Epub | pcskibum | Conversion | 9 | 03-07-2012 09:42 AM |
Epub to Kindle and page numbers | apropos | Calibre | 11 | 12-09-2010 01:42 PM |
page numbers messed up in my epub | verybadcat | ePub | 1 | 04-13-2010 04:47 PM |