06-30-2017, 12:01 AM | #1 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
help edit a very bad kindle edition - a sttyle per word
i bought what must rate as one of the worst coded books ever. An expensive kindle edition official game guide, where each word has its own style, making whole paragraphs non reflowable
can someone please suggest how to regex out some of this complexity with calibre editor a snippet follows- a single para. there are hundreds of these. unsurprisingly - all tries to convert to another format are failing, hang for hours at 47% Code:
<p class="para"> <span class="line fs1"> <span class="word si fs1" style="left: 167px; top: 127px; width: 122px; ">TALOS</span> <span class="word si fs1" style="left: 301px; top: 127px; width: 10px; ">I</span> <span class="word si fs1" style="left: 322px; top: 127px; width: 127px; ">LOBBY</span> </span> <span class="line fs1"> <span class="word si fs1" style="left: 813px; top: 127px; width: 235px; ">HARDWARE</span> <span class="word si fs1" style="left: 1059px; top: 127px; width: 98px; ">LABS</span> </span> <span class="line fs17"> <span class="word si fs17" style="left: 120px; top: 198px; width: 37px; ">KEY</span> <span class="word si fs17" style="left: 163px; top: 198px; width: 104px; ">FACILITIES:</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 120px; top: 238px; width: 97px; ">TRANSTAR</span> <span class="word si fs7" style="left: 223px; top: 238px; width: 71px; ">EXHIBIT</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 120px; top: 278px; width: 105px; ">EXECUTIVE</span> <span class="word si fs7" style="left: 231px; top: 278px; width: 82px; ">OFFICES</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 120px; top: 317px; width: 56px; ">SALES</span> <span class="word si fs7" style="left: 182px; top: 317px; width: 90px; ">DIVISION</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 120px; top: 357px; width: 77px; ">HUMAN</span> <span class="word si fs7" style="left: 203px; top: 357px; width: 116px; ">RESOURCES</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 442px; top: 238px; width: 14px; ">IT</span> <span class="word si fs7" style="left: 463px; top: 238px; width: 91px; ">SECURITY</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 442px; top: 278px; width: 82px; ">TRAUMA</span> <span class="word si fs7" style="left: 531px; top: 278px; width: 75px; ">CENTER</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 442px; top: 317px; width: 53px; ">STAFF</span> <span class="word si fs7" style="left: 502px; top: 317px; width: 85px; ">LOUNGE</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 405px; width: 61px; ">When</span> <span class="word si fs0" style="left: 188px; top: 405px; width: 84px; ">TranStar</span> <span class="word si fs0" style="left: 279px; top: 405px; width: 98px; ">acquired</span> <span class="word si fs0" style="left: 383px; top: 405px; width: 35px; ">the</span> <span class="word si fs0" style="left: 425px; top: 405px; width: 67px; ">space</span> <span class="word si fs0" style="left: 499px; top: 405px; width: 71px; ">station</span> <span class="word si fs0" style="left: 576px; top: 405px; width: 18px; ">in</span> <span class="word si fs0" style="left: 601px; top: 405px; width: 55px; ">2030,</span> <span class="word si fs0" style="left: 663px; top: 405px; width: 47px; ">they</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 436px; width: 74px; ">spared</span> <span class="word si fs0" style="left: 201px; top: 436px; width: 27px; ">no</span> <span class="word si fs0" style="left: 235px; top: 436px; width: 90px; ">expense</span> <span class="word si fs0" style="left: 332px; top: 436px; width: 18px; ">in</span> <span class="word si fs0" style="left: 356px; top: 436px; width: 123px; ">refurbishing</span> <span class="word si fs0" style="left: 486px; top: 436px; width: 35px; ">the</span> <span class="word si fs0" style="left: 528px; top: 436px; width: 67px; ">lobby,</span> <span class="word si fs0" style="left: 602px; top: 436px; width: 43px; ">with</span> <span class="word si fs0" style="left: 652px; top: 436px; width: 35px; ">the</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 468px; width: 48px; ">goal</span> <span class="word si fs0" style="left: 175px; top: 468px; width: 21px; ">of</span> <span class="word si fs0" style="left: 203px; top: 468px; width: 109px; ">projecting</span> <span class="word si fs0" style="left: 319px; top: 468px; width: 14px; ">a</span> <span class="word si fs0" style="left: 340px; top: 468px; width: 60px; ">warm</span> <span class="word si fs0" style="left: 407px; top: 468px; width: 43px; ">and</span> <span class="word si fs0" style="left: 457px; top: 468px; width: 75px; ">inviting</span> <span class="word si fs0" style="left: 539px; top: 468px; width: 129px; ">atmosphere</span> <span class="word si fs0" style="left: 675px; top: 468px; width: 22px; ">to</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 500px; width: 67px; ">guests</span> <span class="word si fs0" style="left: 193px; top: 500px; width: 43px; ">and</span> <span class="word si fs0" style="left: 243px; top: 500px; width: 124px; ">employees.</span> <span class="word si fs0" style="left: 374px; top: 500px; width: 124px; ">Connected</span> <span class="word si fs0" style="left: 504px; top: 500px; width: 22px; ">to</span> <span class="word si fs0" style="left: 533px; top: 500px; width: 35px; ">the</span> <span class="word si fs0" style="left: 574px; top: 500px; width: 72px; ">Shuttle</span> <span class="word si fs0" style="left: 653px; top: 500px; width: 45px; ">Bay,</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 532px; width: 35px; ">the</span> <span class="word si fs0" style="left: 161px; top: 532px; width: 60px; ">lobby</span> <span class="word si fs0" style="left: 229px; top: 532px; width: 65px; ">serves</span> <span class="word si fs0" style="left: 300px; top: 532px; width: 23px; ">as</span> <span class="word si fs0" style="left: 330px; top: 532px; width: 14px; ">a</span> <span class="word si fs0" style="left: 351px; top: 532px; width: 75px; ">central</span> <span class="word si fs0" style="left: 434px; top: 532px; width: 48px; ">hub,</span> <span class="word si fs0" style="left: 488px; top: 532px; width: 100px; ">providing</span> <span class="word si fs0" style="left: 595px; top: 532px; width: 49px; ">easy</span> <span class="word si fs0" style="left: 652px; top: 532px; width: 75px; ">access</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 563px; width: 22px; ">to</span> <span class="word si fs0" style="left: 148px; top: 563px; width: 35px; ">the</span> <span class="word si fs0" style="left: 190px; top: 563px; width: 114px; ">Neuromod</span> <span class="word si fs0" style="left: 311px; top: 563px; width: 85px; ">Division,</span> <span class="word si fs0" style="left: 403px; top: 563px; width: 151px; ">Psychotronics,</span> <span class="word si fs0" style="left: 561px; top: 563px; width: 43px; ">and</span> <span class="word si fs0" style="left: 611px; top: 563px; width: 105px; ">Hardware</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 595px; width: 55px; ">Labs.</span> <span class="word si fs0" style="left: 181px; top: 595px; width: 37px; ">The</span> <span class="word si fs0" style="left: 225px; top: 595px; width: 53px; ">main</span> <span class="word si fs0" style="left: 285px; top: 595px; width: 30px; ">lift,</span> <span class="word si fs0" style="left: 322px; top: 595px; width: 85px; ">located</span> <span class="word si fs0" style="left: 414px; top: 595px; width: 18px; ">in</span> <span class="word si fs0" style="left: 438px; top: 595px; width: 35px; ">the</span> <span class="word si fs0" style="left: 480px; top: 595px; width: 70px; ">center</span> <span class="word si fs0" style="left: 557px; top: 595px; width: 21px; ">of</span> <span class="word si fs0" style="left: 585px; top: 595px; width: 35px; ">the</span> <span class="word si fs0" style="left: 627px; top: 595px; width: 67px; ">lobby,</span> <span class="word si fs0" style="left: 700px; top: 595px; width: 13px; ">is</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 120px; top: 626px; width: 120px; ">connected</span> <span class="word si fs0" style="left: 247px; top: 626px; width: 22px; ">to</span> <span class="word si fs0" style="left: 275px; top: 626px; width: 35px; ">the</span> <span class="word si fs0" style="left: 317px; top: 626px; width: 115px; ">Arboretum</span> <span class="word si fs0" style="left: 439px; top: 626px; width: 43px; ">and</span> <span class="word si fs0" style="left: 488px; top: 626px; width: 36px; ">Life</span> <span class="word si fs0" style="left: 531px; top: 626px; width: 88px; ">Support.</span> </span> <span class="line fs17"> <span class="word si fs17" style="left: 765px; top: 198px; width: 37px; ">KEY</span> <span class="word si fs17" style="left: 808px; top: 198px; width: 104px; ">FACILITIES:</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 765px; top: 238px; width: 173px; ">DEMONSTRATION</span> <span class="word si fs7" style="left: 943px; top: 238px; width: 80px; ">THEATER</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 765px; top: 278px; width: 138px; ">COMBUSTION</span> <span class="word si fs7" style="left: 909px; top: 278px; width: 37px; ">LAB</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 765px; top: 317px; width: 106px; ">CHEMICAL</span> <span class="word si fs7" style="left: 878px; top: 317px; width: 37px; ">LAB</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 765px; top: 357px; width: 102px; ">BALLISTICS</span> <span class="word si fs7" style="left: 874px; top: 357px; width: 37px; ">LAB</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 1088px; top: 238px; width: 68px; ">BEAMS</span> <span class="word si fs7" style="left: 1162px; top: 238px; width: 45px; ">AND</span> <span class="word si fs7" style="left: 1213px; top: 238px; width: 66px; ">WAVES</span> <span class="word si fs7" style="left: 1285px; top: 238px; width: 37px; ">LAB</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 1088px; top: 278px; width: 96px; ">MACHINE</span> <span class="word si fs7" style="left: 1190px; top: 278px; width: 54px; ">SHOP</span> </span> <span class="line fs7"> <span class="word si fs7" style="left: 1088px; top: 317px; width: 89px; ">AIRLOCK</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 405px; width: 103px; ">Hardware</span> <span class="word si fs0" style="left: 874px; top: 405px; width: 47px; ">Labs</span> <span class="word si fs0" style="left: 928px; top: 405px; width: 13px; ">is</span> <span class="word si fs0" style="left: 947px; top: 405px; width: 14px; ">a</span> <span class="word si fs0" style="left: 968px; top: 405px; width: 69px; ">secure</span> <span class="word si fs0" style="left: 1043px; top: 405px; width: 91px; ">research</span> <span class="word si fs0" style="left: 1140px; top: 405px; width: 42px; ">and</span> <span class="word si fs0" style="left: 1189px; top: 405px; width: 142px; ">development</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 436px; width: 72px; ">facility.</span> <span class="word si fs0" style="left: 843px; top: 436px; width: 70px; ">Guests</span> <span class="word si fs0" style="left: 919px; top: 436px; width: 42px; ">and</span> <span class="word si fs0" style="left: 967px; top: 436px; width: 137px; ">unauthorized</span> <span class="word si fs0" style="left: 1110px; top: 436px; width: 102px; ">personnel</span> <span class="word si fs0" style="left: 1219px; top: 436px; width: 35px; ">are</span> <span class="word si fs0" style="left: 1260px; top: 436px; width: 69px; ">limited</span> <span class="word si fs0" style="left: 1335px; top: 436px; width: 21px; ">to</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 468px; width: 68px; ">visiting</span> <span class="word si fs0" style="left: 840px; top: 468px; width: 34px; ">the</span> <span class="word si fs0" style="left: 880px; top: 468px; width: 53px; ">foyer</span> <span class="word si fs0" style="left: 939px; top: 468px; width: 42px; ">and</span> <span class="word si fs0" style="left: 988px; top: 468px; width: 153px; ">Demonstration</span> <span class="word si fs0" style="left: 1146px; top: 468px; width: 83px; ">Theater.</span> <span class="word si fs0" style="left: 1236px; top: 468px; width: 16px; ">A</span> <span class="word si fs0" style="left: 1258px; top: 468px; width: 81px; ">number</span> <span class="word si fs0" style="left: 1345px; top: 468px; width: 21px; ">of</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 500px; width: 42px; ">labs</span> <span class="word si fs0" style="left: 813px; top: 500px; width: 42px; ">and</span> <span class="word si fs0" style="left: 862px; top: 500px; width: 34px; ">the</span> <span class="word si fs0" style="left: 902px; top: 500px; width: 92px; ">Machine</span> <span class="word si fs0" style="left: 1001px; top: 500px; width: 52px; ">Shop</span> <span class="word si fs0" style="left: 1059px; top: 500px; width: 35px; ">are</span> <span class="word si fs0" style="left: 1100px; top: 500px; width: 83px; ">located</span> <span class="word si fs0" style="left: 1189px; top: 500px; width: 82px; ">beyond</span> <span class="word si fs0" style="left: 1277px; top: 500px; width: 34px; ">the</span> <span class="word si fs0" style="left: 1318px; top: 500px; width: 79px; ">security</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 532px; width: 123px; ">checkpoint.</span> <span class="word si fs0" style="left: 895px; top: 532px; width: 49px; ">Here</span> <span class="word si fs0" style="left: 950px; top: 532px; width: 119px; ">researchers</span> <span class="word si fs0" style="left: 1076px; top: 532px; width: 42px; ">and</span> <span class="word si fs0" style="left: 1124px; top: 532px; width: 102px; ">engineers</span> <span class="word si fs0" style="left: 1232px; top: 532px; width: 118px; ">experiment</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 563px; width: 42px; ">with</span> <span class="word si fs0" style="left: 814px; top: 563px; width: 14px; ">a</span> <span class="word si fs0" style="left: 834px; top: 563px; width: 70px; ">variety</span> <span class="word si fs0" style="left: 911px; top: 563px; width: 21px; ">of</span> <span class="word si fs0" style="left: 938px; top: 563px; width: 100px; ">emerging</span> <span class="word si fs0" style="left: 1045px; top: 563px; width: 135px; ">technologies</span> <span class="word si fs0" style="left: 1186px; top: 563px; width: 21px; ">to</span> <span class="word si fs0" style="left: 1213px; top: 563px; width: 87px; ">develop</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 595px; width: 101px; ">hardware</span> <span class="word si fs0" style="left: 873px; top: 595px; width: 118px; ">prototypes.</span> <span class="word si fs0" style="left: 997px; top: 595px; width: 100px; ">Currently,</span> <span class="word si fs0" style="left: 1104px; top: 595px; width: 34px; ">the</span> <span class="word si fs0" style="left: 1144px; top: 595px; width: 42px; ">labs</span> <span class="word si fs0" style="left: 1192px; top: 595px; width: 35px; ">are</span> <span class="word si fs0" style="left: 1233px; top: 595px; width: 87px; ">pursuing</span> </span> <span class="line fs0"> <span class="word si fs0" style="left: 765px; top: 626px; width: 82px; ">multiple</span> <span class="word si fs0" style="left: 853px; top: 626px; width: 83px; ">projects</span> <span class="word si fs0" style="left: 943px; top: 626px; width: 42px; ">with</span> <span class="word si fs0" style="left: 991px; top: 626px; width: 14px; ">a</span> <span class="word si fs0" style="left: 1012px; top: 626px; width: 64px; ">broad</span> <span class="word si fs0" style="left: 1083px; top: 626px; width: 62px; ">range</span> <span class="word si fs0" style="left: 1151px; top: 626px; width: 21px; ">of</span> <span class="word si fs0" style="left: 1178px; top: 626px; width: 134px; ">applications.</span> </span> </p> Last edited by stumped; 06-30-2017 at 12:07 AM. |
06-30-2017, 02:11 AM | #2 | |
null operator (he/him)
Posts: 20,459
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@stumped - I doubt the conversions are really 'hanging', just taking a long time. Conversion is a stepwise process and 47% is an often seen rest point along the way - from Aix to Ghent
Conversions have been known to take more than 24 hours. However, to your main question, have a look at Diap's Editing Toolbag, one of its features is Quote:
BR |
|
Advert | |
|
06-30-2017, 02:19 AM | #3 |
Resident Curmudgeon
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I would return it and forget it ever existed.
|
06-30-2017, 02:52 AM | #4 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
I am leaning that way.
I can do most things with regex, but this has me beat. its seems to be styling each individual word within each individual paragraph in order to achieve some absolute positioning, PDF style, or it's a really bad pdf conversion I am not so good with regex in the calibre editor though & I can't get it into sigil 'cos it will not convert, no matter how many conversion options I disable I'd like to try to strip all the line and word spans from one para, to see how it looks, and as a technical challenge, before giving up and sending it back |
06-30-2017, 03:55 AM | #5 | |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
1. Replace all <p class="para"> with <div class="para"> and all </p> with </div> 2. In Regex mode replace: <span class="[^"]+" style="[^"]+">(.*?)</span> with \1 3. Replace all <span class with <p class and all remaining </span> with </p>. Last edited by Doitsu; 06-30-2017 at 03:57 AM. |
|
Advert | |
|
06-30-2017, 05:01 AM | #6 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
ok - i did manage to strip some stuff, but then I got a blotch of text overwriting other text. so i'ts deleted and going back to amazon- life's too short ...
but I cannot believe that a professional book ( not a cheap Kindle unlimited freebie) was so badly put together. this " kindle edition" was also over 100Mb and full of big colour pictures, which is kinda pointless for an e-ink kindle !. for once, an actual paper version is looking more appealing and almost the same price |
07-01-2017, 12:20 PM | #7 |
Resident Curmudgeon
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I would tell Amazon and the publisher how bad it is and maybe Amazon will pull it so the publisher has to fix it.
|
07-01-2017, 12:51 PM | #8 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
i returned it for "quality issues" in the reasons menu, and left a one star review,
& deleted my copy. don't plan on having a discussion unless they contact me |
07-02-2017, 01:51 AM | #9 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
FYI- poking around the amazon help pages, I came across a definition of Kindle Print Replica format - not something I'd heard of before, but I suspect this is what I encountered...
About Kindle Print Replica Kindle Print Replica textbooks maintain the formatting and layout of their print editions while also offering many of the advantages of Kindle books. Each page in a Print Replica textbook displays words and images in the same position as the corresponding print edition, while adding features such as ..... seems like a PDF clone, to me ? |
07-02-2017, 01:58 AM | #10 | |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
I've bought a number of Print Replica books precisely because PDF is the format which works best for some types of material, where you want to match the exact formatting of the printed book. I read them (as PDFs) on my iPad. |
|
07-02-2017, 02:05 AM | #11 |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
i deleted it, so I cannot now check, but I don't recall seeing a 4 . pretty sure calibre saw it as azw3, & T = edit book displayed it as that
I do not have KindleUnplack plugin, unless that is there by default that print replica definition was on a general help page, not a link from book details. if you have an azw4 to test with, does it open in calibre book editor , and display with html, CSS etc components like any other azw ? |
07-02-2017, 03:39 AM | #12 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
If it's a Print Replica book it'll tell you on the Amazon description page. E.g. this is one that I bought:
https://www.amazon.co.uk/New-Oxford-.../dp/B06WP74DPV As you see, below the description of the book it says "Format: Print Replica". No, you can't open a Print Replica book in the editor. As I say, it's a PDF in a Kindle "wrapper". There's no HTML, CSS, etc. Sounds as if you just had a really badly formatted AZW3 book! Last edited by HarryT; 07-02-2017 at 03:47 AM. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do you edit as Epub file, the text in Word? | automa | Sigil | 13 | 06-13-2013 07:02 PM |
Edit text file in Word first? | owly | Sigil | 23 | 04-05-2013 11:56 AM |
Calibre and how to edit word documents... | edhasted | Conversion | 4 | 01-27-2013 04:14 PM |
Anyone wants to use microsoft office onenote or word to edit e-ink pdf file? | seagal | enTourage Archive | 69 | 05-25-2011 07:02 AM |
Iliad Book Edition: a viable word processor? | lotusindigo | iRex | 12 | 08-10-2009 10:32 PM |