04-14-2022, 05:00 PM | #16 | |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
Quote:
https://drive.google.com/file/d/1stC...ew?usp=sharing I started from the azw3 from my new download, turned on most of the heuristic processing in Calibre's conversion routine (which didn't really help), converted it to an epub and have been working on that. I think I've got most everything cleared up but the <div id=...> stuff at the start of every paragraph. Last edited by enuddleyarbl; 04-14-2022 at 05:04 PM. |
|
04-14-2022, 07:16 PM | #17 |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
I think I've figured out the regex to search/replace those <div id=...> things out:
Search for: <(div id="\S+?") (class="para-normal">.+?)(<\/div>) Replace with: <p \2</p> "Wrap" and "Dot All" are checked in the Search dialog. That looks for the first occurrence of: <div id=blahblahblah class="para-normal">blahblahblah_until_it_reaches</div> it replaces that with: <p class="para-normal">blahblahblah</p> I'd reduced all the paragraph styles down to that para-normal and paracenter (one of mine). I just had to do another run with paracenter in the place of para-normal. That appears to have worked. Next, I've got to get rid of the <br ....> things that are at the end of most paragraphs. |
04-15-2022, 05:09 AM | #18 | |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
04-15-2022, 11:18 AM | #19 |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
My replacement regex for the <div... stuff added a </p> to the end of all those paragraphs. So, all I did was delete all the <br...> tags and left the </p> tags to handle the paragraph spacing. I had to manually adjust some of the paragraphs on a couple of pages (on, for instance, the copyright page). But, in general, that part was easy.
I've got a reasonable looking ebook now and am reading it for issues and to look for places I can put my own heading/chapter marks and scene breaks. For some reason, Zelazny and/or the publishers didn't bother with things like that. I've been scratching my head over why the publisher would have put that ridiculous html and css stuff in there. My guess is that they started with either a scanned copy of the paper book (or a PDF of one) and stuck those styles in there because of variations in how that came out instead of how it should have looked. Why else would they have lines/words in a single paragraph changing their height? I'd have thought that someone might have actually looked at the finished product and realized they were trying to reproduce scanning issues in CSS. And, BTW, the original issue I started this thread with (liga 0) is now OBE: I deleted almost all of the stuff in those areas. Sorry I so quickly caused this thread to stray from an Editor issue to a Conversion one. |
04-15-2022, 11:21 AM | #20 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I've had a quick look at the code and yes, it is a mess. Most of the CSS files are duplicates which you can delete. Then you can move all the CSS code from the remaining CSS to the first CSS and delete the rest. Just remember to fix all the HTML to link to that CSS.
|
04-15-2022, 08:05 PM | #21 |
Bibliophagist
Posts: 35,513
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Going by my paper copy, the original was one rather long chapter. I had a few other science fiction books that were the same. One of those fashion decisions that turned out bad?
|
04-15-2022, 10:59 PM | #22 |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
I've noticed that a lot of Zelazny and Pratchett books don't bother with chapters.
|
04-15-2022, 11:14 PM | #23 | |
Guru
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
|
I'm running EPUBCheck from within the Editor and it's pointing out duplicates id= lines. Something like this at the start of several of the files:
Code:
<body class="normbody"> <div class="page" id="section2_1"> <div id="box1_1" class="normbody"> ... https://www.w3docs.com/learn-css/css-id-and-class.html Quote:
|
|
04-16-2022, 12:06 PM | #24 |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
04-16-2022, 03:43 PM | #25 | |
Resident Curmudgeon
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I don't understand how to use "Automatic adding" | Pierre-Olivier | Library Management | 0 | 10-14-2013 08:34 AM |
"Error importing EPUB. EOCD not found. Not a ZIP archive? (Error Code 1068)" | oren | Android Devices | 1 | 02-20-2012 04:10 AM |
Seriously thoughtful I still don't understand "tea". | kindlekitten | Lounge | 20 | 06-04-2010 05:36 AM |
I don't understand meta data "Series" | Imatechie2006 | Calibre | 6 | 01-03-2010 03:08 AM |