Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 04-14-2022, 05:00 PM   #16
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
Quote:
Originally Posted by JSWolf View Post
@DaveLessnau Would you mind shifting from KF8 to ePub using KindleUnpack and then scrambling it and posting so we can see the original code? Thanks.
Here's the epub (Scrambled) that KindleUnpack produced using the KF8 to epub option:

https://drive.google.com/file/d/1stC...ew?usp=sharing

I started from the azw3 from my new download, turned on most of the heuristic processing in Calibre's conversion routine (which didn't really help), converted it to an epub and have been working on that. I think I've got most everything cleared up but the <div id=...> stuff at the start of every paragraph.

Last edited by enuddleyarbl; 04-14-2022 at 05:04 PM.
enuddleyarbl is offline   Reply With Quote
Old 04-14-2022, 07:16 PM   #17
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
I think I've figured out the regex to search/replace those <div id=...> things out:

Search for: <(div id="\S+?") (class="para-normal">.+?)(<\/div>)
Replace with: <p \2</p>

"Wrap" and "Dot All" are checked in the Search dialog. That looks for the first occurrence of:

<div id=blahblahblah class="para-normal">blahblahblah_until_it_reaches</div>

it replaces that with:

<p class="para-normal">blahblahblah</p>

I'd reduced all the paragraph styles down to that para-normal and paracenter (one of mine). I just had to do another run with paracenter in the place of para-normal.

That appears to have worked. Next, I've got to get rid of the <br ....> things that are at the end of most paragraphs.
enuddleyarbl is offline   Reply With Quote
Old 04-15-2022, 05:09 AM   #18
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DaveLessnau View Post
I think I've figured out the regex to search/replace those <div id=...> things out:

Search for: <(div id="\S+?") (class="para-normal">.+?)(<\/div>)
Replace with: <p \2</p>

"Wrap" and "Dot All" are checked in the Search dialog. That looks for the first occurrence of:

<div id=blahblahblah class="para-normal">blahblahblah_until_it_reaches</div>

it replaces that with:

<p class="para-normal">blahblahblah</p>

I'd reduced all the paragraph styles down to that para-normal and paracenter (one of mine). I just had to do another run with paracenter in the place of para-normal.

That appears to have worked. Next, I've got to get rid of the <br ....> things that are at the end of most paragraphs.
Before you get rid of the <br you'll need to code for the spacing to replace the <br.
JSWolf is offline   Reply With Quote
Old 04-15-2022, 11:18 AM   #19
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
My replacement regex for the <div... stuff added a </p> to the end of all those paragraphs. So, all I did was delete all the <br...> tags and left the </p> tags to handle the paragraph spacing. I had to manually adjust some of the paragraphs on a couple of pages (on, for instance, the copyright page). But, in general, that part was easy.

I've got a reasonable looking ebook now and am reading it for issues and to look for places I can put my own heading/chapter marks and scene breaks. For some reason, Zelazny and/or the publishers didn't bother with things like that.

I've been scratching my head over why the publisher would have put that ridiculous html and css stuff in there. My guess is that they started with either a scanned copy of the paper book (or a PDF of one) and stuck those styles in there because of variations in how that came out instead of how it should have looked. Why else would they have lines/words in a single paragraph changing their height? I'd have thought that someone might have actually looked at the finished product and realized they were trying to reproduce scanning issues in CSS.

And, BTW, the original issue I started this thread with (liga 0) is now OBE: I deleted almost all of the stuff in those areas. Sorry I so quickly caused this thread to stray from an Editor issue to a Conversion one.
enuddleyarbl is offline   Reply With Quote
Old 04-15-2022, 11:21 AM   #20
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I've had a quick look at the code and yes, it is a mess. Most of the CSS files are duplicates which you can delete. Then you can move all the CSS code from the remaining CSS to the first CSS and delete the rest. Just remember to fix all the HTML to link to that CSS.
JSWolf is offline   Reply With Quote
Old 04-15-2022, 08:05 PM   #21
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,513
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by DaveLessnau View Post
I've got a reasonable looking ebook now and am reading it for issues and to look for places I can put my own heading/chapter marks and scene breaks. For some reason, Zelazny and/or the publishers didn't bother with things like that.
Going by my paper copy, the original was one rather long chapter. I had a few other science fiction books that were the same. One of those fashion decisions that turned out bad?
DNSB is offline   Reply With Quote
Old 04-15-2022, 10:59 PM   #22
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
I've noticed that a lot of Zelazny and Pratchett books don't bother with chapters.
enuddleyarbl is offline   Reply With Quote
Old 04-15-2022, 11:14 PM   #23
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
I'm running EPUBCheck from within the Editor and it's pointing out duplicates id= lines. Something like this at the start of several of the files:

Code:
<body class="normbody">
<div class="page" id="section2_1">
<div id="box1_1" class="normbody">
...
These are leftovers from the original since I basically don't know what they're for. According to:

https://www.w3docs.com/learn-css/css-id-and-class.html

Quote:
An ID selector is a unique identifier of the HTML element to which a particular style must be applied. It is used only when a single HTML element on the web page must have a specific style.

Both in Internal and External Style Sheets we use hash (#) for an id selector
So, even ignoring the duplicates, there ought to be a style somewhere where #section2_1 and #box1_1 have some attribute applied. But, I can find no reference to them. I suppose it's possible I've deleted it already without realizing it. Can I just delete both of those <div... lines?
enuddleyarbl is offline   Reply With Quote
Old 04-16-2022, 12:06 PM   #24
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DaveLessnau View Post
I've noticed that a lot of Zelazny and Pratchett books don't bother with chapters.
Discworld has no chapters.
JSWolf is offline   Reply With Quote
Old 04-16-2022, 03:43 PM   #25
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DaveLessnau View Post
I'm running EPUBCheck from within the Editor and it's pointing out duplicates id= lines. Something like this at the start of several of the files:

Code:
<body class="normbody">
<div class="page" id="section2_1">
<div id="box1_1" class="normbody">
...
These are leftovers from the original since I basically don't know what they're for. According to:

https://www.w3docs.com/learn-css/css-id-and-class.html



So, even ignoring the duplicates, there ought to be a style somewhere where #section2_1 and #box1_1 have some attribute applied. But, I can find no reference to them. I suppose it's possible I've deleted it already without realizing it. Can I just delete both of those <div... lines?
A lot of the iDs are for footnotes as Discworld books can have a number of footnotes.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I don't understand how to use "Automatic adding" Pierre-Olivier Library Management 0 10-14-2013 08:34 AM
"Error importing EPUB. EOCD not found. Not a ZIP archive? (Error Code 1068)" oren Android Devices 1 02-20-2012 04:10 AM
Seriously thoughtful I still don't understand "tea". kindlekitten Lounge 20 06-04-2010 05:36 AM
I don't understand meta data "Series" Imatechie2006 Calibre 6 01-03-2010 03:08 AM


All times are GMT -4. The time now is 02:38 AM.


MobileRead.com is a privately owned, operated and funded community.