04-20-2011, 04:44 PM | #1 | |
Colonel Mustard
Posts: 90
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPhone 6, Kindle Paperwhite 2, iPad 2
|
Easiest way to clean an ePub file?
Hello everyone,
I've been the happy owner of a Kindle 3 for about 6 months now. I love it, and it's great that there are so many books available in the public domain for free. Only thing, as you probably all know, the formatting isn't always perfect in the books you can find around the web. So after spending some time reading them as-is, I quickly started spending some time trying to fix simple things in the ebooks I found—mainly adding TOC/correct chapter breaks, small tweaks in the CSS to change the indents or alignment, etc. (with Sigil that is, and doing the ePub/mobi conversion with Calibre). I guess you can see where I'm going: the more I learned about ebooks, the more unsatisfied I was with the ones I had, and the more things I wanted to change in them. A couple days ago, I finally found a text I had been searching for some time, in ePub format. Problem is: when I opened it, I noticed it was all in bold. So okay, I figured, I will just change the CSS. Then I noticed something strange (you will notice how little I know about ePubs): the "font-weight: bolder" tag was not in the "calibre2" class, which was used for every paragraph of the book, but it was in a "calibre3" class (that consister only of this "font-weight: bolder" tag). When looking at the code view in Sigil, I noticed that every single paragraph of the book had those 2 classes called at the beginning: <p class="calibre2"><b class="calibre3">. My first reaction to get rid of the bold problem was simply to change the "font-weight" to put "normal" instead of "bolder". Doing this kinda fix the problem, but not in a very satisfying way I must admit: now I have a "calibre3" class that is exactly useless and is still called at the beginning of everything paragraph of the book... My question is simple: what do you do in these circumstances? Remove all the <b class="calibre3">? Start a new clean file with the plain text? Actually, I wouldn't really be asking if the problem was only this. Thing is, I realized that every paragraph was full of multiple and repetitive class calls (I don't know how to name this)... For example, here is one paragraph from the code view in Sigil: Quote:
So at this point, what do you suggest is the simplest way to get a clean book? Just copy paste the "book view" of Sigil and start with the plain text a new ePub file? Or is there a way to remove all the unnecessary formatting? Thanks in advance for your help (and for all the great information available on the forums), and sorry if I made some grammar/syntax mistakes—as you can maybe guess from the quoted paragraph, english isn't my first language. Michael |
|
04-20-2011, 05:55 PM | #2 |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
Honestly, it looks like a serious auto-conversion error. I have no idea why your book is bolding all over like that, but I'm pretty sure it's not supposed to be. Likely someone started out with a shoddy source file and fed it into Calibre, which can only try its best with what it's given.
Since the bolding doesn't seem to do anything useful, I'd just get rid of it by doing a find/replace for </b> <b class="calibre3"> and replace it with a single space, then replace <p class="calibre2"><b class="calibre3"> and </b></p> with the plain, <p class="whatever">-only versions of the markup. It's how I clean the cruft from my own e-books when I decide to redo the formatting to my liking, and usually a lot faster and easier than trying to build a new book from scratch using cut-and-pasted text. Since you're converting from ePub to Mobi, I'd say when in doubt, discard or at least comment out the parts of the CSS file you don't understand, which don't seem to do anything useful. Often Calibre conversions put in a lot of redundant stuff which you just don't need and can't use, given Mobi's limited display capabilities. J'espère que cela vous aide, et bienvenue à MobileRead! |
Advert | |
|
04-21-2011, 08:54 AM | #3 |
ePub Maker
Posts: 120
Karma: 16
Join Date: Dec 2009
Location: Mordor
Device: iPad,Kindle 3, Nook 2
|
Mass replacement
You can use mass replacement to clear them.
For advanced replacement, you can use a regex tool (regular expression) But it,s not an easy nor pleasant work. If you see the source code of a Word HTML, you would know Your example code is rather clean and optimized. |
04-21-2011, 11:27 AM | #4 |
Colonel Mustard
Posts: 90
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPhone 6, Kindle Paperwhite 2, iPad 2
|
Thanks for your help. I guess I'll stick with my current file and clean it as I can. I don't know much about regex tools, so I'll just do as ATDrake suggested:
I'll also keep in mind the tip about commenting out the parts of the CSS file I don't understand and which doesn't seem to do anything useful. Merci! I appreciate the fast and friendly help—great forum! Michael Update: Just spent some time cleaning the book as you suggested, and it looks scarier than it is. It's actually very easy to do and in a couple of minutes the file looks MUCH better. I guess I should have tried that "find & replace" thingy way earlier... Thanks again! Last edited by mtrahan; 04-21-2011 at 12:15 PM. Reason: Update |
04-21-2011, 03:28 PM | #5 | |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Quote:
<b class="calibre3"> and replace it with nothing at all. Sigil will remove all the bold tags and once you save the file or switch to book view and back or something like that, it will automatically delete all the correpsonding bold end tags as well. If you change the search/replace to all html files it should take you all of about five to twenty seconds, depending on the size of the book. Remember to always make a working copy and save your original to fall back on if you really mess things up and need to start fresh. Wa la, no more bold text. |
|
Advert | |
|
04-22-2011, 06:16 AM | #6 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I would recommend spending some time in learning RegEx. It will help you!
|
04-22-2011, 08:45 AM | #7 | |
Colonel Mustard
Posts: 90
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPhone 6, Kindle Paperwhite 2, iPad 2
|
Quote:
Thanks again for the help. |
|
04-22-2011, 03:57 PM | #8 |
Wizard
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
|
04-27-2011, 08:06 AM | #9 |
Fanatic
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
|
Yes, I just didn't feel like going looking for the a with the little accent symbol over it.
Obviously, you knew what I meant though, correct? I'm really glad you felt the need to spend more time "correcting" my post rather than trying to help the op though. Way to go. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Opus Updating EPUB file won't change publisher data on file listing | spaze | Bookeen | 1 | 03-08-2011 01:34 AM |
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. PDF, 13th Dec 2010 | BrianMartinez | Other Books | 0 | 12-13-2010 09:27 PM |
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 | BrianMartinez | Kindle Books | 0 | 12-13-2010 09:25 PM |
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 | BrianMartinez | ePub Books | 0 | 12-13-2010 09:23 PM |
how to clean more disk space in root file system to upgrade system | chinaet | iRex | 1 | 12-18-2006 03:54 PM |