Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 04-20-2011, 04:44 PM   #1
mtrahan
Colonel Mustard
mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.
 
mtrahan's Avatar
 
Posts: 87
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPod Touch, Kindle 3, iPad 2
Easiest way to clean an ePub file?

Hello everyone,

I've been the happy owner of a Kindle 3 for about 6 months now. I love it, and it's great that there are so many books available in the public domain for free. Only thing, as you probably all know, the formatting isn't always perfect in the books you can find around the web. So after spending some time reading them as-is, I quickly started spending some time trying to fix simple things in the ebooks I found—mainly adding TOC/correct chapter breaks, small tweaks in the CSS to change the indents or alignment, etc. (with Sigil that is, and doing the ePub/mobi conversion with Calibre).

I guess you can see where I'm going: the more I learned about ebooks, the more unsatisfied I was with the ones I had, and the more things I wanted to change in them.

A couple days ago, I finally found a text I had been searching for some time, in ePub format. Problem is: when I opened it, I noticed it was all in bold. So okay, I figured, I will just change the CSS. Then I noticed something strange (you will notice how little I know about ePubs): the "font-weight: bolder" tag was not in the "calibre2" class, which was used for every paragraph of the book, but it was in a "calibre3" class (that consister only of this "font-weight: bolder" tag).

When looking at the code view in Sigil, I noticed that every single paragraph of the book had those 2 classes called at the beginning: <p class="calibre2"><b class="calibre3">. My first reaction to get rid of the bold problem was simply to change the "font-weight" to put "normal" instead of "bolder". Doing this kinda fix the problem, but not in a very satisfying way I must admit: now I have a "calibre3" class that is exactly useless and is still called at the beginning of everything paragraph of the book...

My question is simple: what do you do in these circumstances? Remove all the <b class="calibre3">? Start a new clean file with the plain text?

Actually, I wouldn't really be asking if the problem was only this. Thing is, I realized that every paragraph was full of multiple and repetitive class calls (I don't know how to name this)... For example, here is one paragraph from the code view in Sigil:

Quote:
<p class="calibre2"><b class="calibre3">— Fatime, dit-il à ma compagne, je suppose que cette jeune et jolie personne est</b> <b class="calibre3">au fait ; il ne me reste donc plus qu'à vous prévenir que nous avons pour convives</b> <b class="calibre3">deux vieux Allemands, à Paris depuis un mois, et qui brûlent du désir de connaître</b> <b class="calibre3">quelques jolies filles. L'un d'eux a pour vingt mille écus de diamants sur lui :</b> <b class="calibre3">Fatime, je te le recommande. L'autre, qui désire acheter une maison dans ce village,</b> <b class="calibre3">et à qui j'ai persuadé que je lui en trouverais une à très bon marché s'il apportait de</b> <b class="calibre3">quoi la payer comptant, aura sûrement plus de quarante mille francs dans sa poche,</b> <b class="calibre3">soit en or, soit en lettres à vue : Juliette, ce sera votre lot ; acquittez-vous bien de la</b> <b class="calibre3">mission et je vous ferai souvent faire de semblables parties.</b></p>
I have no idea how a ePub can end up with such repetitive formatting... Since the book needs no special formatting (it's all just regular text), I want every paragraph to have only a single class called at the beginning. Is that the correct way to do this?

So at this point, what do you suggest is the simplest way to get a clean book? Just copy paste the "book view" of Sigil and start with the plain text a new ePub file? Or is there a way to remove all the unnecessary formatting?

Thanks in advance for your help (and for all the great information available on the forums), and sorry if I made some grammar/syntax mistakes—as you can maybe guess from the quoted paragraph, english isn't my first language.

Michael
mtrahan is offline   Reply With Quote
Old 04-20-2011, 05:55 PM   #2
ATDrake
Wizzard
ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.ATDrake ought to be getting tired of karma fortunes by now.
 
Posts: 6,087
Karma: 14841004
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International & Sony PRS-T1
Honestly, it looks like a serious auto-conversion error. I have no idea why your book is bolding all over like that, but I'm pretty sure it's not supposed to be. Likely someone started out with a shoddy source file and fed it into Calibre, which can only try its best with what it's given.

Since the bolding doesn't seem to do anything useful, I'd just get rid of it by doing a find/replace for

</b> <b class="calibre3">

and replace it with a single space, then replace

<p class="calibre2"><b class="calibre3">

and

</b></p>

with the plain, <p class="whatever">-only versions of the markup.

It's how I clean the cruft from my own e-books when I decide to redo the formatting to my liking, and usually a lot faster and easier than trying to build a new book from scratch using cut-and-pasted text.

Since you're converting from ePub to Mobi, I'd say when in doubt, discard or at least comment out the parts of the CSS file you don't understand, which don't seem to do anything useful.

Often Calibre conversions put in a lot of redundant stuff which you just don't need and can't use, given Mobi's limited display capabilities.

J'espère que cela vous aide, et bienvenue à MobileRead!
ATDrake is offline   Reply With Quote
 
Enthusiast
Old 04-21-2011, 08:54 AM   #3
eping
ePub Maker
eping began at the beginning.
 
eping's Avatar
 
Posts: 120
Karma: 16
Join Date: Dec 2009
Location: Mordor
Device: iPad,Kindle 3, Nook 2
Talking Mass replacement

You can use mass replacement to clear them.
For advanced replacement, you can use a regex tool
(regular expression)
But it,s not an easy nor pleasant work.

If you see the source code of a Word HTML, you would know
Your example code is rather clean and optimized.
eping is offline   Reply With Quote
Old 04-21-2011, 11:27 AM   #4
mtrahan
Colonel Mustard
mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.
 
mtrahan's Avatar
 
Posts: 87
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPod Touch, Kindle 3, iPad 2
Thanks for your help. I guess I'll stick with my current file and clean it as I can. I don't know much about regex tools, so I'll just do as ATDrake suggested:
  • replace </b> <b class="calibre3"> with single space
  • replace <p class="calibre2"><b class="calibre3"> with <p class="calibre"> (which is the "basic" class in the stylesheet)
  • replace </b></p> with only </p>
  • etc.

I'll also keep in mind the tip about commenting out the parts of the CSS file I don't understand and which doesn't seem to do anything useful. Merci!

I appreciate the fast and friendly help—great forum!

Michael

Update: Just spent some time cleaning the book as you suggested, and it looks scarier than it is. It's actually very easy to do and in a couple of minutes the file looks MUCH better. I guess I should have tried that "find & replace" thingy way earlier... Thanks again!

Last edited by mtrahan; 04-21-2011 at 12:15 PM. Reason: Update
mtrahan is offline   Reply With Quote
Old 04-21-2011, 03:28 PM   #5
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 510
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Amazon Kindle 3 Wi-Fi & B&N Nook Tablet & B&N Nook HD+
Quote:
Originally Posted by ATDrake View Post
Since the bolding doesn't seem to do anything useful, I'd just get rid of it by doing a find/replace for

</b> <b class="calibre3">

and replace it with a single space, then replace

<p class="calibre2"><b class="calibre3">

and

</b></p>
Actually, I wouldn't replace with a space. Just perform a search for:

<b class="calibre3">

and replace it with nothing at all. Sigil will remove all the bold tags and once you save the file or switch to book view and back or something like that, it will automatically delete all the correpsonding bold end tags as well. If you change the search/replace to all html files it should take you all of about five to twenty seconds, depending on the size of the book.

Remember to always make a working copy and save your original to fall back on if you really mess things up and need to start fresh.

Wa la, no more bold text.
bfollowell is offline   Reply With Quote
Old 04-22-2011, 06:16 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,899
Karma: 2909045
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
I would recommend spending some time in learning RegEx. It will help you!
Toxaris is offline   Reply With Quote
Old 04-22-2011, 08:45 AM   #7
mtrahan
Colonel Mustard
mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.mtrahan is no ebook tyro.
 
mtrahan's Avatar
 
Posts: 87
Karma: 1426
Join Date: Feb 2010
Location: Montreal
Device: iPod Touch, Kindle 3, iPad 2
Quote:
Originally Posted by Toxaris View Post
I would recommend spending some time in learning RegEx. It will help you!
I will. It's just I'm learning one thing at a time and for now it was already enough to experiment. It's actually the first book I reformat that much—doing cover, endnotes, etc. Now that I'm learning to get more out of it, I find Sigil really awesome.

Thanks again for the help.
mtrahan is offline   Reply With Quote
Old 04-22-2011, 03:57 PM   #8
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 1,832
Karma: 1692648
Join Date: Jan 2009
Device: Kindle, iPad (not used much for reading)
Quote:
Originally Posted by bfollowell View Post
Wa la, no more bold text.
I hope you were joking. It's voilà, not 'wa la'.
susan_cassidy is offline   Reply With Quote
Old 04-27-2011, 08:06 AM   #9
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 510
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Amazon Kindle 3 Wi-Fi & B&N Nook Tablet & B&N Nook HD+
Quote:
Originally Posted by susan_cassidy View Post
I hope you were joking. It's voilà, not 'wa la'.
Yes, I just didn't feel like going looking for the a with the little accent symbol over it.

Obviously, you knew what I meant though, correct?

I'm really glad you felt the need to spend more time "correcting" my post rather than trying to help the op though. Way to go.
bfollowell is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Opus Updating EPUB file won't change publisher data on file listing spaze Bookeen 1 03-08-2011 01:34 AM
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. PDF, 13th Dec 2010 BrianMartinez Other Books 0 12-13-2010 09:27 PM
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 BrianMartinez Kindle Books 0 12-13-2010 09:25 PM
Short Fiction Martinez, Brian: A Good Clean, A Harsh Clean. v1. 13th Dec 2010 BrianMartinez ePub Books 0 12-13-2010 09:23 PM
how to clean more disk space in root file system to upgrade system chinaet iRex 1 12-18-2006 03:54 PM


All times are GMT -4. The time now is 06:22 AM.


MobileRead.com is a privately owned, operated and funded community.