View Single Post
Old 06-05-2011, 12:41 PM   #1
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Gui Plugin for Cleaning Ebooks, Fast

Ebook Cleaner

About:
Many ebooks have messy and inconsistent formatting.
  • Big breaks between every par
  • Funky header formatting
  • Randomely capitalized first words of a chapter
  • Strange letter, verse , etc. formatting
  • Bad tocs
  • Indented first scene
  • Strange chapter titles ('Chap', roman numerals, ...)
  • Broken paragraphs/sentences, missing punctuation...
  • ... the list goes on.

The original html/css structure may have been messy. Pile on to that the fact that many ebooks undergo conversions, and you are left with an impossible tangle of classes and elements
  • Example: <div class='calibre143'><div 'blah'><p class='calibre38'><span 'blah'... you get the point

Now, you can fix it up using Sigil or Word... but it will take you many long hours... Also, attempting Word's grammer check will find issues with alot of grammer the author intended; you'll spend alot of time skipping through grammer errors.
The goal of this plugin is to provide tools and methods to significantly shorten the time needed to restore and clean up an ebook.

Version 0.0.6:
Some major improvements in the interface and coding... also reverted back to webkit.
Anyways, I think it is now 'stable', just lacking in features...

Note: The plugin now only supports HTMLZ for the time being, as I don't have time to deal with ideosyncresis (how'd you spell that?) in epub format. HOWEVER, it can save to epub format (in the next update, at least). I realize that calibre's HTMLZ doesn't support all tags/css. But, for the most part (at least in my opinion, feel free to express/explain yours), an ebook that is aestheticly pleasing to the reader tries to avoid overkill in the formatting.

Usage:
  1. Convert the ebook to HTMLZ using the following settings for the HTMLZ Output: How to handle CSS = inline; How to handle class based CSS = inline.
  2. The rest of the tools should be pretty self explanatory (at least to me, so ask if you feel some of them need clarification).
  3. To edit by hand, or change navigation list from styles/patterns/classes, see settings tab.
  4. Save to htmlz. Then convert back to whatever format you want, using calibre.

Plans:
  • save to epub (soon)
  • a spell checker using heuristics to avoid wasting time on names and places created for that book
  • a punctuation checker finding broken paragraphs/sentences/punctuation - (the ones guarenteed needing you attention, not every possible grammer...)
  • toc creator
  • Misc tools that their need pops up in my cleaning preferences; If you wish for those needed for your cleaning preferences, feel free to contribute a suggestion, I will try fairly hard to incorporate it.

Issues:
I'm sure there a million others... please post them so I can deal with them.
Attached Files
File Type: zip plugin 0.0.6.zip (119.7 KB, 805 views)

Last edited by burbleburble; 07-05-2011 at 12:46 PM. Reason: Updated Plugin to version 0.0.6
burbleburble is offline   Reply With Quote