View Single Post
Old 09-14-2011, 07:15 AM   #89
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Sorry about the delay.

Ortep: Sounds okay. What with school and SAT's I don't have time just now to learn how to integrate it with the 'Open With' plugin + the fact that it still wouldn't open ebooks directly from calibre without some work. (I really would like to port it to python 2.7 to reintegrate with calibre, just testing all the unicode conversions (it primarily manipulates unicode text, and py2.7 vs py3+ differs greatly in this area) will take alot of time...)

Meanwhile [I hope this is okay to post here, as I do look forward to re-integrating it, as it was before, and you asked to test its current state]: Because it's and independant package running off Python 3 + PyQt4 + lxml, it's a 13mb rar package. I have uploaded it to megaupload:

Note: This program is currently optimized/designed for a large screen! It will appear cluttered and probably be awkward to use on a small screen!

ECleaner v1.0.6 Program


ECleaner v1.0.6 Instructions
Spoiler:


Before you start:
  • I am aware that it will be confusing at first, and the help is rather limited. I welcome questions and suggestions for making it more intuitive; however, due to a busy schedule, I may not respond right away (give me a week+).
  • There is a known 'memory leak'. This sounds scary but it's not. It just means that every few ebooks you will have to restart the program; otherwise it will slow down.
  • Use at you're own risk. I have never experienced any issues though...
Installation:
  • Unpack the RAR
  • Inside the unpacked folder is a shortcut 'ECleaner'. (It has no logo; it is just a white box. If for some reason it can't find the program, it is the file 'main.exe' in the subfolder ECleaner).
First steps ['Raw EBooks']:
  • I assume you are familiar with basic html, namely: tags: p,span,i,b,a, attributes:class,id
  • Create an HTMLZ file with calibre. You MUST first set these conversion options: In 'HTMLZ Output'-- set to 'inline' for both selections. In calibre 'Look & Feel'-- check 'Smarten punctuation'.
  • Copy the 'raw' (uncleaned) htmlz file into the ECleaner-subfolder named 'RawFiles'. (This is for your ease, not required. ECleaner looks in this folder first. It will save you time from browsing to the file.)
  • Run ECleaner.
  • Since this is a 'raw' ebook, press the button named 'Open and Clean Htmlz'. This will tidy it, compressing nested divs, p's, spans, and creating classes based on patterns in the ebook. This allows one to easily and quickly restructure the ebook.
    • For example, all chapter headers will generally fall under the same class, let us say 5. One can then rename class="5" to class="Chapter", either to a 'single' element or 'forward' to all following elements with class="5".
  • Update the basic metadata. A new cover can be dragged & dropped over the old one.
Cleaning 'em up:
  • Go to tab #2 'Content', and explore your options! It usually takes me no more than max 15 minutes an ebook, but of course I already know my way around
  • There are options for ()Checking for puncuation issues, ()Renaming classes, ()Creating id's, ()Auto generating css formatting, ()Auto generating a titlepage and toc, ()Changing to titlecase, ()Search and replace, and so on.
  • One the left is a 'navigator' which can help navigate: ()By punctuation, ()By class, ()By image. It also provides many useful pieces of information
    • For example, one might want to check that all chapter titles are followed by a paragraph with class='FirstScene'. Well, this info is listed.
    • For example, one might want to check that he found all chapter headers. So, all ones needs to do is check the numbering - if it lists 1-35, and the last chapter header is 'Chapter 36', you know you're missing one!
  • The center pane is the main editor. On the right is a previewer. For purposes of speed, it usually displays only a 'range' of the ebook, relative to the cursor. Sometimes it updates itself, sometimes you have to click it.
Tips, Tricks, and Warnings:
  • Of course you'll figure them out with time, but some basics:
  • This program is optimized for the big screen! But, all tools are resizeable and hideable!
  • The search and replace via regex has no undo option! Save your htmlz first!
  • The auto formatting 'Use class templates' button:
    • It is based on my personal taste in formatting!
    • All supported classes are in the 'choose class' drop down list.
    • Some classes span only a single paragraph: For example, Chapter, Part, Title.
    • Some classes may span multiple paragraphs: For example, a 'Letter', a 'Verse', a 'Quotation. These should be used as follows, for example: Define the first paragraph as class="Letter". Leave the following paragraphs blank. Define the paragraph FOLLOWING the last paragraph in the letter as class="Regular". This way, the program knows where the letter ends. It also works to define the following paragraph as any other class, such as 'Letter' (a new letter), or Scene....etc, just don't leave it class=""!
  • The create 'Titlepage and Toc' button:
    • The auto generated titlepage is based on my personal taste in formatting! You can always adjust it after creating it.
    • To create a TOC (Table of Contents), you must first add the 'id' attribute to the relevant points in the ebook.
    • You must format the id's as follows (there are buttons for helping to add them...): Either - 'Epilogue', 'Prologue', 'Quotation#' (for an Epigraph), 'Part#', 'Chapter#'. Obviously, no 'id' may be repeated twice.
    • In the special case that an ebook contains sections in both the form 'Part' and 'Chapter', a special multi-level toc is created. For this, you must create id's for chapters in the form 'Part#Chapter#'
Saving and Reopening Ebooks:
  • You may save in HTMLZ, EPUB, and MOBI formats.
  • You may reopen any htmlz ebook that was already run through the tidy button, by simply clicking 'Open Htmlz'.
  • You cannot reopen epub or mobi. So save an htmlz backup.
  • NOTE: Epubs are saved with the following settings: Justification is not forced, this is left to the ebook reader/user's disgression. Indents are automaticly added where not otherwise specified to be 0. The html file is not split into multiple parts. (This may cause some ebook readers to open the ebook slower...)
  • NOTE: All saves to MOBI or EPUB are first run through Epubcheck. It always pays to check the details box to make sure nothing went wrong when tidying, saving, or opening a file.
Final notes:
  • OK. I'm more than aware these instructions probably won't suffice. I really am not good at this sort of thing. I welcome anyone who wishes to better document this (he/she will get the credits of course). Either way, I welcome questions... see the first paragraph. Some of the buttons have popup tooltips too, I hope to add more when I find the time.
  • So, play around, mess with it. It does work!!! (See the books I've cleaned up on my thread of cleaned up ebooks )
burbleburble is offline   Reply With Quote