Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-05-2011, 12:41 PM   #1
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Gui Plugin for Cleaning Ebooks, Fast

Ebook Cleaner

About:
Many ebooks have messy and inconsistent formatting.
  • Big breaks between every par
  • Funky header formatting
  • Randomely capitalized first words of a chapter
  • Strange letter, verse , etc. formatting
  • Bad tocs
  • Indented first scene
  • Strange chapter titles ('Chap', roman numerals, ...)
  • Broken paragraphs/sentences, missing punctuation...
  • ... the list goes on.

The original html/css structure may have been messy. Pile on to that the fact that many ebooks undergo conversions, and you are left with an impossible tangle of classes and elements
  • Example: <div class='calibre143'><div 'blah'><p class='calibre38'><span 'blah'... you get the point

Now, you can fix it up using Sigil or Word... but it will take you many long hours... Also, attempting Word's grammer check will find issues with alot of grammer the author intended; you'll spend alot of time skipping through grammer errors.
The goal of this plugin is to provide tools and methods to significantly shorten the time needed to restore and clean up an ebook.

Version 0.0.6:
Some major improvements in the interface and coding... also reverted back to webkit.
Anyways, I think it is now 'stable', just lacking in features...

Note: The plugin now only supports HTMLZ for the time being, as I don't have time to deal with ideosyncresis (how'd you spell that?) in epub format. HOWEVER, it can save to epub format (in the next update, at least). I realize that calibre's HTMLZ doesn't support all tags/css. But, for the most part (at least in my opinion, feel free to express/explain yours), an ebook that is aestheticly pleasing to the reader tries to avoid overkill in the formatting.

Usage:
  1. Convert the ebook to HTMLZ using the following settings for the HTMLZ Output: How to handle CSS = inline; How to handle class based CSS = inline.
  2. The rest of the tools should be pretty self explanatory (at least to me, so ask if you feel some of them need clarification).
  3. To edit by hand, or change navigation list from styles/patterns/classes, see settings tab.
  4. Save to htmlz. Then convert back to whatever format you want, using calibre.

Plans:
  • save to epub (soon)
  • a spell checker using heuristics to avoid wasting time on names and places created for that book
  • a punctuation checker finding broken paragraphs/sentences/punctuation - (the ones guarenteed needing you attention, not every possible grammer...)
  • toc creator
  • Misc tools that their need pops up in my cleaning preferences; If you wish for those needed for your cleaning preferences, feel free to contribute a suggestion, I will try fairly hard to incorporate it.

Issues:
I'm sure there a million others... please post them so I can deal with them.
Attached Files
File Type: zip plugin 0.0.6.zip (119.7 KB, 379 views)

Last edited by burbleburble; 07-05-2011 at 12:46 PM. Reason: Updated Plugin to version 0.0.6
burbleburble is offline   Reply With Quote
Old 06-05-2011, 12:49 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,097
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I lack the time to read your code (given that I've never used Tk), but if you have any specific questions on PyQt, I'll try to help.
kovidgoyal is offline   Reply With Quote
Old 06-05-2011, 02:06 PM   #3
paulfiera
Addict
paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.paulfiera could sell banana peel slippers to a Deveel.
 
paulfiera's Avatar
 
Posts: 339
Karma: 3102
Join Date: Dec 2010
Location: EU
Device: iPad, Kobo Glo
Thumbs up

I applaud your efforts and wish someone could come in and help.

Something to clean up bad markup - check those epubs generated by Word export to HTML and how many font definitions they have on every single page- , CSS and removing embedded fonts would be a terrific feature.
paulfiera is offline   Reply With Quote
Old 06-06-2011, 04:02 AM   #4
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Well, I took a crash course in Qt (read large portions of its manual), found some great tutourials, and have some idea of whats going on.

I do however have some specific questions to start with:
  1. If you look at the attached picture, you will see how there are gray, ridged-areas around words like 'PATTERN07'. Is there a way to create a similar effect (using labels, formatting, or whatever) inside a QTextEdit?
  2. Is there a way to bind them to a click event (so the user can select that Pattern occurence or type)? In tkinter I just tagged areas and bound the tags.
  3. Is there a way to make certain parts of the QTextEdit uneditable by user (the 'PATTERNS07' or 'CHAPTER-TITLE' tags for example)?
  4. Is there some concept of tags in QTextEdit?
  5. And last, is there maybe a better approach than QTextEdit widget and labeled paragraphs/spans; either easier to program or more intuitive for a user?

Thank you for your help!
Attached Thumbnails
Click image for larger version

Name:	program.JPG
Views:	389
Size:	166.6 KB
ID:	72374  

Last edited by burbleburble; 06-06-2011 at 04:09 AM.
burbleburble is offline   Reply With Quote
Old 06-06-2011, 08:18 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,097
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I would suggest using QWebView instead, which renders HTML5 using the WebKit engine. You can set the contentEditable property in the HTML to make some fields editable. And QWebView will emit signals when a user clicks on HTML links. So you can essentially make a web app inside Qt
kovidgoyal is offline   Reply With Quote
Old 06-06-2011, 08:42 AM   #6
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
OK, Thank You!!!

I looked at the QWebView and PyQt4 examples, and it seems to fit my needs perfectly. So, within a week or two, I hope to have a test version of the plugin ready for development and feedback.
burbleburble is offline   Reply With Quote
Old 06-06-2011, 09:27 AM   #7
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
I started working with the QWebView, and I cant figure out how to make only portions of the html editable.

If this isn't possible, is there a way to guarentee that an entire word is deleted (treated as one entity?), not just one letter; for example: given a label 'PATTERN01', the user shouldn't be able to delete part of it, rather the whole thing or none of it?
burbleburble is offline   Reply With Quote
Old 06-06-2011, 10:28 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,097
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://blog.whatwg.org/the-road-to-h...ontenteditable

Simply set it on the element you want to be editable.

Demo: http://html5demos.com/contenteditable
kovidgoyal is offline   Reply With Quote
Old 06-14-2011, 11:12 AM   #9
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Question 1) I tried building the plugin files as per the instructions in the calibre manual.

Yet, every time I try adding it (even trying to add the demo plugin in the manual) I get an error:

Code:
'module' object has no attribute 'InterfacePlugin'
I am using the latest version of calibre portable.

Question 2) How do a)I make it that the user runs the plugin on a book, and b)check that the book is an epub, and c) retrieve the html from the epub?

Thank you for any help!

Last edited by burbleburble; 06-14-2011 at 11:17 AM.
burbleburble is offline   Reply With Quote
Old 06-14-2011, 11:24 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,097
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
1) post the whole error message you are getting. WHat calibre version are you on?

2-) Look at the tweak epub plugin for how to do these things (actions/tweak_epub.py in the calibre source code)
kovidgoyal is offline   Reply With Quote
Old 06-14-2011, 11:29 AM   #11
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Thank you for responding so quickly.

I am using calibre version 0.8.4, portable build.

The error is exactly:

FrameTitle:
ERROR: Unhandled Exception

Message:
AttributeError:'module' object has no attribute 'InterfacePlugin'

Details:
Traceback (most recent call last):
File "site-packages\calibre\gui2\preferences\plugins.py", line 294, in add_plugin
File "site-packages\calibre\gui2\preferences\plugins.py", line 369, in check_for_add_to_toolbars
File "site-packages\calibre\customize\__init__.py", line 539, in load_actual_plugin
AttributeError: 'module' object has no attribute 'InterfacePlugin'
burbleburble is offline   Reply With Quote
Old 06-14-2011, 11:51 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,097
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There was a bug in 0.8.4 calibre portable with regard to plugins, upgrade to 0.8.5
kovidgoyal is offline   Reply With Quote
Old 06-14-2011, 12:08 PM   #13
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Thank you Kovid.


To people in general:
(a)Does anyone have any ideas of how to make an intuitive, easy to use css property selector (not just a clutter of 100 comboboxes for every css property)?

(b)Is it worth supporting every css property - for example in 'font-weight' there is value=bolder, value=900.... (does anyone formatting their ebook care that much?!) - should I just stick to the common values of 'bold' and 'normal'?

(c)I attached two pictures of the plugin's current appearance below. I explained the basic idea of it in the first post; sorry, I am not a clear explainer... but still, does anyone see any changes I should make before going to much further; changes to make it more intuitive, or to make it easier to implement some function you see as being important to a good, fast, ebook cleaner?

Thank you for any feedback
Attached Thumbnails
Click image for larger version

Name:	view01.JPG
Views:	279
Size:	97.8 KB
ID:	72763   Click image for larger version

Name:	view02.JPG
Views:	233
Size:	91.6 KB
ID:	72764  
burbleburble is offline   Reply With Quote
Old 06-14-2011, 02:02 PM   #14
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,831
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by burbleburble View Post
Thank you Kovid.


To people in general:
(a)Does anyone have any ideas of how to make an intuitive, easy to use css property selector (not just a clutter of 100 comboboxes for every css property)?

(b)Is it worth supporting every css property - for example in 'font-weight' there is value=bolder, value=900.... (does anyone formatting their ebook care that much?!) - should I just stick to the common values of 'bold' and 'normal'?

(c)I attached two pictures of the plugin's current appearance below. I explained the basic idea of it in the first post; sorry, I am not a clear explainer... but still, does anyone see any changes I should make before going to much further; changes to make it more intuitive, or to make it easier to implement some function you see as being important to a good, fast, ebook cleaner?

Thank you for any feedback
I would use the numeric because of the finer granularity.
Is there any reason tot to include the 'friendly name' next to the number (in the display only) for the old timers ?
theducks is offline   Reply With Quote
Old 06-15-2011, 03:01 AM   #15
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Thanks for your response theducks.

Please forgive my ignorance of your terminology, but what exacty are you refering to when you say
Quote:
'friendly name' next to the number (in the display only)
?

Last edited by burbleburble; 06-16-2011 at 02:55 AM.
burbleburble is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Reading List kiwidude Plugins 660 10-10-2014 04:36 PM
[GUI Plugin] Open With kiwidude Plugins 232 10-09-2014 12:38 AM
[GUI Plugin] Find Duplicates kiwidude Plugins 508 09-02-2014 07:00 PM
[GUI Plugin] User Category kiwidude Plugins 35 08-07-2014 04:04 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 07:53 AM.


MobileRead.com is a privately owned, operated and funded community.