Register Guidelines E-Books Search Today's Posts Mark Forums Read

 MobileRead Forums Gui Plugin for Cleaning Ebooks, Fast

 06-22-2011, 04:02 PM #31 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 Thank you Kovid. Yet another question (you're quite good about answering, thank you) 1)Where do I look for an example of adding an ebook (epub) to the library progmatticly? 2)Where do I look for an example of updating the database/library regarding an ebook that I changed (its internal html/manifest/etc)? Does anything need an update?
 06-22-2011, 04:14 PM #32 kovidgoyal creator of calibre     Posts: 31,553 Karma: 8685410 Join Date: Oct 2006 Location: Mumbai, India Device: Various See tweak epub. Note that there have been some recent changes to that, so use the latest source.
 06-23-2011, 12:54 PM #33 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 Updated to v0.0.2 -uses tempDir -hidesDebugger -General usage improvements -many general code impovements/bug fixes
 06-24-2011, 02:42 AM #34 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 Calibre (because of its pyqt/python version?) doesn't automaticly convert javascript returned QString/QVariant objects to python objects, while my my development enviroment (pyscripter, latest python and pyqt) does. This leads to some problems... 1)How do I determine programatticly if the plugin is being run by calibre or if it is being run externally?
 06-24-2011, 11:59 AM #35 kovidgoyal creator of calibre     Posts: 31,553 Karma: 8685410 Join Date: Oct 2006 Location: Mumbai, India Device: Various try: import calibre except ImportError: pass
 06-25-2011, 11:22 AM #36 drMerry Addict     Posts: 293 Karma: 21022 Join Date: Mar 2011 Location: NL Device: Sony PRS-650 @burbleburble Seems to become a nice plugin. Looking at the current state of the plugin, shouldn't this one be in the dev forum at this moment? Seems not to be 'safe to use' jet.
 06-26-2011, 12:20 PM #37 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 Updated to v0.0.3 -major cleanup of gui interface -many bug fixes -large recoding -support for lists -tool to run javascript/jquery directly on the epub @kovid Thanks. The idea worked fine. @drMery I checked. The 'Development' section is defined as 'All about hacking on calibre. Discuss calibre's internal structures and API', which seems to refer to calibre itself (?), while plugins is for developing plugins. Anyways, while limited in scope, I don't see why it isn't 'safe to use' yet; especially with the most recent update. (You can't delete or overwrite your old epub by accident. There ain't no viruses. It utilizes the temp directory and cleans up after. If you save to html, it opens the folder, so you know exactly where the file is being left. Why isn't it safe?) Last edited by burbleburble; 06-26-2011 at 12:28 PM.
 06-27-2011, 10:19 AM #38 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 Below a piece of javascript. It is designed to wrap groups of elements, the groups starting each time an element with class="Pattern31" occurs. This part works fine. But I am tearing my hair out trying to figure out why it rearanges the elements it is wrapping; specificly it moves spans from inside a paragraph to somewhere else outside, including the text. Code: $patterns =$(".Pattern31") for(i=0;i<$patterns.size();i++) {$all = $('*', document.body) index =$all.index($patterns[i]) if(i<$patterns.size() + 1) { $all.slice(index,$all.index($patterns[i+1])).wrapAll(' ') } } true I'd really appreciate if someone could point out my coding mistake, or suggest a different approach Edit: (I don't know how to delete a solved question...) I found a workaround at stackoverflow. The workaround, for some reason, works great; even though it uses the sam wrapAll... (it makes for cleaner, more concise code, too) Code:$(".Pattern33").each(function(){$(this).add($(this).nextUntil(".Pattern31")).wrapAll("
")}) Credits/Thanks: Scharrels at stackoverflow Last edited by burbleburble; 06-27-2011 at 10:46 AM.
 06-27-2011, 02:20 PM #39 Ortep Fanatic   Posts: 522 Karma: 470 Join Date: Sep 2007 Location: The Netherlands Device: Kindle PW, Kindle 3 I feel like an idiot, but how do you use it? I select an ebook/epub and then I click the button 'ebook' cleaner. Then the gui opens. But empty...when I try to open an ePub I get an error. (Different ones). An example" Code: calibre, version 0.8.7 ERROR: Unhandled exception: KeyError:u'MsoNormal' Traceback (most recent call last): File "calibre_plugins.ebook_cleaner.main", line 1296, in slotOpenEpub File "calibre_plugins.ebook_cleaner.main", line 662, in feed File "lxml.etree.pyx", line 2942, in lxml.etree.parse (src/lxml/lxml.etree.c:54187) File "parser.pxi", line 1528, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:79485) File "parser.pxi", line 1557, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:79768) File "parser.pxi", line 1457, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:78843) File "parser.pxi", line 997, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:75698) File "parsertarget.pxi", line 147, in lxml.etree._TargetParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:86163) File "lxml.etree.pyx", line 282, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7972) File "saxparser.pxi", line 170, in lxml.etree._handleSaxStart (src/lxml/lxml.etree.c:81243) File "parsertarget.pxi", line 75, in lxml.etree._PythonSaxParserTarget._handleSaxStart (src/lxml/lxml.etree.c:85326) File "calibre_plugins.ebook_cleaner.main", line 736, in start KeyError: u'MsoNormal' Last edited by Ortep; 06-27-2011 at 02:23 PM.
06-27-2011, 03:34 PM   #40
burbleburble
Connoisseur

Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Ortep

I appreciate your interest in the plugin.

As for your question, from the traceback you supplied, I think the the cause of the problem has to do with reading css from the epub.

The epubreader\parser\cleaner for the gui currently looks for css values in the stylsheet from the link element and the style element. Even these I haven't thoroughly tested for relative path resolution. And especially there is still an issue that it does not resolve all css selectors (only .* not *.*).

It would appear from the error (the line in the source coupled with the error type):
Quote:
 File "calibre_plugins.ebook_cleaner.main", line 736, in start KeyError: u'MsoNormal'
that the epub was not produced/run through calibre conversion process. (I don't think so at least.)

Anyways, without seeing how the MsoNormal css is referenced, it is hard for me to know how to fix it right now. (Is it something produced by MS Word html? if so I can easly look into it)

So, currently, in this test version, the best solution would be to use an epub produced by calibre. The 'different ones' errors, since I dont know them, may have been fixed in v0.0.3 if you were using an earlier version.

In about a week or two, I hope to release the first 'stable' version with a much improved epub reader/writer (I've actually written most of it already) that should deal with such issues. The delay is because I've been working on a faster system/intuitive approach for reformatting the patterns; just worked out the basics so far.

Last edited by burbleburble; 06-27-2011 at 03:39 PM.

06-27-2011, 04:28 PM   #41
Ortep
Fanatic

Posts: 522
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle PW, Kindle 3
Quote:
 Originally Posted by burbleburble @Ortep I appreciate your interest in the plugin.
My pleasure Anything that can improve ePubs is great

Quote:
 that the epub was not produced/run through calibre conversion process. (I don't think so at least.)
That did the trick. I tested it at first on a couple of ePubs I found 'somewere' All of them failed, but as soon as I reconverted them with Calibre ePub->ePub I was able to use the plugin.
Thanks for your work and don't bother to write a ePub writer for me, I have a Kindle

 06-28-2011, 04:26 AM #42 Calibrefan Enthusiast   Posts: 49 Karma: 12 Join Date: Feb 2011 Device: Kobo Aura, Sony PRS-350 and PRS-T1 @ burbleburble I've tried to use your plugin to clean a Calibre converted epub, which has a lot of pagelines which I would like to eliminate e.g. ----------------------- Page 5----------------------- I can't seem to manage it though. Could you please point me in the right direction how to do this? Thanks very much!
 06-28-2011, 04:43 AM #43 Dopedangel Wizard     Posts: 1,606 Karma: 28506885 Join Date: Dec 2006 Location: Singapore Device: Coolreader(Nexus 5)\Coolreader(Nook Touch) I was wondering isn't what your plugin and this plugin are doing almost the same things https://www.mobileread.com/forums/sho...d.php?t=134249 combining them be into one plugin would be better rather then using to different plugins to clean epubs in my opinion
 06-28-2011, 05:48 AM #44 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 @Calibrefan The short: Basicly, the answer to your question is that this is still a test version for feedback (thank you for it!), and it can't handle such actions yet. The long: Amongst the many plans, I do plan on implementing (early on) several methods of search/replace; whether by class / between____and_____ / regex /lineThatStartsWith (I'll figure out the exact breakdown later). For your issue, I would probably create a search for 'lineThatStartsWith' = '------------', provide a list of results (to remove any matches that may not be page breaks for some reason), then replace/remove. However, currently, the plugin only does a basic reformatting/restructuring of the epub based on pattern matching heuristics. @Dopedangel Thanks for the pointer. I took a look at the thread you mentioned, and yes, I do plan on including with kiwidudes permission a set of epub fixes/checks based on his plugins. However, these two plugins aren't quite the same, in that mine currently focuses more on the viewable content of the epub and kiwidude's on the the epub's ocf and opf. (i.e. 'unviewable' content used by the epub reading platform to manipulate/read and epub, which when messed up can cause problems).
 06-28-2011, 11:24 AM #45 burbleburble Connoisseur   Posts: 52 Karma: 38 Join Date: Jun 2011 Device: Kindle 3 @Kovid, anyone... I keep running into the following error when trying to save a cleaned version of the epub's html: Code: Traceback (most recent call last): File "D:\ECleaner\Plugin\main.py", line 1339, in slotCleanAndOpenEpub cleaner.clean(epub) File "D:\ECleaner\Plugin\main.py", line 472, in clean with codecs.open(path, 'w', 'utf-8') as xhtml_xml_file: File "D:\Software\Portable\Python 3.2.0.1\App\lib\codecs.py", line 884, in open file = builtins.open(filename, mode, buffering) IOError: [Errno 2] No such file or directory: 'c:\\docume~1\\ref21\\locals~1\\temp\\tmpkt51bwEbookCleaner\\CLEANED\\text.html' I tried both codecs.open() and open(), and I am using 'w' mode. So why does it refuse to create the file? (By the way, 'tmpkt51bwEbookCleaner' does exist, only 'CLEANED\\text.html' doesn't)