06-28-2011, 12:21 PM | #46 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
open will not create directories for you you have to use os.makedirs first.
|
06-29-2011, 06:28 AM | #47 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
@Kovid
Thanks. Solved the problem by checking/creating missing directory. Another question: For some reason, alot of things that work with my edition of python/pyqt/lxml don't work in calibre (v0.8.6). I keep on coming across the following when running the plugin in calibre: Code:
Traceback (most recent call last): File "calibre_plugins.ebook_cleaner.main", line 1479, in slotCleanAndOpenEpub File "calibre_plugins.ebook_cleaner.main", line 513, in clean File "lxml.etree.pyx", line 2762, in lxml.etree.fromstringlist (src/lxml/lxml.etree.c:52933) File "parser.pxi", line 1134, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:76722) File "parser.pxi", line 556, in lxml.etree._ParserContext._handleParseResult (src/lxml/lxml.etree.c:71680) File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614) File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955) XMLSyntaxError: Char 0x0 out of allowed range, line 2, column 1 How can I solve this problem? |
06-29-2011, 10:20 AM | #48 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You've got null bytes in your strings. stringvar.replace('\0', '')
|
06-29-2011, 11:54 AM | #49 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
@Kovid
That solved one issue. But then it found another. So I just did ''.join(list) first, then parsed from a string instead of a list. For some strange reason it no longer has a problem, even without replacing null bytes. But it is rather time consuming to perform this operation first. Oh well. Still, is calibre's version of lxml not up to date? Because mine works fine parsing from a list! Another question: I'm having trouble saving a page from webkit. I tried both mainFrame().toHtml() and documentElement.toOuterXml() and either way it wont save valid xhtml. It always leaves out the '/' on single tag elements (like 'img', 'br', 'meta'). (Is it even valid in an epub?) This generates serious problems when trying to parse it again with lxml. So, do you know of a way around this issue? Thanks for all the help |
06-29-2011, 12:01 PM | #50 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
calibre-debug -c "from lxml import etree; print etree.LXML_VERSION"
I've never tried saving from webkit so I dont have any advice for you on that. |
06-29-2011, 12:04 PM | #51 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
06-30-2011, 11:01 AM | #52 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
@Kovid: Thanks. I guess I'll just have to parse webkit's output with lxml.html, and resave as xhtml.
@Anyone: I've been working on a better structure view, and a better way of editing it. I've come up with a method, and I think its kind of simple; however the javascript for implementing all its facets is a rather frustrating to write. So, I have written rudimentary code for it, and attatched below a test version. I really need some suggestions about the following issues in this test version:
|
06-30-2011, 11:29 AM | #53 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
As per the bug report if you use setContent with an xhtml mimetype, it should generate valid xhtml on output. IIRC, the calibre viewer uses setContent not setHTML when viewing EPUB.
|
06-30-2011, 11:33 AM | #54 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
@Kovid
Once again, thank you. I didn't understand thats what it meant when I read it. |
07-04-2011, 12:46 PM | #55 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
Updated to version 0.0.5
Due to a lack of user feedback (interest?) when I posted requests for suggestions in certain areas, I have reverted to designing this plugin as per my own needs; I have no interest in brainstorming how to develop a fully featured plugin, as per everyone's needs, if the user community won't participate. However, I am more than happy to incorporate ideas and modify this plugin, if given some concrete, well defined proposal. I don't mind helping out and making it intuitive, accesible, and usefull for others - YOU JUST HAVE TO EXPRESS WHAT YOU WOULD FIND INTUITIVE, ACCESIBLE, AND USEFULL! The new plugin, therefore, utilizes simple html with syntax highlighting; though it comes with a host of tools to help automate as much of the cleaning process as possible. |
07-04-2011, 05:21 PM | #56 |
Night Reader
Posts: 127
Karma: 4314
Join Date: Oct 2010
Location: Rocky Mountains (US)
Device: Sony PRS-650
|
I'm very interested but am not even close to being a programmer, so have been following this thread to see if your end-product might help me.
I'd like to see what would seem to me to be an almost magical ability to take the Caliber conversions I've made from pdf to epub and easily eliminate the page headers and footers that end up mixed in with the text. I managed to do it once, but it took me so long to study the various "expressions" and figure out how to input the various search parameters (which I've since forgotten) that I haven't done it again. This may be well outside what you are trying to accomplish, but as one who has trouble remembering all about "expressions" for the search/replace function (which is really cool), I have been following this thread hoping your project might include some sort of easy interface for input of the appropriate expressions. I'd guess many other non-technical people would like additional interface assistance with expressions. In any case, I'd encourage you to keep on truckin' -- even if this is NOT where you were headed with this -- because many of us who have NO programming expertise are looking for various interfaces that accomplish what is apparently so easy for programmers but so befuddling for the rest of us. Last edited by Under the Covers; 07-04-2011 at 05:30 PM. Reason: clarification |
07-05-2011, 02:43 AM | #57 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
@Under the Covers
Your feature sounds like a sensible addition. To save me the need to brainstorm the various ways such footers and headers might appear and be identified, please can you list several examples of the following:
To adress your need of it being intuitive for non-programmers (and even for programmers, to avoid the need to write complex expressions), I think I will make it attempt to automaticly match general cases; then provide a list of matches where the user can choose which to replace/remove. Sound good? I can't know for certain when I will implement it, but I will try within the next several weeks. @Kovid, anyone.
Last edited by burbleburble; 07-05-2011 at 06:24 AM. |
07-05-2011, 11:31 AM | #58 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
google javascript regex and use QUrl.fromLocalFile
|
07-05-2011, 12:48 PM | #59 |
Connoisseur
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
|
Updated to 0.0.6:
-reverted back to webkit -major improvements in interface and coding -stable, if lacking in features @Kovid Thanks. The .fromlocalfile worked great. |
07-05-2011, 07:47 PM | #60 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hi burble,
I tried to use your utility but I have to admit to being unclear how to achieve the clean-up. I was able to create the initial htmlz file and to load it in the plugin and produce the patterns but I couldn't figure out how to proceed. I have attached 2 htmlz files. The input one is a tiny extract (to avoid copyright problems) from a mobi-to-htmlz conversion. The output one is what I ideally would have liked as the cleaned-up simplified version. I would then be able to add my own standard external css file to match the tags (<h2>, <h3>, <p>, <i>) and classes ("ctr", "noind", "txt") in the cleaned-up index.html file. Please could you tell me whether this is achievable with the current plugin, or even whether I could get somewhere close if I knew what I was doing Last edited by jackie_w; 07-05-2011 at 07:55 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Reading List | kiwidude | Plugins | 1319 | Yesterday 09:27 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 08:39 AM |
[GUI Plugin] User Category | kiwidude | Plugins | 123 | 03-16-2024 11:59 PM |
[GUI Plugin] Find Duplicates | kiwidude | Plugins | 1096 | 03-16-2024 11:28 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |