Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-28-2011, 12:21 PM   #46
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
open will not create directories for you you have to use os.makedirs first.
kovidgoyal is offline   Reply With Quote
Old 06-29-2011, 06:28 AM   #47
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Kovid
Thanks. Solved the problem by checking/creating missing directory.


Another question: For some reason, alot of things that work with my edition of python/pyqt/lxml don't work in calibre (v0.8.6). I keep on coming across the following when running the plugin in calibre:
Code:
Traceback (most recent call last):
  File "calibre_plugins.ebook_cleaner.main", line 1479, in slotCleanAndOpenEpub
  File "calibre_plugins.ebook_cleaner.main", line 513, in clean
  File "lxml.etree.pyx", line 2762, in lxml.etree.fromstringlist (src/lxml/lxml.etree.c:52933)
  File "parser.pxi", line 1134, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:76722)
  File "parser.pxi", line 556, in lxml.etree._ParserContext._handleParseResult (src/lxml/lxml.etree.c:71680)
  File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
  File "parser.pxi", line 585, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955)
XMLSyntaxError: Char 0x0 out of allowed range, line 2, column 1
where the stringlist being input into etree.fromstringlist() is a perfectly normal list of strings (the first three being '<html>', '<head>', '<meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/>' ; these first few strings are written in the plugin, not read from somewhere else; I'm guessing 'line 2' refers to the third one?)

How can I solve this problem?
burbleburble is offline   Reply With Quote
Old 06-29-2011, 10:20 AM   #48
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You've got null bytes in your strings. stringvar.replace('\0', '')
kovidgoyal is offline   Reply With Quote
Old 06-29-2011, 11:54 AM   #49
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Kovid
That solved one issue. But then it found another. So I just did ''.join(list) first, then parsed from a string instead of a list. For some strange reason it no longer has a problem, even without replacing null bytes.

But it is rather time consuming to perform this operation first. Oh well. Still, is calibre's version of lxml not up to date? Because mine works fine parsing from a list!

Another question: I'm having trouble saving a page from webkit. I tried both mainFrame().toHtml() and documentElement.toOuterXml() and either way it wont save valid xhtml. It always leaves out the '/' on single tag elements (like 'img', 'br', 'meta'). (Is it even valid in an epub?) This generates serious problems when trying to parse it again with lxml. So, do you know of a way around this issue?

Thanks for all the help
burbleburble is offline   Reply With Quote
Old 06-29-2011, 12:01 PM   #50
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre-debug -c "from lxml import etree; print etree.LXML_VERSION"

I've never tried saving from webkit so I dont have any advice for you on that.
kovidgoyal is offline   Reply With Quote
Old 06-29-2011, 12:04 PM   #51
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://bugreports.qt.nokia.com/browse/QTBUG-2787
kovidgoyal is offline   Reply With Quote
Old 06-30-2011, 11:01 AM   #52
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Kovid: Thanks. I guess I'll just have to parse webkit's output with lxml.html, and resave as xhtml.


@Anyone: I've been working on a better structure view, and a better way of editing it. I've come up with a method, and I think its kind of simple; however the javascript for implementing all its facets is a rather frustrating to write. So, I have written rudimentary code for it, and attatched below a test version.

I really need some suggestions about the following issues in this test version:
  1. Currently the method for editing the class structure is the following:
    • Every class entered in the replacements text edit is defined in relation to the root (i.e. the book). For example: 'title of chapter' or 'line of verse of chapter'; the 'of root' need not be specified. In this test version you must write ' of ' between any two classes in order to create a hierarchy.
    • Every class can be defined as 'new'. For example 'title of newchapter'. This is because you may replace a 'class13' with 'scene of chapter', and you really don't want to start a new chapter there. So the plugin takes that into account, and merges it to an existing chapter, or later when you create a ' ... newchapter' previous to it.
    So please, anyone: do you have a simpler/clearer approach or a clearer explanation (I'm awful at explaining); and especially a suggestion for how to present a gui interface for creating/utilizing such syntaxes. The test version below can be used to see how it works right now. Below is also a picture of being put to use in this way....
  2. Included in this test version is an epub writer. I just don't have an approach for where to save it, rename it... currently it just saves to the temp folder, and opens it for you to copy it out of...

    So, please, anyone: Should it overwrite the original? Should it add to the calibre library with and extendsion 'Harry Potter 1' + 'CLEANED'... should you save it somewhere on the computer? ideas please!!!
(I enjoy the programming, I can communicate with the computer; but how to clearly and intuitively communicate with the user is quite often beyond me)
Attached Thumbnails
Click image for larger version

Name:	view01.JPG
Views:	348
Size:	168.6 KB
ID:	73564  
Attached Files
File Type: zip test plugin.zip (165.3 KB, 259 views)
burbleburble is offline   Reply With Quote
Old 06-30-2011, 11:29 AM   #53
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
As per the bug report if you use setContent with an xhtml mimetype, it should generate valid xhtml on output. IIRC, the calibre viewer uses setContent not setHTML when viewing EPUB.
kovidgoyal is offline   Reply With Quote
Old 06-30-2011, 11:33 AM   #54
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Kovid
Once again, thank you. I didn't understand thats what it meant when I read it.
burbleburble is offline   Reply With Quote
Old 07-04-2011, 12:46 PM   #55
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Updated to version 0.0.5

Due to a lack of user feedback (interest?) when I posted requests for suggestions in certain areas, I have reverted to designing this plugin as per my own needs; I have no interest in brainstorming how to develop a fully featured plugin, as per everyone's needs, if the user community won't participate.
However, I am more than happy to incorporate ideas and modify this plugin, if given some concrete, well defined proposal. I don't mind helping out and making it intuitive, accesible, and usefull for others - YOU JUST HAVE TO EXPRESS WHAT YOU WOULD FIND INTUITIVE, ACCESIBLE, AND USEFULL!

The new plugin, therefore, utilizes simple html with syntax highlighting; though it comes with a host of tools to help automate as much of the cleaning process as possible.
burbleburble is offline   Reply With Quote
Old 07-04-2011, 05:21 PM   #56
Under the Covers
Night Reader
Under the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura aboutUnder the Covers has a spectacular aura about
 
Under the Covers's Avatar
 
Posts: 127
Karma: 4314
Join Date: Oct 2010
Location: Rocky Mountains (US)
Device: Sony PRS-650
I'm very interested but am not even close to being a programmer, so have been following this thread to see if your end-product might help me.

I'd like to see what would seem to me to be an almost magical ability to take the Caliber conversions I've made from pdf to epub and easily eliminate the page headers and footers that end up mixed in with the text. I managed to do it once, but it took me so long to study the various "expressions" and figure out how to input the various search parameters (which I've since forgotten) that I haven't done it again.

This may be well outside what you are trying to accomplish, but as one who has trouble remembering all about "expressions" for the search/replace function (which is really cool), I have been following this thread hoping your project might include some sort of easy interface for input of the appropriate expressions. I'd guess many other non-technical people would like additional interface assistance with expressions.

In any case, I'd encourage you to keep on truckin' -- even if this is NOT where you were headed with this -- because many of us who have NO programming expertise are looking for various interfaces that accomplish what is apparently so easy for programmers but so befuddling for the rest of us.

Last edited by Under the Covers; 07-04-2011 at 05:30 PM. Reason: clarification
Under the Covers is offline   Reply With Quote
Old 07-05-2011, 02:43 AM   #57
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
@Under the Covers

Your feature sounds like a sensible addition. To save me the need to brainstorm the various ways such footers and headers might appear and be identified, please can you list several examples of the following:
  1. The headers as they appear, with some context (surrounding text/lines). Please list several so I can see how page# or odd and even pages might change the header. It of course would also help to provide such examples from different books, as they may change appearance from book to book.
  2. The same for footers.

To adress your need of it being intuitive for non-programmers (and even for programmers, to avoid the need to write complex expressions), I think I will make it attempt to automaticly match general cases; then provide a list of matches where the user can choose which to replace/remove. Sound good?

I can't know for certain when I will implement it, but I will try within the next several weeks.

@Kovid, anyone.
  1. How do you do regex searches in webkit? (javascript? is there a way?)
  2. How do you do general search and replace in webkit, especcially considering the fact that the text may be spread across several elements (ex. italic, bold and p)?
  3. For some reason, I can't get images to show in webkit. I am using, for example, webkit.setContent(data, baseUrl=QtCore.QUrl('D:\\TestA')) where the baseUrl is the folder containing the original html (of course, it was converted to data, but the image src attrib remains the same). I also tried using baseUrl=QtCore.QUrl('D:\\TestA\\index.html), where I used the name of the html as part of the baseUrl. What am I doing wrong?

Last edited by burbleburble; 07-05-2011 at 06:24 AM.
burbleburble is offline   Reply With Quote
Old 07-05-2011, 11:31 AM   #58
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,767
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
google javascript regex and use QUrl.fromLocalFile
kovidgoyal is offline   Reply With Quote
Old 07-05-2011, 12:48 PM   #59
burbleburble
Connoisseur
burbleburble began at the beginning.
 
Posts: 52
Karma: 38
Join Date: Jun 2011
Device: Kindle 3
Updated to 0.0.6:
-reverted back to webkit
-major improvements in interface and coding
-stable, if lacking in features

@Kovid
Thanks. The .fromlocalfile worked great.
burbleburble is offline   Reply With Quote
Old 07-05-2011, 07:47 PM   #60
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,171
Karma: 16228536
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Hi burble,

I tried to use your utility but I have to admit to being unclear how to achieve the clean-up. I was able to create the initial htmlz file and to load it in the plugin and produce the patterns but I couldn't figure out how to proceed.

I have attached 2 htmlz files. The input one is a tiny extract (to avoid copyright problems) from a mobi-to-htmlz conversion. The output one is what I ideally would have liked as the cleaned-up simplified version. I would then be able to add my own standard external css file to match the tags (<h2>, <h3>, <p>, <i>) and classes ("ctr", "noind", "txt") in the cleaned-up index.html file.

Please could you tell me whether this is achievable with the current plugin, or even whether I could get somewhere close if I knew what I was doing
Attached Files
File Type: zip input.htmlz.zip (89.0 KB, 254 views)
File Type: zip output.htmlz.zip (88.7 KB, 243 views)

Last edited by jackie_w; 07-05-2011 at 07:55 PM.
jackie_w is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] User Category kiwidude Plugins 123 03-16-2024 11:59 PM
[GUI Plugin] Reading List kiwidude Plugins 1309 03-16-2024 11:52 PM
[GUI Plugin] Open With kiwidude Plugins 402 03-16-2024 11:44 PM
[GUI Plugin] Find Duplicates kiwidude Plugins 1096 03-16-2024 11:28 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 01:21 AM.


MobileRead.com is a privately owned, operated and funded community.