Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-04-2009, 12:54 PM   #61
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,480
Karma: 305784726
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
I'm not Jim, but yes please do post your xpml2xhtml code. Thank you.

Quote:
Originally Posted by KevinH View Post
If you want it I will happily post it for you (since it has no DRM removal code ii it at all).
pdurrant is offline   Reply With Quote
Old 12-04-2009, 01:01 PM   #62
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,602
Karma: 5433388
Join Date: Nov 2009
Device: many
posting xpml2xhtml.py

Hi,

Sure thing. When I get off work tonight I will post it on pastebin (I don't have access to a webserver of my own to post it directly) and then post the link to it here.

I should probably figure out how to post things to the webspace my ISP provides but I have never bothered.

Take care,

Kevin
KevinH is offline   Reply With Quote
Old 12-04-2009, 03:21 PM   #63
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,602
Karma: 5433388
Join Date: Nov 2009
Device: many
new version of xpml2xhtml.py

Hi,

I finished my grading early (final exams are as a big a pain for profs as they are for students) so I went ahead and posted the new version of xpml2xhtml.py to pastebin.de. This code is completely anti-drm free and so is okay to post, e-mail people and share. It requires HTML Tidy command line executable to be installed on the machine. This is installed already under Mac OSX (at least on my machine) and will build out of the box on Mac OSX and Linux and pre-built binaries for Windows are available from: http://int64.org/projects/tidy-binaries

Just make sure tidy is in the path someplace (I have never tried tidy on windows so feedback welcome).

The link is to xpml2xhtml.py is:

http://pastebin.de/3445

It includes a command line optional switch --sigil-breaks that will automatically insert sigil Chapter Breaks which makes it easy to use Sigil to go from the output to a finished epub in much less time (if sigil would only read in the meta info in the header I would be so happy!).

I use it as follows (on my Mac or under Linux)

python xpml2xhtml.py --sigil-breaks input.pml output.html

And to just make things clear, the format for footnotes in the input pml file is the xml one not the one of the original ereader2html one. The snippets of code to which create this format in the pml file are at:

http://pastebin.de/3444

for those who are interested.

Hope this helps,

Please let me know if you run into problems or troublesome files that won't convert. I am always looking for test documents that hit corner cases.

Take care,

KevinH
KevinH is offline   Reply With Quote
Old 12-05-2009, 01:51 AM   #64
macr0t0r
Connoisseur
macr0t0r doesn't littermacr0t0r doesn't litter
 
macr0t0r's Avatar
 
Posts: 91
Karma: 108
Join Date: Jan 2008
Device: Palm Treo 680, Sony Reader
I have a final to prep for, so it'll be a week before I can play with it. I originally tried ereader2html, but I left screaming in horror at the html it produced. I'll give yours a try. There may be some valuable tidbits that could be pushed into Calibre. Thanks!
- Jim
macr0t0r is offline   Reply With Quote
Old 12-05-2009, 07:09 AM   #65
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by macr0t0r View Post
There may be some valuable tidbits that could be pushed into Calibre.
Kevin and I spoke to each other about his parser and the new calibre one while both were being developed last week. Other than general considerations like how to handle certain cases (some from calibre's went into xpml2xhtml and some from xpml2xhtml went into calibre's) the design of each are very different making it difficult to import actual code.
user_none is offline   Reply With Quote
Old 12-05-2009, 12:21 PM   #66
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,602
Karma: 5433388
Join Date: Nov 2009
Device: many
more on xpml2xhtml.py

Hi,

Yes, xpml2xhtml.py is in no way only my work. I have literally exchanged ideas and code with "user_none" and borrowed ideas from "WayneD's" perl pml2html.pl conversion program, took ideas and code posted on the Dark Blog by others, and of course started with the original code posted on the Dark blog.

I just now borrowed the idea of cleaning up chars. I hated to touch the pml file produced since that is the original. But I now have added the following to my latest version of xpml2xhtml.pl that literally cleans up the last issue I was having that forced me to use tidy (handling those special win1252 chars)

Based on Jim and user_none comments above, I have added:

def cleanupHighChars(src):
# convert special win152 chars 0x80 - 0xa0 to be properly handled later
src = re.sub('[\x80-\xa0]', lambda x: '\\a%03d' % ord(x.group()), src)
src = re.sub('[^\x00-\xff]', lambda x: '\\U%04x' % ord(x.group()), src)
return src

which when it finds these special win1252 chars it recodes them to more proper pml with the \a and \U tags and then have expanded the pml_chars array as follows based on the following win1252 page:

http://www.microsoft.com/globaldev/r.../sbcs/1252.htm

which gives me

pml_chars = {
128:'€', 129:'', 130:'—',131:'ƒ',132:'„',
133:'…', 134:'†',135:'‡',136:'ˆ',137: '‰',
138: 'Š', 139:'‹', 140:'Œ', 141:'', 142:'Ž' ,
143: '', 144:'', 145:'‘', 146:'’', 147:'“',
148:'”', 149:'•', 150: '–', 151: '—', 152: '',
153:'™', 154:'š', 155:'›', 156:'œ', 157:'',
158:'ž', 159:'Ÿ', 160:' ',
}


Then I handle all of the \a tags values by translating them

elif cmd == 'a':
final += self.pml_chars.get(attr, '&#%d;' % attr)


So I can now properly handle all of those special win1252 chars that are not allowed to be encoded in unicode just by value and that need to be remapped to special html codes.

So now, I can modify the program to use an option --use-tidy flag if that will default to no, so that the code is useable even by people without tidy.

That said, I like to see the structure when I look at an html file and tidy's nice indentation and wrapping makes for easily understood code (i.e. makes it easy to see html breakpoints).

I will test my new code further and post a final version over the weekend.

Thanks for all of the code tips and ideas.

KevinH
KevinH is offline   Reply With Quote
Old 12-05-2009, 08:00 PM   #67
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,602
Karma: 5433388
Join Date: Nov 2009
Device: many
final version of xpml2xhtml.py

Hi,

I added the cleanup code, made use of tidy optional with a command line switch (--use-tidy) fixed some corner cases and made a few other improvements.

So if you are going to try xpml2xhtml.py, please try this version:

http://pastebin.de/3639

Hope this helps,

KevinH
KevinH is offline   Reply With Quote
Old 12-08-2009, 03:41 PM   #68
macr0t0r
Connoisseur
macr0t0r doesn't littermacr0t0r doesn't litter
 
macr0t0r's Avatar
 
Posts: 91
Karma: 108
Join Date: Jan 2008
Device: Palm Treo 680, Sony Reader
Finished my final! Alright, let me take a look at all of this. I think the best way I can deal with this within Calibre is if I create a PDB "on import" plugin that automatically converted when I added the PDB. Then I could just add the resulting HTML in the "Edit MetaData" window in the GUI.

Truth be told, I'd rather just use the Calibre's built-in features. I'll see what features xpml2xhtml.py have that really matter and look at how feasible it is to add it into user_none's code. With the latest changes, his stuff already does most of what I want. I'd just like to have the footnotes better handled with pagebreaks and return links.

- Jim
macr0t0r is offline   Reply With Quote
Old 12-08-2009, 05:35 PM   #69
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by macr0t0r View Post
Finished my final! Alright, let me take a look at all of this. I think the best way I can deal with this within Calibre is if I create a PDB "on import" plugin that automatically converted when I added the PDB. Then I could just add the resulting HTML in the "Edit MetaData" window in the GUI.
Why do you need a PDB on import plugin? eReader PDB's are fully supported.

Quote:
Originally Posted by macr0t0r View Post
Truth be told, I'd rather just use the Calibre's built-in features. I'll see what features xpml2xhtml.py have that really matter and look at how feasible it is to add it into user_none's code. With the latest changes, his stuff already does most of what I want. I'd just like to have the footnotes better handled with pagebreaks and return links.
Once again I'm one step ahead of you, I added this a few days ago.
user_none is offline   Reply With Quote
Old 12-08-2009, 06:14 PM   #70
macr0t0r
Connoisseur
macr0t0r doesn't littermacr0t0r doesn't litter
 
macr0t0r's Avatar
 
Posts: 91
Karma: 108
Join Date: Jan 2008
Device: Palm Treo 680, Sony Reader
Quote:
Originally Posted by user_none View Post
Why do you need a PDB on import plugin? eReader PDB's are fully supported.

Once again I'm one step ahead of you, I added this a few days ago.
I know that eReader PDBs are supported. This is a little trick I do if I want to use an external converter for PDB within the Calibre Python environment (very useful on Windows machines). By importing the PDB, it calls the conversion routine on the file and generates a zipped HTML file in the same directory. I can then add that within the MetaData GUI. Now I have both the original eReader file and the converted HTML zip file to work with. I can't do this as a conversion plugin since that plugin expects OEB output. Perhaps with some work, I could figure out how to call the HTML to OEB functions within the plugin after converting to HTML. This whole plugin thing is still a bit of effort to work with.

However....this may be unnecessary. I'll do another bzr update and see how your work looks. If it's good enough, then....it's good enough!

- Jim
macr0t0r is offline   Reply With Quote
Old 12-09-2009, 02:09 AM   #71
macr0t0r
Connoisseur
macr0t0r doesn't littermacr0t0r doesn't litter
 
macr0t0r's Avatar
 
Posts: 91
Karma: 108
Join Date: Jan 2008
Device: Palm Treo 680, Sony Reader
Hmmm....I'm on revision 3999 on my Bazar project, but I don't see your changes that add link-back to footnotes. I'm looking at pmlconvertor.py:
Code:
    (re.compile(r'\\Fn="(?P<target>.+?)"(?P<text>.*?)\\Fn'), lambda match: '<a href="#fns-%s">%s</a>' % (match.group('target'), match.group('text')) if match.group('text') else ''),
    (re.compile(r'\\Sd="(?P<target>.+?)"(?P<text>.*?)\\Sd'), lambda match: '<a href="#fns-%s">%s</a>' % (match.group('target'), match.group('text')) if match.group('text') else ''),
<snip>
    # Sidebar and Footnotes
    (re.compile(r'&lt;sidebar\s+id="(?P<target>.+?)"&gt;\s*(?P<text>.*?)\s*&lt;/sidebar&gt;', re.DOTALL), lambda match: '<div id="fns-%s">%s</div>' % (match.group('target'), match.group('text')) if match.group('text') else ''),
    (re.compile(r'&lt;footnote\s+id="(?P<target>.+?)"&gt;\s*(?P<text>.*?)\s*&lt;/footnote&gt;', re.DOTALL), lambda match: '<div id="fns-%s">%s</div>' % (match.group('target'), match.group('text')) if match.group('text') else ''),
I was expecting something like this (footnotes only):
Code:
    (re.compile(r'\\Fn="(?P<target>.+?)"(?P<text>.*?)\\Fn'), lambda match: '<a id="Xfns-%s" href="#fns-%s">%s</a>' % (match.group('target'), match.group('target'), match.group('text')) if match.group('text') else ''),
<snip>
    # Sidebar and Footnotes
    (re.compile(r'&lt;sidebar\s+id="(?P<target>.+?)"&gt;\s*(?P<text>.*?)\s*&lt;/sidebar&gt;', re.DOTALL), lambda match: '<div title="Footnote" id="fns-%s" style="page-break-before : always;">%s<br /><a href=#Xfns-%s>-Back-</a></div>' % (match.group('target'), match.group('text'), match.group('target')) if match.group('text') else ''),
Is your code similar to this?

- Jim
macr0t0r is offline   Reply With Quote
Old 12-09-2009, 06:11 AM   #72
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by macr0t0r View Post
Is your code similar to this?
Not in the slightest. I wrote a new parser and replaced the regex one with it. It doesn't look like your branch as properly merged with trunk. Are you doing bzr merge lp:calibre?. You can see the new parser here in the mean time.
user_none is offline   Reply With Quote
Old 12-09-2009, 01:43 PM   #73
macr0t0r
Connoisseur
macr0t0r doesn't littermacr0t0r doesn't litter
 
macr0t0r's Avatar
 
Posts: 91
Karma: 108
Join Date: Jan 2008
Device: Palm Treo 680, Sony Reader
I had done a bzr revert (to blow out my unnecessary changes) followed by a bzr merge. Hmmmm...maybe there were errors that I didn't notice.

I looked at your code, and that definitely does what I need to do. Man, the thing is practically a re-write! How long did it take you to re-organize all of that? I like how you first convert the pseudo-XML footnote references into your own PML codes. It's odd that the eReader format doesn't do it that way as it does for standard links.

I'll beat bzr into submission and try it out. Thanks!

- Jim
macr0t0r is offline   Reply With Quote
Old 12-18-2012, 07:41 AM   #74
Apprentice Alf
Member
Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.Apprentice Alf ought to be getting tired of karma fortunes by now.
 
Apprentice Alf's Avatar
 
Posts: 14
Karma: 1236266
Join Date: Dec 2010
Device: None
I doubt that anyone is still interested in this code, but I am preparing a new tools release and I am removing a lot of obsolete files.

xplm2xhtml.py is one of the ones to go. As this file contains no de-drm code at all, I am attaching that latest version that I have to this post.

— Alf.
Attached Files
File Type: zip xpml2xhtml.py.zip (8.8 KB, 99 views)

Last edited by Apprentice Alf; 12-18-2012 at 08:05 AM.
Apprentice Alf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
converting sony books or B&N books for ipad? cavi General Discussions 2 04-25-2010 11:45 PM
Converting to Palm Digital Ereader rocojo Calibre 5 12-27-2009 08:31 AM
Converting Fictionwise's Secure eReader to something my 505 will read RWJ Calibre 12 09-11-2009 04:33 PM
converting long, somewhat complex docs to eReader Richard Maseles Other formats 4 01-07-2009 05:28 PM
Converting books to eReader with Dropbook Robotech_Master Workshop 1 12-23-2008 12:46 PM


All times are GMT -4. The time now is 01:47 AM.


MobileRead.com is a privately owned, operated and funded community.