View Single Post
Old 03-24-2008, 06:01 PM   #27
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Moonraker View Post
Thank you very much for the link and for the instructions.

This is all very interesting to me because I have never before seen the html code behind a prc file.
This is a recent ability perfected by tompe with his Mobiperl code. I had "hacked" makedoc9 (popular .pdb to .txt converter) years ago to strip out the images and fix the <img...> to substitute the 'filenames' for the 'reindex' tag. It allowed me to see the .html code behind the .prc for the first time. Sadly, I had no idea what a 'filepos' was and the href links were all broken. That's why I was so taken by tompe's efforts and wanted to combine the two worlds (.prc to .imp)!

Quote:
For the test I used the same prc file in two different folders, giving the prc files different names.

The result, as far as I can see, is that the two files are identical using either:

mobi2html "Your.prc" TempDir

or

mobi2html --rawhtml "Your.prc" Temp >My.html

Both files are the same size size and have the same number of lines and both end with �</body></html>

Both files have all the closing </p>'s stripped and replaced by <div height="0em"></div>
<div height="0em"></div>.

All my curly quotes i.e. &8220; and &8221; have been changed to &quot; (straight quotes).
My em-dash codes &8212; have all been changed to &mdash; etc.
Note: I had to omit the # sign from the above numerics in order to get this posted.

And my HTML(XML) header is completely changed although the charset=UTF-8 has been kept.

It appears to be Mobipocket Creator that changes the code don't you think?
BTW, those endings may be easy to strip out as they don't mean anything nor needed. I haven't come across these issues with .prc's built by BookDesigner or HarryT. Any other quirks to watch out for?
nrapallo is offline   Reply With Quote