Quote:
Originally Posted by Moonraker
Thank you very much for the link and for the instructions.
This is all very interesting to me because I have never before seen the html code behind a prc file.
|
This is a recent ability perfected by tompe with his Mobiperl code. I had "hacked" makedoc9 (popular .pdb to .txt converter) years ago to strip out the images and fix the <img...> to substitute the 'filenames' for the 'reindex' tag. It allowed me to see the .html code behind the .prc for the first time. Sadly, I had no idea what a 'filepos' was and the href links were all broken. That's why I was so taken by tompe's efforts and wanted to combine the two worlds (.prc to .imp)!
Quote:
For the test I used the same prc file in two different folders, giving the prc files different names.
The result, as far as I can see, is that the two files are identical using either:
mobi2html "Your.prc" TempDir
or
mobi2html --rawhtml "Your.prc" Temp >My.html
Both files are the same size size and have the same number of lines and both end with </body></html>
Both files have all the closing </p>'s stripped and replaced by <div height="0em"></div>
<div height="0em"></div>.
All my curly quotes i.e. &8220; and &8221; have been changed to " (straight quotes).
My em-dash codes &8212; have all been changed to — etc.
Note: I had to omit the # sign from the above numerics in order to get this posted.
And my HTML(XML) header is completely changed although the charset=UTF-8 has been kept.
It appears to be Mobipocket Creator that changes the code don't you think?
|
BTW, those endings may be easy to strip out as they don't mean anything nor needed. I haven't come across these issues with .prc's built by BookDesigner or HarryT. Any other quirks to watch out for?