MobileRead Forums - View Single Post - Using perl scripts to produce .IMP ebooks and more...

nrapallo · 03-24-2008, 09:01 PM

Quote:

Originally Posted by Moonraker

Sorry for my previous post - I had missed the my.html file thinking it would be in another folder. I have retested the files and the following is my findings:

Thank you for the link and for the instructions:

This is all very interesting to me because I have never before seen the html code behind a prc file.

For the test I used the same prc file but gave it two different names.

First file test (mobi2html "Your.prc" TempDir):

Size: 1152 KB

� 250 occurrences appeared throught the document. These would have to be removed.
i.e. adhering changed to adherin�g

</p> stripped and replaced by <div height="0em"></div> <div height="0em"></div>

&8220; changed to “
&8221; changed to ”
&8217; changed to ’
&8212; changed to —

Headings - i.e. <h4>Chapter 10</h4> Changed to = <h4 align="center"><font size="+1"><b>Chapter 10</b></font></h4>

<b></b> added to headings but where <strong></strong> were in the original file they have been left unchanged.

<br style="page-break-after:always" /> inserted at end of file.

Second file test (mobi2html --rawhtml "Your.prc" Temp >My.html)

Size: 1155 KB

All numeric code unchanged.

<b></b> Added to Headings but <strong></strong> in original file left unchanged.

<font size="+1"> added to Headings

</p> left unchanged but <div height="0em"></div> <div height="0em"></div> added between paragraphs. This seems superfluous to me.

<mbpagebreak/> added to end of file.

When I put the file through Tidy.exe I got 8833 warnings that <div> attribute "height" has invalid value "0em"

NOTE: # omitted from numeric codes to get this posted.

That � entry is weird. I wonder what the rationale behind it was. I know this will sound like you are chasing your tail, but if you make an .imp with this .html using eBook Publisher, does it bomb? It does if the HTML char &# 20; exists in the ebook i.e.

Code:

<p>Html documents with this entity &# 20; bomb!  No output produced by eBook Publisher v2.2.5</p>

Note for display purposes, I put a space between '#' and '2' that shouldn't be there!

I think we can conclude that the Mobiperl code strips the </p> and just leaves behind the Mobipocket empty <div>'s. I have seen this behaviour with the .pdb to .imp routine in Mobi2IMP. BTW, you can take a PalmDOC .pdb (TEXt/REAd) document and have Mobi2IMP create a .imp version.