View Single Post
Old 11-20-2007, 02:09 PM   #79
shoggot
Junior Member
shoggot began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Nov 2007
Device: sony prs-505
Mmm, the file itself contains material covered under copyright... but as I've replicated the results with three separate files now, it should be easy enough to replicate.

The files were each over 3MB (as .htm files), and contained only the bare essential tags, so could be replicated like so, under bash:

Quote:
echo "<html><head></head><body></body>" > foo.htm
num=1
while [[ $num -lt 50000 ]];do
echo "<p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 </p>" >> foo.htm
num=$(($num+1))
done
echo "</body></html>" >> foo.htm
And this is the commandline used against the result:
./html2lrf.exe --title="barfoo" --author="foobar" --font-delta=-2 --left-margin=5 --right-margin=5 --top-margin=5 foo.htm

Followed by a
Quote:
sed -e s/"<\/p>"//g foo.htm > foo2.htm
And run the commandline against that. Output seems to be identical, or within a few bytes either way. On-reader appearance is identical, anyway - just processing times.

I'm running a truncated version of that test file with verbose options, to see the results:

Well, the result's interesting (temp files). Snipping for space, but you'll see the diff (and I have no idea why it should have any effect.)
With </p> tags:
Quote:
<html><head></head><body></body>
<p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 </p>
<p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 </p>
...
<p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 </p>
</html>
Without </p> tags:
Quote:
<html><head></head><body></body>
<p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
</p><p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
</p><p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
</p><p>abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
</p></html>
The closed </body> tag is due to my malforming the input html, but there you go.

If you really have to have the original file, I guess we'll make it happen in the name of progress, but I'd prefer to bypass that step.
shoggot is offline   Reply With Quote