View Single Post
Old 10-03-2011, 06:07 AM   #8
roffLOL
Member
roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.
 
roffLOL's Avatar
 
Posts: 10
Karma: 1538
Join Date: Sep 2011
Location: Sweden
Device: Sony PRS-350
On the subject of speed, I'm pretty sure I will be able to cut 25% execution time off with a simple optimization, and cut most of the memory consumption as well. I will not aim for a faster conversion than that. For a non-repeated task, it is good enough for me.

The advertised goals were met yesterday. I have created a 98% replica of the PDF-document that serves as my test case. It removes '-' at end of lines and connects the lines. It appends non-fullstop paragraphs that spans pages, it keeps an indentation ratio for paragraph beginnings equal to that of the original document (if it has one), it retains font-information (even for single word in the middle of sentences), it retains line and paragraph spacing, it translates all special characters to their HTML-equivalence, and for some weird reason, the line numbering seems to automagically vanish, even though I can't remember implementing anything to leave them out.

However, the whole implementation will likely blow in my face when tried with another document =). Much work left to be done.
roffLOL is offline   Reply With Quote