View Single Post
Old 09-30-2011, 04:57 AM   #1
roffLOL
Member
roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.roffLOL once ate a cherry pie in a record 7 seconds.
 
roffLOL's Avatar
 
Posts: 10
Karma: 1538
Join Date: Sep 2011
Location: Sweden
Device: Sony PRS-350
PDF -> HTML conversion

You wouldn't be interested in a PDF -> HTML converter? I'm currently developing one. For single page (one page per page, not those documents with double columns), justified PDF documents it will be able to:

retain:
Fonts
Paragraphs and indentations
alignment
PDF's general logical structure with TOC
[graphics]


remove:
Page numbering
[possibly header and footer]

However, I have developed this library out of need, and as such, will not develop it further as soon as I get it working for the case described (single page, justified PDF document).

Current status is 90% finished. Only 90% development time left, in other words Say, a month.

The library is in pure python (2.6?).

//Humble greetings,
roffLOL

Last edited by roffLOL; 09-30-2011 at 08:17 AM. Reason: Clarification. Written before coffee o'clock.
roffLOL is offline   Reply With Quote