Quote:
Originally Posted by kovidgoyal
For e.g. suppose I write a cool little web app that wants to "mangle" poems, i.e. extract stanzas from different poems and combine them into a new poem. Now if all my poems were in PDF files I'd have no way of knowing what a stanza is, beyond using some sort of hackish heuristic based on blank lines. This is because the poem is "rendered" in the PDF file and really the only way to identify semantic elements like a "stanza" is to have an entity as intelligent as a human look at the rendered poem.
|
Why would you want to mangle a poem? That's just disgusting!

As for not recognizing stanzas, I don't know why a reflowable PDF should be different from another ebook format (again, not a programmer). How do stanzas stand out in other formats? Surely spacing can be recognized in a PDF just as it would be in another format...?
Quote:
Originally Posted by kovidgoyal
And as for your claim of being able to process, extract content from a PDF. Can you extract the content from a letter sized PDF and create a non image based 6" inch PDF from it easily? If you can I'd really like to know how. And by easily I mean automatically, in no more than 5 minutes.
|
I've never timed myself, but if the document contains no images (which kind of takes away the point of my using a PDF file in the first place) then I can easily do it in under 5 minutes. What I do is save the PDF as a DOC file (which Acrobat Pro allows users to do) and open it in Microsoft Word. I then set the page format to the preset format of 3.47"x4.54" and .15" margins. This resets the text, obviously, and fits it into Reader sized pages. Then I convert the document back to PDF. Images obviously need to be resized when in Word, which is usually what I wind up doing because I like to include images from articles I take from the Internet. It kind of livens up the document.

I generally don't spend very long on this, however. As I said above, I'm going to try out BookDesigner to see how it handles images. That's really my big thing with PDFs. I can format a page, resize and place an image anywhere in Word, and know that it will look exactly the same on my Reader. I love it.