Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 07-19-2011, 02:13 PM   #1
arslonga
Enthusiast
arslonga began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2011
Device: none
Is this true?

Apparently, any PDF document page begins as a new paragraph, independently of whether its first line is part of the sentence that ends the previous page. Assuming this is true, is anybody aware of a program that exports a PDF document to an editable format (html, doc) with the ability to override this PDF limitation?

Another question in my first post: when I save a DOC document with images as RTF, its size increases dramatically. Does anybody know why?

Thank you for your time and attention.
arslonga is offline   Reply With Quote
Old 07-19-2011, 02:57 PM   #2
WillAdams
Guru
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 980
Karma: 1915000
Join Date: Feb 2008
Device: Sony PRS-600, Fujitsu Stylistic ST-4121
Unfortunately, .pdfs were designed before the idea of re-flowable electronic documents was prevalent, so yes, each page starts over, and the text on it constitutes a new paragraph as detected by most textual-oriented software. Marcel Weiher's TextLightning.app for Mac OS X does attempt to recognize paragraphs based on text formatting, but it's better to go back to the original source document.

PDFs use compression, RTFs don't, hence the file-size change.
WillAdams is offline   Reply With Quote
Old 07-19-2011, 09:23 PM   #3
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
There's PDF reflow, here, but don't expect perfection.

The original purpose of a PDF was to make sure a document looked exactly the same on every medium and printer; it's meant to emulate paper. Really it's just a map of the exact location of each character. Not only does it not understand the continuation of one sentence on one page to another, it doesn't even have the concept of sentence or paragraph. That's just in the source document.
frabjous is offline   Reply With Quote
Old 07-20-2011, 07:50 AM   #4
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 428
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Yes, from my experience I think it's true. For instance, if you start a paragraph at the end of a page, and it ends on the next, the close tag will be on the second page.

But you see, most PDF files are saved as regular PDF, with objects that float around (including text, parts of text, or sometimes even individual letters). You can usually spot them right away if you select the text and there are all sorts of spaces between characters or entire groups of characters. These are very difficult to convert because each object (or groups of objects) have their own coordinates on the page.

On the other hand, there are tagged PDF files, which have open and close tags for objects, especially for text segments, making them (relatively) easier to convert - but not perfect. Keep in mind that PDF is considered an output format and as Will suggested, it's always better to go back to the source of the document.


Oh, and when you say size, do you mean file size or display size ? Display size is different probably because the margins weren't converted properly. Find out the .doc page size and margins and add them to the .rtf.
DSpider is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seriously thoughtful I almost can't believe this is true Exer Lounge 1 04-06-2011 05:23 AM
His One True Love B.K. Wright Self-Promotions by Authors and Publishers 0 11-11-2010 07:51 AM
Two good to be true agraff Introduce Yourself 10 05-21-2010 03:40 PM
is this true? pathfinderca News 1 04-05-2010 01:17 PM


All times are GMT -4. The time now is 01:09 AM.


MobileRead.com is a privately owned, operated and funded community.