Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 07-19-2011, 01:13 PM   #1
arslonga
Connoisseur
arslonga began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Jul 2011
Device: none
Is this true?

Apparently, any PDF document page begins as a new paragraph, independently of whether its first line is part of the sentence that ends the previous page. Assuming this is true, is anybody aware of a program that exports a PDF document to an editable format (html, doc) with the ability to override this PDF limitation?

Another question in my first post: when I save a DOC document with images as RTF, its size increases dramatically. Does anybody know why?

Thank you for your time and attention.
arslonga is offline   Reply With Quote
Old 07-19-2011, 01:57 PM   #2
WillAdams
Wizard
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 1,234
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
Unfortunately, .pdfs were designed before the idea of re-flowable electronic documents was prevalent, so yes, each page starts over, and the text on it constitutes a new paragraph as detected by most textual-oriented software. Marcel Weiher's TextLightning.app for Mac OS X does attempt to recognize paragraphs based on text formatting, but it's better to go back to the original source document.

PDFs use compression, RTFs don't, hence the file-size change.
WillAdams is offline   Reply With Quote
Advert
Old 07-19-2011, 08:23 PM   #3
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
There's PDF reflow, here, but don't expect perfection.

The original purpose of a PDF was to make sure a document looked exactly the same on every medium and printer; it's meant to emulate paper. Really it's just a map of the exact location of each character. Not only does it not understand the continuation of one sentence on one page to another, it doesn't even have the concept of sentence or paragraph. That's just in the source document.
frabjous is offline   Reply With Quote
Old 07-20-2011, 06:50 AM   #4
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Yes, from my experience I think it's true. For instance, if you start a paragraph at the end of a page, and it ends on the next, the close tag will be on the second page.

But you see, most PDF files are saved as regular PDF, with objects that float around (including text, parts of text, or sometimes even individual letters). You can usually spot them right away if you select the text and there are all sorts of spaces between characters or entire groups of characters. These are very difficult to convert because each object (or groups of objects) have their own coordinates on the page.

On the other hand, there are tagged PDF files, which have open and close tags for objects, especially for text segments, making them (relatively) easier to convert - but not perfect. Keep in mind that PDF is considered an output format and as Will suggested, it's always better to go back to the source of the document.


Oh, and when you say size, do you mean file size or display size ? Display size is different probably because the margins weren't converted properly. Find out the .doc page size and margins and add them to the .rtf.
DSpider is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seriously thoughtful I almost can't believe this is true Exer Lounge 1 04-06-2011 04:23 AM
Two good to be true agraff Introduce Yourself 10 05-21-2010 02:40 PM
is this true? pathfinderca News 1 04-05-2010 12:17 PM


All times are GMT -4. The time now is 01:48 AM.


MobileRead.com is a privately owned, operated and funded community.