![]() |
#1 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 397
Karma: 85500
Join Date: Feb 2011
Location: Sydney
Device: Sony PRS350, Onyx M92, Onyx T68 (defective!)
|
PDF is flowable?
I always though pdf is displayed as is, like a photo. I didn't know they are flowable like epub.
are all pdfs flowable? Some pdf are obviously just photocopy of textbooks, are they flowable? Really good surprise |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,732
Karma: 128354696
Join Date: May 2009
Location: 26 kly from Sgr A*
Device: T100TA,PW2,PRS-T1,KT,FireHD 8.9,K2, PB360,BeBook One,Axim51v,TC1000
|
As a rule PDFs are best assumed *not* flowable.
*Some* PDFs can be *created* to reflow but it is hit or miss and usually not particularly effective. PDF reflow is not something to count on. |
![]() |
![]() |
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,230
Karma: 7145404
Join Date: Nov 2007
Location: Southern California
Device: Kindle Voyage & iPhone 7+
|
PDF is sort of a wrapper for different types of data. You already noticed that some data is just a graphic image (photocopy). Those cannot be intelligently reflowed because it is just a bunch of dots. If the data is a text file then it becomes possible to reflow and, for example, to print it in a different font.
|
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,042
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
I'm not an expert, but I believe there are tags that can be added to text to make it reflowable. As far as I know, there is no such stuff for tables, equations, code, line drawings,... So even if the text tags have been added, reflow doesn't work with technical documents with the above elements. It really only works for paragraphs of text with the odd image embedded between them (and as dwig implies, only if the tags have been added).
Last edited by rkomar; 03-14-2012 at 04:11 PM. |
![]() |
![]() |
![]() |
#6 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,732
Karma: 128354696
Join Date: May 2009
Location: 26 kly from Sgr A*
Device: T100TA,PW2,PRS-T1,KT,FireHD 8.9,K2, PB360,BeBook One,Axim51v,TC1000
|
Quote:
If you pay for the full Acrobat application you can open pdfs (as long as they're not locked) embed the reflow tags and hope to see some reflow. But the results tend to be mostly underwhelming, even for all-text documents. |
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
In the common basic PDF, text exist as a block only so long as there is no deviation from the default linear flow. Any font change (reg to italic, ...) breaks the block and starts a new separate block. Any alteration in the letterspacing/kerning also ends a block and begins another. As a result, what appears to be a paragraph when the document is displayed is actually, at the least, one separate text block for each line and often several blocks per line. The presence of Tagged text blocks allows readers that are aware of them to skip the fixed layout version of the text, with all its separate pieces, and replace it with the flowable block. With these viewers and with tagged PDFs you have the option to turn on the reflow at the sacrifice of the carefully designed layout. |
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,230
Karma: 7145404
Join Date: Nov 2007
Location: Southern California
Device: Kindle Voyage & iPhone 7+
|
Well, using 3rd party software (or full Acrobat) any time there is text in a PDF I can extract it. If I can extract it, as text, then it has to be reflowable in certain applications.
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,042
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
PDF files are programs that execute inside of a state engine. They combine data with instructions, and what is done with either depends on the state of the engine at that time. For example, the exact position at which one character is rendered may depend on where the preceding character was located. Modifying what happens at some stage of rendering could have drastic effects on what comes after, since the engine state could be different than what was expected when the PDF file/program was written. I think these reflow tags work to alleviate some of that problem, breaking the text into smaller independent objects that can be relocated as a group as far as the engine is concerned. The main point is that you can't think of a PDF file as content and metadata in arbitrary arrangements; it is really a set of precise instructions that have to be followed consecutively according to strict rules. Applying reflow to, say, a mathematical paper will show you what happens when you start messing with the engine in arbitrary ways while it's working (i.e. you get gibberish).
|
![]() |
![]() |
![]() |
#10 | |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 200
Karma: 289206
Join Date: Dec 2011
Device: Onyx M92
|
Quote:
That is my experience too. With programs like ABBYY you can transform any PDF into PDF/A, so it is readable like text: theoretically, that is. If there a no pictures, just text, you get pretty good results. The resulting file can be reflowed, often very well. If there are many pictures, unusual fonts etc. you will not be very happy ... |
|
![]() |
![]() |
![]() |
#11 |
Liseur de Bonne Aventure
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
|
I've played for absolute ages with PDF documents to get them on my reader, and have very seldom succeeded. Even simple .txt documents imbedded in PDFs lose their end-of-line properties, and it is an absolute mess to recreate a proper flow of text. Sure, you can add the tags, even try to automate them, but it takes a lot of work and too much time. My advice to FinancialWar: just don't think of these documents as flowable, technical issues aside, they're really not.
|
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,609
Karma: 9211856
Join Date: Jan 2010
Device: kindle Oasis 2018, kindle 4 NT, kindle PW2, iPhone, iPad mini
|
I have a nook that reflows pdfs sold as ebooks. I have read a lot of them, too, because initially many of our Overdrive library books were only available as pdf.
It's clunky. You can get the text larger (generally you get to choose between microscopic, small, and HUGE), but there are always hyphenation* and page break problems, and sometimes the header/page number gets folded into the text. Images are an issue. If you're reading a pdf intended to be an ebook without much reliance on images, you can get reflow. However, if your pdf has no underlying text or formatting, fuhgeddaboutit. * the choice in hyphenation is between dropped hyphens and retained hyphens. Dropped hyphens are IMO infinitely preferrable. Retained end of line hy-phens can make the text near-ly unreadable. |
![]() |
![]() |
![]() |
#13 | |
Interested Bystander
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,726
Karma: 19728152
Join Date: Jun 2008
Device: Note 4, Kobo One
|
Quote:
You cannot 100% reliably extract text in the correct order. A two column PDF might be laid out as all of column one, then all of column two, or as the first line of both columns, then the second line of both columns... You could have a perfectly valid PDF, which displayed fine on the screen, which printed all the letter 'a's, then all the letter 'b's, and so on. You cannot reliably extract sentence and paragraph endings. You cannot reliably tell whether a new page should or should not start a new paragraph. In short, PDF is excellent at being a final display format, and poor at being a transitional format. |
|
![]() |
![]() |
![]() |
#14 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
FWIW, extracting text *mostly* works. I'd say 85% or more of text-based PDFs (not scans) convert fairly well to Word or HTML formats... and then need cleanup. Remove the headers & page #'s, which extract as just text. Get rid of the forced paragraph breaks at the ends of pages. Find the chapter headers and fix them. (They might be fine. They might be converted to plain text, depending on various font issues.) Look for sets of short lines of text--dialogue especially--that were all crammed into one paragraph.
The text itself tends to extract fine (if there weren't columns or magazine layouts to deal with), but the formatting needs a thorough touchup to be useful. |
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,230
Karma: 7145404
Join Date: Nov 2007
Location: Southern California
Device: Kindle Voyage & iPhone 7+
|
Murray, good point about the order (or possible lack thereof). I have seen that myself.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
eBook PDF - free tool for creating PDF eBooks from text files | KACartlidge | 6 | 01-04-2012 09:41 AM | |
PDF Reader Review and Guide: View, Optimize and Create PDF files | UpSpin | Sony Reader | 15 | 11-26-2011 10:11 AM |
【Best PDF Size】I find The reason of slowing When Read PDF file | linlance | Sony Reader | 0 | 03-11-2010 08:13 AM |
Flowable Text PDF app | Gideon | Apple Devices | 2 | 11-19-2009 04:46 PM |