Quote:
Originally Posted by MarjaE
... It figures. If it's useful, it will be phased out. Like touchscreen-free e-readers. If it would be useful, it won't be developed.
|
It's not that at all. It is the nature of PDF.
PDFs do not necessarily contain any sentence, paragraph, or other layer information. The contain merely strings of characters and/or other graphic elements along with their position on the page. Two adjacent characters can be located in totally different locations in the digital file and have absolutely do internal relationship.
It takes an AI to assemble the different pieces in a logical order. At present, such AIs are limited to using the position information in order to reassemble the document flow. We'll never have truly good tools until an AI is developed that can comprehend sentence structure and use that in order to do the reassembly. The tools we have, a poor as they are, actually do a rather decent job given the Herculean task put before them.
Humpty Dumpty sat on a wall <=the book before converstion to PDF
Humpty Dumpty had a great fall <=the book being converted to PDF
All the King's Horses and all the Kings Men <=the available conversion tools
Couldn't put Humpty together again <= FOL with PDFs