MobileRead Forums - View Single Post

Daithi · 10-28-2009, 10:45 AM

If B&N hasn't done so, it seems to me that they should give each of the Admins that are answering questions a Nook, so that they can actually confirm how the device works with their own eyes. Consulting the "product team" seems like it would be rife with misinterpretations that lead to providing erroneous answers.

Wallcraft,
It appears to me that you are very well-informed in regards to PDF, so if you don't mind, I'm hoping you can confirm my understanding of how PDFs work or set straight my misconceptions.

My understanding of PDFs is that each page in a PDF document is an image, usually acquired with a digital camera or scanner. Typically, OCR software is then applied to the PDF images and the resulting text is stored as data within the PDF file. When you view a PDF on your computer you will normally see the images and not the OCRed text. However, when you perform functions, such as searching for text, the OCRed text is what is actually searched.

When viewing a PDF on a device with a small screen, such as the Nook, the PDF can be shown as reflowable text, and this is done by replacing the text within an image with the OCRed text. Pictures and diagrams within the original source document remain in place, and only the text is replaced with the OCRed text.

Some of the drawbacks to this approach are that not all PDFs have OCR software applied to them. Instead these PDFs are nothing more than a series of images of the original document. The implication being that the Nook would have a difficult time displaying one of these documents as it could only do so by displaying an image of the page (these PDFs are not reflowable). The document images would be resized to fit within the Nook's 6" screen and as a result would likely not be readable.

Another drawback is that even when the PDF contains OCRed text it often contains lots of errors. These OCR errors result in whole sections of text that are completely unreadable, or at best results in text containing at least one error every page or two, which makes reading the text highly annoying.

Another option for small screen sizes is to provide a zoom feature. This allows the user to zoom in on a portion of a page's image, and thus make it readable without having to rely on OCRed text. The drawback to this approach is that you can only view one section of a page at a time. This could be really annoying if you had to zoom in to the top left quarter of a document to read the first two-thirds of a line, then zoom to the top right quarter to read the last third of the line, then back to to the top left quarter to read the first two-thirds of the next line, etc.

The best solution to viewing PDFs is probably to use a device that has a screen large enough to display a readable version of the original documents images. This is the approach used by the Kindle DX, the iRex 1000s, and the upcoming Plastic Logic Que proReader. No need to display OCRed text with errors and no need for zooming (but reflow and zoom aren't forbidden on these devices -- although the Kindle DX doesn't support them). There are occasions when zooming, even on a large screen device, would be beneficial -- such as zooming in to view the detail of an image. Likewise, even large screens would benefit from reflowable text, because you can resize reflowable text to make it easier to read. The only way to resize the text of an image is to make the entire image larger and this may not fit within an ereader's screen size.

So is that about right, or am I going off the rails somewhere?

10-28-2009, 10:45 AM	#4
Daithi Publishers are evil! Posts: 2,418 Karma: 36205264 Join Date: Mar 2008 Location: Rhode Island Device: Various Kindles	If B&N hasn't done so, it seems to me that they should give each of the Admins that are answering questions a Nook, so that they can actually confirm how the device works with their own eyes. Consulting the "product team" seems like it would be rife with misinterpretations that lead to providing erroneous answers. Wallcraft, It appears to me that you are very well-informed in regards to PDF, so if you don't mind, I'm hoping you can confirm my understanding of how PDFs work or set straight my misconceptions. My understanding of PDFs is that each page in a PDF document is an image, usually acquired with a digital camera or scanner. Typically, OCR software is then applied to the PDF images and the resulting text is stored as data within the PDF file. When you view a PDF on your computer you will normally see the images and not the OCRed text. However, when you perform functions, such as searching for text, the OCRed text is what is actually searched. When viewing a PDF on a device with a small screen, such as the Nook, the PDF can be shown as reflowable text, and this is done by replacing the text within an image with the OCRed text. Pictures and diagrams within the original source document remain in place, and only the text is replaced with the OCRed text. Some of the drawbacks to this approach are that not all PDFs have OCR software applied to them. Instead these PDFs are nothing more than a series of images of the original document. The implication being that the Nook would have a difficult time displaying one of these documents as it could only do so by displaying an image of the page (these PDFs are not reflowable). The document images would be resized to fit within the Nook's 6" screen and as a result would likely not be readable. Another drawback is that even when the PDF contains OCRed text it often contains lots of errors. These OCR errors result in whole sections of text that are completely unreadable, or at best results in text containing at least one error every page or two, which makes reading the text highly annoying. Another option for small screen sizes is to provide a zoom feature. This allows the user to zoom in on a portion of a page's image, and thus make it readable without having to rely on OCRed text. The drawback to this approach is that you can only view one section of a page at a time. This could be really annoying if you had to zoom in to the top left quarter of a document to read the first two-thirds of a line, then zoom to the top right quarter to read the last third of the line, then back to to the top left quarter to read the first two-thirds of the next line, etc. The best solution to viewing PDFs is probably to use a device that has a screen large enough to display a readable version of the original documents images. This is the approach used by the Kindle DX, the iRex 1000s, and the upcoming Plastic Logic Que proReader. No need to display OCRed text with errors and no need for zooming (but reflow and zoom aren't forbidden on these devices -- although the Kindle DX doesn't support them). There are occasions when zooming, even on a large screen device, would be beneficial -- such as zooming in to view the detail of an image. Likewise, even large screens would benefit from reflowable text, because you can resize reflowable text to make it easier to read. The only way to resize the text of an image is to make the entire image larger and this may not fit within an ereader's screen size. So is that about right, or am I going off the rails somewhere? Last edited by Daithi; 10-28-2009 at 11:00 AM.