View Full Version : Eink fonts?


CheyenneDonna
06-18-2011, 11:22 AM
I have a curious problem with some scanned in accessible format books. The accessibility center at school scanned several books for me in a format I could reflow, rather than the usual PDF image, however, some of them just show blank pages when I open them on the reader side. I have no trouble if I open them on the LCD with a PDF viewer. Is the reader not able to display all fonts?

tomsem
06-18-2011, 12:53 PM
I have a curious problem with some scanned in accessible format books. The accessibility center at school scanned several books for me in a format I could reflow, rather than the usual PDF image, however, some of them just show blank pages when I open them on the reader side. I have no trouble if I open them on the LCD with a PDF viewer. Is the reader not able to display all fonts?

It sounds like they forgot to OCR the scanned book, so it is just a series of page images. Those won't show anything if the reader is in PDF Reflow mode (is the reader in PDF reflow mode? do you see anything when you turn off reflow?). You could open with Adobe Reader to check this, see what happens when you turn on reflow (a View option I think). The Adobe RMSDK that reader uses has a different reflow algorithm than Adobe Reader (RMSDK doesn't pay attention to reading order tags), but they should behave similarly in terms of this.

In general, whatever fonts are needed to render text should be embedded in the PDF. But it is possible to create PDFs without embedded fonts, in which case it tries to find them on the OS. Even then I think it would finally resort to font substitution and display text.

On the other hand if you turn off PDF reflow and still don't see anything, the authoring system is at fault. I sometimes find for example that PDFs I create with Apple's PDF writer cannot be read by Acrobat (no text shows up). The Adobe RMSDK that the eink reader uses is a little more fault tolerant, but it probably shares some of the same properties, if not some of the same code. Adobe has to maintain a very high standard for security, given the ubiquity of Reader, so if some authoring system is doing anything deviant or strange, it's not surprising that Adobe's PDF rendering system is going to ignore it rather than allow potentially malicious code to run. Again a good way to check would be to use Adobe Reader and see how it renders the problem PDFs.

Last_of_the_PEs
06-18-2011, 01:25 PM
Open the book in Acrobat Reader (on a PC), then try to capture text using the I-beam tool. If you just get blocks and cannot scrub the text in lines, you're dealing with images, not text.
The eInk viewer will use SOME substituted font where the imbedded one isn't included

CheyenneDonna
06-18-2011, 02:57 PM
I can open the files on a PC with Adobe Reader. OCR was performed, they are searchable. It looks like they scanned with OmniPage CSDK 16 and used OmniPage 17 to create the PDF. (I own OmniPage 18). No Security was set. The fonts are embedded. At random a line or word will be visible on the reader, but not searchable. I can search the PDF on my PC.

ivanjt
06-18-2011, 06:37 PM
I had something like that happen to some PDF files. A new girl in my office scanned some manuals and did a good job with the OCR but forgot to edit and add the diagrams back so I had many blank pages. I don't know if this is the problem but it could be if there are pages of diagrams and tables.

Another thought. Have you tried to convert them with Calibre to, say epub, and see what you get. If that allows them to show the text then there is a problem with the PDF file which may be fixed by taking the epub and converting it back to PDF.

At random a line or word will be visible on the reader, but not searchable.
This tends to indicate there are problems with the embedding of the fonts and/or the way the reader display program reacts to the PDF version of file itself. You can check this by opening the file in a text editor and looking at the first line - it should show something like %PDF-1.3 with the numbers being the version it was created with. All of the PDFs I use are 1.3 to 1.5 - the PDF version of the edge users guide is 1.3. Again this may or may not be the cause of the problem but is worth checking.

CheyenneDonna
06-19-2011, 12:13 AM
When I use notepad to open one it says %PDF-1.6.
When I view in the reader only the images sow up, and not all of them.
Converting with Calibre results in many misplaced lines, the PDF is 2 column - does Calibre have difficulty with 2 column PDF files?

ivanjt
06-19-2011, 06:58 AM
I'm not sure but that could be the problem. Here, anything we convert to PDF has a setting of %PDF-1.5 because we had problems displaying anything above that. We can now display PDFs above that because my programmer has recompiled the reading application using the latest libs available.

I don't have anything higher than 1.5 here to test but I'll get someone in the office to convert something with two columns and diagrams to 1.6 and then test it on my EE - will be late tomorrow at the earliest as it can get a little hectic first thing Monday morning.

Again I can't be sure about Calibre and later version PDFs because we've never tried to convert anything above 1.5 and what we have done no one has complained about - will also try this tomorrow if possible (we have win and Linux versions running in VMs on the server).

Gunnerp245
06-19-2011, 08:49 AM
When I use notepad to open one it says %PDF-1.6.
When I view in the reader only the images sow up, and not all of them.
Converting with Calibre results in many misplaced lines, the PDF is 2 column - does Calibre have difficulty with 2 column PDF files?

Calibre does not handle two-column input.

obsessed2
06-19-2011, 09:35 AM
I can open the files on a PC with Adobe Reader. OCR was performed, they are searchable. It looks like they scanned with OmniPage CSDK 16 and used OmniPage 17 to create the PDF. (I own OmniPage 18). No Security was set. The fonts are embedded. At random a line or word will be visible on the reader, but not searchable. I can search the PDF on my PC.

I had a similar problem when I printed Elsevier Pageburst books for my son and moved them to the Edge. Try reprinting one of the scanned documents again as a PDF (I know they are already PDF) and see what happens. I do a lot of work with scanned PDFs and there is a free program (doPDF) that prints PDFs and makes them usable on the Edge. Once printed with doPDF the documents become searchable, you can highlight, annotate etc.

http://www.dopdf.com/

CheyenneDonna
06-19-2011, 10:33 AM
Thanks for all the suggestions and help offers. I was very puzzled to open blank pages on the edge reader and not the PC, it never happened before. I do own dopdf and I guess it worth a try, trouble is there are 5 textbooks, split into chapters by the school accessibility, so that's about 100 print jobs. Ouch.

I did some experimenting with OmniPage 18. I scanned a page and the OCR. I saved it in 2 different ways, as a searchable PDF image and as a PDF. The searchable image displays blank on the reader and the LCD. The PDF opens fine and reflows. Totally weird.

obsessed2
06-19-2011, 01:03 PM
Thanks for all the suggestions and help offers. I was very puzzled to open blank pages on the edge reader and not the PC, it never happened before. I do own dopdf and I guess it worth a try, trouble is there are 5 textbooks, split into chapters by the school accessibility, so that's about 100 print jobs. Ouch.

I feel your pain. I printed out four different medical books from Elsevier Pageburst for my son, some with 40 to 50 chapters. I printed the books chapter by chapter as that is the only thing Elsevier allows you to do. I also printed cover pages and appendices. I then reassembled all the individual PDFs using PDF Converter Professional 6 to make a complete book. I even created a Table of Contents using PDF Converter Pro so my son could easily move back and forth between chapters and appendices. I used doPDF to print each chapter and after assembling the books with PDF Converter Pro they were 100 percent searchable and my son can highlight and annotate the books. I also added the individual chapters to the Edge library for convenience. Took me about a day but was totally worth the effort. I suggest you start with a couple of chapters and see what works. Once I figured out what I was going to do the printing and reassembly was actually quite easy. Wish you luck.

Last_of_the_PEs
06-19-2011, 01:21 PM
I've never had a good result from anything including Acrobat Pro itself, using 2-column text source. At a former employer,

I used an IBML scanner that did it well, exporting to RTF, but that's a $725,000 unit, so it doesn't really "count" as anything WE could possibly use.

ivanjt
06-20-2011, 05:53 PM
Hi Donna, the office has buckled down to work and produced for me a version 1.6 OCRed two column text with pictures, diagrams and tables included and it shows up on the e-ink just like any of the manuals we produce and the only difference on the LCD side is the colour in some of the charts shows up.

Looking for differences between what you have and what we have, I assume yours is produced in windows and ours uses linux. Our procedure is to scan each page, take out pictures and diagrams then OCR what is left. The OCR file is then opened in Libre Office, spell checked, formatted and the pictures and diagrams are then inserted in the correct places. The resulting file is then saved and printed to a virtual PDF printer - there we have the PDF we want. It is also possible to create PDF files directly from Libre Office with good results on simple documents.

Another difference between what we do and what you have is that we get paid to do what we do and I assume you get yours 'free'.

Sorry I can't be of any more help, but being unable to reproduce the problem you have leaves me without a starting point.

CheyenneDonna
06-20-2011, 06:25 PM
That was very cool of you. Yes I'm on Windows, as I assume my school is and yes my services are provided free by the school or government. They are using OmniPage 17, I own Omni 18. If I use OmniPage to open one of the files they sent me, reprocess it and save as a PDF ( where they saved as a PDF edited) then I can see it on my reader. Very strange.

I played around a bit and this is what I found. I used OmniPage 18 to scan a page, do OCR, and saved in 2 different formats. The page I saved as a PDF searchable image shows blank on my reader in reflow or not. The page saved as a PDF works fine.(I have 4 PDF save choices in Omni 18 - PDF, PDF Edited, PDF Searchable Image, or PDF with image substitutes)

ivanjt
06-21-2011, 04:51 AM
That sounds as if you need to check the OmniPage manual and find out what they mean by 'Searchable Image' and what they do to make one. To me it sounds like they they place tags of some sort onto the image that locate words etc. It is most probable that one or more of these tags are upsetting the display software - shown by your removal of them by saving a plain PDF.

I don't think it would be any good asking my programmer if she has any ideas because I know her answer - give me a SDK and then I'll tell you. This is very much like a problem we encountered with a client, they had a lot of old wordstar document files that they couldn't open on their new computers - we found we could open them in a VM which solved the clients problem. We later found out it was a control code embedded in the doc that upset the video card. I think you problem might be something similar.