View Single Post
Old 11-26-2011, 08:29 AM   #14
Analogus
Fanatic
Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.Analogus ought to be getting tired of karma fortunes by now.
 
Analogus's Avatar
 
Posts: 568
Karma: 2170348
Join Date: Apr 2011
Device: 2x Sony PRS-350; PRS-300 (†), Paperwhite (†), Voyage
Quote:
Originally Posted by Prestidigitweeze View Post
Rizla:

It's very possible that EPUB 3 and Kindle Format 8 will eliminate the necessity of UpSpin's task.
I cannot see why the necessity will be vanish. There will be enough PDF's being outside, and they still want to be read.

Regarding PDF and handling:

I for my part use PDF very often on my 5"-reader. Therefor I developed ;-) following 'decision-tree':

1) Try to use the file without altering it on the reader in re-flow-mode
2) If experience is OK --> read on. If experience is bad try
a) cropping headers and footers as described above
or
b) convert the PDF into EPUB --> goto (3)
3) Converting PDF:

Fact: I do not want to experiment with every PDF to see what happens.
Fact: I sometimes want to have pictures and other times just plain text.
Fact: PDF's come in different quality

My solution for ALL PDF's:

I use the (sadly not free of charge) software ABBY-PDF-Transformer .
It takes EVERY technical form of PDF and do a complete OCR-process. Sometimes it is necessary to do a ~30 min. manual correction of picture-frames.

Details:
  1. Crop headers and footers (page numbers, ...) with whatever software you want (for ex. Adobe Acrobat)
  2. Load it in Abby-PDF-Transformer and do just a recognition of the different areas (pictures, test, tables)
  3. manually correct areas if necessary, especially picture areas. This step is ev. necessary for 50% of PDF's
  4. Do the OCR and produce a HTML-file without original layout
  5. Open it in MS Word and save it as RTF. Close Word and reopen the RTF. Save it a second time as HTML.
  6. Load the HTML-file in CALIBRE
  7. Convert it into EPUB

That procedure sounds ridiculous, but there is just (and maybe) one longer part: Correction of areas in Abby-software.

I did a huge number of PDF-converting in various ways and ended up as described.

A.

Last edited by Analogus; 11-26-2011 at 08:32 AM.
Analogus is offline   Reply With Quote