I read the thread
Converting a PDF to mobi and having it come out right? which was about converting the very book I have on hand to mobi. BTW the book is 'The Girl on the Dock' and is only available in PDF. I just spent a day and a half trying various methods of converting this book to epub. I think whoever decided to publish this book in two-facing-pages PDF should be fired, if not molested with Petra's magic wand. ;-)
The solution I finally settled upon is as follows, step by step. It produces a decent epub with all illustrations, but some of the sentences are unexplainably split and the paragraphs have no spacing between them no matter how hard I tried. I could hand edit these but it is not worth my time. I suspect the lousy PDF is not formatted correctly. There is also no TOC but that is because the stupidly authored PDF has none. I could also add that but for 5 short chapters it doesn't seem worthwhile. Note that I saved the cropped file to HTML because Calibre does an infinitely better conversion of HTML than it does of PDF.
Procedure:
1. Open PDF in Acrobat X:
Tools->Pages->Header & Footer->Remove…:
Removes only the text above the upper hairline.
Tools->Pages->Crop:
Select entire region between hairlines at full width. Double click the selection. Select Page Range->All and click OK. Crops all pages and can be undone. Do not check 'Remove White Margins' or it will include the hairlines.
Tools->Protection->Remove Hidden Information:
When Status is 'Finding Hidden Information…Done', click Remove.
File->Save As->More Options->HTML Web Page->Settings…:
Check 'Include Images'. Uncheck 'Run OCR if needed' or it produces unwanted artifacts.
See
Cropping Pages Permanently with Acrobat Pro for more information.
2. The HTML file needs touching up--the 'illuminated' first character of each chapter is missing. Open in plain text editor and find ">one<". Then type in the missing 'P' in 'Petra'. Find ">two<" and type in the missing 'T' in 'The'. Repeat find up to ">five<" and type in missing uppercase character.
3. Calibre CLI:
Code:
ebook-convert "The Girl on the Dock.html" "The Girl on the Dock.epub" --no-default-epub-cover --pretty-print --preserve-cover-aspect-ratio --enable-heuristics --insert-blank-line --cover "cover_image.png" --title "The Girl on the Dock" --authors "G. Norman Lippert"
This is specific to my book but could easily be adapted to any book.
Original PDF:
[Image violates guidelines for size - MODERATOR]
Resulting epub:
[Image violates guidelines for size - MODERATOR]