View Single Post
Old 11-26-2013, 12:11 AM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Unfortunately, your PDF is made of pictures, probably with text layered behind it, which is why you can highlight it. Many PDFs are like that, and it means nothing, because...

The conversion pipeline in calibre can only read the .png (the main) form of the pages; this is one of the reasons why PDFs are the worst format to convert from.

The margins are likely built into the image, especially if they alternate. That is for the left side/right side pages in a paper book, once printed. The cover image is the first "page" and calibre then adds the cover, again, this time as a cover image. If Images are used heavily, there is less length of content in the html, which is probably why the page numbers are wonky; I get that in comic books all the time. It's treating each image as one line, which to be fair, it is.

You will have to use OCR to get the text from the pictures. OCR is software that attempts to guess the text from pictures -- calibre doesn't include such software, it can only use the actual content of the PDF

Or you can copy and paste into a text file, using calibre's txt conversion to recognize paragraphs by the empty lines in between, use markdown to indicate the bold/headers (for the chapter titles)/italics, use the extracted cover image that calibre has already saved in the book listing, etc. I did this for a few short stories online as free PDF's, and it is not something I would want to do a lot of.

Also, in future, you can attach documents to Mobileread, by posting using Go Advanced ==> Additional Options, instead of using external hosting sites. And it's only against site policy to post these if it is a copyrighted book you don't have permission to share.

Last edited by eschwartz; 11-26-2013 at 12:16 AM.
eschwartz is offline   Reply With Quote