Unfortunately, your PDF is made of pictures, probably with text layered behind it, which is why you can highlight it. Many PDFs are like that, and it means nothing, because...
The conversion pipeline in calibre can only read the .png (the main) form of the pages; this is one of the reasons why PDFs are the worst format to convert from.
The margins are likely built into the image, especially if they alternate. That is for the left side/right side pages in a paper book, once printed. The cover image is the first "page" and calibre then adds the cover, again, this time as a cover image. If Images are used heavily, there is less length of content in the html, which is probably why the page numbers are wonky; I get that in comic books all the time. It's treating each image as one line, which to be fair, it is.
You will have to use OCR to get the text from the pictures. OCR is software that attempts to guess the text from pictures -- calibre doesn't include such software, it can only use the actual content of the PDF
Or you can copy and paste into a text file, using calibre's txt conversion to recognize paragraphs by the empty lines in between, use markdown to indicate the bold/headers (for the chapter titles)/italics, use the extracted cover image that calibre has already saved in the book listing, etc. I did this for a few short stories online as free PDF's, and it is not something I would want to do a lot of.
Also, in future, you can attach documents to Mobileread, by posting using Go Advanced ==> Additional Options, instead of using external hosting sites. And it's only against site policy to post these if it is a copyrighted book you don't have permission to share.
Last edited by eschwartz; 11-26-2013 at 12:16 AM.
|