MobileRead Forums - View Single Post - Converting pdfs into Nook Simple Touch Format

eschwartz · 11-26-2013, 01:11 AM

Unfortunately, your PDF is made of pictures, probably with text layered behind it, which is why you can highlight it. Many PDFs are like that, and it means nothing, because...

The conversion pipeline in calibre can only read the .png (the main) form of the pages; this is one of the reasons why PDFs are the worst format to convert from.

The margins are likely built into the image, especially if they alternate. That is for the left side/right side pages in a paper book, once printed. The cover image is the first "page" and calibre then adds the cover, again, this time as a cover image. If Images are used heavily, there is less length of content in the html, which is probably why the page numbers are wonky; I get that in comic books all the time. It's treating each image as one line, which to be fair, it is.

You will have to use OCR to get the text from the pictures. OCR is software that attempts to guess the text from pictures -- calibre doesn't include such software, it can only use the actual content of the PDF

Or you can copy and paste into a text file, using calibre's txt conversion to recognize paragraphs by the empty lines in between, use markdown to indicate the bold/headers (for the chapter titles)/italics, use the extracted cover image that calibre has already saved in the book listing, etc. I did this for a few short stories online as free PDF's, and it is not something I would want to do a lot of.

Also, in future, you can attach documents to Mobileread, by posting using Go Advanced ==> Additional Options, instead of using external hosting sites. And it's only against site policy to post these if it is a copyrighted book you don't have permission to share.

11-26-2013, 01:11 AM	#7
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Unfortunately, your PDF is made of pictures, probably with text layered behind it, which is why you can highlight it. Many PDFs are like that, and it means nothing, because... The conversion pipeline in calibre can only read the .png (the main) form of the pages; this is one of the reasons why PDFs are the worst format to convert from. The margins are likely built into the image, especially if they alternate. That is for the left side/right side pages in a paper book, once printed. The cover image is the first "page" and calibre then adds the cover, again, this time as a cover image. If Images are used heavily, there is less length of content in the html, which is probably why the page numbers are wonky; I get that in comic books all the time. It's treating each image as one line, which to be fair, it is. You will have to use OCR to get the text from the pictures. OCR is software that attempts to guess the text from pictures -- calibre doesn't include such software, it can only use the actual content of the PDF Or you can copy and paste into a text file, using calibre's txt conversion to recognize paragraphs by the empty lines in between, use markdown to indicate the bold/headers (for the chapter titles)/italics, use the extracted cover image that calibre has already saved in the book listing, etc. I did this for a few short stories online as free PDF's, and it is not something I would want to do a lot of. Also, in future, you can attach documents to Mobileread, by posting using Go Advanced ==> Additional Options, instead of using external hosting sites. And it's only against site policy to post these if it is a copyrighted book you don't have permission to share. Last edited by eschwartz; 11-26-2013 at 01:16 AM.