Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2013, 01:23 AM   #1
plaidrhino
Junior Member
plaidrhino began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2013
Device: Nook Simple Touch
Question Converting pdfs into Nook Simple Touch Format

Hi,

I have some books in pdf format. I have been trying to convert them into Nook Simple Touch format. I've tried online tools that turn pdfs into .epubs, but they don't seem to display correctly once moved over to my Nook.

Either the page numbering will be off, screens will number the same page number multiple times, the size of the page is too large for the screen, etc.

I learned that an epub is a zip file with html and other files. I renamed one as such, and saw that one conversion turned real highlightable text into .png files!

Is there is a tool, online or downloadable, that will convert my pdfs into a file that will be optimized for Nook Simple Touch-friendly files? If so, is there one where editing, font changing, font size changing, and previewing is possible?

Last edited by plaidrhino; 11-14-2013 at 01:26 AM.
plaidrhino is offline   Reply With Quote
Old 11-14-2013, 05:42 AM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Had this same discussion last month in the PDF section:

https://www.mobileread.com/forums/sho...d.php?t=223817

Just be warned, PDF -> anything is the WORST conversion. There will be lots of errors from the OCR output, and it takes many hours of fixing to get it up to par.
Tex2002ans is offline   Reply With Quote
Advert
Old 11-14-2013, 11:15 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
If the PDF contains a text layer, i.e. if you can search it, you could try converting it with Calibre with all Heuristics Processing options enabled.
Activating all Heuristics Processing options often leads to somewhat better results. However, as Tex2002ans has already pointed out, PDF files are the worst input format for ePub converstion.

Here's a screen shot of the dialog from SoftPedia:

Doitsu is offline   Reply With Quote
Old 11-15-2013, 12:55 AM   #4
plaidrhino
Junior Member
plaidrhino began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2013
Device: Nook Simple Touch
Well, I tried using Calibre, turning on the heuristics.

It converted better, with a few anomolies:

1. Triple copies of the cover
2. Numbering is off. A few pages show 1 of 4, then many show 2 of 4. There are over 200 pages though.
3. Margins are too large - too much white space around each page.
4. It converted all the text pages, which I could highlight in Adobe Reader, into png. Only 2 pages, the front and back are graphical.


Trying to correct the above. I tried the "tweak book" tool. I opened content.opf. I don't know that much html, which I think will help. Am I on the right track here?
plaidrhino is offline   Reply With Quote
Old 11-15-2013, 07:55 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You can try adapting the stylesheet. The best way is still to do OCR though.
Toxaris is offline   Reply With Quote
Advert
Old 11-25-2013, 11:13 PM   #6
plaidrhino
Junior Member
plaidrhino began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2013
Device: Nook Simple Touch
Toxaris,
Thanks for the idea. I think calibre used OCR, as it converted the text into images.

Still hitting the same problems:

- Duplicate copies of the cover
- Numbering. It only shows x (1, 2, 3 or 4) of 4 pages, even though there are over 200+ pages.
- Margins. Too large, extra left alternating with extra right, and overall too much around each page.

It is still readable, but I would like it to be a better conversion.

If it's not against the site policy, I'll post the original pdf and the epub conversion that resulted from Calibre. Maybe someone can advise of a better technique.

Original book:
http://www.mediafire.com/view/6rb06n...0of%20Mind.pdf

epub output:
http://www.mediafire.com/download/o0...e_of_Mind.epub

Thanks.
plaidrhino is offline   Reply With Quote
Old 11-26-2013, 12:11 AM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Unfortunately, your PDF is made of pictures, probably with text layered behind it, which is why you can highlight it. Many PDFs are like that, and it means nothing, because...

The conversion pipeline in calibre can only read the .png (the main) form of the pages; this is one of the reasons why PDFs are the worst format to convert from.

The margins are likely built into the image, especially if they alternate. That is for the left side/right side pages in a paper book, once printed. The cover image is the first "page" and calibre then adds the cover, again, this time as a cover image. If Images are used heavily, there is less length of content in the html, which is probably why the page numbers are wonky; I get that in comic books all the time. It's treating each image as one line, which to be fair, it is.

You will have to use OCR to get the text from the pictures. OCR is software that attempts to guess the text from pictures -- calibre doesn't include such software, it can only use the actual content of the PDF

Or you can copy and paste into a text file, using calibre's txt conversion to recognize paragraphs by the empty lines in between, use markdown to indicate the bold/headers (for the chapter titles)/italics, use the extracted cover image that calibre has already saved in the book listing, etc. I did this for a few short stories online as free PDF's, and it is not something I would want to do a lot of.

Also, in future, you can attach documents to Mobileread, by posting using Go Advanced ==> Additional Options, instead of using external hosting sites. And it's only against site policy to post these if it is a copyrighted book you don't have permission to share.

Last edited by eschwartz; 11-26-2013 at 12:16 AM.
eschwartz is offline   Reply With Quote
Old 11-26-2013, 07:07 AM   #8
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Does original text show when you select text view in a program like Foxit Reader? You might be able to get at the text that way. What you will get is very hard to say, since PDFs are constructed for display and printing, not deconstruction.
mrmikel is offline   Reply With Quote
Old 11-26-2013, 03:06 PM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by mrmikel View Post
Does original text show when you select text view in a program like Foxit Reader? You might be able to get at the text that way. What you will get is very hard to say, since PDFs are constructed for display and printing, not deconstruction.
It does, (the OP said so) but how would you recommend turning that into paragraphs/chapters? I've done that by hand for short stories (copy-pasting, with Paragraph Style: Block and markdown formatting for the bold/italics), but WHEW is that time-consuming.
eschwartz is offline   Reply With Quote
Old 11-26-2013, 03:08 PM   #10
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
It is a lot of work. But it might be less work than OCRing and correcting those errors.

Oh for the bad old days when everyone didn't think they were graphic artists and the text had to do the talking.
mrmikel is offline   Reply With Quote
Old 11-26-2013, 03:15 PM   #11
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
the text inside the file is likely OCR'd in the first place and likely full of errors.

Dale
DaleDe is offline   Reply With Quote
Old 11-27-2013, 04:54 AM   #12
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
My Word add-in might help you to clean up the OCR text more quickly with retaining formatting.
Toxaris is offline   Reply With Quote
Old 12-01-2013, 08:00 PM   #13
plaidrhino
Junior Member
plaidrhino began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Nov 2013
Device: Nook Simple Touch
Thank you all for the replies. I didn't get an email informing me of replies. I just changed the notification setting from weekly (default?) to instant.

Here are my replies:

eschwartz - thanks for the tips and the attachment option info
mrmikel - I haven't used Foxit - just adobe pdf reader. maybe i'll try it.
DaleDe - the text is fine. No errors that I've seen
Toxaris - I might try your add-on if I really want to do the coversion.

For now, I'm reading it ok in my nook. Just not as clean or neat as I'd like it to be, but it works. Occasionally it loses the page I'm on, or I have to reboot, and then have to press the up or down key a lot, like 50 times, since it only thinks there are 4 pages, and the nook's 'scroller' doesn't work.

Wondering, I used to use Adobe's pdf distiller to go from pdf to Word, or another text output - anyone familiar with that? I might use that again.
plaidrhino is offline   Reply With Quote
Old 12-02-2013, 11:11 AM   #14
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by plaidrhino View Post
Thank you all for the replies. I didn't get an email informing me of replies. I just changed the notification setting from weekly (default?) to instant.

Here are my replies:

eschwartz - thanks for the tips and the attachment option info
mrmikel - I haven't used Foxit - just adobe pdf reader. maybe i'll try it.
DaleDe - the text is fine. No errors that I've seen
Toxaris - I might try your add-on if I really want to do the coversion.

For now, I'm reading it ok in my nook. Just not as clean or neat as I'd like it to be, but it works. Occasionally it loses the page I'm on, or I have to reboot, and then have to press the up or down key a lot, like 50 times, since it only thinks there are 4 pages, and the nook's 'scroller' doesn't work.

Wondering, I used to use Adobe's pdf distiller to go from pdf to Word, or another text output - anyone familiar with that? I might use that again.
How are you looking at the text to see if there are errors? If you look at the image in the PDF you will not see them. If you use save as text in the PDF reader and then look at the text file itself you create then you will see them if they are there. Some PDF files have two layers, one an image of the text and a second that is the OCR'd text that can be used for searches and printed in the save as text option.

Dale
DaleDe is offline   Reply With Quote
Reply

Tags
conversion, nook, pdf to epub, pdf to epub converter, simple touch


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Numbered and bulleted list format lost on Nook Simple Touch epub Pondering ePub 2 10-15-2013 09:19 AM
ConsumerReport: E-book readers: Nook Simple Touch tops Kindle Touch afv011 Barnes & Noble NOOK 4 11-22-2011 03:39 PM
Kindle 4th gen non touch vs Nook Simple Touch shinew Which one should I buy? 8 10-07-2011 09:10 PM
Kindle 3, Nook Simple Touch, Kobo Touch and Libra Pro Touch jbcohen Which one should I buy? 4 06-18-2011 07:58 PM
Pre-ordering Nook Simple Touch or Kobo Touch? SilentDuck Which one should I buy? 27 05-29-2011 05:27 PM


All times are GMT -4. The time now is 06:45 PM.


MobileRead.com is a privately owned, operated and funded community.