Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-24-2016, 10:00 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Good way to convert pdfs to epubs on the mac?

Most of these are scanned pdfs.

Some pdfs can freeze Preview, Skim, or either e-reader. And generally pdfs are much harder on the Kindle Dx than mobis are.

Some of these have text layers, some don't. Extracting the text layers, removing the line-braking hyphens, and resolving the misspellings could help.
MarjaE is offline   Reply With Quote
Old 12-24-2016, 11:04 PM   #2
Tenzin_la
Enthusiast
Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.Tenzin_la is not intimidated by interfenestral monkeys.
 
Tenzin_la's Avatar
 
Posts: 29
Karma: 26718
Join Date: Nov 2013
Location: Long Island, NY - USA
Device: Oasis
Off the top of my head, I'd use https://smallpdf.com/ to convert the PDF to Word and then let Calibre do the conversion to whatever eBook format you like. The only consistent problem I've encountered is the conversion adds about one extra space between words every two pages or so (which is easily fixable automagically a number of ways.
Tenzin_la is offline   Reply With Quote
Advert
Old 12-25-2016, 11:19 AM   #3
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I tried https://smallpdf.com/ with 3 pdfs. The first 2 times it answered "This file does not seem to be a PDF." The last time it answered "Sorry your upload failed. Please try again," twice.

P.S. I tried the help page. It flashed, and now I have a migraine. And no, I can't use the web without strobe-blocking and animation-blocking extensions.

Last edited by MarjaE; 12-25-2016 at 11:28 AM.
MarjaE is offline   Reply With Quote
Old 12-25-2016, 03:03 PM   #4
pluma
Enthusiast
pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.
 
Posts: 48
Karma: 854254
Join Date: Nov 2016
Device: none
Quote:
Originally Posted by Tenzin_la View Post
Off the top of my head, I'd use https://smallpdf.com/ to convert the PDF to Word and then let Calibre do the conversion to whatever eBook format you like. The only consistent problem I've encountered is the conversion adds about one extra space between words every two pages or so (which is easily fixable automagically a number of ways.
this gotta be the most unhelpful answer!

I don't think you'll find a magic wand that'll guess your intended goal. What you should be looking at it's far a workflow, a combination of tools. Since you know your material and where you want to get at after trying several things and converting and adjusting a few of the pdfs you'll realize which tools are good for your needs.

There no shortage of tools edit and manipulate pdfs so I give you a few off the top of my head:

LibreOffice
imagemagick
gimp
different pdf readers and their convertig options.

What you have to do is google 'pdf' + "program" + "approximate goal" and you'll get a lot good results.

At the begining it might sound like a inconveninet thing to do but after you see the results you'll get the hang of it.

Lastly, Safari's PDF viewer got a pretty darn good OCR when highlighting pds.

peace
pluma is offline   Reply With Quote
Old 12-25-2016, 05:12 PM   #5
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Okay.

Some of these have their own imperfect text layers. Splitting or compressing the documents often results in losing the text layers. (I use pdf toolkit+)

Some of these come from the Internet Archive and have ocr'd text versions. The big problems are that the ocr can screw up tables, can misread figures, and of course, can misread ordinary words. So I've needed either pdf or djvu for comparison. Some don't have text versions.

If I can extract the text layer, then spell-checkers could help with the minor errors, the substitution of punctuation for letters, etc., in English-language docs. Not so useful with the major errors. (I would prefer NeoOffice to LibreOffice for this, but neither can find and replace hyphen-breaks or extra line breaks, so I'd probably need Calibre's editing tools too.)

If I can find, excerpt, and re-compress the relevant tables, I could perhaps use two versions, one a pdf with the tables, and the other an epub or mobi with the text. (I would keep using pdf toolkit+)
MarjaE is offline   Reply With Quote
Advert
Old 12-25-2016, 06:46 PM   #6
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Some other resources on the topic:

1. My PDF conversion web page.

2. Mobileread's wiki
willus is offline   Reply With Quote
Old 12-26-2016, 12:36 PM   #7
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Okay, thanks.

I really need some way to extract existing text layers for entire books. I know "it isn't hard" but I don't know how to do it.

I would also like some way to speed up proofreading, and extract the pages with tables and insert them at the appropriate points in the text.

I don't have Word, but I imported into NeoOffice and got a long ocr'd drawing of one text. I think the ocr was the Internet Archive's text layer, but I don't know for sure. I can see that some figures are off - 4 for 1, 0 for 6, etc. I don't know how to strip off the source images or convert all the small text boxes into proper tables...

And with my disabilities, I haven't found an accessible tablet, and I never expect to.
MarjaE is offline   Reply With Quote
Old 12-27-2016, 12:38 PM   #8
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
To extract the text use Adobe reader and save as text from the file menu.
DaleDe is offline   Reply With Quote
Old 12-28-2016, 11:23 AM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by DaleDe View Post
To extract the text use Adobe reader and save as text from the file menu.
k2pdfopt will also extract the text layer:
Code:
k2pdfopt -ocrout outfile.txt -mode copy myfile.pdf
willus is offline   Reply With Quote
Old 12-28-2016, 06:13 PM   #10
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I don't have Adobe Reader.

I have a nasty strobe sensitivity. Adobe's site has hit me with strobes. I use a number of Firefox accessibility fixes, but they didn't block these strobes. I have Adobe Digital Editions from years ago, but I avoid that site now.

Neither Skim nor Preview allow me to save pdfs as text.

I never figured out how to install k2pdf. Or exactly what I could do with it. I have Elucidate to create a text layer.
MarjaE is offline   Reply With Quote
Old 12-30-2016, 12:11 AM   #11
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
I have Elucidate to create a text layer.
FYI, Elucidate uses the Tesseract OCR engine.
willus is offline   Reply With Quote
Old 01-07-2017, 04:35 AM   #12
helixpteron
Junior Member
helixpteron began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Dec 2016
Device: Kobo Aura One
I only have a history of two weeks trying to convert:

I tried calibre, Acrobat Pro with Calibre, Wondershare PDF converter and Aiseesoft Converter.

The only software I found to do this reliably was the Aiseesoft PDF Converter and I stopped looking for another solution, as this simply did the trick for what I needed. There is a pdf to epub only version, which ist cheaper.

In theory you should be able to set the autoimport folder of calibre as the output folder of Aiseesoft, but unfortunately it doesn't set author and title right - so I export into source folder and then add the format to calibre manually. It's a bit inconvenient, but the result is worth it. Or - you can autoadd it - mark the old and the new book and then use command-shift-M to merge the two book in the first selected one - this requires the least input.

Therefor this isn't a solution for bulk conversion, but I only use it when calibre doesn't yield a decent result and then it ist brilliant. I tried to achieve the same with Acrobat Pro, but it was inferior - by far. I reconverted books that came out unusable with everything else I tried and it worked fine.

https://www.aiseesoft.de/pdf-to-epub-converter/


PS: For the cover images to show up in ibooks I had to do a epub to epub conversion with output profile table in calibre, but that might not be a conern for you.

Hope this helps, Rob.

Last edited by helixpteron; 01-07-2017 at 04:47 AM.
helixpteron is offline   Reply With Quote
Old 01-08-2017, 11:42 AM   #13
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by helixpteron View Post
I tried calibre, Acrobat Pro with Calibre, Wondershare PDF converter and Aiseesoft Converter.

The only software I found to do this reliably was the Aiseesoft PDF Converter and I stopped looking for another solution, as this simply did the trick for what I needed...
This is certainly an inexpensive solution. I quickly tried Aiseesoft with a couple of my favorite benchmark PDFs and it did the scanned book well (just a couple mistakes with the OCR) but not very well with the two-column PDF--it lost the table and a lot of the text was missing or not formatted correctly. This is admittedly a hard PDF, but MS Word (Office 365 version) does a very good job with it--faithfully retaining its layout. MS Word does not save directly to epub, though. You still have to get it to epub format. See attachments (epubs created by Aiseesoft).
Attached Files
File Type: pdf milne.pdf (1.47 MB, 431 views)
File Type: epub milne.epub (70.5 KB, 404 views)
File Type: pdf ieee.pdf (154.0 KB, 394 views)
File Type: epub ieee.epub (43.3 KB, 390 views)
willus is offline   Reply With Quote
Old 01-13-2017, 11:52 AM   #14
pluma
Enthusiast
pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.pluma ought to be getting tired of karma fortunes by now.
 
Posts: 48
Karma: 854254
Join Date: Nov 2016
Device: none
hallo,

if the pdf is text based "pdftohtml" gets it right but with css/html monstrosity which can be taken care afterwards.

There's a more sophisticated tool which I haven't tried 'pdf2htmlEX'. This one get complex scientific pdf rendering into proper mathml and some other nifty proper html/css/whatever format.

If it's just text 'pdftohtml' gets fontsize, bold, italics, quotes etc, correct.

As I said earlier, is about trying to see what fits someones particular case.

Here calibre faired poorly with a pdf to epub directly.

Sigil, as a tool, is quite faster during post pdf editing/clean up.
pluma is offline   Reply With Quote
Old 01-19-2017, 07:21 AM   #15
helixpteron
Junior Member
helixpteron began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Dec 2016
Device: Kobo Aura One
I must admit, that I only tried it with PDFs that were also in part comlex, but never with a multi column one. Thanks for the hint with WORD - I will now use Aiseesoft for books and word for papers.

Cheers, Rob.
helixpteron is offline   Reply With Quote
Reply

Tags
pdf to epub


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert PDFs into readable EPUBs skinnymojo Conversion 3 01-23-2012 03:06 PM
Whats the best reader for ePubs and PDFs? BIG45-70 Which one should I buy? 3 07-28-2010 01:35 PM
Calibre 0.6.14 with Mac OSX 10.6.1: didn't convert any PDFs MarcJLH Calibre 9 10-02-2009 11:35 PM
RELEASED: Native transcoding of PDFs and epubs on the Kindle2 jesse Kindle Developer's Corner 23 05-27-2009 11:19 AM
Convert print-protected pdfs into image-based pdfs? magogo Sony Reader 3 12-04-2007 01:18 AM


All times are GMT -4. The time now is 01:44 AM.


MobileRead.com is a privately owned, operated and funded community.