12-24-2016, 10:00 PM | #1 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Good way to convert pdfs to epubs on the mac?
Most of these are scanned pdfs.
Some pdfs can freeze Preview, Skim, or either e-reader. And generally pdfs are much harder on the Kindle Dx than mobis are. Some of these have text layers, some don't. Extracting the text layers, removing the line-braking hyphens, and resolving the misspellings could help. |
12-24-2016, 11:04 PM | #2 |
Enthusiast
Posts: 29
Karma: 26718
Join Date: Nov 2013
Location: Long Island, NY - USA
Device: Oasis
|
Off the top of my head, I'd use https://smallpdf.com/ to convert the PDF to Word and then let Calibre do the conversion to whatever eBook format you like. The only consistent problem I've encountered is the conversion adds about one extra space between words every two pages or so (which is easily fixable automagically a number of ways.
|
Advert | |
|
12-25-2016, 11:19 AM | #3 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
I tried https://smallpdf.com/ with 3 pdfs. The first 2 times it answered "This file does not seem to be a PDF." The last time it answered "Sorry your upload failed. Please try again," twice.
P.S. I tried the help page. It flashed, and now I have a migraine. And no, I can't use the web without strobe-blocking and animation-blocking extensions. Last edited by MarjaE; 12-25-2016 at 11:28 AM. |
12-25-2016, 03:03 PM | #4 | |
Enthusiast
Posts: 48
Karma: 854254
Join Date: Nov 2016
Device: none
|
Quote:
I don't think you'll find a magic wand that'll guess your intended goal. What you should be looking at it's far a workflow, a combination of tools. Since you know your material and where you want to get at after trying several things and converting and adjusting a few of the pdfs you'll realize which tools are good for your needs. There no shortage of tools edit and manipulate pdfs so I give you a few off the top of my head: LibreOffice imagemagick gimp different pdf readers and their convertig options. What you have to do is google 'pdf' + "program" + "approximate goal" and you'll get a lot good results. At the begining it might sound like a inconveninet thing to do but after you see the results you'll get the hang of it. Lastly, Safari's PDF viewer got a pretty darn good OCR when highlighting pds. peace |
|
12-25-2016, 05:12 PM | #5 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Okay.
Some of these have their own imperfect text layers. Splitting or compressing the documents often results in losing the text layers. (I use pdf toolkit+) Some of these come from the Internet Archive and have ocr'd text versions. The big problems are that the ocr can screw up tables, can misread figures, and of course, can misread ordinary words. So I've needed either pdf or djvu for comparison. Some don't have text versions. If I can extract the text layer, then spell-checkers could help with the minor errors, the substitution of punctuation for letters, etc., in English-language docs. Not so useful with the major errors. (I would prefer NeoOffice to LibreOffice for this, but neither can find and replace hyphen-breaks or extra line breaks, so I'd probably need Calibre's editing tools too.) If I can find, excerpt, and re-compress the relevant tables, I could perhaps use two versions, one a pdf with the tables, and the other an epub or mobi with the text. (I would keep using pdf toolkit+) |
Advert | |
|
12-25-2016, 06:46 PM | #6 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
|
12-26-2016, 12:36 PM | #7 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Okay, thanks.
I really need some way to extract existing text layers for entire books. I know "it isn't hard" but I don't know how to do it. I would also like some way to speed up proofreading, and extract the pages with tables and insert them at the appropriate points in the text. I don't have Word, but I imported into NeoOffice and got a long ocr'd drawing of one text. I think the ocr was the Internet Archive's text layer, but I don't know for sure. I can see that some figures are off - 4 for 1, 0 for 6, etc. I don't know how to strip off the source images or convert all the small text boxes into proper tables... And with my disabilities, I haven't found an accessible tablet, and I never expect to. |
12-27-2016, 12:38 PM | #8 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
To extract the text use Adobe reader and save as text from the file menu.
|
12-28-2016, 11:23 AM | #9 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
|
12-28-2016, 06:13 PM | #10 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
I don't have Adobe Reader.
I have a nasty strobe sensitivity. Adobe's site has hit me with strobes. I use a number of Firefox accessibility fixes, but they didn't block these strobes. I have Adobe Digital Editions from years ago, but I avoid that site now. Neither Skim nor Preview allow me to save pdfs as text. I never figured out how to install k2pdf. Or exactly what I could do with it. I have Elucidate to create a text layer. |
12-30-2016, 12:11 AM | #11 |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
|
01-07-2017, 04:35 AM | #12 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2016
Device: Kobo Aura One
|
I only have a history of two weeks trying to convert:
I tried calibre, Acrobat Pro with Calibre, Wondershare PDF converter and Aiseesoft Converter. The only software I found to do this reliably was the Aiseesoft PDF Converter and I stopped looking for another solution, as this simply did the trick for what I needed. There is a pdf to epub only version, which ist cheaper. In theory you should be able to set the autoimport folder of calibre as the output folder of Aiseesoft, but unfortunately it doesn't set author and title right - so I export into source folder and then add the format to calibre manually. It's a bit inconvenient, but the result is worth it. Or - you can autoadd it - mark the old and the new book and then use command-shift-M to merge the two book in the first selected one - this requires the least input. Therefor this isn't a solution for bulk conversion, but I only use it when calibre doesn't yield a decent result and then it ist brilliant. I tried to achieve the same with Acrobat Pro, but it was inferior - by far. I reconverted books that came out unusable with everything else I tried and it worked fine. https://www.aiseesoft.de/pdf-to-epub-converter/ PS: For the cover images to show up in ibooks I had to do a epub to epub conversion with output profile table in calibre, but that might not be a conern for you. Hope this helps, Rob. Last edited by helixpteron; 01-07-2017 at 04:47 AM. |
01-08-2017, 11:42 AM | #13 | |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
01-13-2017, 11:52 AM | #14 |
Enthusiast
Posts: 48
Karma: 854254
Join Date: Nov 2016
Device: none
|
hallo,
if the pdf is text based "pdftohtml" gets it right but with css/html monstrosity which can be taken care afterwards. There's a more sophisticated tool which I haven't tried 'pdf2htmlEX'. This one get complex scientific pdf rendering into proper mathml and some other nifty proper html/css/whatever format. If it's just text 'pdftohtml' gets fontsize, bold, italics, quotes etc, correct. As I said earlier, is about trying to see what fits someones particular case. Here calibre faired poorly with a pdf to epub directly. Sigil, as a tool, is quite faster during post pdf editing/clean up. |
01-19-2017, 07:21 AM | #15 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2016
Device: Kobo Aura One
|
I must admit, that I only tried it with PDFs that were also in part comlex, but never with a multi column one. Thanks for the hint with WORD - I will now use Aiseesoft for books and word for papers.
Cheers, Rob. |
Tags |
pdf to epub |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert PDFs into readable EPUBs | skinnymojo | Conversion | 3 | 01-23-2012 03:06 PM |
Whats the best reader for ePubs and PDFs? | BIG45-70 | Which one should I buy? | 3 | 07-28-2010 01:35 PM |
Calibre 0.6.14 with Mac OSX 10.6.1: didn't convert any PDFs | MarcJLH | Calibre | 9 | 10-02-2009 11:35 PM |
RELEASED: Native transcoding of PDFs and epubs on the Kindle2 | jesse | Kindle Developer's Corner | 23 | 05-27-2009 11:19 AM |
Convert print-protected pdfs into image-based pdfs? | magogo | Sony Reader | 3 | 12-04-2007 01:18 AM |