02-11-2017, 12:14 AM | #16 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Thanks.
I had been struggling with a Google Books pdf - I couldn't find a readable non-Google Books pdf of the same work - but Aiseesoft couldn't handle all the tables. I guess I'm still going to need to use Elucidate whenever I have to use Google Books. |
02-22-2017, 07:08 AM | #17 |
Junior Member
Posts: 9
Karma: 10
Join Date: Dec 2016
Device: Kobo Aura One
|
|
Advert | |
|
02-23-2017, 08:53 AM | #18 | |
Fuzzball, the purple cat
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
https://itunes.apple.com/us/app/elucidate/id1066088407 Last edited by willus; 02-23-2017 at 08:56 AM. |
|
04-02-2017, 02:00 AM | #19 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
... Another few tries.
* If you have a good text layer, and you want to extract that layer, most Macos Sierra applications won't handle ligatures and will substitute blank spaces for ff and fi ligatures and probably others. I understand it may not handle superscript either. * If you don't have a good text layer, you will need ocr to create one, before you can extract that layer. Tesseract e.g. Elucidate is good for short passages, but do you want to correct errors across an entire ocred book? Abbyy Finereader might work better. * Sometimes ocr merges columns in 2-column or 3-column view. Sometimes ocr separates columns in tables. The more it avoids one error, the more it's likely to run into the other. Processing before ocr makes text recognition errors more likely, but processing with k2pdfopt might make column recognition errors less likely. I haven't tested this fix. * If the original format isn't important, and if the ligature bug gets fixed, then extracting the text and manually re-inserting pictures and tables may be a workable fix. I haven't gotten this working yet though. * In the case of Internet Archive texts, there're usually epub and/or txt versions as well as the pdf version. |
04-02-2017, 03:04 PM | #20 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
* Fix for the ligature issue: I tried to slice off a couple pages, with a lot of ligatures, as a sample file. ... It fixed the ligatures there. I wouldn't want to overwrite the originals of a lot of my pdfs, but I can keep an extra copy which has been split. Or I can extract the text from that copy.
* Sometimes source pdfs have out-of-order text layers. I have no idea how to fix these. Last edited by MarjaE; 04-02-2017 at 04:08 PM. |
Advert | |
|
04-11-2017, 12:16 PM | #21 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
The most reliable way to convert PDF to reflowable formats is (in my experience, at least) to use a decent OCR program like Abbyy FineReader. You can pick up older versions of this for a moderate price.
|
05-19-2017, 05:51 AM | #22 |
Enthusiast
Posts: 41
Karma: 2621116
Join Date: Jul 2011
Device: iPad
|
Yes, decent OCR program Abbyy FineReader, but it's a little expensive.
I also have tried Prizmo and Cisdem PDFConverterOCR, I found if only want to convert scanned PDF into Epub(or other editable document) PDFConverterOCR is a good choice, it is available on stacksocial deal. Prizmo do good jobs on extract text and do editing from the PDF file. |
05-20-2017, 06:42 AM | #23 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
|
Tags |
pdf to epub |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert PDFs into readable EPUBs | skinnymojo | Conversion | 3 | 01-23-2012 03:06 PM |
Whats the best reader for ePubs and PDFs? | BIG45-70 | Which one should I buy? | 3 | 07-28-2010 01:35 PM |
Calibre 0.6.14 with Mac OSX 10.6.1: didn't convert any PDFs | MarcJLH | Calibre | 9 | 10-02-2009 11:35 PM |
RELEASED: Native transcoding of PDFs and epubs on the Kindle2 | jesse | Kindle Developer's Corner | 23 | 05-27-2009 11:19 AM |
Convert print-protected pdfs into image-based pdfs? | magogo | Sony Reader | 3 | 12-04-2007 01:18 AM |