![]() |
#1 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
|
What pdf format can be converted to epub?
I scanned a paperback book (after getting permission from the author) into a searchable pdf. I've attached one page of the pdf here. It looks like an image but the text is searchable.
I assumed this was all is needed to convert the pdf into a reflowable epub. I did the conversion but the epub output looked exactly the same as the pdf! I've attached the epub here as well. How do I convert such a searchable pdf into an editable epub with xhtml files and images? |
![]() |
![]() |
![]() |
#2 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,717
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Word has been my default tool for converting 'simple' PDFs for a while. If there are a lot of tables and images with wrap around text etc (e.g. coffee table cook books) it's not so good. If it barfs because the pdf is too big, grab one of the free PDF split utility tools, and chop into two or more chunks on chapter boundaries - I use one called PDFSam. If you don't have access to a recent version of Word try LO Writer to do the convert to DOCX. BR |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
|
Quote:
Thanks! I read here that people seemed to have been able to convert directly from pdf. So I thought it was some settings in Calibre that I had missed. I do have MS Word and am able to convert the pdf to docx although there was a lot of editing needed to get it right. Now I know how it’s done. |
|
![]() |
![]() |
![]() |
#4 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
PDF is a LAYOUT/paste-up format originally to allow users to print the same as every other user.
HOW it was made, affects the quality of conversion. PDF is not a Linear file like an EPUB (every item/entry in order of use. Start to finish). What it contains (pictures or charts...), affects the quality of conversion. So back to your Q. An EPUB created FROM HTML has a better chance of converting back because the source was linear (and probably has no ligatures) |
![]() |
![]() |
![]() |
#5 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
I have run across a number of pdfs that Calibre would not convert, even though they were searchable, that is, contained some sort of text. Calibre uses pdftohtml to extract the text. In the case of the ones I've found, using pdftohtml from the CL failed, but using pedtotext worked. I guess Word can find some text Calibre can't.
A pdf can contain just about anything. As theducks said, it depends on how it was made. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
|
I’ll need to read up and learn more about these.
I paid someone on Fiverr to convert this pdf to epub for me. When he sent the epub over, I was able to see the xhtml files, the css style sheet, and all the images jpgs when I load it into Calibre’s ebook editor. I learned html 2 decades ago but can still remember some of it, so I was able to fine tune the epub. I then asked him how he managed to convert the pdf to epub and he told me that he first converted the pdf to Word and then extracted the images and converted the word document to xhtml in Calibre. So he used BR’s method. |
![]() |
![]() |
![]() |
#7 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,717
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@KMalsi - there are a couple of Addins for Word that can help tidy up PDF artefacts:
MobileRead: Toxaris's eBook Tools MS Word add-in. I also use the Translator Tools add-in, it has features which are not translator specific, it's not free. BR |
![]() |
![]() |
![]() |
#8 | |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to EPUB job deletes converted file | SuperGraham | Conversion | 2 | 03-26-2018 06:14 AM |
Improving PDF output from converted ePub | Hawkeye1969 | Conversion | 2 | 04-24-2017 10:48 PM |
How I converted an epub dictionary to mobi format | Mindtrap | Workshop | 2 | 07-06-2013 03:33 PM |
pdf -> epub, only 2/108 pages converted | justapuppy | Conversion | 6 | 07-22-2011 01:04 PM |
PDF to EPUB: Converted document looks nothing like how it's supposed to look. Help | CameraTester | Conversion | 2 | 07-19-2011 01:46 AM |