Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-06-2021, 01:55 AM   #1
KMalsi
Junior Member
KMalsi began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
What pdf format can be converted to epub?

I scanned a paperback book (after getting permission from the author) into a searchable pdf. I've attached one page of the pdf here. It looks like an image but the text is searchable.

I assumed this was all is needed to convert the pdf into a reflowable epub. I did the conversion but the epub output looked exactly the same as the pdf! I've attached the epub here as well.

How do I convert such a searchable pdf into an editable epub with xhtml files and images?
Attached Files
File Type: pdf test.pdf (268.1 KB, 120 views)
File Type: epub test - Unknown.epub (224.2 KB, 105 views)
KMalsi is offline   Reply With Quote
Old 10-06-2021, 06:29 AM   #2
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,717
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by KMalsi View Post
I scanned a paperback book (after getting permission from the author) into a searchable pdf. I've attached one page of the pdf here. It looks like an image but the text is searchable.

I assumed this was all is needed to convert the pdf into a reflowable epub. I did the conversion but the epub output looked exactly the same as the pdf! I've attached the epub here as well.

How do I convert such a searchable pdf into an editable epub with xhtml files and images?
I was able to open the PDF with current Word, save it as DOCX, and convert it to EPUB with calibre, the attached ZIP has the DOCX and EPUB.

Word has been my default tool for converting 'simple' PDFs for a while. If there are a lot of tables and images with wrap around text etc (e.g. coffee table cook books) it's not so good. If it barfs because the pdf is too big, grab one of the free PDF split utility tools, and chop into two or more chunks on chapter boundaries - I use one called PDFSam.

If you don't have access to a recent version of Word try LO Writer to do the convert to DOCX.

BR
Attached Files
File Type: zip test (781).zip (49.0 KB, 122 views)
BetterRed is offline   Reply With Quote
Advert
Old 10-06-2021, 06:40 AM   #3
KMalsi
Junior Member
KMalsi began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
Quote:
Originally Posted by BetterRed View Post
I was able to open the PDF with current Word, save it as DOCX, and convert it to EPUB with calibre, the attached ZIP has the DOCX and EPUB.

Word has been my default tool for converting 'simple' PDFs for a while. If there are a lot of tables and images with wrap around text etc (e.g. coffee table cook books) it's not so good. If it barfs because the pdf is too big, grab one of the free PDF split utility tools, and chop into two or more chunks on chapter boundaries - I use one called PDFSam.

If you don't have access to a recent version of Word try LO Writer to do the convert to DOCX.

BR

Thanks! I read here that people seemed to have been able to convert directly from pdf. So I thought it was some settings in Calibre that I had missed. I do have MS Word and am able to convert the pdf to docx although there was a lot of editing needed to get it right. Now I know how it’s done.
KMalsi is offline   Reply With Quote
Old 10-06-2021, 09:57 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
PDF is a LAYOUT/paste-up format originally to allow users to print the same as every other user.

HOW it was made, affects the quality of conversion. PDF is not a Linear file like an EPUB (every item/entry in order of use. Start to finish).
What it contains (pictures or charts...), affects the quality of conversion.

So back to your Q. An EPUB created FROM HTML has a better chance of converting back because the source was linear (and probably has no ligatures)
theducks is online now   Reply With Quote
Old 10-06-2021, 10:14 AM   #5
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
I have run across a number of pdfs that Calibre would not convert, even though they were searchable, that is, contained some sort of text. Calibre uses pdftohtml to extract the text. In the case of the ones I've found, using pdftohtml from the CL failed, but using pedtotext worked. I guess Word can find some text Calibre can't.

A pdf can contain just about anything. As theducks said, it depends on how it was made.
retiredbiker is offline   Reply With Quote
Advert
Old 10-06-2021, 10:41 AM   #6
KMalsi
Junior Member
KMalsi began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
I’ll need to read up and learn more about these.

I paid someone on Fiverr to convert this pdf to epub for me. When he sent the epub over, I was able to see the xhtml files, the css style sheet, and all the images jpgs when I load it into Calibre’s ebook editor. I learned html 2 decades ago but can still remember some of it, so I was able to fine tune the epub.

I then asked him how he managed to convert the pdf to epub and he told me that he first converted the pdf to Word and then extracted the images and converted the word document to xhtml in Calibre. So he used BR’s method.
KMalsi is offline   Reply With Quote
Old 10-06-2021, 03:36 PM   #7
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,717
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@KMalsi - there are a couple of Addins for Word that can help tidy up PDF artefacts:

MobileRead: Toxaris's eBook Tools MS Word add-in.

I also use the Translator Tools add-in, it has features which are not translator specific, it's not free.

BR
BetterRed is offline   Reply With Quote
Old 10-06-2021, 07:14 PM   #8
KMalsi
Junior Member
KMalsi began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2021
Device: iPad Pro
Quote:
Originally Posted by BetterRed View Post
@KMalsi - there are a couple of Addins for Word that can help tidy up PDF artefacts:

MobileRead: Toxaris's eBook Tools MS Word add-in.

I also use the Translator Tools add-in, it has features which are not translator specific, it's not free.

BR
Thanks BR!
KMalsi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to EPUB job deletes converted file SuperGraham Conversion 2 03-26-2018 06:14 AM
Improving PDF output from converted ePub Hawkeye1969 Conversion 2 04-24-2017 10:48 PM
How I converted an epub dictionary to mobi format Mindtrap Workshop 2 07-06-2013 03:33 PM
pdf -> epub, only 2/108 pages converted justapuppy Conversion 6 07-22-2011 01:04 PM
PDF to EPUB: Converted document looks nothing like how it's supposed to look. Help CameraTester Conversion 2 07-19-2011 01:46 AM


All times are GMT -4. The time now is 08:34 PM.


MobileRead.com is a privately owned, operated and funded community.