View Full Version : converting pdf to epub


Gagan
11-18-2008, 08:01 AM
I want to convert pdf files to epub format. I tried calibre but the epub generated is not good.
If anybody is using any other tool for the same conversion.Please let me know as well.

JSWolf
11-18-2008, 08:47 AM
I want to convert pdf files to epub format. I tried calibre but the epub generated is not good.
If anybody is using any other tool for the same conversion.Please let me know as well.
First convert from the PDF into HTML and make sure you've fixed all of the errors in the HTML due to converting from PDF. I've never yet seen a PDF converter that does it 100% error free. So it could be the ePub is reflecting the errors in the conversion process from PDF.

Hadrien
11-18-2008, 09:26 AM
PDF is the worst source format that you can use. Don't expect any good result with automatic conversions from PDF to ePub: you'll always have to fix a lot of things manually.

wallcraft
11-18-2008, 10:57 AM
If anybody is using any other tool for the same conversion The only other ebook-centric converter I am aware of (which does not convert to images) is Windows MobiPocket Reader or Creator, which will convert from PDF to MOBI. Calibre can then convert the MOBI to ePub. This may be no better than Calibre's native converter, but it might be worth trying. Note that the source code for the underlying pdf2xml is available, see Mobipocket convert in mass? (http://www.mobileread.com/forums/showpost.php?p=260477&postcount=14).

Gagan
11-19-2008, 01:25 AM
I have tried both the conversions pdf>html>epub and also pdf>mobi>epub.
But both have some problems and html files have to be manually modified to get the required epub.
Where can I find the source code for pdf2xml ?
Thanx for ur time..

wallcraft
11-19-2008, 02:23 AM
Where can I find the source code for pdf2xml? It is at pdf2xml homepage (http://www.mobipocket.com/dev/pdf2xml/).

alhaqpk
05-03-2010, 12:30 PM
Try this website: http://epub2go.com, it's free ;-)

alecE
05-04-2010, 04:42 PM
FWIW I've found the only way to get a good conversion from PDF is the labour-intensive one: export the PDF as text, load the .txt into your favourite text editor, use extensive search & replace to eliminate hard line endings etc (or perhaps less search & replace if you have any competency with regex, which I don't), convert quotes to “ etc., and finally turn it into a decent epub with Sigil. Loading into Calibre is then of course a trivial exercise.

mr ploppy
05-04-2010, 05:13 PM
I've tried most of these methods, but the best so far is to open the PDF in an OCR program and generate a new text file from that. Then manually delete any headers and footers, and fix any broken paragraphs. I haven't seen any common OCR problems yet, presumably because the text in a PDF will be perfectly straight and without any scanner or paper noise.

Toxaris
05-05-2010, 05:03 PM
I agree, I usually use Abby Finereader to OCR the PDF. Still a lot of manual afterwork, but the results are quite good.

Ankh
05-05-2010, 07:44 PM
I haven't seen any common OCR problems yet, presumably because the text in a PDF will be perfectly straight and without any scanner or paper noise.

No OCR artifacts? None?

That sounds like your OCR program is "cheating" and using PDF tags, when and if they are available.

eBookLuke
05-06-2010, 11:35 AM
Try this website: http://epub2go.com, it's free ;-)

This website, like as many others, uses Calibre to make the conversions. If you don't like Calibre, will not like even ebup2go.

Luke

Bluesteel
05-30-2010, 07:46 PM
http://ipadhelp.com/ipad-help/covert-ebooks-to-the-ipad/

Simple as 123.

frabjous
05-30-2010, 08:15 PM
http://ipadhelp.com/ipad-help/covert-ebooks-to-the-ipad/

Simple as 123.

Correct me if I'm wrong, but that site uses calibre for its backend as well, no?

eBookLuke
05-31-2010, 02:13 AM
http://ipadhelp.com/ipad-help/covert-ebooks-to-the-ipad/

Simple as 123.

It uses Calibre's engine. Try a conversion, open the ePub ancd check it.

I release a new version of writer2epub, it corrects many issues unde Windows OS.
Look in my signature to download it.

Luke

superstitious
06-10-2010, 02:55 AM
I use Calibre to do my conversion. It is not perfect epub rendition but it is pretty close to me. I basically didn't really change anything at all but one thing. In the " line up-wrapping factor" I changed it to 0.45 instead of the default 0.5 I found that advice here somewhere but I can't locate the thread now. It's been so long. The suggestion said to try anywhere from 0.35, 0.45, and 0.55. Mine PDFs looked best with 0.45 IMO. Hope this helps.

Canorka
11-20-2010, 02:47 AM
I use Calibre to do my conversion. It is not perfect epub rendition but it is pretty close to me. I basically didn't really change anything at all but one thing. In the " line up-wrapping factor" I changed it to 0.45 instead of the default 0.5 I found that advice here somewhere but I can't locate the thread now. It's been so long. The suggestion said to try anywhere from 0.35, 0.45, and 0.55. Mine PDFs looked best with 0.45 IMO. Hope this helps.

Awesome tip, this works wonders. Some nagging paragraph breaks to deal with but totally manageable stuff. Thanks!

freddystewert
10-25-2011, 07:10 PM
while Calibre and other tools like it are grate for free convertions... they just will not have the best quality, so if u are profectonist then u are ether going to have to learn a lot of crappy boring programing stuff, or u could just use a conversion company i like to use pdf to ebook converter (http://www.convertabook.com/)

Toxaris
10-26-2011, 02:12 AM
Nobody can do perfect automatic conversion of PDF to ePUB. There are some tools which are good, but they all require afterwork.
A lot of those 'conversion companies' use the same tools as we. The quality of those companies is very dependent on the afterwork. Since the afterwork takes time, this will be reflected in the price.

Wiggle72
10-26-2011, 03:49 PM
Hello there, this is my first post. I have just purchased a cybook opus, and I use calibre to convert from pdf to epub, I was wondering, should I do this, or should I just take the pdf files and put them in my cybook, this is for storys, star wars etc. I know I could just try but I was wondering if anyone else had any views. Also when I convert from pdf to epub, I get an opf file, should I just transfer the epub file and forget about transfering the opf file. If I just transfer the epub file without opf, what would happen, thanks.

JSWolf
10-27-2011, 09:58 PM
You should treat the PDF like they don't exist. Ignore them or delete them. Getting them to look good on a reader with a 6" screen isn't worth your time and hassle.

frabjous
10-27-2011, 10:14 PM
Nonsense. There are plenty of ways of try to make them look good, but I wouldn't convert to ePub. Try tools like BRISS, and soPDF to divide the pages into manageable chunks. I've taken scans of triple-column textbooks and made them readable on a 6" screen.

cakefordinner
10-29-2011, 03:12 PM
Personally, I've given up on converting PDF to epub. "It's like a box a choclits. You nevah know whatcher gonna git." Forest Gump

It's a total crapshoot if your epub conversion is going to actually have content on the pages or be completely blank.

I did one (pdf > delete all 30+ blank pages and duplicate covers/title pages/split image pages > text > spent hours fixing all the problems (mainly spaces between letters that don't belong) > add to calibre > add book cover and all the metadata > convert.

Once was enough. I have an iPad and iPhone so I just let all the PDFs (all 2 of them, so far) go to their separate shelf. Haven't gotten up the energy to add any of my PDF cookbooks. One thing's for sure; I'm not going to bother trying to convert any of them. Even asking St. Jude to help (patron saint of hopeless cases) didn't work. LOL

gsp
11-03-2011, 09:17 AM
Hi,

When converting from pdf to EPUB using calibre the columns are not converted correctly.


In my pdf I am having 3 column format. But after conversion it showing as a single column (3 columns - as one by one).


Please advise.

Thanks

DiapDealer
11-03-2011, 09:44 AM
Hi,

When converting from pdf to EPUB using calibre the columns are not converted correctly.


In my pdf I am having 3 column format. But after conversion it showing as a single column (3 columns - as one by one).


Please advise.

Thanks
If keeping the 3 column format is important, then leave it as a PDF. You're not going to be able to convert to any format and retain a multi-column style.

gsp
11-04-2011, 01:01 AM
We need to convert it into EPUB format to read from kindle device.

gsp
11-04-2011, 01:04 AM
We need to convert it into EPUB format to read from kindle device. This is our client's requirement

Is there any way to edit EPUB file after conversion.

We need to convert PDF into EPUB and MOBI.

Please advise on any other way to acheive this task

DiapDealer
11-04-2011, 09:51 AM
We need to convert it into EPUB format to read from kindle device. This is our client's requirement
You can convert to epub/mobi but you're going to have to ditch the 3-column format. It's that simple. And there's still going to be a lot of manual cleanup involved with whatever conversion tool you use.

Is there any way to edit EPUB file after conversion.
Sigil (http://code.google.com/p/sigil/)

SusanM
04-01-2013, 11:32 PM
Thanks,
I was wondering about Abby Fine Reader. What would you say is the accuracy? I have Acrobat X1 which does an okay job on OCR,

S

Turtle91
04-02-2013, 01:53 AM
Abbyy has improved a bit ...

As mentioned here I would use it to create HTML then do the afterwork to clean it up. Then use sigil to make an ePub.

Another way is to use Abby to make an RTF, then use Toxaris' word cleanup macro to make a clean HTML. Then sigil to make ePub.

SusanM
04-06-2013, 12:46 PM
Abbyy has improved a bit ...

As mentioned here I would use it to create HTML then do the afterwork to clean it up. Then use sigil to make an ePub.

Another way is to use Abby to make an RTF, then use Toxaris' word cleanup macro to make a clean HTML. Then sigil to make ePub.


Thanks! I contacted ABBY about the difference between the Abby Finereader Professional and Abby PDF Transformer (both export to HTML but with a big price difference) asking if there is a difference in the quality of OCR or only a difference in the options. No reply. Are you using the Finereader or the PDF Transformer? I am testing the trail version of PDF Transformer and the HTML is much cleaner than with Acrobat X1.

S

Turtle91
04-06-2013, 01:38 PM
I use the fine reader pro. For me it was worth the money...but I have lots of things to scan/convert.

SusanM
04-12-2013, 10:38 AM
Thanks, Turtle.

As I mentioned, I emailed the company a few weeks ago and asked about the difference in the apps, but have not received a response. User experience reports are always much more reliable anyway :).

mandavkarswap
04-16-2013, 04:23 AM
Try I'mTOO PDF to Epub converter....best I've ever know...you don't need to adjust anything...just drag and drop file and convert..enjoy...however this one does not convert images...so you will have cross mark(x) at the location of the image..how ever if you can adjust without images you can use this..
I'm still trying to find proper pdf images to epub images converter...I'll let you know as soon as i get one..;)
NOTE : I'm using Albite Reader for reading E-books on my Nokia X2-00.
It is FREEWARE..!!
you can get it here--->
www.albite.org/reader
and google for I'mTOO..I downloded it from torrent around 1.5 year ago...;)
HF..:D xD

pdurrant
04-16-2013, 07:24 AM
ImToo free demo available here: http://www.imtoo.com/pdf-to-epub-converter.html

orange!
06-18-2013, 09:29 PM
ImToo free demo available here: http://www.imtoo.com/pdf-to-epub-converter.html

This is a timely topic for me because I have volunteered to help one of my favorite authors convert some of his back list to ePub (and MOBI). He had a company scan the books and has given me the resulting PDF and Word files. It's my job to figure out how to get ePub files from them.

My question is--has anyone actually used this imtoo gizmo? And if so, does it do a good job?

Thanks in advance for any information or suggestions.

DaleDe
06-18-2013, 09:47 PM
This is a timely topic for me because I have volunteered to help one of my favorite authors convert some of his back list to ePub (and MOBI). He had a company scan the books and has given me the resulting PDF and Word files. It's my job to figure out how to get ePub files from them.

My question is--has anyone actually used this imtoo gizmo? And if so, does it do a good job?

Thanks in advance for any information or suggestions.

Do not use the PDF if you have a Word file. The word file will produce better results and faster. If you are a beginner I would suggest downloading Atlantis Word Processor. They have a free trial and you can have an ePub within 5 minutes and a Mobi also using your word file as a source.

If you are HTML savy you can turn the word file into an html and then use Sigil to convert it to ePub. You are talking hours of effort. But if Atlantis, for some reason, doesn't produce exactly the epub you want you can edit the ePub using a Sigil download. (Sigil has its own forum here on Mobile Read) Atlantis has it own thread here.

Dale

Toxaris
06-19-2013, 03:28 AM
If the source of the Word document is the PDF, then be prepared to face many issues with the Word document. This is independent of the method used to convert the PDF to Word (I prefer OCR, but that is me).
These errors should be corrected and that can be a lot of work. My Word add-in can help in that, although it will not catch 100% of the errors.

orange!
06-19-2013, 10:47 AM
Do not use the PDF if you have a Word file. ...Dale

...My Word add-in can help in that, although it will not catch 100% of the errors.

Thank you for your suggestions! I'll try the Atlantis route first. As a bonus, I see that they have a tutorial (http://www.atlantiswordprocessor.com/en/videos/) on their website. :)

JSWolf
06-19-2013, 11:05 AM
One solution is to run the word document through Book Designer. That seems to clean up a lot of the mess. Then you save to HTML and run that through Sigil.

Toxaris
06-19-2013, 12:40 PM
Thank you for your suggestions! I'll try the Atlantis route first. As a bonus, I see that they have a tutorial (http://www.atlantiswordprocessor.com/en/videos/) on their website. :)

You're welcome. Be aware that Atlantis will not find and correct OCR errors for you.

orange!
06-19-2013, 01:15 PM
One solution is to run the word document through Book Designer. That seems to clean up a lot of the mess. Then you save to HTML and run that through Sigil.

Thanks for the suggestion. May try this route if other suggestions don't work out. I'm familiar with HTML in general, but not in terms of specifically what works for ePUB. I'm not familiar with Book Designer or Sigil (yet).

orange!
06-20-2013, 11:44 PM
You're welcome. Be aware that Atlantis will not find and correct OCR errors for you.

Toxaris,

I installed your Word macro and used it as the 1st step in my workflow. It seemed to do a great job!

I hand-corrected additional errors, most of which were paragraph errors--usually there wasn't a paragraph where there should have been one and occasionally it was the other way around. Paragraphs that end with a quoted sentence seemed to be especially problematic.

Now I'm not quite clear what to do next. I see that your Macro can generate HTML and/or generate ePUB.

I need to eventually get both ePUB and MOBI files. That was why I was considering Atlantis.

Any thoughts you have about what I should next would be welcome.

Thanks for making your awesome Word tools!

Toxaris
06-21-2013, 02:42 AM
If you generate the ePUB, you can make the final touch up in Sigil. After all, the ePUB is not ready for publication (as stated in the manual). You can then finalize it. The ePUB can be used for generating the mobi file.

Also, don't forget you can add S&R rules to the corresponding document. If you do, the next time the S&R action will be done and you don't have to do it manually anymore.

ittiandro
10-06-2014, 11:05 PM
Hi!
Just joining the conversation on this subject, with a problem.
After converting a seemingly PDF Scanned document through k2PDF OPt ( which works very well on my tablet with EZPdf Reader) I have tried to convert it to Epub with Wondershare PDF Editor using the OCR feature, but when I open it in Calibre the document is almost unlegible because the text background is a kind of spotty gray/black background. Also some of the diagrams in the original text are distorted into meaningless symbols. Unlike the native EPUB format, this EPUB conversion does not allow any modification of the font or other. It is a fixed page which looks pretty much like an image.
Can anybody tell me what is wrong and how to achieve full setting control?


Thanks

Ittiandro

pdurrant
10-07-2014, 05:18 AM
You are unlikely to get much success with an automatic conversion of a scanned PDF. You will have to do lots of manual clean up/conversion yourself.

ittiandro
10-07-2014, 07:00 PM
You are unlikely to get much success with an automatic conversion of a scanned PDF. You will have to do lots of manual clean up/conversion yourself.

Yes, but could you tell me, if you know, in what consists the " lot of manual clean up/conversion" I am to do manually vs the automatic conversion? Would it be better to convert to Word .doc ( or docx) format and then reconvert to EPUB? This is what I often hear, but I am not too clear.
Or perhaps you can refer me to more precise instructions somewhere else on the Web?I couldn't find any.
Bottom line, I want to have a fully controllable page layout after converting to EPUB.

Thanks

Ittiandro

mrmikel
10-07-2014, 09:30 PM
You are looking for what is pretty much impossible, fixed layout in an reflowable format. Some machines allow attempts at it.

When the reader presses the increase size button, the formatting will go out the window.

PDF is a format that is fixed and will work if laid out for the appropriate size of your device's screen. But it is not a format which is accepted by the major publishing houses for ebooks.

Any conversion from PDF as a Optical Character Recognition will have a least an error per page, often many more. Converting PDF to Word doesn't change that. Working from an original Word document which has not been OCR'ed will be much better and is supported by Toxaris's add in as well as Atlantis Word Processor. Even so, if you want fixed layout in epubs you are padding with a spoon upstream.

JSWolf
10-08-2014, 03:12 AM
Print the PDF, with the pages by your computer keyboard, start typing. You will have to copy the graphics though.

Good luck!

pdurrant
10-08-2014, 04:53 AM
Yes, but could you tell me, if you know, in what consists the " lot of manual clean up/conversion" I am to do manually vs the automatic conversion? Would it be better to convert to Word .doc ( or docx) format and then reconvert to EPUB? This is what I often hear, but I am not too clear.
Or perhaps you can refer me to more precise instructions somewhere else on the Web?I couldn't find any.
Bottom line, I want to have a fully controllable page layout after converting to EPUB.

You describe a poor quality scan of a book wrapped as a PDF. To get a reflowable ePub you'll need to do OCR on the scanned pages, and also extract the images from the scans, and then format the extracted text, fixing all the OCR errors and inserting the (cleaned up) images in the text at the right points.

I don't know the best software for doing the OCR. Extract the pages with graphics from the PDF with (say) Adobe Reader as actual image files (e.g. .PNG or .TIFF) and clean them up in your favorite image editing programme (e.g. Photoshop). I'd recommend Sigil for creating/editing the ePub

mrmikel
10-08-2014, 08:46 AM
Right on, pdurrant. It is all tricks and no treat, except days or weeks later when done.

ittiandro
10-10-2014, 01:58 PM
First convert from the PDF into HTML and make sure you've fixed all of the errors in the HTML due to converting from PDF. I've never yet seen a PDF converter that does it 100% error free. So it could be the ePub is reflecting the errors in the conversion process from PDF.

I have a PDF physics book which I want to convert to EPUB for reading with my Android tablet. I am trying to convert with the ABBYY Fine Reader but I am having great problems in rendering non text areas in the EPUB conversion. such as diagrams, tables and sketches . Actually they are not rendered at all!
I almost ready to give up! Before that, I might take a last shot with HTML conversion but I don't know what to after it , because my aim is to get an EPUB file. How does an HTML conversion facilitate the EPUB conversion?

Thanks

Ittiandro

JSWolf
10-10-2014, 02:06 PM
I have a PDF physics book which I want to convert to EPUB for reading with my Android tablet. I am trying to convert with the ABBYY Fine Reader but I am having great problems in rendering non text areas in the EPUB conversion. such as diagrams, tables and sketches . Actually they are not rendered at all!
I almost ready to give up! Before that, I might take a last shot with HTML conversion but I don't know what to after it , because my aim is to get an EPUB file. How does an HTML conversion facilitate the EPUB conversion?

Thanks

Ittiandro

Since you do have a tablet, it would be much easier to just use the PDF as a PDF with your tablet. It would be much more hassle to convert then it is worth. Also, if you have any errors in anything important like formulas, you could be screwed. So just keep the PDF as PDF, find a good program to use to view the PDF and you are all set.

ittiandro
10-10-2014, 02:29 PM
Since you do have a tablet, it would be much easier to just use the PDF as a PDF with your tablet. It would be much more hassle to convert then it is worth. Also, if you have any errors in anything important like formulas, you could be screwed. So just keep the PDF as PDF, find a good program to use to view the PDF and you are all set.

Yes, I might have no choice but keeping my current PDF format for reading in the tablet with EzPdfReader, because even a software like ABBYY Fine Reader which is very sophisticated, does not display non-text areas in the EPUB cponversion, such as diagrams, sketches or other images. I am sure there must be a way, but it is very time-consuming to do it.
The reason why I wanted to use EPUB instead of my current PDF format is that PDF has a fixed white bright page background which puts a strain on my eyes, whereas EPUB Readers such Cool Reader ( my favorite) and FBReader allow to change the page background and have a wider array of settings. Reading in night mode is an option I do not particularly like.

Thanks

Ittiandro

Hitch
10-11-2014, 07:55 PM
Yes, I might have no choice but keeping my current PDF format for reading in the tablet with EzPdfReader, because even a software like ABBYY Fine Reader which is very sophisticated, does not display non-text areas in the EPUB cponversion, such as diagrams, sketches or other images. I am sure there must be a way, but it is very time-consuming to do it.
The reason why I wanted to use EPUB instead of my current PDF format is that PDF has a fixed white bright page background which puts a strain on my eyes, whereas EPUB Readers such Cool Reader ( my favorite) and FBReader allow to change the page background and have a wider array of settings. Reading in night mode is an option I do not particularly like.

Thanks

Ittiandro

Ittiandro:

Did you try outputting Abbyy to WORD, instead of ePUB? That will retain images and graphics.

Hitch

mrmikel
10-11-2014, 09:26 PM
Save To HTML does preserve images and tables also.

BUT you need to go through each page to see how it has analyzed in order to make sure you get complete pictures. It goes a little overboard if it finds text in a graphic. It also doesn't do so well if you have an image that seems to fade out too fast for its liking. But you can tell it where the image boundaries are in these cases and it will pick up the whole thing.

You can also do the same for tables, when it has mistaken them.

Then Read (recognize) and the output is much better.

Hitch
10-12-2014, 04:22 PM
Save To HTML does preserve images and tables also.

BUT you need to go through each page to see how it has analyzed in order to make sure you get complete pictures. It goes a little overboard if it finds text in a graphic. It also doesn't do so well if you have an image that seems to fade out too fast for its liking. But you can tell it where the image boundaries are in these cases and it will pick up the whole thing.

You can also do the same for tables, when it has mistaken them.

Then Read (recognize) and the output is much better.

Absolutely. I think the issue/problem here is an expectation of going direct to ePUB, and skipping that step. Direct to ePUB shan't keep the images/figures, etc., as we all know (painfully too well). I think that Tex (Texanns002) has a fairly great lengthy post somewhere around here (hell, is it in this very thread?) about how to competently scan. Wherever it is, it's worth reading, both for newbs and even those of us with a few under our belts.

Hitch