Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-16-2010, 11:54 PM   #16
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Quote:
Originally Posted by Bluesman7 View Post
I think my point is that none of the methods I've tried have yielded consistent results.
Something a computer class taught me...

GIGO: An acronym standing for "Garbage In, Garbage Out". It is used as a snide comment implying that the starting data the computer was given to work with was not appropriate, just plain wrong, or entered by a moron. Regardless their was no way in Blazes it was going to produce anything like a correct answer.

Here, it applies to poorly constructed, but completely valid, ebooks. We've all seen it's cousin, the poorly constructed web page, that can be viewed in one browser perfectly. But in any other browser the page is a complete mess.
Sabardeyn is offline   Reply With Quote
Old 05-17-2010, 10:26 PM   #17
vietchovui
Zealot
vietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enough
 
Posts: 109
Karma: 556
Join Date: Nov 2009
Location: SaiGon VietNam
Device: PRS T1, Kobo Forma 8G, Kobo Libra H2O
Quote:
Originally Posted by greenapple View Post
I've tested many converters (freeware as well as trial versions). Acrobat does the best job in my opinion in converting pdf to html/rtf. But the price is way too high. Someone please tell Adobe to make a cheaper version for home users.
Try Solid Converter. It's cheaper than Adobe but still does really nice work!
vietchovui is offline   Reply With Quote
Advert
Old 05-19-2010, 08:56 PM   #18
Pranananda
Connoisseur
Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.Pranananda is often consulted by the I Ching.
 
Pranananda's Avatar
 
Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
easy rider,

If you are handy with the command line, you might try my pdfreflow utility that is described in the PDF forum. It will take the XML output of pdftohtml -xml, reflow to HTML, which you can use in Calibre as input for epub.
Pranananda is offline   Reply With Quote
Old 08-04-2010, 11:08 PM   #19
kiwikobo
Enthusiast
kiwikobo doesn't litterkiwikobo doesn't litter
 
Posts: 47
Karma: 120
Join Date: Jun 2010
Device: Kobo
Quote:
Originally Posted by easyrider View Post
I found today a good way to convert PDF to ePUB
It's in 3 steps:

1. mobipocket creator: PDF -> html
2. mobipocket creator: html -> PRC
3. Calibre PRC -> ePub
You, sir, are a genius. This works beautifully and avoids all the mucking about in Sigil fixing stuff that Calibre puts in. Not only that, but it creates dramatically smaller epubs. For newbie Mobipocket users, just click on "import PDF", choose the file, click Import, then click on Build. Add the book with calibre, and then do a mass convert to epub. With most of my scummy PDFs all I needed to do then was tidy up the title and everything was lovely! You've saved me hours...
kiwikobo is offline   Reply With Quote
Old 08-06-2010, 04:14 AM   #20
vastav
Member
vastav began at the beginning.
 
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
I would like to gently point out another option that I have been involved with for converting PDFs to ePub. The primary solution comes as an Acrobat plugin which uses similar conversion algo as PDF to RTF/ HTML options.

The solution also has a free web based option available at http://www.pdf2epub.com/trial . While the software is still a work in progress, it is simple to use and most of the font level formatting options are retained in the converted ePub. I would love to hear feedback from the community.
vastav is offline   Reply With Quote
Advert
Old 08-08-2010, 12:00 PM   #21
anthony_barker
Member
anthony_barker began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jun 2010
Device: PB360, nokia n900
This is a very useful series of posts

So it looks like the _best_ options are for PDFs

1) Mobipocket/Calibre
PDF->html->PRC->epub
https://wiki.mobileread.com/wiki/MobiPocket_Creator
https://wiki.mobileread.com/wiki/Calibre

Does this work better than simply opening the HTML directly in Sigil?
and/Or cleaning the html prior to Sigil?

2) Acrobat Pro/Calibre
PDF -> RTF -> epub

3) If OCR required
ABBYY -> html -> Sigil?
Or is ABBYY -> RTF-> Calibre better?

Is epub the best end format? I've been reading stuff in html on my pocketbook 360 as I find using the browser better on my cellphone (nokia n900) than fbreader

Also it seems the best bet for text book style books is just to leave them in pdf, crop them or OCR them (follow #3)?

Also does any of these methods handle math symbols?
anthony_barker is offline   Reply With Quote
Old 08-08-2010, 09:49 PM   #22
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by anthony_barker View Post
This is a very useful series of posts

So it looks like the _best_ options are for PDFs

1) Mobipocket/Calibre
PDF->html->PRC->epub
https://wiki.mobileread.com/wiki/MobiPocket_Creator
https://wiki.mobileread.com/wiki/Calibre

Does this work better than simply opening the HTML directly in Sigil?
and/Or cleaning the html prior to Sigil?
Personally when I have to go outside Calibre, I usually try Mobipocket Creator for this part:
PDF->html

And then back to Calibre for this part:
html->epub

Then I use Sigil for any fine tuning.

I don't think the PRC step helps since the first thing calibre does during conversion is change the file back to html.

My main goal is avoiding PDF as a source document whenever possible.

Last edited by DoctorOhh; 08-09-2010 at 05:03 AM.
DoctorOhh is offline   Reply With Quote
Old 08-09-2010, 03:43 AM   #23
vastav
Member
vastav began at the beginning.
 
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
Quote:
Originally Posted by anthony_barker View Post
This is a very useful series of posts

2) Acrobat Pro/Calibre
PDF -> RTF -> epub
If you are looking to do option #2, pdf2epub.com solution is a single step version of that (and better since you avoid the intermediate lossy conversion to RTF). The solution uses tags in PDF to drive the conversion process using similar flows as those used by RTF and HTML converters built into Acrobat.
vastav is offline   Reply With Quote
Old 08-09-2010, 11:34 AM   #24
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Quote:
Originally Posted by vastav View Post
If you are looking to do option #2, pdf2epub.com solution is a single step version of that (and better since you avoid the intermediate lossy conversion to RTF). The solution uses tags in PDF to drive the conversion process using similar flows as those used by RTF and HTML converters built into Acrobat.
I am getting the best results by far using Acrobat as an intermediate step.

Over the last week I converted a couple pdf books to epub format and the biggest problem was getting the paragraph breaks to end up right. I initially tried a straight Calibre conversion but paragraph breaks were all over the place and incorrect -- even after fiddling for quite some time with the line un-wrapping value.

Then I read this thread and the suggestions by chaley and greenapple to use Acrobat were right on the money. I tried other suggestions such as Mobipocket Creator and the pdf2epub.com converter but both resulted in body text where paragraphs all ran together in one long block!

With Acrobat, converting either to RTF or HTML gave me an almost perfect result with the body text. I convert a pdf both ways in Acrobat, then import both rtf and html into Calibre and see which conversion to epub gives the best result in the body text. In one instance it was RTF and in the other it was HTML.

After deciding which gave the best base conversion (RTF or HTML) I then imported the file into MS Word to designate chapter headings and generate a TOC. (I find it easier to do in Word than in Sigil.) Then I import into Calibre, convert to ePub, and do last minute tidying up in Sigil. Sounds like a long process, and it is, but it's much less labor intensive and problematic than trying to clean up the bad paragraph breaks left by other conversion methods.

I realize not all have or can afford Acrobat, but if you look on eBay you can sometimes find older versions on sale for a good price. There may also be some free or cheaper pdf applications that can do as clean a job as Acrobat on pdf-to-rtf/html conversions. I already had Acrobat but never realized it could be so helpful in ebook conversions.

--Pat
PatNY is offline   Reply With Quote
Old 08-09-2010, 05:33 PM   #25
vastav
Member
vastav began at the beginning.
 
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
Quote:
Originally Posted by PatNY View Post
I am getting the best results by far using Acrobat as an intermediate step.

Over the last week I converted a couple pdf books to epub format and the biggest problem was getting the paragraph breaks to end up right. I initially tried a straight Calibre conversion but paragraph breaks were all over the place and incorrect -- even after fiddling for quite some time with the line un-wrapping value.

Then I read this thread and the suggestions by chaley and greenapple to use Acrobat were right on the money. I tried other suggestions such as Mobipocket Creator and the pdf2epub.com converter but both resulted in body text where paragraphs all ran together in one long block!

With Acrobat, converting either to RTF or HTML gave me an almost perfect result with the body text. I convert a pdf both ways in Acrobat, then import both rtf and html into Calibre and see which conversion to epub gives the best result in the body text. In one instance it was RTF and in the other it was HTML.

After deciding which gave the best base conversion (RTF or HTML) I then imported the file into MS Word to designate chapter headings and generate a TOC. (I find it easier to do in Word than in Sigil.) Then I import into Calibre, convert to ePub, and do last minute tidying up in Sigil. Sounds like a long process, and it is, but it's much less labor intensive and problematic than trying to clean up the bad paragraph breaks left by other conversion methods.

I realize not all have or can afford Acrobat, but if you look on eBay you can sometimes find older versions on sale for a good price. There may also be some free or cheaper pdf applications that can do as clean a job as Acrobat on pdf-to-rtf/html conversions. I already had Acrobat but never realized it could be so helpful in ebook conversions.

--Pat
I hope you tried the solution at http://www.pdf2epub.com and not the similar sounding offering by dnaml. The ePub output will have same paragraph breaks as those you find in the RTF or HTML export from Acrobat. Here's why - the conversion plugins (for formats such as RTF, HTML, XML, Plain Text) that come packaged with Acrobat use the tags in the PDF to drive conversion process. If the PDF is not tagged, the first step in conversion process is to generate tags using a tag recognition technolgoy that comes with Acrobat. Once the tags are generated, a piece of content marked as paragraph will be exported as a paragraph by all conversion filters, including the ePub plugin that I supply.

At its origin Tagged PDF was primarily influenced by HTML 4.01 and CSS1.0 specifications. The Tagged PDF spec has some omissions as well as additions compared with the other two standards. I am not sure about the current state of RTF but the RTF 1.6 specification (which is exported by Acrobat 7) had some differences with Tagged PDF's styling attributes. That is why I mentioned that when you go from PDF > RTF > ePub, you will likely encounter some loss, depending on how your PDF is constructed.

For the TOC, if you use the plugin I supply, all bookmarks in PDF automatically get converted to TOC in ePub. If you have a PDF which is tagged by the authoring application, you can simply create the bookmarks in Acrobat by choosing "New bookmarks from Structure" from the top drop-down available in the bookmarks tab in Acrobat. If you have a PDF which is not tagged (you can check by opening View > Navigation Panels > Tags), you should create the bookmarks manually in Acrobat before running the conversion filter for HTML/ RTF/ ePub to ensure that bookmarks get exported in a valid manner in the exported file.

If you have Acrobat on your system, I would suggest using the ePub plugin available on my site versus the web-based solution. The help documentation provides details on using the plugin. If you like the RTF/ HTML export from Acrobat, there is good chance that you will also like the ePub export. I will be happy to help resolve any issues that you may find.
vastav is offline   Reply With Quote
Old 08-09-2010, 10:54 PM   #26
Pushka
Wizard
Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.
 
Pushka's Avatar
 
Posts: 1,119
Karma: 1019140
Join Date: Oct 2009
Location: Australia
Device: kindle, Ipad, Iphone, Nexus and PPW
Quote:
Originally Posted by vastav View Post
If you are looking to do option #2, pdf2epub.com solution is a single step version of that (and better since you avoid the intermediate lossy conversion to RTF). The solution uses tags in PDF to drive the conversion process using similar flows as those used by RTF and HTML converters built into Acrobat.
So, I downloaded a 30 day trial of Adobe Acrobat, my that is a huge file, and am using a trial version of pdf3epub.com

Will let you know the results in a bit.....

Wow - this is the best. Used the trial version of Acrobat, saved as epub using vastav's file, loaded on to calibre for a final conversion to mobi. And it looks amazing. Kudos to the two of you!

Last edited by Pushka; 08-09-2010 at 11:41 PM.
Pushka is offline   Reply With Quote
Old 08-10-2010, 08:29 AM   #27
Pushka
Wizard
Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.Pushka ought to be getting tired of karma fortunes by now.
 
Pushka's Avatar
 
Posts: 1,119
Karma: 1019140
Join Date: Oct 2009
Location: Australia
Device: kindle, Ipad, Iphone, Nexus and PPW
Ok, an update. I am really pleased with how the PDF looks, but not being good at reading instructions, I would like to remove page numbers that were in the PDF. I tagged them and used the tools to tag them as background, but some still appear. Any thoughts? I can live with them, but now having had the experience of great PDF to mobi conversion, now I am looking for perfection
Pushka is offline   Reply With Quote
Old 08-10-2010, 09:00 AM   #28
anthony_barker
Member
anthony_barker began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jun 2010
Device: PB360, nokia n900
Quote:
Originally Posted by Pushka View Post
... I would like to remove page numbers that were in the PDF. I tagged them and used the tools to tag them as background, but some still appear.
A pdf editor such as NitroPDF, acrobat and foxit allow you to crop the pages and remove the page numbers

Other freeware includes pdfcropper, sopdf, briss etc

Question - which generates smaller epubs? Acrobat -> RTF or Acrobat -> HTML...

Last edited by anthony_barker; 08-10-2010 at 09:42 AM.
anthony_barker is offline   Reply With Quote
Old 08-10-2010, 12:19 PM   #29
vastav
Member
vastav began at the beginning.
 
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
Quote:
Originally Posted by Pushka View Post
Ok, an update. I am really pleased with how the PDF looks, but not being good at reading instructions, I would like to remove page numbers that were in the PDF. I tagged them and used the tools to tag them as background, but some still appear. Any thoughts? I can live with them, but now having had the experience of great PDF to mobi conversion, now I am looking for perfection
Thanks for your kind words. If you were using the ePub tool, one thing I can think of is that the bounding rect wasn't big enough to fully encompass the page number, for the plugin to exclude it from conversion. I'd be happy to look at your PDF and provide you exact suggestion. You can email that to support at pdf2epub dot com or upload on this thread, if it is public domain.
vastav is offline   Reply With Quote
Old 08-10-2010, 01:35 PM   #30
PatNY
Zennist
PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.PatNY ought to be getting tired of karma fortunes by now.
 
PatNY's Avatar
 
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
Quote:
Originally Posted by vastav View Post
I hope you tried the solution at http://www.pdf2epub.com and not the similar sounding offering by dnaml. The ePub output will have same paragraph breaks as those you find in the RTF or HTML export from Acrobat.
vastav, I originally did use that website. I tried it again last evening and it still resulted in a file with the paragraphs all run together. OK, so then I downloaded and installed the plugin into Acrobat 9 and the issue with the run-in paragraphs is still there.

The paragraphs in this pdf may not be formatted in a standard way, but when I do an intermediate conversion to RTF or HTML in Acrobat, they are all picked up correctly!

You can see for yourself as I am going to send you the ebook file by email so you can investigate what is going on with your methods.


Quote:
For the TOC, if you use the plugin I supply, all bookmarks in PDF automatically get converted to TOC in ePub. If you have a PDF which is tagged by the authoring application, you can simply create the bookmarks in Acrobat by choosing "New bookmarks from Structure" from the top drop-down available in the bookmarks tab in Acrobat. If you have a PDF which is not tagged (you can check by opening View > Navigation Panels > Tags), you should create the bookmarks manually in Acrobat before running the conversion filter for HTML/ RTF/ ePub to ensure that bookmarks get exported in a valid manner in the exported file.
Thanks for this very useful information. I did not know about these Acrobat features and I used them on another book. It made a difference in being able to generate a metadata TOC. I don't think Calibre can read bookmarks in a pdf this way and generate the metadata TOC.

pdf2epub seems to do a very nice job indeed on most conversions where other tools fail, but it's still not there yet if it can't correctly break the paragraphs on all files. I will still keep on using/testing it for other pdf ebooks, however. In the meantime, let me know when you get the ebook and find out what was the hitch.

Also, does anyone know of a good automated way to insert a blank line between paragraphs in the body text of a pdf ebook? I couldn't figure out how to do it in Acrobat, except manually of course. There is no global search/replace feature in it.

I was figuring if I could first insert a space between paragraphs in the problematic document, then when converting via pdf2epub it wouldn't run the paragraphs all together.

--Pat

Last edited by PatNY; 08-10-2010 at 01:42 PM.
PatNY is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert a 2 column PDF into epub thorm42 Conversion 18 06-03-2014 05:37 AM
Using Calibre to convert pdf to epub varelov Calibre 2 10-15-2010 02:20 AM
Would it be better if I convert pdf into epub? fantasyvn Sony Reader 7 04-15-2010 07:43 AM


All times are GMT -4. The time now is 01:11 PM.


MobileRead.com is a privately owned, operated and funded community.