04-16-2010, 11:54 PM | #16 | |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
Quote:
GIGO: An acronym standing for "Garbage In, Garbage Out". It is used as a snide comment implying that the starting data the computer was given to work with was not appropriate, just plain wrong, or entered by a moron. Regardless their was no way in Blazes it was going to produce anything like a correct answer. Here, it applies to poorly constructed, but completely valid, ebooks. We've all seen it's cousin, the poorly constructed web page, that can be viewed in one browser perfectly. But in any other browser the page is a complete mess. |
|
05-17-2010, 10:26 PM | #17 |
Zealot
Posts: 109
Karma: 556
Join Date: Nov 2009
Location: SaiGon VietNam
Device: PRS T1, Kobo Forma 8G, Kobo Libra H2O
|
Try Solid Converter. It's cheaper than Adobe but still does really nice work!
|
Advert | |
|
05-19-2010, 08:56 PM | #18 |
Connoisseur
Posts: 98
Karma: 122982
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
|
easy rider,
If you are handy with the command line, you might try my pdfreflow utility that is described in the PDF forum. It will take the XML output of pdftohtml -xml, reflow to HTML, which you can use in Calibre as input for epub. |
08-04-2010, 11:08 PM | #19 |
Enthusiast
Posts: 47
Karma: 120
Join Date: Jun 2010
Device: Kobo
|
You, sir, are a genius. This works beautifully and avoids all the mucking about in Sigil fixing stuff that Calibre puts in. Not only that, but it creates dramatically smaller epubs. For newbie Mobipocket users, just click on "import PDF", choose the file, click Import, then click on Build. Add the book with calibre, and then do a mass convert to epub. With most of my scummy PDFs all I needed to do then was tidy up the title and everything was lovely! You've saved me hours...
|
08-06-2010, 04:14 AM | #20 |
Member
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
|
I would like to gently point out another option that I have been involved with for converting PDFs to ePub. The primary solution comes as an Acrobat plugin which uses similar conversion algo as PDF to RTF/ HTML options.
The solution also has a free web based option available at http://www.pdf2epub.com/trial . While the software is still a work in progress, it is simple to use and most of the font level formatting options are retained in the converted ePub. I would love to hear feedback from the community. |
Advert | |
|
08-08-2010, 12:00 PM | #21 |
Member
Posts: 15
Karma: 10
Join Date: Jun 2010
Device: PB360, nokia n900
|
This is a very useful series of posts
So it looks like the _best_ options are for PDFs 1) Mobipocket/Calibre PDF->html->PRC->epub https://wiki.mobileread.com/wiki/MobiPocket_Creator https://wiki.mobileread.com/wiki/Calibre Does this work better than simply opening the HTML directly in Sigil? and/Or cleaning the html prior to Sigil? 2) Acrobat Pro/Calibre PDF -> RTF -> epub 3) If OCR required ABBYY -> html -> Sigil? Or is ABBYY -> RTF-> Calibre better? Is epub the best end format? I've been reading stuff in html on my pocketbook 360 as I find using the browser better on my cellphone (nokia n900) than fbreader Also it seems the best bet for text book style books is just to leave them in pdf, crop them or OCR them (follow #3)? Also does any of these methods handle math symbols? |
08-08-2010, 09:49 PM | #22 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
PDF->html And then back to Calibre for this part: html->epub Then I use Sigil for any fine tuning. I don't think the PRC step helps since the first thing calibre does during conversion is change the file back to html. My main goal is avoiding PDF as a source document whenever possible. Last edited by DoctorOhh; 08-09-2010 at 05:03 AM. |
|
08-09-2010, 03:43 AM | #23 |
Member
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
|
If you are looking to do option #2, pdf2epub.com solution is a single step version of that (and better since you avoid the intermediate lossy conversion to RTF). The solution uses tags in PDF to drive the conversion process using similar flows as those used by RTF and HTML converters built into Acrobat.
|
08-09-2010, 11:34 AM | #24 | |
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
Quote:
Over the last week I converted a couple pdf books to epub format and the biggest problem was getting the paragraph breaks to end up right. I initially tried a straight Calibre conversion but paragraph breaks were all over the place and incorrect -- even after fiddling for quite some time with the line un-wrapping value. Then I read this thread and the suggestions by chaley and greenapple to use Acrobat were right on the money. I tried other suggestions such as Mobipocket Creator and the pdf2epub.com converter but both resulted in body text where paragraphs all ran together in one long block! With Acrobat, converting either to RTF or HTML gave me an almost perfect result with the body text. I convert a pdf both ways in Acrobat, then import both rtf and html into Calibre and see which conversion to epub gives the best result in the body text. In one instance it was RTF and in the other it was HTML. After deciding which gave the best base conversion (RTF or HTML) I then imported the file into MS Word to designate chapter headings and generate a TOC. (I find it easier to do in Word than in Sigil.) Then I import into Calibre, convert to ePub, and do last minute tidying up in Sigil. Sounds like a long process, and it is, but it's much less labor intensive and problematic than trying to clean up the bad paragraph breaks left by other conversion methods. I realize not all have or can afford Acrobat, but if you look on eBay you can sometimes find older versions on sale for a good price. There may also be some free or cheaper pdf applications that can do as clean a job as Acrobat on pdf-to-rtf/html conversions. I already had Acrobat but never realized it could be so helpful in ebook conversions. --Pat |
|
08-09-2010, 05:33 PM | #25 | |
Member
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
|
Quote:
At its origin Tagged PDF was primarily influenced by HTML 4.01 and CSS1.0 specifications. The Tagged PDF spec has some omissions as well as additions compared with the other two standards. I am not sure about the current state of RTF but the RTF 1.6 specification (which is exported by Acrobat 7) had some differences with Tagged PDF's styling attributes. That is why I mentioned that when you go from PDF > RTF > ePub, you will likely encounter some loss, depending on how your PDF is constructed. For the TOC, if you use the plugin I supply, all bookmarks in PDF automatically get converted to TOC in ePub. If you have a PDF which is tagged by the authoring application, you can simply create the bookmarks in Acrobat by choosing "New bookmarks from Structure" from the top drop-down available in the bookmarks tab in Acrobat. If you have a PDF which is not tagged (you can check by opening View > Navigation Panels > Tags), you should create the bookmarks manually in Acrobat before running the conversion filter for HTML/ RTF/ ePub to ensure that bookmarks get exported in a valid manner in the exported file. If you have Acrobat on your system, I would suggest using the ePub plugin available on my site versus the web-based solution. The help documentation provides details on using the plugin. If you like the RTF/ HTML export from Acrobat, there is good chance that you will also like the ePub export. I will be happy to help resolve any issues that you may find. |
|
08-09-2010, 10:54 PM | #26 | |
Wizard
Posts: 1,119
Karma: 1019140
Join Date: Oct 2009
Location: Australia
Device: kindle, Ipad, Iphone, Nexus and PPW
|
Quote:
Will let you know the results in a bit..... Wow - this is the best. Used the trial version of Acrobat, saved as epub using vastav's file, loaded on to calibre for a final conversion to mobi. And it looks amazing. Kudos to the two of you! Last edited by Pushka; 08-09-2010 at 11:41 PM. |
|
08-10-2010, 08:29 AM | #27 |
Wizard
Posts: 1,119
Karma: 1019140
Join Date: Oct 2009
Location: Australia
Device: kindle, Ipad, Iphone, Nexus and PPW
|
Ok, an update. I am really pleased with how the PDF looks, but not being good at reading instructions, I would like to remove page numbers that were in the PDF. I tagged them and used the tools to tag them as background, but some still appear. Any thoughts? I can live with them, but now having had the experience of great PDF to mobi conversion, now I am looking for perfection
|
08-10-2010, 09:00 AM | #28 | |
Member
Posts: 15
Karma: 10
Join Date: Jun 2010
Device: PB360, nokia n900
|
Quote:
Other freeware includes pdfcropper, sopdf, briss etc Question - which generates smaller epubs? Acrobat -> RTF or Acrobat -> HTML... Last edited by anthony_barker; 08-10-2010 at 09:42 AM. |
|
08-10-2010, 12:19 PM | #29 | |
Member
Posts: 18
Karma: 38
Join Date: Sep 2009
Location: San Francisco Bay Area
Device: none
|
Quote:
|
|
08-10-2010, 01:35 PM | #30 | ||
Zennist
Posts: 1,022
Karma: 47809468
Join Date: Jul 2010
Device: iPod Touch, Sony PRS-350, Nook HD+ & HD
|
Quote:
The paragraphs in this pdf may not be formatted in a standard way, but when I do an intermediate conversion to RTF or HTML in Acrobat, they are all picked up correctly! You can see for yourself as I am going to send you the ebook file by email so you can investigate what is going on with your methods. Quote:
pdf2epub seems to do a very nice job indeed on most conversions where other tools fail, but it's still not there yet if it can't correctly break the paragraphs on all files. I will still keep on using/testing it for other pdf ebooks, however. In the meantime, let me know when you get the ebook and find out what was the hitch. Also, does anyone know of a good automated way to insert a blank line between paragraphs in the body text of a pdf ebook? I couldn't figure out how to do it in Acrobat, except manually of course. There is no global search/replace feature in it. I was figuring if I could first insert a space between paragraphs in the problematic document, then when converting via pdf2epub it wouldn't run the paragraphs all together. --Pat Last edited by PatNY; 08-10-2010 at 01:42 PM. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert a 2 column PDF into epub | thorm42 | Conversion | 18 | 06-03-2014 05:37 AM |
Using Calibre to convert pdf to epub | varelov | Calibre | 2 | 10-15-2010 02:20 AM |
Would it be better if I convert pdf into epub? | fantasyvn | Sony Reader | 7 | 04-15-2010 07:43 AM |