Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-07-2012, 03:28 PM   #16
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Try saving the file from Word as an earlier (Word95?) type of Word file, then retry various conversions with that as the original
- save as filtered html & convert
- save as rtf & convert
- load into OpenOffice/Atlantis save as ePub.

RTF does do graphics, just may be some formats aren't supported fully.

Saving the word file as an earlier Word format may help reduce all the extraneous junk that gets put in and is what's causing the glitches when converting.
(I don't use Word anymore, but am pretty sure you can specify what Word version the doc file will be saved as, so that others with older versions can load/edit the file)

A lot depends on what the graphics are and what format they are in.

Last edited by Perkin; 07-07-2012 at 03:31 PM.
Perkin is offline   Reply With Quote
Old 07-07-2012, 08:54 PM   #17
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,758
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by arturox View Post
A couple of problems with that.
1) I can manage plain old HTML, but I know 'nuffink' about CCS and at the moment don't have the where-with-all to learn it.

2) As a comprehensive Editing tool, I find Sigil a problem.

Good thought non-the-less.

Ax
Unless you make the effort to learn XML/CSS, then you may as well not bother with these DOC files as they won't convert well without doing some cleaning up.
JSWolf is offline   Reply With Quote
Old 07-08-2012, 04:53 AM   #18
Sunlite
Addict
Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.Sunlite ought to be getting tired of karma fortunes by now.
 
Sunlite's Avatar
 
Posts: 206
Karma: 547516
Join Date: Mar 2008
Location: Berlin, Germany
Device: KObo Clara, Kobo Aura, PRS-T1, PB602, CyBook Gen3
@arturox: The more conversion methods fail to produce a result you like, the more it sounds like your source just isn't that good. Most of the conversion programs work on the principle of 'garbage in, garbage out'. The cleaner the source is, the higher is the chance that the conversion result is nice looking.

Learning html/css is one way to make sure your source is clean. For epub creation it is probably even the best way.

If you don't have the time at the moment to do it or need a fast solution, cleaning up the Word file might work. Get yourself the Smashwords Style Guide and rework your Word file according to their rules. The guide is even available in other languages if that helps you in any way.
Sunlite is offline   Reply With Quote
Old 07-08-2012, 05:24 PM   #19
arturox
Enthusiast
arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.
 
arturox's Avatar
 
Posts: 28
Karma: 107028
Join Date: Jul 2012
Device: Kobo
Thanks for that thought, I'll keep it in mind.
I'm a bit busy with my own stuff ATM. so her stuff will have to wait a bit.
Ax
arturox is offline   Reply With Quote
Old 07-09-2012, 12:34 PM   #20
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by JSWolf View Post
Unless you make the effort to learn XML/CSS, then you may as well not bother with these DOC files as they won't convert well without doing some cleaning up.
I couldn't agree more. The 2 objectives 'I want to create beautiful epubs with complex layouts' and 'I don't want to learn HTML/css' are mutually exclusive. You either need to learn a bit or lower your epub expectations. The amount of css knowledge required for manipulating epubs is much smaller than having to learn ALL of it.

On the subject of Word and images, I also went through the pain of 'how to create usable html from Word' about 3 years ago. As far as images are concerned, in the end I decided not to add them directly to my Word doc. Instead I added a text-placeholder (using a text-string which included the image name). Once I saved as webpage-filtered, I was easily able to do a mass find/replace on the html file, using a good text editor.

P.S. Another tip, never try to apply a Dropcap style using Word's flavour of Dropcap. The resulting html was horrible. Now it may be my ancient version of Word that's at fault, but I suspect not
jackie_w is offline   Reply With Quote
Old 07-10-2012, 02:55 PM   #21
arturox
Enthusiast
arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.
 
arturox's Avatar
 
Posts: 28
Karma: 107028
Join Date: Jul 2012
Device: Kobo
The reason I don't want waste my time learning Css is not out of cussedness, I'm doing this stuff for the Wife, but personally I have no interest in such matters or even Ereaders, my interests are in other areas.

AX
arturox is offline   Reply With Quote
Old 07-10-2012, 04:20 PM   #22
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,758
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by arturox View Post
The reason I don't want waste my time learning Css is not out of cussedness, I'm doing this stuff for the Wife, but personally I have no interest in such matters or even Ereaders, my interests are in other areas.

AX
Shall we tell your wife that you don't care to make her eBooks look nice?

But as said, either you do it right, or don't bother as your wife will not like the results.
JSWolf is offline   Reply With Quote
Old 07-12-2012, 03:57 AM   #23
arturox
Enthusiast
arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.arturox is my name, but call me Ishmael.
 
arturox's Avatar
 
Posts: 28
Karma: 107028
Join Date: Jul 2012
Device: Kobo
Mmnnn!
In some respects you are correct, however that's all done with now.

I was testing all this stuff out so that she could do it herself as she's very computerate in some areas, but not so good in other areas.
However, writing simple Html let alone CSS is not in her computing experience at all, so she'd have to start from scratch. Totally from scratch.

Anyway, I've now worked out a system that's workable for her, particularly from the documents she has as her source, using standard apps, no manual Html required and gives an end result that 99 percent, and she's happy with the end result.

So I guess I'm done...
Ax
arturox is offline   Reply With Quote
Old 07-18-2012, 05:40 AM   #24
rocketdocs
rocketdocs developer
rocketdocs began at the beginning.
 
rocketdocs's Avatar
 
Posts: 5
Karma: 10
Join Date: Jul 2012
Location: Ottawa, Canada
Device: iPad
I stumbled upon your thread today and believe me, I feel your pain! I've been converting all kinds of documents for over 4 years now and getting a Word doc to EPUB is just a painful process.

What we found works best is to first convert it to PDF, then convert the PDF to HTML and then to EPUB. Converting to PDF first gets rid of all that junk Word outputs and gives you a better baseline to work with.

We've actually developed a web-based application (www.rocketdocs.com) that's based on this very process and it works flawlessly. No HTML or CSS coding involved.

Hope that helps!
rocketdocs is offline   Reply With Quote
Old 07-18-2012, 06:01 AM   #25
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,729
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@rocketdocs - I am almost speechless. I'm struggling to respond on what an insanely flawed suggestion that is without dropping down to expletives. The stickies and the dozens of posts every week from all the user who are battling with the limitations of the PDF format and how it is the *worst* possible format to try to convert from are all there for very good reasons. To recommend that users intentionally put PDF into their workflow on the way to somehing else it is complete and utter madness. Is this April 1st or something?

Perhaps JSWolf can offer a more tactful response...
kiwidude is offline   Reply With Quote
Old 07-18-2012, 06:10 AM   #26
rocketdocs
rocketdocs developer
rocketdocs began at the beginning.
 
rocketdocs's Avatar
 
Posts: 5
Karma: 10
Join Date: Jul 2012
Location: Ottawa, Canada
Device: iPad
@kiwidude - I've read a few of the stickies and posts about the "limitations" of the PDF format, but they are only limitations as it pertains to calibre and not PDF itself.

Don't get me wrong, PDF is a tricky beast to tame, that's for sure, but like I said, we've spent 4 years converting thousands of PDFs to strict HTML standards for the Government of Canada so I'm definitely qualified to make that statement.

I saw posts on here that say column layouts can't be converted or forget tables, they're impossible to extract. We've been doing these for years now with our software.
rocketdocs is offline   Reply With Quote
Old 07-18-2012, 06:19 AM   #27
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@rocketdocs: Make your tool available and I'll provide 10 pdfs that it will make an absolute hash of. I'm amazed that you claim to have developed tools to convert PDF and yet appear to have zero understanding of the problems with PDF.
kovidgoyal is offline   Reply With Quote
Old 07-18-2012, 07:12 AM   #28
rocketdocs
rocketdocs developer
rocketdocs began at the beginning.
 
rocketdocs's Avatar
 
Posts: 5
Karma: 10
Join Date: Jul 2012
Location: Ottawa, Canada
Device: iPad
@kovidgoyal: sign up for our beta trial and we'll give you access to it when it becomes available.

We have an immense understanding of PDF. What exactly are the problems you're having so I can better comment on your pain points?
rocketdocs is offline   Reply With Quote
Old 07-18-2012, 07:27 AM   #29
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,729
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by rocketdocs View Post
@kiwidude - I've read a few of the stickies and posts about the "limitations" of the PDF format, but they are only limitations as it pertains to calibre and not PDF itself.
I'm afraid your credibility is plummeting like a stone with an opening statement like that. If converting PDFs was an issue limited to calibre's implementation the sticky would simply read "use XYZ tool for converting PDFs instead".

Why does it not say that? Because there isn't a tool on the market that anyone else who has ever come across these forums has seen that can consistently convert them with any reliability. Not even commercial tools from Adobe, the people who surely know more about the PDFs than anyone else. This is not a calibre issue, it is a "PDF is a print format" issue.

Now to be fair I haven't used your particular tool, nor seen the before/after examples that it produces. And perhaps there is something specific about going from Word to PDF which is letting you bypass some of the traditional PDF conversion issues to do with fonts, unicode, ligatures, layout, headers and footers, OCR issues, scene breaks, tables, lack of paragraph structure and whatever else I am forgetting. But until I saw such a thing for myself I stand by my comments that your typical user should not see PDFs as a saviour for a general conversion workflow.
kiwidude is offline   Reply With Quote
Old 07-18-2012, 07:31 AM   #30
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It doesn't matter how good your "understanding" of PDF is. The difficulty in converting PDF does not come from the obscurity of the format. It comes from the nature of the format. I have discoursed in length on that elsewhere, so I am not going to repeat myself in detail here. But suffice it to say that PDF is not a semantic format. A PDF file (typically) contains instructions that look like draw character #1234 from font xyz at position (x, y) on the page. The PDF file (unless it is tagged) has no semantic info at all. It has no concept of semantic units like words, sentences, paragraphs, tables, lists, etc. That means that an attempt to convert it to HTML can follow one of two paths:

1) Use non semantic HTML (i.e. just replicate the PDF drawing instructions with some form of absolute positioned HTML)

2) Use statistical analysis to re-organize the text from the PDF into semantic units.

As far as (1) is concerned there are already dozens of perfectly good tools that do this. However, the resulting HTML is not reflowable and is useless as far as small screened devices are concerned.

(2) suffers from the problems of statistical analysis. It can never be absolutely accurate. So it will mis identify text sections, sentences, words, headers, footers and so on, in some percentage of cases. Some tools that follow this approach and try to work with arbitrary PDFs are: pdftoxml (used by Amazon), pdftohtml from poppler (used by calibre), PDFMiner, and a couple of others. None of them work well on any significant subset of PDFs.

If you claim that your tool can convert arbitrary PDF into HTML losslessly, then it is, I suspect, an implementation of (1) and as such not very interesting.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fun converting Word to epub DebbyS Conversion 2 10-09-2011 03:27 AM
Number of HTML converting to EPUB HoushaSen Conversion 11 08-16-2011 07:49 AM
Converting Word Doc with Tables to Epub? dhume01 ePub 8 12-28-2010 08:02 PM
Converting from Word Perfect to epub PhishStyx Sigil 10 05-17-2010 04:49 PM


All times are GMT -4. The time now is 06:24 PM.


MobileRead.com is a privately owned, operated and funded community.