View Single Post
Old 03-02-2011, 11:58 AM   #1
skallal
Junior Member
skallal began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2011
PDF to MOBI command line issues

This is my first post on this forum. I am using Calibre on a client's web site to to convert a PDF document to MOBI format on the fly. I have been able to do the conversion manually through the library with reasonable results. But I have NOT been able to do so with ebook-convert.exe wo/ some interesting formatting issues.

The only conversion defaults I've changed in the library are on the Conversion Input Options for PDF Input. Line Un-Wrapping factor is changed to 0.10 and No Images is checked.

The command line equivalents are:
--no-images
and
--html-unwrap-factor=0.10

Are these command line options correct?

The formatting issue is that the MOBI output has random paragraph breaks in the middle of some sentences. A look at the web inspector, in the viewer, shows HTML paragraph breaks (<p></p> tags) in the middle of these sentences. I've run ebook-convert.exe with the debug option. The HTML is OK in the debug "input" folder. But the HTML in the debug "parsed" folder is where the bad paragraph breaks begin.

I am new to Calibre, but it seems like the problem is in the input plug-in somewhere. Yet the problem does NOT exist within the library. So there is a problem in my usage of ebook-convert.exe.

Also I'd like to get the Calibre source code. I've download Bazaar, but am unable to retrieve the source code. Never used Bazaar before. Mainly code in C# ASP.NET, but hope to make sense of the Python code.

Also I tried to call pdftohtml.exe directly. It works except the encoding is off. I don't know how to use the -enc command line option to specify UTF8 encoding for example.

I am using version 0.7.45, which is almost the newest. And I am running Windows XP Pro SP3.

Can someone please help me?
skallal is offline   Reply With Quote