PDF to MOBI command line issues
This is my first post on this forum. I am using Calibre on a client's web site to to convert a PDF document to MOBI format on the fly. I have been able to do the conversion manually through the library with reasonable results. But I have NOT been able to do so with ebook-convert.exe wo/ some interesting formatting issues.
The only conversion defaults I've changed in the library are on the Conversion Input Options for PDF Input. Line Un-Wrapping factor is changed to 0.10 and No Images is checked.
The command line equivalents are:
--no-images
and
--html-unwrap-factor=0.10
Are these command line options correct?
The formatting issue is that the MOBI output has random paragraph breaks in the middle of some sentences. A look at the web inspector, in the viewer, shows HTML paragraph breaks (<p></p> tags) in the middle of these sentences. I've run ebook-convert.exe with the debug option. The HTML is OK in the debug "input" folder. But the HTML in the debug "parsed" folder is where the bad paragraph breaks begin.
I am new to Calibre, but it seems like the problem is in the input plug-in somewhere. Yet the problem does NOT exist within the library. So there is a problem in my usage of ebook-convert.exe.
Also I'd like to get the Calibre source code. I've download Bazaar, but am unable to retrieve the source code. Never used Bazaar before. Mainly code in C# ASP.NET, but hope to make sense of the Python code.
Also I tried to call pdftohtml.exe directly. It works except the encoding is off. I don't know how to use the -enc command line option to specify UTF8 encoding for example.
I am using version 0.7.45, which is almost the newest. And I am running Windows XP Pro SP3.
Can someone please help me?
|