Quote:
Originally Posted by zoran
I had some time trying out to convert pdf to epub. Calibre is
fine utility for this, but it is not panacea. People start to make
documents in pdf or saving them in pdf as default. Cannot say
why is that. The reason I mention this is that mostly they not
press <enter> in their work. So, there is not hard new line, or
I'm totally wrong. If I'm right, this kind of data is converted
without a hiccup. I have some books reading just as any other.
There is also another kind of pdf documents. They seems to
have new line and the output looks like this:
This is a new line that should show
you how I not
intent to format epub book.
Whatever I try, nothing. Converting this way, another way.
Stays the same. More users have the same problem. If some-
one could point to something better, please, do it.
|
Calibre does have a
Line Un-Wrapping Factor that might help you. From the Calibre manual:
Quote:
PDF documents are one of the worst formats to convert from. They are a fixed page size and text placement format. Meaning, it is very difficult to determine where one paragraph ends and another begins. calibre will try to unwrap paragraphs using a configurable, Line Un-Wrapping Factor. This is a scale used to determine the length at which a line should be unwrapped. Valid values are a decimal between 0 and 1. The default is 0.5, this is the median line length. Lower this value to include more text in the unwrapping. Increase to include less. You can adjust this value in the conversion settings under PDF Input.
Also, they often have headers and footers as part of the document that will become included with the text. Use the options to remove headers and footers to mitigate this issue. If the headers and footers are not removed from the text it can throw off the paragraph unwrapping.
Some limitations of PDF input is complex, multi-column, and image based documents are not supported. Extraction of vector images and tables from within the document is also not supported.
|