Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-13-2009, 10:34 PM   #1
wamblej
Member
wamblej began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2009
Device: Kindle 2
PDF Conversion

For some reason I cannot seem to get a successful PDF -> mobi (or any other format for that matter). I either get a clump of text (with no spacing or formatting at all) or else what I more commonly get it a bunch of lines in the middle of paragraphs.
Ex: This is a sentence and then halfway through the sentence or the para

graph it would have a space that is weird. And it seems like the majority of the formatting is wrong.

If you have any directions or methods that have worked for you - even if it requires multiple conversions that would be very helpful. Thanks!
wamblej is offline   Reply With Quote
Old 10-14-2009, 05:59 AM   #2
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,909
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Thems the breaks

When I do a conversion from HTML to LRF these kinds of weird breaks are signs of non breaking spaces. The non breaking space binds the text before it to the text after it, so it doesn't break normally.

I don't know if this is what is happening to you...just a thought.

You might try conversion to epub, then you could look at the text with sigil and see what is there.
mrmikel is offline   Reply With Quote
 
Enthusiast
Old 10-14-2009, 06:01 AM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,428
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
You need to adjust the unwrap factor. If it's clump increase if it has broken paragraphs decrease.
user_none is offline   Reply With Quote
Old 10-14-2009, 10:56 AM   #4
wamblej
Member
wamblej began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2009
Device: Kindle 2
The unwrap factor under PDF input is at 0.00 already. I tried changing it to 0.5, which actually corrected the problem. THanks. I'll try it again with some other files and see if it helps. I don't know why it was defaulting to 0 instead of .5. I'll keep you posted on what I find. Thanks.
wamblej is offline   Reply With Quote
Old 10-15-2009, 01:17 PM   #5
mysweety
Connoisseur
mysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enough
 
Posts: 61
Karma: 510
Join Date: Jul 2009
Device: Hanlin V3, PB360
I have just been through this process.
Here is a procedure (linux):
a) Put the pdf in a viewer and do a "select all"
b) Put the text into openoffice and produce an .odt file
c) Ajust sentence length to 50% so as to join short bits with new line chars
d) Run replace for "" to "\n" and find for [a-z] - this gets rid of paras that begin with a small letter and dialogue that has been run together.
e)Split odt file into a separate file for each chapter or section you want in the TOC
f) Clean up the odt files against the original, checking sentences and paras, and put (i) and (ii) for example around italic text.
g) Convert these files to encoded text utf-8
h) Start ecub and put in these files for immediate conversion to html.
i) Clean these html files substituting <i> for (i) and </i> for (ii) etc and putting in images etc.
h) Compile to epub file and check in azardi that all the changes are ok.
i) Copy the resulting build folder to e.g. finalbuild
j) Correct the cover page and place a reference to the TOC in content.opf
k) Place a reference to the TOC in the title page
l) Run "zip -Xr9D $1.epub mimetype * -x .DS_Store" in FinalBuild to produce a new epub. Check in azardi
m) Run mobigen against content.opf with wine
mysweety is offline   Reply With Quote
Old 10-15-2009, 06:57 PM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,428
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by mysweety View Post
I have just been through this process.
Here is a procedure (linux):
...
The addition of the unwrap factor should make this unnecessary. However, I do realize it is not a perfect solution. Kovid is doing some work on PDF input right now that will make it even better.
user_none is offline   Reply With Quote
Old 10-15-2009, 10:26 PM   #7
wamblej
Member
wamblej began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Sep 2009
Device: Kindle 2
So far Calibre is doing a great job. I just needed to change that unwrap to 0.5. I went to preferences and changed my default to be 0.5 and it has been great.
wamblej is offline   Reply With Quote
Old 10-16-2009, 08:13 AM   #8
mysweety
Connoisseur
mysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enoughmysweety will become famous soon enough
 
Posts: 61
Karma: 510
Join Date: Jul 2009
Device: Hanlin V3, PB360
I think the problem is the style of the original and how faithfully one wishes to follow it.
I found the unwrap factor could handle about 50% of the problem and recourse to regular expressions probably got it up to 80-90%. However if you need to be really faithful to the original then unfortunately it needs to be checked by hand and this is where it takes most time.
The advantage of ecub is that it is very flexible and produces good, simple xhtml files which can be edited with ease. Also the problems surrounding the TOC disappear and mobi,epub and voice can all be produced at the same time.
I think that calibre does a very good job but is limited in the degree of accuracy of the output.
mysweety is offline   Reply With Quote
Reply

Tags
conversion, pdf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf conversion terraskye Calibre 0 10-07-2010 09:46 PM
Conversion de pdf ? Cressence Assistance 7 02-11-2010 07:34 AM
PDF conversion help ardeegee Other formats 5 01-13-2010 02:47 PM
Conversion PDF EricGagne Software 3 10-29-2009 03:19 PM
PDF Conversion Help Exinferis Reading and Management 2 06-15-2009 09:11 AM


All times are GMT -4. The time now is 05:49 PM.


MobileRead.com is a privately owned, operated and funded community.