Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-28-2015, 01:56 PM   #1
fetito666
Enthusiast
fetito666 began at the beginning.
 
fetito666's Avatar
 
Posts: 26
Karma: 10
Join Date: Feb 2015
Device: Kindle Paperwhite 3rd
Avoiding delimiter symbols

Hello!

I want to convert some DOCX files into the mobi format. Is there a way to do this in a clean way?

Some line skips have been converted strangely, especially the delimiter symbol which Word automatically adds in order to separate one word if it does not fit into one line.

In the converted mobi-format some words appear like this:

“The great philo- soph Hegel spoke to all people. He wrote a lot a inter- esting books.”

Instead of

“The great philo- soph Hegel spoke to all people. He wrote a lot of interesting books.”

Saludos
Patrick
fetito666 is offline   Reply With Quote
Old 02-28-2015, 04:51 PM   #2
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544528
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
Use toxaris word add-in. It produces an epub out of the docx. You could then use kindlegen to make a mobi out of it. Another possibility would be calibre.
Word added hyphenation shouldn't be in the html anyway. Are you sure they are not manual made? Otherwise just disable auto hyphenation in word.
dickloraine is offline   Reply With Quote
Advert
Old 02-28-2015, 07:14 PM   #3
fetito666
Enthusiast
fetito666 began at the beginning.
 
fetito666's Avatar
 
Posts: 26
Karma: 10
Join Date: Feb 2015
Device: Kindle Paperwhite 3rd
Avoiding delimiter symbols

@dickloraine: Thank you for the answer.

I think those hyphenations exist, because the file used to be a PDF which I was able to convert perfectly into DOCX. In Word it looks exactly like the PDF-file without typos or errors.

It's just that when I export it with calibre into the MOBI-format, the layout gets "funky": hyphenation appears.
fetito666 is offline   Reply With Quote
Old 02-28-2015, 09:28 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Sticky: Read this before Posting PDF Questions

No, "converting perfectly into DOCX" does not exist, and will not and cannot fix the problems inherent to using PDF.

The best any PDF-to-Word converter can do is make use of advanced and expensive parsing engines to make a best-guess at where the paragraphs connect, and I do not know of any such software. It probably does exactly what calibre does and uses a generic line unwrap factor. So you might want to think about checking for the odd split/joined paragraphs.

Either way, this is another common problem with PDF conversions (IIRC calibre does the same) -- it correctly unwrapped those two lines, but took the PDF at its word with the hyphen and linebreak, and assumed they were two words.
eschwartz is offline   Reply With Quote
Old 02-28-2015, 10:05 PM   #5
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544528
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
As eschwartz said, the hyphens come from the pdf. They should be in the docx too. Depending on the book you could mass replace them with a regex in the calibre editor. But depending on how exactly they look, you would nuke any correct use too. But few books have many such words and maybe there is even a blank space between the dash and the next word. If you encounter that on every page, I would do it.
dickloraine is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Screen flashing--any way of avoiding it?! robinson Marvin 12 10-05-2015 05:27 AM
Avoiding update to 5.6.1 on PW2 nissaa7 Kindle Developer's Corner 3 11-23-2014 09:46 AM
Touch Avoiding previews being loaded? MacEachaidh Kobo Reader 8 04-14-2013 12:05 AM
Avoiding the dreaded update jenniren Nook Developer's Corner 3 01-26-2011 11:51 PM
Avoiding auto updates? Polvo Kindle Developer's Corner 5 10-20-2010 01:19 PM


All times are GMT -4. The time now is 09:40 AM.


MobileRead.com is a privately owned, operated and funded community.