![]() |
#1 |
Enthusiast
![]() Posts: 26
Karma: 10
Join Date: Feb 2015
Device: Kindle Paperwhite 3rd
|
Avoiding delimiter symbols
Hello!
I want to convert some DOCX files into the mobi format. Is there a way to do this in a clean way? Some line skips have been converted strangely, especially the delimiter symbol which Word automatically adds in order to separate one word if it does not fit into one line. In the converted mobi-format some words appear like this: “The great philo- soph Hegel spoke to all people. He wrote a lot a inter- esting books.” Instead of “The great philo- soph Hegel spoke to all people. He wrote a lot of interesting books.” Saludos Patrick |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 631
Karma: 7544528
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
|
Use toxaris word add-in. It produces an epub out of the docx. You could then use kindlegen to make a mobi out of it. Another possibility would be calibre.
Word added hyphenation shouldn't be in the html anyway. Are you sure they are not manual made? Otherwise just disable auto hyphenation in word. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() Posts: 26
Karma: 10
Join Date: Feb 2015
Device: Kindle Paperwhite 3rd
|
Avoiding delimiter symbols
@dickloraine: Thank you for the answer.
I think those hyphenations exist, because the file used to be a PDF which I was able to convert perfectly into DOCX. In Word it looks exactly like the PDF-file without typos or errors. It's just that when I export it with calibre into the MOBI-format, the layout gets "funky": hyphenation appears. |
![]() |
![]() |
![]() |
#4 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Sticky: Read this before Posting PDF Questions
No, "converting perfectly into DOCX" does not exist, and will not and cannot fix the problems inherent to using PDF. The best any PDF-to-Word converter can do is make use of advanced and expensive parsing engines to make a best-guess at where the paragraphs connect, and I do not know of any such software. It probably does exactly what calibre does and uses a generic line unwrap factor. So you might want to think about checking for the odd split/joined paragraphs. Either way, this is another common problem with PDF conversions (IIRC calibre does the same) -- it correctly unwrapped those two lines, but took the PDF at its word with the hyphen and linebreak, and assumed they were two words. |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 631
Karma: 7544528
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
|
As eschwartz said, the hyphens come from the pdf. They should be in the docx too. Depending on the book you could mass replace them with a regex in the calibre editor. But depending on how exactly they look, you would nuke any correct use too. But few books have many such words and maybe there is even a blank space between the dash and the next word. If you encounter that on every page, I would do it.
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Screen flashing--any way of avoiding it?! | robinson | Marvin | 12 | 10-05-2015 05:27 AM |
Avoiding update to 5.6.1 on PW2 | nissaa7 | Kindle Developer's Corner | 3 | 11-23-2014 09:46 AM |
Touch Avoiding previews being loaded? | MacEachaidh | Kobo Reader | 8 | 04-14-2013 12:05 AM |
Avoiding the dreaded update | jenniren | Nook Developer's Corner | 3 | 01-26-2011 11:51 PM |
Avoiding auto updates? | Polvo | Kindle Developer's Corner | 5 | 10-20-2010 01:19 PM |