Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-07-2021, 02:30 AM   #1
quinta@ebf.cz
Connoisseur
quinta@ebf.cz began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
soft hyphens in docx conversion output

Soft hyphens marks (characters U+00AD, or entitities #173 or shy), originally existing in html, are exported to docx (again) as shy characters (code 00AD).

Which is not quite desired behaviour, cause MS Word implements optional word breaks differently, and characters 00AD itself are simply displayed (visually simillary as standard hyphens).

Exported docx document containing shy characters can be repaired by searching shy characters (using symbol ^0173), and replacing them: either by Word "optional word break" (^-), or (mostly in my case) just deleting them by replacing by nothing...

Anyway: Is such export behaviour intentional? Or - mayby - is for some reason inevitable? Is there any way how to achieve replacing shy characters to MS Word "optional word break" as part of conversion?

Last edited by quinta@ebf.cz; 09-07-2021 at 02:40 AM.
quinta@ebf.cz is offline   Reply With Quote
Old 09-07-2021, 05:13 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,920
Karma: 22669818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well just use the search and replace feature in the conversion dialog to replace the soft hyphen character with whatever you like, (IIRC the zero width non-joiner is what word uses for optional spaces).

Last edited by kovidgoyal; 09-07-2021 at 05:16 AM.
kovidgoyal is offline   Reply With Quote
Advert
Old 09-07-2021, 05:52 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,920
Karma: 22669818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
And note that in DOCX softhyphens are represented as a special tag not as a unicode character. From the next release calibre will convert soft hyphens to that tag automatically. https://github.com/kovidgoyal/calibr...c9948658be0db8
kovidgoyal is offline   Reply With Quote
Old 09-07-2021, 10:04 AM   #4
quinta@ebf.cz
Connoisseur
quinta@ebf.cz began at the beginning.
 
Posts: 59
Karma: 10
Join Date: Mar 2019
Device: Kindle 3 Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
And note that in DOCX softhyphens are represented as a special tag not as a unicode character.
Yes, docx softhyphens (optional word brakes), seem to look more like objects then characters. They do not react to unicode-revealing shortcut Alt+X, their "ascii value" is 31 (?), their XML representation is element <w:softHyphen/> (not character)...

My attempts to create optional word brakes using suggested Calibre "search and replace" export feature was (yet) not succesfull. Well, all I tried was replacing using expression \u200C (which is unicode value of suggested "zero width non-joiner"), and using expression \u001F (hexa value of 31)... Excuse my naive approach. : )

Possible good reason for converting soft hyphens to OWB as default Calibre export behaviour: MS Word itself is behaving that way. OWB are converted to SHY when exported to HTML, and vice versa (just tested in Word 2010).

Quote:
Originally Posted by kovidgoyal View Post
From the next release calibre will convert soft hyphens to that tag automatically. https://github.com/kovidgoyal/calibr...c9948658be0db8
Ooh, I think thats better message than expected. Thank you.
quinta@ebf.cz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre Conversion: Inconsistent Font Output When Converting From Epub to Docx IndiePublisher Conversion 2 06-16-2020 02:17 AM
How to preserve soft hyphens in MOBI output bronger Conversion 2 08-27-2019 01:36 AM
Soft hyphens lost on conversion to EPUB David Booth Conversion 4 06-23-2017 06:33 AM
Soft hyphens on Windows Styx Calibre 4 02-13-2015 04:26 AM
Soft Hyphens wallcraft Workshop 29 06-12-2012 04:21 AM


All times are GMT -4. The time now is 04:39 AM.


MobileRead.com is a privately owned, operated and funded community.