01-09-2014, 02:02 AM | #16 |
null operator (he/him)
Posts: 20,572
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Hmmmm - very bizarre see attachment - I'll need to have a sleep on it, maybe someone will solve it while I'm doing that
I don't think its an Editor issue, I'm pretty sure its a Conversion issue BR |
01-09-2014, 04:34 AM | #17 |
Enthusiast
Posts: 42
Karma: 10
Join Date: Dec 2010
Location: Arizona USA
Device: iPod Touch 6G
|
And here is what I got for that specific phrase.
|
01-09-2014, 01:39 PM | #18 |
null operator (he/him)
Posts: 20,572
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I think the issue of what you get in the ePUB XHTML depends on 'what you do' in Word with cut & paste and editing - in that shot I posted I changed the ". Not" to "… not" in Word and did a conversion.
The fragmentation of the not that you see in the ePUB XHTML reflects what's in the Word DOCX XML ePUB XHTML Code:
<i class="calibre1">n</i><span class="text1">ot</span> Code:
<w:r w:rsidR="00AB4F90" w:rsidRPr="00160E46"> <w:t>n</w:t> </w:r> <w:r w:rsidRPr="00160E46"> <w:t>ot.</w:t> </w:r> One way of ensuring better consistency might be to paste plain ASCII text into the DOCX - you can achieve this via the Word Options->Advanced->Cut, copy and Paste settings. You'd then have to do all the font styling manually. If the examples you posted originate from LIT it might be interesting to see the XHTML that a LIT to EPUB conversion creates. BR Last edited by BetterRed; 01-09-2014 at 04:08 PM. Reason: did a Tidy on the XML fragment |
01-09-2014, 02:06 PM | #19 |
Grand Sorcerer
Posts: 12,168
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
You can also "strip" formatting via selecting the text in question and doing both a ctrl-q and ctrl-space keyboard commands (at least on Windows).
|
01-09-2014, 02:44 PM | #20 |
Enthusiast
Posts: 42
Karma: 10
Join Date: Dec 2010
Location: Arizona USA
Device: iPod Touch 6G
|
Easiest alternative
We'll, it might be easiest to change my Word defaults to strip out the formatting as I paste text in. Since I verify all italics as I go already, I might as well redo them, and see how that affects the conversion results.
The ctrl-q and ctrl-space idea, I hesitate at, since I don't know what those functions do. But I'll take a look and see. As far as the idea of a direct LIT conversion, I don't know for sure what type of file these docx files came from. I had already done a cut/paste from the original to make the docx file I converted. And I get my eBooks in all types of files, including PDFs, which I always remake to docx before I try them in Calibre. I think as a test case, I will do a couple chapters of a fresh book (to see which way it goes from the formatting as is), then redo the same text, with Word stripping the formatting and my redoing it myself. I'll let you all know my results. Oh, and another reason I want this to be consistent. I have ~absolutely no~ idea what the code line you got in the doc XML means. And I don't know anything about regex either. Besides, simple is better and removing the problem will make redoing things faster. |
01-09-2014, 04:11 PM | #21 | ||
null operator (he/him)
Posts: 20,572
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Quote:
I updated the XML fragment I posted earlier - after a Tidy The 'n' and 'ot' are at the beginning and end of the both the XHTML and the XML. I don't expect anyone to actually comprehend the DOCX XML - except maybe Kovid BR |
||
01-14-2014, 04:46 PM | #22 |
Watching the Sky
Posts: 234
Karma: 634112
Join Date: Sep 2012
Location: Winnipeg, MB
Device: Kobo Aura
|
I only briefly scanned this and thought it worth mentioning: using an <i> tag is no longer a valid way to present italics in an HTML file. Italics now use <em> and any converter that is converting an <i> tag to a span is doing what it should (since it's inline within a paragraph). Try using <em> instead and see if the problem persists
|
01-15-2014, 02:26 AM | #23 | |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
If you really wanted to do this right, identify the semantic use of the <i> tag and create a class in the stylesheet for that use and determine its style. It might be emphasis, it might be thoughts, it might be a letter, etc. If you really want to make semantic use, follow it through. The <i> tag will be supported for a long time, even if it is deprecated. I would not even be surprised if it is restored. WordML/OpenXML is not that difficult to understand. It is just very big with a lot of options and functions. It is all documented quite well. The reason for these splitups can be various. Most likely the word was edited with a slightly different style. That would cause this behavior. It would be correct, only not very useful for further processing. That is the main reason I use a different way of creating the HTML export from Word... |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sorting tag values in tag browser | mcam | Library Management | 15 | 08-25-2013 05:50 AM |
Send tag to device only if tag has more than 1 book? | eosrose | Calibre | 0 | 01-29-2013 07:46 PM |
svg image inside span tag in mobi file not display numbers | Hala Aly | Workshop | 3 | 09-12-2012 08:00 AM |
Adding an Owner tag to tag list? | Fangles | Library Management | 1 | 02-25-2011 02:32 AM |
'Keep' tag? | AnemicOak | Amazon Kindle | 13 | 03-17-2009 04:19 PM |