![]() |
#1 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jul 2019
Device: iPad with Kindle app
|
Questions about 9.16
Hi there!
I've been using Sigil (most recently 9.14) for a while to convert Word doc manuscript novels into epubs. I love the program and how easy it is (was). Copy and paste the Word doc into the Book Viewer as "plain" text, find and replace all the extraneous <p><br/><p> and such, and it was really good at getting rid of the other crap that Word tries to insert. Then go through, split the chapters, then go back through and add back in all the italics and bolding and what not. VERY easy to do being able to quickly switch back and forth between code and book view. It's a workflow that (other than losing all the formatting italics and such, which is a pain in the ass to go back and mark) is fairly quick and easy. I upgraded to 9.16 and everything just fell apart. There's no way now to dump a generally clean document into Sigil. And having to switch between THREE screens (the Word doc to find the formatting, the PageEdit app and Sigil itself) has completely broken any sort of flow I might have had. So, my question is... how do you now import a Word document without all the extra nonsense (I tried "save as a filtered HTML" from the Sigil user guide and it would take me days to remove all the extra stuff it had in there from Word when I loaded it into Sigil). Is there a way to import the document WITH the formatting intact (I went to the "MS Word Macro", suggested in the User Guide, but that seems to be for Windows only, I'm on a Mac, please correct me if I'm wrong) because that would be so super helpful to not have to go back through the entire document again inputting all the italics and whatnot. And if that were the case, then I could probably figure out a better way to change over to 9.16 and PageEdit and a different workflow without needing the Word Doc opened at the same time (the Preview window would probably be enough for me at that point). For now, I'm going to reinstall 9.14 so I can at least finish this current project and then reassess. Any advice or suggestions would be GREATLY GREATLY appreciated! Thank you so very much! |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,450
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Two ways ...
1. There are many Sigil plugins to speed importing and cleaning of Word Docs. See the Sigil plugin index on this site. See this plugin for one: https://www.mobileread.com/forums/sh...d.php?t=274536 And here is another one: https://www.mobileread.com/forums/sh...d.php?t=273966 Plus there is an ODT html import plugin, and a CustomCleaner plugin. The plugin index is here: https://www.mobileread.com/forums/sh...d.php?t=247431 2. Do what you used to do in BookView but do it with PageEdit. You can fire up Sigil and get an empty epub. Double click to get Section0001.xhtml in the Code View window and hit the button to launch it in PageEdit and paste into PageEdit just as you did in BookView. Make any simple editing changes there and save the file and it will appear in the CodeView Window which you can use to edit the xhtml or proof in it Preview. This is why we created PageEdit and created a one push icon to launch it from within Sigil. Last edited by KevinH; 07-25-2019 at 08:59 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,450
Karma: 5703586
Join Date: Nov 2009
Device: many
|
And of course 3. Simply continue to use Sigil-0.9.14. It was built with quite up to date Qt and Webkit, and should continue to run well for years.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jul 2019
Device: iPad with Kindle app
|
Okay, I did get the macro to work after some research (never used macros on Word before, so had to figure out how to get it in there and then to run it). Seems to take a really long time. But does preserve the formatting. But then has a LOT of errors that even I as a Debug Queen can't seem to find and fix. "Fix Automatically" makes me paranoid I'm losing something that I will not be able to catch in a 400 page book. Probably just paranoia but still.
Any other "clean HTML" (like what Sigil 9.14 does to an imported Word doc) suggestions, that keeps the italics, bolds, etc, intact? Thank you! |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,450
Karma: 5703586
Join Date: Nov 2009
Device: many
|
What macro? What are you trying to do? Have you tried setting PageEdit as your preferred external xhtml edit in Sigil's Preferences? Then by simply hitting the pencil icon in Sigil , it will launch the current tab in PageEdit. Pasting in PageEdit is exactly the same as pasting text into BookView.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Why is pasting a Word Document into PageEdit so much more traumatic than pasting a Word document into Book View again? I'm lost.
|
![]() |
![]() |
![]() |
#7 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Along with making your life so much easier, it will also help any of these methods with outputting cleaner HTML from your Word documents. Method #1 (Highly Recommended) Best way from Microsoft Word->EPUB is Toxaris's EPUB Tools: https://www.mobileread.com/forums/sh...d.php?t=213372 Whenever I work from DOCX, this is the method that I use. Toxaris's conversion mentality is to strip out as much garbage as possible, and give you super clean, minimal HTML. Note: Sadly, this is a Windows-only addon. Word on Mac =/= Word on Windows. Method #2 There are also a few Sigil plugins that help with DOCX import + cleanup: DOCXImport CustomCleanerPlus Method #3 You can also feed that DOCX->EPUB using Calibre. If you don't use Styles, you'll get a mess of calibre## styles. And if you did use Styles (which you should), you get pretty clean code out of it. Calibre's conversion mentality is more: Garbage In, Garbage Out. This is definitely a few steps above Word Filtered HTML though. Method #4 (Not really recommended) If you want another more minimal conversion, save your DOCX as RTF, and convert RTF->EPUB via Calibre. That would carry over the bold/italics, while throwing out a lot of the other extraneous formatting. This is a method I used to use, but you'll still have to go adding in headings and other more complicated formatting. It'll probably be a million times better than your current method though, where you're manually readding the bold/italics. But this would be better handled by using Word Styles in the first place, and cleaning up your source documents. Quote:
You keep saying "the macro". What macro? Last edited by Tex2002ans; 07-26-2019 at 04:40 AM. |
||
![]() |
![]() |
![]() |
#8 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,612
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
↑ ↑ ↑ ✔️ ↑ ↑ ↑ ✔️ ↑ ↑ ↑ ✔️ ↑ ↑ ↑ ✔️
I assume the OP means the Macro referenced in the Sigil Manual Which is this one ==>> Word macro for clean HTML code - MobileRead Forums BR Last edited by BetterRed; 07-25-2019 at 10:51 PM. |
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,450
Karma: 5703586
Join Date: Nov 2009
Device: many
|
I did not know this macro even existed. Have you ever used it?
|
![]() |
![]() |
![]() |
#10 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,612
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
No, I use Toxaris' Word Add-in - which only works on Windows. The OP uses a Mac.
|
![]() |
![]() |
![]() |
#11 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
That macro was way back in 2011-2013, and there are better ways of dealing with conversion from Word documents now. Side Note: And yet another DOCX->EPUB method, LibreOffice's Export As EPUB (although last I checked, the HTML output is still horrifying). Last edited by Tex2002ans; 07-26-2019 at 06:08 AM. |
|
![]() |
![]() |
![]() |
#12 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jul 2019
Device: iPad with Kindle app
|
Book View vs PageEdit
Okay, wrote a very long reply and the system logged me out while writing it.
The gist being that pasting into Book View and pasting into PageEdit is NOT the same. Book View copy/paste with CMD-V, as "plain text": Code:
<head> <title></title> </head> <body> <p><br/></p> <p>Two Little Problems</p> <p><br/></p> <p>A J.B. DANNAN NOVEL</p> <p><br/></p> <p>BOOK SEVEN</p> <p><br/></p> <p><br/></p> <p><br/></p> <p>STEVE COOPER</p> <p><br/></p> <p><br/></p> <p>“This is a work of fiction. Names, characters, places, incidents, and dialogue are products of the author’s imagination or are used fictitiously and are not to be construed as real. Any resemblance to actual events, locales, organizations, or persons, living or dead, is entirely coincidental.”</p> <p><br/></p> <p><br/></p> <p>Copyright © 2018 Steve Cooper</p> PageEdit (which is linked to the icon in Sigil) with CMD-V, does not give the option to choose "plain text": Code:
<head> <title></title> </head> <body> <p><!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves>false</w:TrackMoves> <w:TrackFormatting/> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>EN-US</w:LidThemeOther> <w:LidThemeAsian>JA</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> If I use the Paste/Clipboard icon in PageEdit, I do get the paste as plain text and then it looks like this: <head> <title></title> </head> <body> <p>Two Little Problems A J.B. DANNAN NOVEL BOOK SEVEN STEVE COOPER “This is a work of fiction. Names, characters, places, incidents, and dialogue are products of the author’s imagination or are used fictitiously and are not to be construed as real. Any resemblance to actual events, locales, organizations, or persons, living or dead, is entirely coincidental.” Copyright © 2018 Steve Cooper All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopy, re-coding, or any information storage and retrieval system, or any method not yet devised, without permission in writing from the author of this publication..... And 363 pages are crammed into one giant block closed out with a paragraph tag (and a styles list code at the end). As to Styles in Word, the author writes on Pages, sends to me, I put it into Word and format for the print version, then after proof, I dump into Sigil. He doesn't know Styles (because he uses Pages) and I hate Word, so I don't know Styles either. I'll have to look into it, after this book, if it would help. I will try some of the suggested plugins. Because even if I were to stick with 9.14, I still need a good way to not have to re-add the formatting because that sucks a lot. Thanks to everyone for your suggestions and responses! Last edited by theducks; 07-26-2019 at 05:37 PM. Reason: disabled smilies in text |
![]() |
![]() |
![]() |
#13 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Use the Paste button or the Paste menu item so you get the choice of pasting plain text vs rich text. Or, you know, fight it forever because it's slightly different. Your choice.
Quote:
Why would anyone paste 363 pages into Sigil (Book View or PageEdit) in one go!? That's crazy. Use one of the other import methods mentioned to get clean html from Word into Sigil. Or just go back to Sigil 0.9.14, Book View, and what obviously works for you. Sounds like you'd be much happier with it anyway. Last edited by DiapDealer; 07-26-2019 at 05:16 PM. |
|
![]() |
![]() |
![]() |
#14 | ||||
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jul 2019
Device: iPad with Kindle app
|
Quote:
Quote:
Quote:
I did install the DOCXimport plugin and YES! it keeps the formatting intact and is almost as clean as the BookView clean up (maybe BookView could live on as a clean up plugin because thus far, it's been the best at making very workable HTML from Word, again minus the lack of keeping formatting intact). Quote:
Thanks again for all the suggestions, the new plugin seems to work very well and the new project is proceeding much faster due to not having to hunt down all the italics! That alone is worth it. |
||||
![]() |
![]() |
![]() |
#15 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Note that with the DOCXImport plugin, unless you create a conversion map of the styles used in the Word document, the only formatting that will be preserved is the bold and italiciized text that was created by using the bold or italic buttons in Word. If they are bolded or italicized as part of an applied Word style, they will not be bold or italicized after importing.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A Few Questions | KayLee | Calibre | 32 | 04-21-2016 11:37 AM |
Various questions | AlexBell | Upload Help | 3 | 06-13-2013 03:16 AM |
Two Questions | nynaevelan | Calibre | 19 | 10-30-2010 06:39 PM |
K3 Here, Have any questions? | Anarel | Amazon Kindle | 15 | 08-26-2010 08:34 PM |
a few questions | Thetaeta | Which one should I buy? | 4 | 07-31-2008 11:15 PM |