05-09-2016, 09:36 AM | #1 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
[Plugin] DOCXImport
DOCXImport: Import DOCX documents into Sigil as epubs.
(based on the Python Mammoth module) ** NOTE: this plugin periodically checks for updated versions by connecting to github (where the source is maintained). ** (this update check can be disabled via the GUI) Minimum Sigil requirement: v0.9.0 or higher Python Requirements: Python 3.4+ (Bundled or external) OS Requirements: Windows/Linux/OS X *** Linux users will have to make sure that the PyQt5 graphical python module (or PySide6 starting with Sigil 2.0) is present if it's not already. On Debian-based flavors this can be done with "sudo apt-get install python3-pyqt5" (or pip install PySide6). On Arch distributions it can be done with pacman -S python-pyqt5 and/or pacman -S pyside6*** *Note: Do not rename any Sigil plugin zip files before attempting to install them * Select a pre-existing DOCX file using the file dialog and it will be imported as a single-file epub. The following features are currently supported (provided by Mammoth):
An example of a style map (as well as a sample docx and css file file it will work with) are in the samples.zip attached to this post. More info on writing custom style maps can be found in the "Writing Style maps" section of Mammoth's README. DOCXImport's code is hosted/maintained on Github. The very latest version (and all previous versions) of DOCXImport can always be found on its Github Releases Page. Changes Spoiler:
Last edited by DiapDealer; 08-03-2023 at 02:29 PM. |
05-09-2016, 10:51 AM | #2 |
Sigil Developer
Posts: 8,158
Karma: 5450818
Join Date: Nov 2009
Device: many
|
Wow!
Nicely done! KevinH |
Advert | |
|
05-09-2016, 12:22 PM | #3 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Thanks! Mammoth is truly doing all the heavy-lifting at the moment, though. I just made the necessary modules portable and namespaced them (so they could never potentially conflict with the PyPI versions) and slapped a Sigil plugin wrapper around them.
I was pleasantly surprised at how well mammoth performed out of the box. Now I need to familiarize myself with it more so I can start tweaking. |
05-11-2016, 06:39 PM | #4 |
Resident Curmudgeon
Posts: 76,402
Karma: 136466962
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Which produces a better ePub, this plugin or Calibre?
|
05-11-2016, 07:06 PM | #5 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Can we please not turn this into a competition? For the moment, this Sigil plugin does little but create a barebones epub. There's no css being generated (currently) so that will have to be supplied by the user after-the-fact. It does make some pretty-clean html, though (provided the docx was styled relatively competently). But it should certainly be considered a work in progress right now.
Last edited by DiapDealer; 05-11-2016 at 11:30 PM. |
Advert | |
|
05-12-2016, 08:46 PM | #6 |
Guru
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
Well, FWIW it works on my Kubuntu 14.04, 32-bit system (sigil 0.9.4).
I didn't have a genuine .docx file handy, so I loaded an .odt into LibreOffice and saved it as .docx -- which leads to my question... The .odt document had been styled with several custom paragraph and character styles, but these were not preserved (i.e. not even the class names) in the epub. It did identify headers (all coded as h1) and all paragraphs as plain p, regardless of whatever style was used in the original document. Is this to be expected at this stage, or is it because the .docx via LibreOffice isn't quite legit? Anyway, quite an interesting plugin! Albert |
05-12-2016, 09:39 PM | #7 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Custom style mappings are an inherent feature of the underlying Mammoth Python Module. I just haven't knocked together a way for users to make/use/save their own custom style maps with the plugin yet. I hope to soon. It shouldn't really matter whether the docx was made with Word or LibreOffice in that regard.
|
05-13-2016, 03:07 PM | #8 | |
Resident Curmudgeon
Posts: 76,402
Karma: 136466962
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
05-13-2016, 06:06 PM | #9 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It's also sort of an impossible question to answer. What constitutes a "better epub" is entirely subjective: what one person loves--another hates. It was also an unnecessary question since you could easily try both and answer your own question.
|
05-15-2016, 02:49 PM | #10 | |
Guru
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
Quote:
As you are no doubt aware, the Writer2Latex add-on package for Libre/Open office contains, besides the add-ons, a stand-alone java utility that can be used to produce an epub from an open-document (.odt) file. Since, for my sins, I'm the guy that gets to clean up and normalize the word .doc files from the authors for conversion to epub or placement in InDesign, I use writer2latex a lot. Seems like Mammoth is a work-alike program. How nice it would be to directly import the .odt or .docx file into sigil without the manual conversion step! ETA: Of course the add-in will export an epub directly from LibreOffice, but the stand-alone is much more flexible (IMHO) and it's easy to modify the configuration as needed. Last edited by st_albert; 05-15-2016 at 03:01 PM. Reason: afterthought |
|
05-21-2016, 05:18 AM | #11 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
I did a test with two genuine - and plain - docx files. - structure (chapter titles h1) was kept. I only had to recreate a toc.ncx to get a brand new one. - paragraphs are all transformed to plain p - italics are kept - footnotes with returns link were all correctly kept. This plugin can already save a lot of time. Edit: I did a test with a loong book with h1 and h2 headings and there was no problem. * I also use writer2latex (however not the standalone Java tool but the writer2xhtml extension for LibreOffice). It's very precise and highly comendable. Last edited by roger64; 05-23-2016 at 03:51 AM. Reason: Edit: about headings |
|
05-21-2016, 06:26 AM | #12 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
There is a good change that if there is an used style that contains italic (or bold, etc) that when the paragraph is transformed to plain p, the italic will be gone.
Last edited by Toxaris; 05-21-2016 at 10:15 AM. |
05-21-2016, 07:13 AM | #13 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
You are right. The italics that were kept were plain words or expressions between em tags. Sorry for this.
|
05-21-2016, 10:22 AM | #14 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I know the problem/issue that causes this and it is difficult to avoid. The only way to 'solve' this, is to examine the word style and then apply the italics to the words/paragraphs that have that style directly before converting them to a standard paragraph. This is not that easy as it sounds though... It is also nothing that Diap can easily solve, this should be part of the mammoth library.
I have a lot of experience in these kind of issues due to my work on the add-in. That is part of the reason I ended up using another method of generating html. |
05-21-2016, 10:58 AM | #15 |
Grand Sorcerer
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
My plan is to leave it entirely up to the user through Mammoth's style mappings and the users's own css templates.
I'm not really envisioning this plugin being used by user A to convert user B, C, D, and E's docx files automagically. I envision it being used by a writer/user who's adapted a standard for styling all their docx documents. That way, they create a custom Mammoth style-map (or a few) and an associated stylesheet. Once they have that in place, they can focus on creating their Word/LibreOffice documents. The style-map will take care of mapping all standard and custom docx styles/headings to specific html/class-names (with associated css). In other words ... documents will be created that conform with a pre-existing style-map/css, rather than creating a style-map/css to accommodate each particular document (though the latter is still doable provided the user doesn't mind the extra work). Last edited by DiapDealer; 05-21-2016 at 11:01 AM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[FileType Plugin] YVES Bible Plugin | ClashTheBunny | Plugins | 27 | 01-16-2023 02:25 AM |
[Plugin] KindleImport Sigil plugin | DiapDealer | Plugins | 187 | 07-04-2022 11:11 AM |
Plugin not customizable: Plugin: HTML Output does not need customization | flyingfoxlee | Conversion | 2 | 02-24-2012 03:24 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |
New Plugin Type Idea: Library Plugin | cgranade | Plugins | 3 | 09-15-2010 01:11 PM |