Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2016, 09:36 AM   #1
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
[Plugin] DOCXImport

DOCXImport: Import DOCX documents into Sigil as epubs.
(based on the Python Mammoth module)

** NOTE: this plugin periodically checks for updated versions by connecting to github (where the source is maintained). **
(this update check can be disabled via the GUI)

Minimum Sigil requirement: v0.9.0 or higher
Python Requirements: Python 3.4+ (Bundled or external)
OS Requirements: Windows/Linux/OS X
*** Linux users will have to make sure that the PyQt5 graphical python module (or PySide6 starting with Sigil 2.0) is present if it's not already. On Debian-based flavors this can be done with "sudo apt-get install python3-pyqt5" (or pip install PySide6). On Arch distributions it can be done with pacman -S python-pyqt5 and/or pacman -S pyside6***

*Note: Do not rename any Sigil plugin zip files before attempting to install them *

Select a pre-existing DOCX file using the file dialog and it will be imported as a single-file epub.

The following features are currently supported (provided by Mammoth):
  • Headings.
  • Lists.
  • Customisable mapping from your own docx styles to HTML. For instance, you could convert WarningHeading to h1.warning by providing an appropriate style mapping.
  • Tables. The formatting of the table itself, such as borders, is currently ignored, but the formatting of the text is treated the same as in the rest of the document.
  • Footnotes and endnotes.
  • Images **NOTE: WMF/EMF images are unsupported and will be ignored.**
  • Bold, italics, underlines, strikethrough, superscript and subscript.
  • Links.
  • Line breaks.
  • Text boxes. The contents of the text box are treated as a separate paragraph that appears after the paragraph containing the text box.

An example of a style map (as well as a sample docx and css file file it will work with) are in the samples.zip attached to this post. More info on writing custom style maps can be found in the "Writing Style maps" section of Mammoth's README.

DOCXImport's code is hosted/maintained on Github.

The very latest version (and all previous versions) of DOCXImport can always be found on its Github Releases Page.

Changes
Spoiler:

v0.1.0
- Initial release
v0.2.0
- added gui
- added ability to employ custom style maps and custom css files
- dropped Python 2.7.x support
v0.2.1
- fixed some widget clipping situations
- changed icon
v0.2.2
- corrected some non-compliant opf issues when importing as EPUB3
v0.2.3
- use PyQt5 GUI if Sigil is new enough.
- integrate upstream changes to mammoth module
v0.2.4
- Update mammoth/cobble modules to latest upstream
- Tweak mammoth to create a "title" attribute for images if the title property is defined in the docx
- Remove extraneous parsimonious module
v0.2.5
- Add empty alt attribute to images when no alt_text/descr is provided in the DOCX
v0.2.6
- Fix empty paragraph regex bug (thanks @BeckyEbook)
v0.2.7
- Update upstream Mammoth library
- Add support to match Sigil's light/dark theme in Sigil 1.1
- Re-enable translation support (translator's wanted)
v0.2.8
- Import to a flat archive structure; Sigil 1.0+ users can restructure as they wish
v0.3.0
- Dropped Support for tkinter
- Fix to work with Qt6.5.2 and Python 3.11.3 for Sigil 2.0
Attached Files
File Type: zip samples.zip (7.4 KB, 5527 views)
File Type: zip DOCXImport_v0.3.0.zip (78.8 KB, 2918 views)

Last edited by DiapDealer; 08-03-2023 at 02:29 PM.
DiapDealer is offline   Reply With Quote
Old 05-09-2016, 10:51 AM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,158
Karma: 5450818
Join Date: Nov 2009
Device: many
Wow!

Nicely done!

KevinH
KevinH is offline   Reply With Quote
Advert
Old 05-09-2016, 12:22 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Thanks! Mammoth is truly doing all the heavy-lifting at the moment, though. I just made the necessary modules portable and namespaced them (so they could never potentially conflict with the PyPI versions) and slapped a Sigil plugin wrapper around them.

I was pleasantly surprised at how well mammoth performed out of the box. Now I need to familiarize myself with it more so I can start tweaking.
DiapDealer is offline   Reply With Quote
Old 05-11-2016, 06:39 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,402
Karma: 136466962
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Which produces a better ePub, this plugin or Calibre?
JSWolf is offline   Reply With Quote
Old 05-11-2016, 07:06 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by JSWolf View Post
Which produces a better ePub, this plugin or Calibre?
Can we please not turn this into a competition? For the moment, this Sigil plugin does little but create a barebones epub. There's no css being generated (currently) so that will have to be supplied by the user after-the-fact. It does make some pretty-clean html, though (provided the docx was styled relatively competently). But it should certainly be considered a work in progress right now.

Last edited by DiapDealer; 05-11-2016 at 11:30 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 05-12-2016, 08:46 PM   #6
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
Well, FWIW it works on my Kubuntu 14.04, 32-bit system (sigil 0.9.4).

I didn't have a genuine .docx file handy, so I loaded an .odt into LibreOffice and saved it as .docx -- which leads to my question...

The .odt document had been styled with several custom paragraph and character styles, but these were not preserved (i.e. not even the class names) in the epub. It did identify headers (all coded as h1) and all paragraphs as plain p, regardless of whatever style was used in the original document.

Is this to be expected at this stage, or is it because the .docx via LibreOffice isn't quite legit?

Anyway, quite an interesting plugin!

Albert
st_albert is offline   Reply With Quote
Old 05-12-2016, 09:39 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by st_albert View Post
Is this to be expected at this stage, or is it because the .docx via LibreOffice isn't quite legit?
Custom style mappings are an inherent feature of the underlying Mammoth Python Module. I just haven't knocked together a way for users to make/use/save their own custom style maps with the plugin yet. I hope to soon. It shouldn't really matter whether the docx was made with Word or LibreOffice in that regard.
DiapDealer is offline   Reply With Quote
Old 05-13-2016, 03:07 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 76,402
Karma: 136466962
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
Can we please not turn this into a competition? For the moment, this Sigil plugin does little but create a barebones epub. There's no css being generated (currently) so that will have to be supplied by the user after-the-fact. It does make some pretty-clean html, though (provided the docx was styled relatively competently). But it should certainly be considered a work in progress right now.
It's not a competition. It was a valid question to know whether to use the plugin or use Calibre.
JSWolf is offline   Reply With Quote
Old 05-13-2016, 06:06 PM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by JSWolf View Post
It's not a competition. It was a valid question to know whether to use the plugin or use Calibre.
It's also sort of an impossible question to answer. What constitutes a "better epub" is entirely subjective: what one person loves--another hates. It was also an unnecessary question since you could easily try both and answer your own question.
DiapDealer is offline   Reply With Quote
Old 05-15-2016, 02:49 PM   #10
st_albert
Guru
st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'st_albert gives new meaning to the word 'superlative.'
 
Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
Quote:
Originally Posted by DiapDealer View Post
Custom style mappings are an inherent feature of the underlying Mammoth Python Module. I just haven't knocked together a way for users to make/use/save their own custom style maps with the plugin yet. I hope to soon. It shouldn't really matter whether the docx was made with Word or LibreOffice in that regard.
Yes, I see that now that I've read up a little on Mammoth. Perhaps all it would take would be a preference item that passes a pointer to the style_map file; the default of which could contain a simple demonstration of the syntax for style mapping. Looks pretty flexible, btw.

As you are no doubt aware, the Writer2Latex add-on package for Libre/Open office contains, besides the add-ons, a stand-alone java utility that can be used to produce an epub from an open-document (.odt) file. Since, for my sins, I'm the guy that gets to clean up and normalize the word .doc files from the authors for conversion to epub or placement in InDesign, I use writer2latex a lot.

Seems like Mammoth is a work-alike program. How nice it would be to directly import the .odt or .docx file into sigil without the manual conversion step!

ETA: Of course the add-in will export an epub directly from LibreOffice, but the stand-alone is much more flexible (IMHO) and it's easy to modify the configuration as needed.

Last edited by st_albert; 05-15-2016 at 03:01 PM. Reason: afterthought
st_albert is offline   Reply With Quote
Old 05-21-2016, 05:18 AM   #11
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by st_albert View Post
.../... It did identify headers (all coded as h1) and all paragraphs as plain p, regardless of whatever style was used in the original document..../...
Anyway, quite an interesting plugin!

Albert
This is to confirm these findings.
I did a test with two genuine - and plain - docx files.

- structure (chapter titles h1) was kept. I only had to recreate a toc.ncx to get a brand new one.
- paragraphs are all transformed to plain p
- italics are kept
- footnotes with returns link were all correctly kept.

This plugin can already save a lot of time.

Edit: I did a test with a loong book with h1 and h2 headings and there was no problem.

* I also use writer2latex (however not the standalone Java tool but the writer2xhtml extension for LibreOffice). It's very precise and highly comendable.

Last edited by roger64; 05-23-2016 at 03:51 AM. Reason: Edit: about headings
roger64 is offline   Reply With Quote
Old 05-21-2016, 06:26 AM   #12
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
There is a good change that if there is an used style that contains italic (or bold, etc) that when the paragraph is transformed to plain p, the italic will be gone.

Last edited by Toxaris; 05-21-2016 at 10:15 AM.
Toxaris is offline   Reply With Quote
Old 05-21-2016, 07:13 AM   #13
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
You are right. The italics that were kept were plain words or expressions between em tags. Sorry for this.
roger64 is offline   Reply With Quote
Old 05-21-2016, 10:22 AM   #14
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I know the problem/issue that causes this and it is difficult to avoid. The only way to 'solve' this, is to examine the word style and then apply the italics to the words/paragraphs that have that style directly before converting them to a standard paragraph. This is not that easy as it sounds though... It is also nothing that Diap can easily solve, this should be part of the mammoth library.

I have a lot of experience in these kind of issues due to my work on the add-in. That is part of the reason I ended up using another method of generating html.
Toxaris is offline   Reply With Quote
Old 05-21-2016, 10:58 AM   #15
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,040
Karma: 199464182
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
My plan is to leave it entirely up to the user through Mammoth's style mappings and the users's own css templates.

I'm not really envisioning this plugin being used by user A to convert user B, C, D, and E's docx files automagically. I envision it being used by a writer/user who's adapted a standard for styling all their docx documents. That way, they create a custom Mammoth style-map (or a few) and an associated stylesheet. Once they have that in place, they can focus on creating their Word/LibreOffice documents. The style-map will take care of mapping all standard and custom docx styles/headings to specific html/class-names (with associated css).

In other words ... documents will be created that conform with a pre-existing style-map/css, rather than creating a style-map/css to accommodate each particular document (though the latter is still doable provided the user doesn't mind the extra work).

Last edited by DiapDealer; 05-21-2016 at 11:01 AM.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[FileType Plugin] YVES Bible Plugin ClashTheBunny Plugins 27 01-16-2023 02:25 AM
[Plugin] KindleImport Sigil plugin DiapDealer Plugins 187 07-04-2022 11:11 AM
Plugin not customizable: Plugin: HTML Output does not need customization flyingfoxlee Conversion 2 02-24-2012 03:24 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM
New Plugin Type Idea: Library Plugin cgranade Plugins 3 09-15-2010 01:11 PM


All times are GMT -4. The time now is 09:55 PM.


MobileRead.com is a privately owned, operated and funded community.