Cleans and converts html docs derived from MS Word, ODT(Writer) or Google Doc to epub format.
Requirements
Plugin Type: Input
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows/OSX/Linux
Tested on Windows 7, 8 & 10 only
Current Version: "0.1.8"
Installation
* Select
Manage Plugins from the
Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally)
* Click
Add Plugin and select
HTML2Epub_vxxx.zip. This will load and install the plugin into Sigil, which you can then select and run using
Plugins > Input > HTML2Epub.
Description
This input plugin will import and convert various html doctypes to full epub format in Sigil. The main purpose of this plugin is to help users more easily and rapidly convert their html documents directly to standard epub format.
This plugin effectively converts and transforms your html doc(as you have styled it in html) into a reflowable epub without any frills.
Users should only use html docs derived from the following doctypes with this plugin:
Word doc, Word docx(both saved as Web Page HTML Filtered), ODF Writer(LO or OO only), Google Doc(saved as html, zipped). The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for Windows.(changed in v0.1.7)
Features
This plugin does the following tasks:
* Thoroughly cleans out the html file and ensures epub 2 html compliance as well.
* Creates a stylesheet that preserves all layout and styling after conversion to epub.
* Trims the stylesheet and removes any unneeded or redundant style properties.
* Transforms and ports all in-line styling to the stylesheet.
* Preserves all internal links, external links and valid bookmarks.
* Removes all unused bookmarks.
* Splits the html file into xhtml files at heading boundaries according to the heading style selected by the user(see
User Options).
* Adds an ebook cover image to the epub.
* Imports html ebook images with all height/width values as a % of current screen width.
* Adds the necessary basic metadata to the epub.
* Formats all epub text as default serif throughout.
* Converts all absolute values to relative "em" values in the css.
* Adds globals and presets to the css to help guard against common Look Inside issues for KDP uploads.
* Tables and embedded fonts are not supported.
Edit eBook Metadata(via dialog)
This dialog collects the basic ebook metadata that is required for an ebook.
User Options(via dialog)
This options dialog sets the main heading style used -- either h1 or h2 -- for all chapters or main headings in your ebook. The selected heading style will be used to split the html file into separate epub xhtml files and will be also be used to automatically create the epub TOC file page and NCX TOC.
Added an extra option to allow the user to automatically generate a single level TOC section in the epub(added in v0.1.8).
Plugin Run
For LO and OO html doctypes, ensure that both the html doc and all associated images are put into a separate dedicated folder. For all other html doctypes, just ensure that the html doc + images folder are both in the same directory on your computer. Then just run the plugin.
After running this plugin it would also be advisable to run
Tools > Delete Unused Stylesheet Classes or the
cssRemoveUnusedSelectors plugin to remove any empty or unused styles in the CSS.
Caveat
For best results you should ensure that you style all your headings, reading text and spacing using paragraph styles in your word-processor doc before conversion to html. Users should also minimally ensure that they at least use either h1 or h2 heading style for all chapter headings and main headings in their html docs.
There's no need to create a doc toc in your html doc because a single-level TOC page will automatically be created by the plugin.
Tables, endnotes and embedded fonts are not supported by this plugin.
Ensure that all images in your html doc have filenames that contain no spaces, otherwise the plugin will fail.
Try and avoid using fake smallcaps in your doc - using nested font styles may cause errors.
Don't put decorative images above your ebook title or chapter headings as his will cause errors. You can, instead, just add in your decorative images in Sigil using
Insert > File after you have converted your html doc to epub format.
This plugin converter does have it's limitations and isn't meant to compete with other well known converters like Calibre. But this plugin should still be quite useful for some because it's so easy to use and it should give you an epub that usually always passes Epubcheck with minimal issues. After conversion, users should have a good start point -- with a clean, basic epub where they can manually add in any final touches in Sigil before ebook upload.
Changes:
Spoiler:
v0.1.8
-- A new option has been added to the User Options dialog which allows the user to automatically add a new single-level TOC section to the epub. Any existing HTML TOC section will always be preserved in the epub.
-- Fixed a bug concerning the transformation of html internal links to epub format.
-- Fixed a naked span bug.
-- Improved dialog buttons for Linux users.
v0.1.7
-- Fixed a problem with ODT html images access.
-- The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for Windows.
-- Other minor fixes and stabilty improvements.
v0.1.6
-- Fixed a bug in html cleanup .
v0.1.5
-- Fixed a uuid problem affecting the uuids generated in the content.opf and toc.ncx metadata, which was causing uuid errors during Epubcheck.
v0.1.4
-- Fixed problems with image folder name generation for both Word and AbiWord html docs.
v0.1.3
-- Added a delay between dialog calls due to sensitivity issues causing the second dialog to inconveniently disappear.
v0.1.2
-- Fixed a minor problem with anchor tags that contain no attributes. These are now removed by the plugin.
v0.1.1
-- Changed text in the User Options dialog to be less confusing
v.0.1.0
-- Initial release