Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 03-23-2018, 10:23 PM   #1
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
[Plugin] HTML2Epub - converts various html doctypes to full epub format

Cleans and converts html docs derived from MS Word, ODT(Writer) or Google Doc to epub format.


Requirements
Plugin Type: Input
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows/OSX/Linux
Tested on Windows 7, 8 & 10 only
Current Version: "0.1.8"

Installation
* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally)
* Click Add Plugin and select HTML2Epub_vxxx.zip. This will load and install the plugin into Sigil, which you can then select and run using Plugins > Input > HTML2Epub.

Description
This input plugin will import and convert various html doctypes to full epub format in Sigil. The main purpose of this plugin is to help users more easily and rapidly convert their html documents directly to standard epub format.

This plugin effectively converts and transforms your html doc(as you have styled it in html) into a reflowable epub without any frills.

Users should only use html docs derived from the following doctypes with this plugin: Word doc, Word docx(both saved as Web Page HTML Filtered), ODF Writer(LO or OO only), Google Doc(saved as html, zipped). The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for Windows.(changed in v0.1.7)

Features
This plugin does the following tasks:

* Thoroughly cleans out the html file and ensures epub 2 html compliance as well.
* Creates a stylesheet that preserves all layout and styling after conversion to epub.
* Trims the stylesheet and removes any unneeded or redundant style properties.
* Transforms and ports all in-line styling to the stylesheet.
* Preserves all internal links, external links and valid bookmarks.
* Removes all unused bookmarks.
* Splits the html file into xhtml files at heading boundaries according to the heading style selected by the user(see User Options).
* Adds an ebook cover image to the epub.
* Imports html ebook images with all height/width values as a % of current screen width.
* Adds the necessary basic metadata to the epub.
* Formats all epub text as default serif throughout.
* Converts all absolute values to relative "em" values in the css.
* Adds globals and presets to the css to help guard against common Look Inside issues for KDP uploads.
* Tables and embedded fonts are not supported.

Edit eBook Metadata(via dialog)
This dialog collects the basic ebook metadata that is required for an ebook.

User Options(via dialog)
This options dialog sets the main heading style used -- either h1 or h2 -- for all chapters or main headings in your ebook. The selected heading style will be used to split the html file into separate epub xhtml files and will be also be used to automatically create the epub TOC file page and NCX TOC.

Added an extra option to allow the user to automatically generate a single level TOC section in the epub(added in v0.1.8).

Plugin Run
For LO and OO html doctypes, ensure that both the html doc and all associated images are put into a separate dedicated folder. For all other html doctypes, just ensure that the html doc + images folder are both in the same directory on your computer. Then just run the plugin.

After running this plugin it would also be advisable to run Tools > Delete Unused Stylesheet Classes or the cssRemoveUnusedSelectors plugin to remove any empty or unused styles in the CSS.

Caveat
For best results you should ensure that you style all your headings, reading text and spacing using paragraph styles in your word-processor doc before conversion to html. Users should also minimally ensure that they at least use either h1 or h2 heading style for all chapter headings and main headings in their html docs.

There's no need to create a doc toc in your html doc because a single-level TOC page will automatically be created by the plugin.

Tables, endnotes and embedded fonts are not supported by this plugin.

Ensure that all images in your html doc have filenames that contain no spaces, otherwise the plugin will fail.

Try and avoid using fake smallcaps in your doc - using nested font styles may cause errors.

Don't put decorative images above your ebook title or chapter headings as his will cause errors. You can, instead, just add in your decorative images in Sigil using Insert > File after you have converted your html doc to epub format.

This plugin converter does have it's limitations and isn't meant to compete with other well known converters like Calibre. But this plugin should still be quite useful for some because it's so easy to use and it should give you an epub that usually always passes Epubcheck with minimal issues. After conversion, users should have a good start point -- with a clean, basic epub where they can manually add in any final touches in Sigil before ebook upload.

Changes:

Spoiler:

v0.1.8
-- A new option has been added to the User Options dialog which allows the user to automatically add a new single-level TOC section to the epub. Any existing HTML TOC section will always be preserved in the epub.
-- Fixed a bug concerning the transformation of html internal links to epub format.
-- Fixed a naked span bug.
-- Improved dialog buttons for Linux users.
v0.1.7
-- Fixed a problem with ODT html images access.
-- The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for Windows.
-- Other minor fixes and stabilty improvements.
v0.1.6
-- Fixed a bug in html cleanup .
v0.1.5
-- Fixed a uuid problem affecting the uuids generated in the content.opf and toc.ncx metadata, which was causing uuid errors during Epubcheck.
v0.1.4
-- Fixed problems with image folder name generation for both Word and AbiWord html docs.
v0.1.3
-- Added a delay between dialog calls due to sensitivity issues causing the second dialog to inconveniently disappear.
v0.1.2
-- Fixed a minor problem with anchor tags that contain no attributes. These are now removed by the plugin.
v0.1.1
-- Changed text in the User Options dialog to be less confusing
v.0.1.0
-- Initial release
Attached Thumbnails
Click image for larger version

Name:	Metadata_Dialog.JPG
Views:	1482
Size:	32.2 KB
ID:	163007   Click image for larger version

Name:	User_Options_dialog.JPG
Views:	516
Size:	29.8 KB
ID:	183481  
Attached Files
File Type: zip HTML2Epub_v018.zip (1.12 MB, 1993 views)

Last edited by slowsmile; 01-17-2021 at 02:15 AM.
slowsmile is offline   Reply With Quote
Old 03-24-2018, 12:04 AM   #2
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Quick Update: I've changed the wording in the User Options dialog. I've made it less techie and more plain english, so that there's no confusion or difficulty in understanding what's needed.(v0.1.1)

Last edited by slowsmile; 03-24-2018 at 03:03 AM.
slowsmile is offline   Reply With Quote
Old 03-24-2018, 05:10 AM   #3
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Update: Fixed a minor problem caused by anchor tags that contain no attributes. These are now automatically removed by the plugin.(v0.1.2)

Last edited by slowsmile; 03-24-2018 at 08:36 AM.
slowsmile is offline   Reply With Quote
Old 03-24-2018, 02:22 PM   #4
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 647
Karma: 4566069
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Would this plugin be a good choice for html->epub with the html coming from kindleunpacking a mobi/kf7? (If not, what would be the best option?)
patrik is offline   Reply With Quote
Old 03-24-2018, 04:39 PM   #5
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@patrik..I'm afraid that this plugin would probably give errors from html via KindleUnpack because my plugin always converts html --> epub. Here are some of the other reasons why it would fail:

* The plugin looks for the html doctype in the metadata part of the html(in the meta tags) to identify valid doctypes that can be used with this plugin.

* The html styling layout -- in between the html <style></style> tags -- will be unique depending on what particular html doctype you are using.

I'm not really sure what you are trying to do because whenever you use DiapDealer's KindleImport plugin -- which also uses KindleUnpack -- then this will always give you an epub from a mobi. So why the need to convert from html --> epub?

Last edited by slowsmile; 03-24-2018 at 04:49 PM.
slowsmile is offline   Reply With Quote
Old 03-25-2018, 04:18 AM   #6
patrik
Guru
patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.patrik ought to be getting tired of karma fortunes by now.
 
Posts: 647
Karma: 4566069
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
Ok, thanks. (And yes, KindleImport does what I wanted to do in this case. I had it downloaded but not installed in Sigil for whatever reason thus it was out of mind. Thanks!)
patrik is offline   Reply With Quote
Old 03-26-2018, 03:52 AM   #7
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Update: Added a delay between dialog calls due to sensitivity issues causing the second dialog to inconveniently disappear.(v0.1.3)
slowsmile is offline   Reply With Quote
Old 03-26-2018, 05:00 AM   #8
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Update: Fixed problems with image folder name generation for both Word and AbiWord html docs. (v.0.1.4)

Last edited by slowsmile; 03-26-2018 at 05:13 AM.
slowsmile is offline   Reply With Quote
Old 12-09-2018, 06:25 PM   #9
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Update: The following bug has been fixed in v0.1.5:

* Fixed a uuid problem affecting the uuids generated in the content.opf and toc.ncx metadata, which was causing uuid errors during Epubcheck.
slowsmile is offline   Reply With Quote
Old 06-08-2020, 10:43 PM   #10
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 667
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
I used the plugin to import an HTML file exported from Word as "web page, filtered".


Sample HTML para:


<p class=MsoNormal style='margin-top:6.0pt;margin-right:0cm;margin-bottom:6.0pt;
margin-left:0cm;text-indent:36.0pt'><span style='font-size:18.0pt;letter-spacing:
-.2pt;font-style:normal'>And yet, despite his distinguished ancestry, despite
his celebrated historical novels, and despite his glorious Boer </span><span
style='font-size:18.0pt;letter-spacing:-.1pt;font-style:normal'>War record,
Conan Doyle is best known to the world for </span><span style='font-size:18.0pt;
letter-spacing:-.25pt;font-style:normal'>having created Sherlock Holmes.</span></p>



Output in epub:

<p class="Normal sgc-4"><span class="sgc-1">And yet, despite his distinguished ancestry, despite his celebrated historical novels, and despite his glorious Boer</span> <span class="sgc-2">War record, Conan Doyle is best known to the world for</span> <span class="sgc-3">having created Sherlock Holmes.</span></p>

p.sgc-4 {
margin-top: 0.5em;
margin-right: 0;
margin-bottom: 0.5em;
margin-left: 0;
text-indent: 36.0pt
}
span.sgc-3 {
font-size: 1.5em;
letter-spacing: 0em;
font-style: normal
}
span.sgc-2 {
font-size: 1.5em;
letter-spacing: 0em;
font-style: normal
}
span.sgc-1 {
font-size: 1.5em;
letter-spacing: 0em;
font-style: normal
}



For an entire book, there were about 300 styles created, many identical as above or only differing by "letter-spacing". The 3 styles here I assume were rounded down from the small letter spacing (-.2pt, -.1pt, -.25pt) in the source. Would have been nice if then they were combined into a single style.

After wasting a few hours trying to clean that up in Sigil, I went back to the source HTML and deleted all the letter-spacing styling with a text editor and reimported.
Now there was a manageable number of styles; though again, several were identically defined.

I suggest that the importer just ignore all letter-spacing formatting. Or at least, have that as a default option, though I've never seen any ebook where letter spacing was appropriate in body text. And ideally, merge styles with identical definitions.

Last edited by AlanHK; 06-09-2020 at 02:11 AM.
AlanHK is offline   Reply With Quote
Old 06-09-2020, 11:38 PM   #11
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
My plugin doesn't create or reinterpret styles in your HTML doc - it just ports them as they are to the Styles directory and creates a CSS. However, it also transforms and ports all span tag styling to the CSS as is. I also don't really think it's wise to remove all letter-spacing attributes from the CSS stylesheet. After all, when everyone is considered, there might be plugin users who have deliberately used letter-spacing for a good reason in their HTML doc -- such as for adjusting character spacing in the title or in their headings. And if all you want to do is remove the letter-spacing: 0em; attribute from the CSS, you should be able to do that easily and quickly in one hit by using Search and Replace in Sigil.

Last edited by slowsmile; 06-10-2020 at 04:10 AM.
slowsmile is offline   Reply With Quote
Old 06-10-2020, 05:38 AM   #12
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 667
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by slowsmile View Post
And if all you want to do is remove the letter-spacing: 0em; attribute from the CSS, you should be able to do that easily and quickly in one hit by using Search and Replace in Sigil.
Yes I can. I did.
Also, in the full file there were many other letter-spacing in styles, also not hard to delete: "letter-spacing:.+;"

The problem is that left 300 styles, scores of them identical and many empty. And a file with thousands of spans that were 99% redundant. Every paragraph split by spans, sometimes between letters in a word. That would have taken all day to simplify. Since the "letter-spacing" styles were interspersed with actual necessary formatting, like italics, I could not just delete or unify styles in groups by regex, the only way was one by one.

Letter-spacing in Word exports in body text is always garbage.
And almost always garbage even in headings.
If you think someone might want this, make it optional.

These tags are not from design choices made by anyone, they are Word trying to mimic the layout in the original page, and that makes no sense with a different font in a different size page.
Maybe in Word changing all fully justified text to left-aligned or monospace before exporting would stop it doing that.

Anyway, just my feedback. Up to you.

Last edited by AlanHK; 06-10-2020 at 05:55 AM.
AlanHK is offline   Reply With Quote
Old 07-26-2020, 11:39 PM   #13
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Plugin Update(v0.1.7):
  • Fixed a problem with ODT html images access.
  • The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for MS Windows.
  • Other minor fixes and stabilty improvements.

Last edited by slowsmile; 07-27-2020 at 02:07 AM.
slowsmile is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] DjVuMaker - converts Postscript books to Djvu TheWizard Plugins 19 02-13-2024 06:58 PM
[Plugin] OpenDocHTMLImport - Full ODF HTML(Writer) conversion to epub slowsmile Plugins 94 10-01-2017 11:59 AM
Previously converted html no longer converts. chrisanthropic Conversion 5 12-31-2011 09:15 PM
No Images When HTML Converts to Mobi -Why? Akua Conversion 4 11-22-2011 01:52 AM
HTML converts to ZIP? Deejub44 Calibre 2 01-24-2009 08:57 PM


All times are GMT -4. The time now is 05:39 PM.


MobileRead.com is a privately owned, operated and funded community.