|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
[Plugin] HTML2Epub - converts various html doctypes to full epub format
Cleans and converts html docs derived from MS Word, ODT(Writer) or Google Doc to epub format. Requirements Plugin Type: Input MIT Licence(OSI) Minimum Sigil requirement: v0.9.3 or higher Python Requirements: Python 3.4+ (Bundled or External) OS Requirements: Windows/OSX/Linux Tested on Windows 7, 8 & 10 only Current Version: "0.1.8" Installation * Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally) * Click Add Plugin and select HTML2Epub_vxxx.zip. This will load and install the plugin into Sigil, which you can then select and run using Plugins > Input > HTML2Epub. Description This input plugin will import and convert various html doctypes to full epub format in Sigil. The main purpose of this plugin is to help users more easily and rapidly convert their html documents directly to standard epub format. This plugin effectively converts and transforms your html doc(as you have styled it in html) into a reflowable epub without any frills. Users should only use html docs derived from the following doctypes with this plugin: Word doc, Word docx(both saved as Web Page HTML Filtered), ODF Writer(LO or OO only), Google Doc(saved as html, zipped). The plugin no longer supports html derived from AbiWord because AbiWord is no longer distributed or supported for Windows.(changed in v0.1.7) Features This plugin does the following tasks: * Thoroughly cleans out the html file and ensures epub 2 html compliance as well. * Creates a stylesheet that preserves all layout and styling after conversion to epub. * Trims the stylesheet and removes any unneeded or redundant style properties. * Transforms and ports all in-line styling to the stylesheet. * Preserves all internal links, external links and valid bookmarks. * Removes all unused bookmarks. * Splits the html file into xhtml files at heading boundaries according to the heading style selected by the user(see User Options). * Adds an ebook cover image to the epub. * Imports html ebook images with all height/width values as a % of current screen width. * Adds the necessary basic metadata to the epub. * Formats all epub text as default serif throughout. * Converts all absolute values to relative "em" values in the css. * Adds globals and presets to the css to help guard against common Look Inside issues for KDP uploads. * Tables and embedded fonts are not supported. Edit eBook Metadata(via dialog) This dialog collects the basic ebook metadata that is required for an ebook. User Options(via dialog) This options dialog sets the main heading style used -- either h1 or h2 -- for all chapters or main headings in your ebook. The selected heading style will be used to split the html file into separate epub xhtml files and will be also be used to automatically create the epub TOC file page and NCX TOC. Added an extra option to allow the user to automatically generate a single level TOC section in the epub(added in v0.1.8). Plugin Run For LO and OO html doctypes, ensure that both the html doc and all associated images are put into a separate dedicated folder. For all other html doctypes, just ensure that the html doc + images folder are both in the same directory on your computer. Then just run the plugin. After running this plugin it would also be advisable to run Tools > Delete Unused Stylesheet Classes or the cssRemoveUnusedSelectors plugin to remove any empty or unused styles in the CSS. Caveat For best results you should ensure that you style all your headings, reading text and spacing using paragraph styles in your word-processor doc before conversion to html. Users should also minimally ensure that they at least use either h1 or h2 heading style for all chapter headings and main headings in their html docs. There's no need to create a doc toc in your html doc because a single-level TOC page will automatically be created by the plugin. Tables, endnotes and embedded fonts are not supported by this plugin. Ensure that all images in your html doc have filenames that contain no spaces, otherwise the plugin will fail. Try and avoid using fake smallcaps in your doc - using nested font styles may cause errors. Don't put decorative images above your ebook title or chapter headings as his will cause errors. You can, instead, just add in your decorative images in Sigil using Insert > File after you have converted your html doc to epub format. This plugin converter does have it's limitations and isn't meant to compete with other well known converters like Calibre. But this plugin should still be quite useful for some because it's so easy to use and it should give you an epub that usually always passes Epubcheck with minimal issues. After conversion, users should have a good start point -- with a clean, basic epub where they can manually add in any final touches in Sigil before ebook upload. Changes: Spoiler:
Last edited by slowsmile; 01-17-2021 at 02:15 AM. |
![]() |
![]() |
![]() |
#2 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Quick Update: I've changed the wording in the User Options dialog. I've made it less techie and more plain english, so that there's no confusion or difficulty in understanding what's needed.(v0.1.1)
Last edited by slowsmile; 03-24-2018 at 03:03 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Update: Fixed a minor problem caused by anchor tags that contain no attributes. These are now automatically removed by the plugin.(v0.1.2)
Last edited by slowsmile; 03-24-2018 at 08:36 AM. |
![]() |
![]() |
![]() |
#4 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 623
Karma: 4566069
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Would this plugin be a good choice for html->epub with the html coming from kindleunpacking a mobi/kf7? (If not, what would be the best option?)
|
![]() |
![]() |
![]() |
#5 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@patrik..I'm afraid that this plugin would probably give errors from html via KindleUnpack because my plugin always converts html --> epub. Here are some of the other reasons why it would fail:
* The plugin looks for the html doctype in the metadata part of the html(in the meta tags) to identify valid doctypes that can be used with this plugin. * The html styling layout -- in between the html <style></style> tags -- will be unique depending on what particular html doctype you are using. I'm not really sure what you are trying to do because whenever you use DiapDealer's KindleImport plugin -- which also uses KindleUnpack -- then this will always give you an epub from a mobi. So why the need to convert from html --> epub? Last edited by slowsmile; 03-24-2018 at 04:49 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 623
Karma: 4566069
Join Date: Jan 2010
Location: Sweden
Device: Kobo Forma
|
Ok, thanks. (And yes, KindleImport does what I wanted to do in this case. I had it downloaded but not installed in Sigil for whatever reason thus it was out of mind. Thanks!)
|
![]() |
![]() |
![]() |
#7 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Update: Added a delay between dialog calls due to sensitivity issues causing the second dialog to inconveniently disappear.(v0.1.3)
|
![]() |
![]() |
![]() |
#8 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Update: Fixed problems with image folder name generation for both Word and AbiWord html docs. (v.0.1.4)
Last edited by slowsmile; 03-26-2018 at 05:13 AM. |
![]() |
![]() |
![]() |
#9 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Update: The following bug has been fixed in v0.1.5:
* Fixed a uuid problem affecting the uuids generated in the content.opf and toc.ncx metadata, which was causing uuid errors during Epubcheck. |
![]() |
![]() |
![]() |
#10 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 656
Karma: 567890
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
I used the plugin to import an HTML file exported from Word as "web page, filtered".
Sample HTML para: <p class=MsoNormal style='margin-top:6.0pt;margin-right:0cm;margin-bottom:6.0pt; margin-left:0cm;text-indent:36.0pt'><span style='font-size:18.0pt;letter-spacing: -.2pt;font-style:normal'>And yet, despite his distinguished ancestry, despite his celebrated historical novels, and despite his glorious Boer </span><span style='font-size:18.0pt;letter-spacing:-.1pt;font-style:normal'>War record, Conan Doyle is best known to the world for </span><span style='font-size:18.0pt; letter-spacing:-.25pt;font-style:normal'>having created Sherlock Holmes.</span></p> Output in epub: <p class="Normal sgc-4"><span class="sgc-1">And yet, despite his distinguished ancestry, despite his celebrated historical novels, and despite his glorious Boer</span> <span class="sgc-2">War record, Conan Doyle is best known to the world for</span> <span class="sgc-3">having created Sherlock Holmes.</span></p> p.sgc-4 { margin-top: 0.5em; margin-right: 0; margin-bottom: 0.5em; margin-left: 0; text-indent: 36.0pt } span.sgc-3 { font-size: 1.5em; letter-spacing: 0em; font-style: normal } span.sgc-2 { font-size: 1.5em; letter-spacing: 0em; font-style: normal } span.sgc-1 { font-size: 1.5em; letter-spacing: 0em; font-style: normal } For an entire book, there were about 300 styles created, many identical as above or only differing by "letter-spacing". The 3 styles here I assume were rounded down from the small letter spacing (-.2pt, -.1pt, -.25pt) in the source. Would have been nice if then they were combined into a single style. After wasting a few hours trying to clean that up in Sigil, I went back to the source HTML and deleted all the letter-spacing styling with a text editor and reimported. Now there was a manageable number of styles; though again, several were identically defined. I suggest that the importer just ignore all letter-spacing formatting. Or at least, have that as a default option, though I've never seen any ebook where letter spacing was appropriate in body text. And ideally, merge styles with identical definitions. Last edited by AlanHK; 06-09-2020 at 02:11 AM. |
![]() |
![]() |
![]() |
#11 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
My plugin doesn't create or reinterpret styles in your HTML doc - it just ports them as they are to the Styles directory and creates a CSS. However, it also transforms and ports all span tag styling to the CSS as is. I also don't really think it's wise to remove all letter-spacing attributes from the CSS stylesheet. After all, when everyone is considered, there might be plugin users who have deliberately used letter-spacing for a good reason in their HTML doc -- such as for adjusting character spacing in the title or in their headings. And if all you want to do is remove the letter-spacing: 0em; attribute from the CSS, you should be able to do that easily and quickly in one hit by using Search and Replace in Sigil.
Last edited by slowsmile; 06-10-2020 at 04:10 AM. |
![]() |
![]() |
![]() |
#12 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 656
Karma: 567890
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
Quote:
Also, in the full file there were many other letter-spacing in styles, also not hard to delete: "letter-spacing:.+;" The problem is that left 300 styles, scores of them identical and many empty. And a file with thousands of spans that were 99% redundant. Every paragraph split by spans, sometimes between letters in a word. That would have taken all day to simplify. Since the "letter-spacing" styles were interspersed with actual necessary formatting, like italics, I could not just delete or unify styles in groups by regex, the only way was one by one. Letter-spacing in Word exports in body text is always garbage. And almost always garbage even in headings. If you think someone might want this, make it optional. These tags are not from design choices made by anyone, they are Word trying to mimic the layout in the original page, and that makes no sense with a different font in a different size page. Maybe in Word changing all fully justified text to left-aligned or monospace before exporting would stop it doing that. Anyway, just my feedback. Up to you. Last edited by AlanHK; 06-10-2020 at 05:55 AM. |
|
![]() |
![]() |
![]() |
#13 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 603
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Plugin Update(v0.1.7):
Last edited by slowsmile; 07-27-2020 at 02:07 AM. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] DjVuMaker - converts Postscript books to Djvu | TheWizard | Plugins | 18 | 01-24-2022 02:27 AM |
[Plugin] OpenDocHTMLImport - Full ODF HTML(Writer) conversion to epub | slowsmile | Plugins | 94 | 10-01-2017 11:59 AM |
Previously converted html no longer converts. | chrisanthropic | Conversion | 5 | 12-31-2011 09:15 PM |
No Images When HTML Converts to Mobi -Why? | Akua | Conversion | 4 | 11-22-2011 01:52 AM |
HTML converts to ZIP? | Deejub44 | Calibre | 2 | 01-24-2009 08:57 PM |