|
|
Thread Tools | Search this Thread |
01-09-2017, 03:37 AM | #1 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
[Plugin] OpenDocHTMLImport - Full ODF HTML(Writer) conversion to epub
Import ODF HTML documents into Sigil as epubs. Input: ODF HTML file(derived from LibreOffice or OpenOffice only) MIT Licence(OSI) Output: Epub 2 Minimum Sigil requirement: v0.9.0 or higher Python Requirements: Python 3.4+ (Bundled or External) OS Requirements: Windows/OSX/Linux ** Tested on Windows 7, 8 & 10 only ** ** Tested on OSX, Linux32 & Linux64 ** Current Version: "0.4.7" **Acknowlegements** A huge thank you to both KevinH and DiapDealer for all their helpful advice and testing. Without their expert guidance and invaluable help there would be no OSX or Linux versions for this plugin. Installation * Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally). * Click Add Plugin and select OpenDocHTMLImport_vXXX.zip. This will load and install the plugin into Sigil, which you can then select and run using Plugins > Input > OpenDocHTMLImport. Description The purpose of the plugin is to help users of LibreOffice(LO) and OpenOffice(OO) more easily convert their ODF html documents directly to epub. This plugin should give a full conversion and also acts to get rid of all the drudge jobs like cleaning the html, re-styling your epub from scratch, creating a toc, adding images, creating a stylesheet, adding metadata etc and acts to quickly set up an ideal start point for important Sigil finishing-off tasks like final re-styling, toc change, adding embedded fonts etc. This plugin converter should also be useful for non-techies as well, since it should also produce an uploadable basic epub, with no frills, after conversion. This plugin will convert your document to epub 2 format. Features As well as converting an html doc to epub, this plugin will also do the following additional tasks: * Thoroughly cleans out and reformats the html file. * Fixes common mixed encoding problems. * Now preserves all internal links and bookmarks after conversion (added in v0.3.8) * Creates a stylesheet that preserves all layout and formatting after conversion to epub. * Preserves all original style names in the CSS(does not use indexing). * Ports and transforms in-tag text styling to the stylesheet as named classes(no indexing). * Adds an ebook cover image to the epub. * Imports all html ebook images as inline images. * Uses special formatting to help preserve smaller image sizes across all reading devices. * Creates a Level 1 doc TOC(in Git Markdown style) and a Nav TOC(device TOC). * Adds the necessary metadata to the epub. * Preserves all internet links. * Automatically fixes incorrectly formatted id values in the html(added in v0.4.5). * Trims the stylesheet - removes all unnecessary and unneeded style properties * Formats all epub text and headings as default serif throughout. * Adds the Go To guides for toc, cover and begin read(set to 'Chapter 1' or default). * Converts all "in", "cm", "mm", pc" and "pt" values to relative "em" values in the CSS. * Adds globals and presets to the CSS to help guard against KDP Look Inside issues. * Cannot render tables or lists. This plugin effectively converts and prepares your html doc(as you have styled it in OO or LO) for upload as a basic epub with no frills. Plugin Run Create a named directory on your desktop and save your ODT Document as 'HTML Document(Writer)' + all html images(if applicable) to this directory. Now run the plugin in Sigil to convert your html doc to epub. Metadata(via dialog) The Edit eBook Details dialog window collects all necessary epub metadata. Re-Styling Options(via dialog) These options are defined below: * Convert chapter text only to fiction style format. Transforms only ebook chapter text or story text to fiction style format. Fiction style is where the first paragraph in the chapter always has no indent while all succeeding paragraphs have an indent * Convert chapter text only to block text format. Transforms only ebook chapter text or story text to block text format * Convert all ebook text to block text format(the title and TOC pages are not converted). Caveats: If you use the above re-styling options please ensure that all your chapter headings are formatted in any of the following three ways: Chapter 1, Chapter 2 etc or Chapter One, Chapter Two etc or 1, 2, 3 etc(AllCaps is also allowed). And be sure to properly use heading styles(h1, h2, h3 etc) for all main headings in the front matter, story and back matter of your ebook. If you use <p> tags to style your main headings then the above options will not work well. Also, when converting to fiction style format, ensure that there is no text with <p> tag styling between your chapter headings and first paragraph. For instance, if you have a date and location or timeline(using <p> tags) above the first paragraph in the chapter then this styling option will not work well. Styling Info The plugin interface is simple to use and there are only 2 style rules: First rule: Make sure that you only use 'Heading 1'(h1) paragraph style for all the main headings and chapter headings that you want to see in the auto-generated epub TOC. In the plugin, h1 style is used as a marker for selecting and generating the TOC links and is also used for XML structure creation within the epub. Second rule(optional): If you can, try and use named paragraph styles for formatting all text, headings and spacing in your doc. This is really best practice and this also reduces the number of indexed inline styles ported to the CSS, which helps to make the stylesheet and html more easily readable. This plugin will nevertheless port and preserve most default styles and will preserve all heading styles and named paragraph styles from your doc to your new epub stylesheet. Don't put decorative images above your ebook title or chapter headings. After conversion to epub, any images above your book title or chapter headings will not show. You can add in these decorative images using Sigil after you have converted to epub. User Styles - Important! If you want all your own text style names to show in the generated epub ensure that you do the following for all your text styles: In OO or LO, go to Styles and Formatting > Organizer > Linked With and make sure that your text style is linked with "Text body"(OO) or "Text Body"(LO). If your text style is linked with "Default" or "Default Style" then it will become an inline style on conversion from a doc to HTML which will become an indexed style on converion to epub. But if your named text style is linked with or inherits "Text Body" then your style names will show in the HTML doc as a proper class. And if they show in the HTML then your style names will also show in the generated epub html. So just make sure that all your named text styles are linked with "Text body" in OO or "Text Body" in LO for them to show in the generated epub. **Important**: Please ensure that you are using the most recent versions of OO and LO and always Insert your ebook images as a File(do not tick Link). The auto-generated epub TOC links will be formatted in the following way: AllCaps, 11pt, bold font, blue with no underline. On mouse over the formatting changes to: dark orange with underline. Internet links will also be displayed in the same way without bold or AllCaps. This styling will work for epub vendors like iBooks and Nook. For Kindle, the toc formatting will display, as it is, in the following way: AllCaps, 11pt, bold font, blue with underline. Internet links will not have bold or AllCaps. Kindle does not support link hover capability. I would also be the first to admit that this plugin is far from perfect, but at least it should provide OpenDoc epubbers with a more useful start-point, in quick time, for manually finishing off their epubs as they see fit in Sigil before vendor upload. Using this plugin should hopefully save you a significant amount of time and effort in your conversion workflow. I don't really think of this plugin as a converter. I think of it more as a useful time saver. Updates: * All internal html links will now be converted to epub style pagelinks after conversion. This means that all internal links and bookmarks will now be preserved after conversion to epub. * Now both the long and shorthand forms of 'padding' and 'margin' will also be converted from their absolute to relative 'em' values in the css. Change Log: Spoiler:
Last edited by slowsmile; 12-09-2018 at 06:38 PM. |
01-09-2017, 09:41 AM | #2 |
Grand Sorcerer
Posts: 27,558
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Thanks for your contribution to the Sigil community! Your plugin has been added to the plugin index thread.
|
Advert | |
|
01-09-2017, 11:27 AM | #3 |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
Very good and useful plugin, but unfortunately not for me, because it turns the Polish characters such as: ą, ć, ę, ł, ń, ó, ś, ź, ż, for different signs, for example: ł - ³, ś - �, ń - ñ, ć - æ, e.t.c.
Can I ask you to adapt the plugin to the Polish language? Thank you in advance and sincerely appreciate your existing workload. Sorry for my very poor knowledge of English. Regards bravosx |
01-09-2017, 12:05 PM | #4 |
Sigil Developer
Posts: 7,667
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Is the encoding information (meta tag or encoding or codeset) properly detectable in the input html? In other words, how does a properly formatted ODF html file indicate the character set encoding it uses?
Once converted to utf-8, are these codeset or meta tags *removed* to prevent Sigil from being confused by loading a file that is actually in utf-8 but is tagged to be in some other codeset? Is the epub metadata properly setting the encoding to be utf-8 inside the epub it is handing to Sigil? KevinH Thanks, KevinH |
01-09-2017, 12:43 PM | #5 |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
@Kevin
Not much on the know but I think that the input file is valid. How do I open it in Firefox, these are Polish letters. As the same text open in LibreOffice and I've saved it as .docx, and then import it to the Sigil using plugin DOXImport and arrange to convert to Epub is a Polish letters: ą, ć, ę, ł, ń, ó, ś, ź, ż, are displayed correctly. Once again, sorry for my poor English. Regards bravosx |
Advert | |
|
01-09-2017, 08:46 PM | #6 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@bravosx...Try the following before you export your LibreOffice doc to HTML:
In LibreOffice click Tools tab > Options > Load/Save > HTML Compatibility. In the Character set dropdown select UNICODE (UTF-8) and save. Now export your hmtl and run it in the plugin. Doing this might help to cure your Polish character set problem. Last edited by slowsmile; 01-10-2017 at 12:19 AM. |
01-09-2017, 08:50 PM | #7 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@DiapDealer...Thanks for doing that !!
|
01-10-2017, 09:19 AM | #8 |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
@slowsmile...Thank you for your help.
I've set the character set that zasugerowałeś that is Unicode UTF-8 and tried out different sets of characters related to Central Europe. All display Polish letters inappropriately. Only Polish letters began to correctly display when I chose the LibreOffice character set Western European (Windows-1252/WinLatin 1). A little strange but most importantly, it works. Once again, sorry for my poor English. Regards bravosx |
01-10-2017, 09:21 PM | #9 |
Witchman
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@bravosx...You could also try opening your epub in Sigil and going to Edit > Preferences > Language > Default Language for Metadata and set this and User Interface Language to Polish. This should ensure that the html text in the epub can cope with Polish characters.
Last edited by slowsmile; 01-10-2017 at 09:37 PM. |
01-10-2017, 09:40 PM | #10 |
Sigil Developer
Posts: 7,667
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Sigil onlu uses utf-8 encoding. Any epub not using that encoding is converted to utf-8 on import. It sounds as if either the html file output by LibreOffice is encoded in latin-1 and marked as utf-8, or the user does not have a proper utf-8 font supporting the Polish characters.
Please try directly importing the html file into Sigil while *not* using the plugin. Once loaded, do you see the proper Polish chars or not in CodeView? If not, then the oroblem is with your system and Sigil and not his plugin. You can also try using Sigil Preferences to set a font that has the proper utf-8 glyphs for Polish. KevinH |
01-11-2017, 02:07 PM | #11 |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
I'm running Windows 10 and Sigil 64 0.9.7 and LibreOffice 5.2
@slowsmile... I have set Default Language for Metadata and User Interface Language to Polish. When set to LO, as you suggested in post # 6 Unicode (UTF-8) and save text using the plug OpenDocHTMLImport no Polish characters. In contrast, the same text written, respectively, as .docx i .odt then imported to Sigil using appropriate plugins DOCXImport and ODTImport text is displayed in the working window and the preview window properly, that is, with Polish characters. As I discussed earlier, only at the LO character set Western European (Windows-1252/WinLatin 1) and importing using plugin OpenDocHTMLImport getting properly display Polish characters. Weird, but it works. I think the problem, however, lies in the same plug, but I may be wrong. @Kevin... In Sigil preferences I set font Georgia and they have Polish signs. My question is how to import the Sigil directly saved as .html text I have not found such a possibility. I have made such an attempt, I set again LO in Tools tab > Options > Load/Save > HTML Compatibility. In the Character set dropdown select UNICODE (UTF-8). I wrote the text as .html, opened in Firefox then Ctrl + A, Ctrl + C and Ctrl + V to the working window Sigil. I received plain text (no formatting characters) with a properly displayed Polish characters. Once again, sorry for my English, I greet all and thank you for your help. bravosx |
01-11-2017, 04:00 PM | #12 |
Sigil Developer
Posts: 7,667
Karma: 5433388
Join Date: Nov 2009
Device: many
|
@bravosx
In Sigil, use the File->Open menu and change the filter at the bottom of the File Dialog from .epub to .html and then navigate to and open the html file that was created using LibreOffice. Does the resulting text show the correct Polish characters? |
01-12-2017, 04:01 AM | #13 |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
@Kevin
File created in the LO setting compatibility with HTML format UNICODE character set (UTF-8) and the opening of the Sigil in this way, which you indicated properly display Polish characters and formatting. Regards bravosx |
01-12-2017, 10:28 AM | #14 |
Sigil Developer
Posts: 7,667
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Then the issue must be in the plugin someplace. Sigil autodetects the encoding and converts it to utf-8. The plugin should read the input file as binary (bytes), attempt to autodetect the encoding using charmap or byte search for an encoding string, and then decode the binary (bytes) into a python str type (unicode). Once as a python3 string replace any metadata encoding infonfrom the old encoding to utf-8 before using encode to create a utf-8 set of bytes for working with lxml and etc.
How does this plugin handle that process? KevinH |
01-12-2017, 11:33 AM | #15 | |
Connoisseur
Posts: 99
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
|
Quote:
Regards bravosx |
|
Tags |
conversion, epub, html, odf, opendoc |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
html to epub conversion | andin1 | Conversion | 1 | 03-12-2013 06:38 PM |
Nightmare epub: it's full of tables (conversion from CHM?) | MelBr | Conversion | 2 | 02-23-2013 11:28 AM |
html to epub CLI conversion / html input | m4mmon | Conversion | 2 | 05-05-2012 02:10 AM |
Help with HTML to ePub conversion...? | Nethfel | Calibre | 4 | 05-10-2010 02:26 PM |
Converting ODF to ePub with ODFToEPub | wdonne | News | 0 | 04-22-2010 05:28 AM |