Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 01-09-2017, 04:37 AM   #1
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 438
Karma: 787388
Join Date: May 2013
Location: Philippines
Device: Android S5
[Plugin] OpenDocHTMLImport - Full ODF HTML(Writer) conversion to epub

Import ODF HTML documents into Sigil as epubs.

Input: ODF HTML file(derived from LibreOffice or OpenOffice only)
MIT Licence(OSI)
Output: Epub 2
Minimum Sigil requirement: v0.9.0 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows/OSX/Linux
** Tested on Windows 7, 8 & 10 only **
** Tested on OSX, Linux32 & Linux64 **

Current Version: "0.4.7"

**Acknowlegements** A huge thank you to both KevinH and DiapDealer for all their helpful advice and testing. Without their expert guidance and invaluable help there would be no OSX or Linux versions for this plugin.

Installation
* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally).
* Click Add Plugin and select OpenDocHTMLImport_vXXX.zip. This will load and install the plugin into Sigil, which you can then select and run using Plugins > Input > OpenDocHTMLImport.

Description
The purpose of the plugin is to help users of LibreOffice(LO) and OpenOffice(OO) more easily convert their ODF html documents directly to epub. This plugin should give a full conversion and also acts to get rid of all the drudge jobs like cleaning the html, re-styling your epub from scratch, creating a toc, adding images, creating a stylesheet, adding metadata etc and acts to quickly set up an ideal start point for important Sigil finishing-off tasks like final re-styling, toc change, adding embedded fonts etc.

This plugin converter should also be useful for non-techies as well, since it should also produce an uploadable basic epub, with no frills, after conversion. This plugin will convert your document to epub 2 format.

Features
As well as converting an html doc to epub, this plugin will also do the following additional tasks:

* Thoroughly cleans out and reformats the html file.
* Fixes common mixed encoding problems.
* Now preserves all internal links and bookmarks after conversion (added in v0.3.8)
* Creates a stylesheet that preserves all layout and formatting after conversion to epub.
* Preserves all original style names in the CSS(does not use indexing).
* Ports and transforms in-tag text styling to the stylesheet as named classes(no indexing).
* Adds an ebook cover image to the epub.
* Imports all html ebook images as inline images.
* Uses special formatting to help preserve smaller image sizes across all reading devices.
* Creates a Level 1 doc TOC(in Git Markdown style) and a Nav TOC(device TOC).
* Adds the necessary metadata to the epub.
* Preserves all internet links.
* Automatically fixes incorrectly formatted id values in the html(added in v0.4.5).
* Trims the stylesheet - removes all unnecessary and unneeded style properties
* Formats all epub text and headings as default serif throughout.
* Adds the Go To guides for toc, cover and begin read(set to 'Chapter 1' or default).
* Converts all "in", "cm", "mm", pc" and "pt" values to relative "em" values in the CSS.
* Adds globals and presets to the CSS to help guard against KDP Look Inside issues.
* Cannot render tables or lists.

This plugin effectively converts and prepares your html doc(as you have styled it in OO or LO) for upload as a basic epub with no frills.

Plugin Run
Create a named directory on your desktop and save your ODT Document as 'HTML Document(Writer)' + all html images(if applicable) to this directory. Now run the plugin in Sigil to convert your html doc to epub.

Metadata(via dialog)
The Edit eBook Details dialog window collects all necessary epub metadata.

Re-Styling Options(via dialog)
These options are defined below:

* Convert chapter text only to fiction style format.
Transforms only ebook chapter text or story text to fiction style format. Fiction style is where the first paragraph in the chapter always has no indent while all succeeding paragraphs have an indent

* Convert chapter text only to block text format.
Transforms only ebook chapter text or story text to block text format

* Convert all ebook text to block text format(the title and TOC pages are not converted).

Caveats: If you use the above re-styling options please ensure that all your chapter headings are formatted in any of the following three ways: Chapter 1, Chapter 2 etc or Chapter One, Chapter Two etc or 1, 2, 3 etc(AllCaps is also allowed). And be sure to properly use heading styles(h1, h2, h3 etc) for all main headings in the front matter, story and back matter of your ebook. If you use <p> tags to style your main headings then the above options will not work well. Also, when converting to fiction style format, ensure that there is no text with <p> tag styling between your chapter headings and first paragraph. For instance, if you have a date and location or timeline(using <p> tags) above the first paragraph in the chapter then this styling option will not work well.

Styling Info
The plugin interface is simple to use and there are only 2 style rules:

First rule: Make sure that you only use 'Heading 1'(h1) paragraph style for all the main headings and chapter headings that you want to see in the auto-generated epub TOC. In the plugin, h1 style is used as a marker for selecting and generating the TOC links and is also used for XML structure creation within the epub.

Second rule(optional): If you can, try and use named paragraph styles for formatting all text, headings and spacing in your doc. This is really best practice and this also reduces the number of indexed inline styles ported to the CSS, which helps to make the stylesheet and html more easily readable. This plugin will nevertheless port and preserve most default styles and will preserve all heading styles and named paragraph styles from your doc to your new epub stylesheet.

Don't put decorative images above your ebook title or chapter headings. After conversion to epub, any images above your book title or chapter headings will not show. You can add in these decorative images using Sigil after you have converted to epub.

User Styles - Important!
If you want all your own text style names to show in the generated epub ensure that you do the following for all your text styles:

In OO or LO, go to Styles and Formatting > Organizer > Linked With and make sure that your text style is linked with "Text body"(OO) or "Text Body"(LO). If your text style is linked with "Default" or "Default Style" then it will become an inline style on conversion from a doc to HTML which will become an indexed style on converion to epub. But if your named text style is linked with or inherits "Text Body" then your style names will show in the HTML doc as a proper class. And if they show in the HTML then your style names will also show in the generated epub html. So just make sure that all your named text styles are linked with "Text body" in OO or "Text Body" in LO for them to show in the generated epub.

**Important**: Please ensure that you are using the most recent versions of OO and LO and always Insert your ebook images as a File(do not tick Link).

The auto-generated epub TOC links will be formatted in the following way: AllCaps, 11pt, bold font, blue with no underline. On mouse over the formatting changes to: dark orange with underline. Internet links will also be displayed in the same way without bold or AllCaps. This styling will work for epub vendors like iBooks and Nook. For Kindle, the toc formatting will display, as it is, in the following way: AllCaps, 11pt, bold font, blue with underline. Internet links will not have bold or AllCaps. Kindle does not support link hover capability.

I would also be the first to admit that this plugin is far from perfect, but at least it should provide OpenDoc epubbers with a more useful start-point, in quick time, for manually finishing off their epubs as they see fit in Sigil before vendor upload. Using this plugin should hopefully save you a significant amount of time and effort in your conversion workflow. I don't really think of this plugin as a converter. I think of it more as a useful time saver.

Updates:
* All internal html links will now be converted to epub style pagelinks after conversion. This means that all internal links and bookmarks will now be preserved after conversion to epub.
* Now both the long and shorthand forms of 'padding' and 'margin' will also be converted from their absolute to relative 'em' values in the css.

Change Log:

Spoiler:

v0.4.7
-- Fixed a uuid problem affecting the uuids generated in the content.opf and toc.ncx metadata, which was causing uuid errors during Epubcheck.
v0.4.6
-- For first char digit id errors, an 'x' char is now prepended to the id as a fix(substitution is no longer used).
v0.4.5
-- The plugin now automatically fixes incorrectly formatted id values in the html.
-- Fixed another problem with toc removal.
v0.4.4
-- Fixed a problem with the body tag
v0.4.3
-- The plugin now automatcally removes the html doc TOC if present.
v0.4.2
-- Fixed a bug in reformat inline styles.
V0.4.1
-- Added MIT SW Licence
v0.4.0
-- Fixed a bug with heading ids.
v0.3.9
-- Fixed a minor problem with the TOC header
v0.3.8
-- All internal links and bookmarks will now be preserved after conversion to epub.
-- Now both the long and shorthand 'padding' and 'margin' absolute values will also be converted to relative 'em' values in the css
-- Redesigned the absolute to relative conversion functions to give better output precision.
-- Changed epub file names to lower case
-- Other minor plugin changes to improve checks, protection and cleanup on exit.
-- Updated the release notes.
v0.3.7
-- Fixed "Heading 1" check problem
v0.3.6:
-- Fixed a problem with checkboxes in the Restyling Options dialog.
v0.3.5:
-- Adjusted dimensions for image reformatting
v0.3.4:
-- New functionality. Added Re-Styling Options dialog with 3 new options (see Release Notes for details)
v0.3.3:
-- Fixed icon problems. Now avoids icon assignment for Mac.
v0.3.2:
-- Fixed sys.path error causing problems with Tidy when running the plugin in the external python 3.4+ environment.
v0.3.1:
-- Added fix for <br> tags causing problems with book image re-styling.
-- Added improved and more efficient temp file cleanup on plugin exit or plugin error.
v0.3.0:
-- Changed the Book Browser title.xhtml heading to title case to make it fit in with the rest of the new formatting.
v0.2.9:
-- Fix for epub file name incompatibility issues on Linux 64bit. The opf file names and Book Browser file names have been changed and are now identical and will be in the same letter case as the original headings in the ebook text.
v0.2.8:
-- Fixed and amended begin read in the opf guides to additionally accept the following chapter 1 forms: "Chapter 1" or "Chapter One" or "1".
-- Fixed Book Browser and content.xhtml not properly displaying Polish text with the correct charset in Sigil.
v0.2.7:
-- Added new file encoding checks. These checks are stricter, more accurate and wider in scope and should help to reduce encoding problems in the generated epub.
v0.2.6:
-- Added language to opf metadata based on user locale.
v0.2.5:
-- Added conversion from 'mm' to 'em' values in the CSS
v0.2.4:
-- Added functionality. Now transforms html in-tag text styling to classes with descriptive names. The following core text styles will be used and added to the generated epub stylesheet to accommodate this change:
ebk-centered-text
ebk-blocktext
ebk-text-with-indent
ebk-text-no-indent
v0.2.3:
-- Fixed Heading 1 check problem
-- Fixed header spacing problem
-- Added locale language to XMLNS
v0.2.1:
-- Fixed underscore problem in file names
v0.2.0:
-- Fixed and removed bs4 lxml parser warnings
v0.1.8:
-- Fixed file name compatibilty issues for Windows/Linux/OSX
-- Fixed icon compatibility issues on Linux/OSX
-- Removed unnecessary date timestamp from opf metadata
-- Special thanks to Doitsu for his tenacious testing and advice.
v0.1.4:
-- Initial Release
Attached Files
File Type: zip OpenDocHTMLImport_v047.zip (990.9 KB, 33 views)

Last edited by slowsmile; 12-09-2018 at 07:38 PM.
slowsmile is offline   Reply With Quote
Old 01-09-2017, 10:41 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 19,494
Karma: 99495506
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Thanks for your contribution to the Sigil community! Your plugin has been added to the plugin index thread.
DiapDealer is offline   Reply With Quote
Advert
Old 01-09-2017, 12:27 PM   #3
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
Very good and useful plugin, but unfortunately not for me, because it turns the Polish characters such as: ą, ć, ę, ł, ń, ó, ś, ź, ż, for different signs, for example: ł - ³, ś - �, ń - ñ, ć - æ, e.t.c.
Can I ask you to adapt the plugin to the Polish language? Thank you in advance and sincerely appreciate your existing workload.
Sorry for my very poor knowledge of English.
Regards bravosx
bravosx is offline   Reply With Quote
Old 01-09-2017, 01:05 PM   #4
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,140
Karma: 1931746
Join Date: Nov 2009
Device: many
Is the encoding information (meta tag or encoding or codeset) properly detectable in the input html? In other words, how does a properly formatted ODF html file indicate the character set encoding it uses?

Once converted to utf-8, are these codeset or meta tags *removed* to prevent Sigil from being confused by loading a file that is actually in utf-8 but is tagged to be in some other codeset? Is the epub metadata properly setting the encoding to be utf-8 inside the epub it is handing to Sigil?

KevinH


Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 01-09-2017, 01:43 PM   #5
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
@Kevin
Not much on the know but I think that the input file is valid. How do I open it in Firefox, these are Polish letters. As the same text open in LibreOffice and I've saved it as .docx, and then import it to the Sigil using plugin DOXImport and arrange to convert to Epub is a Polish letters: ą, ć, ę, ł, ń, ó, ś, ź, ż, are displayed correctly.
Once again, sorry for my poor English.
Regards bravosx
bravosx is offline   Reply With Quote
Advert
Old 01-09-2017, 09:46 PM   #6
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 438
Karma: 787388
Join Date: May 2013
Location: Philippines
Device: Android S5
@bravosx...Try the following before you export your LibreOffice doc to HTML:

In LibreOffice click Tools tab > Options > Load/Save > HTML Compatibility. In the Character set dropdown select UNICODE (UTF-8) and save. Now export your hmtl and run it in the plugin. Doing this might help to cure your Polish character set problem.

Last edited by slowsmile; 01-10-2017 at 01:19 AM.
slowsmile is offline   Reply With Quote
Old 01-09-2017, 09:50 PM   #7
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 438
Karma: 787388
Join Date: May 2013
Location: Philippines
Device: Android S5
@DiapDealer...Thanks for doing that !!
slowsmile is offline   Reply With Quote
Old 01-10-2017, 10:19 AM   #8
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
@slowsmile...Thank you for your help.
I've set the character set that zasugerowałeś that is Unicode UTF-8 and tried out different sets of characters related to Central Europe. All display Polish letters inappropriately.
Only Polish letters began to correctly display when I chose the LibreOffice character set Western European (Windows-1252/WinLatin 1). A little strange but most importantly, it works.

Once again, sorry for my poor English.
Regards bravosx
bravosx is offline   Reply With Quote
Old 01-10-2017, 10:21 PM   #9
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 438
Karma: 787388
Join Date: May 2013
Location: Philippines
Device: Android S5
@bravosx...You could also try opening your epub in Sigil and going to Edit > Preferences > Language > Default Language for Metadata and set this and User Interface Language to Polish. This should ensure that the html text in the epub can cope with Polish characters.

Last edited by slowsmile; 01-10-2017 at 10:37 PM.
slowsmile is offline   Reply With Quote
Old 01-10-2017, 10:40 PM   #10
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,140
Karma: 1931746
Join Date: Nov 2009
Device: many
Sigil onlu uses utf-8 encoding. Any epub not using that encoding is converted to utf-8 on import. It sounds as if either the html file output by LibreOffice is encoded in latin-1 and marked as utf-8, or the user does not have a proper utf-8 font supporting the Polish characters.

Please try directly importing the html file into Sigil while *not* using the plugin. Once loaded, do you see the proper Polish chars or not in CodeView? If not, then the oroblem is with your system and Sigil and not his plugin.

You can also try using Sigil Preferences to set a font that has the proper utf-8 glyphs for Polish.

KevinH
KevinH is offline   Reply With Quote
Old 01-11-2017, 03:07 PM   #11
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
I'm running Windows 10 and Sigil 64 0.9.7 and LibreOffice 5.2
@slowsmile... I have set Default Language for Metadata and User Interface Language to Polish.
When set to LO, as you suggested in post # 6 Unicode (UTF-8) and save text using the plug OpenDocHTMLImport no Polish characters.
In contrast, the same text written, respectively, as .docx i .odt then imported to Sigil using appropriate plugins DOCXImport and ODTImport text is displayed in the working window and the preview window properly, that is, with Polish characters.

As I discussed earlier, only at the LO character set Western European (Windows-1252/WinLatin 1) and importing using plugin OpenDocHTMLImport getting properly display Polish characters. Weird, but it works.

I think the problem, however, lies in the same plug, but I may be wrong.

@Kevin... In Sigil preferences I set font Georgia and they have Polish signs.
My question is how to import the Sigil directly saved as .html text I have not found such a possibility.
I have made such an attempt, I set again LO in Tools tab > Options > Load/Save > HTML Compatibility. In the Character set dropdown select UNICODE (UTF-8). I wrote the text as .html, opened in Firefox then Ctrl + A, Ctrl + C and Ctrl + V to the working window Sigil. I received plain text (no formatting characters) with a properly displayed Polish characters.

Once again, sorry for my English, I greet all and thank you for your help.
bravosx
bravosx is offline   Reply With Quote
Old 01-11-2017, 05:00 PM   #12
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,140
Karma: 1931746
Join Date: Nov 2009
Device: many
@bravosx
In Sigil, use the File->Open menu and change the filter at the bottom of the File Dialog from .epub to .html and then navigate to and open the html file that was created using LibreOffice.

Does the resulting text show the correct Polish characters?
KevinH is offline   Reply With Quote
Old 01-12-2017, 05:01 AM   #13
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
@Kevin
File created in the LO setting compatibility with HTML format UNICODE character set (UTF-8) and the opening of the Sigil in this way, which you indicated properly display Polish characters and formatting.

Regards bravosx
bravosx is offline   Reply With Quote
Old 01-12-2017, 11:28 AM   #14
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,140
Karma: 1931746
Join Date: Nov 2009
Device: many
Then the issue must be in the plugin someplace. Sigil autodetects the encoding and converts it to utf-8. The plugin should read the input file as binary (bytes), attempt to autodetect the encoding using charmap or byte search for an encoding string, and then decode the binary (bytes) into a python str type (unicode). Once as a python3 string replace any metadata encoding infonfrom the old encoding to utf-8 before using encode to create a utf-8 set of bytes for working with lxml and etc.

How does this plugin handle that process?

KevinH
KevinH is offline   Reply With Quote
Old 01-12-2017, 12:33 PM   #15
bravosx
Connoisseur
bravosx began at the beginning.
 
Posts: 56
Karma: 10
Join Date: Jun 2014
Location: Poland, Żory
Device: Prestigio PER3464B, Onyx Lynx, Lenovo S5000 i Tab4-8"
Unhappy

Quote:
Originally Posted by KevinH View Post
Then the issue must be in the plugin someplace. Sigil autodetects the encoding and converts it to utf-8. The plugin should read the input file as binary (bytes), attempt to autodetect the encoding using charmap or byte search for an encoding string, and then decode the binary (bytes) into a python str type (unicode). Once as a python3 string replace any metadata encoding infonfrom the old encoding to utf-8 before using encode to create a utf-8 set of bytes for working with lxml and etc.

How does this plugin handle that process?

KevinH
Unfortunately, this is I do not know. I am a retired mechanical engineer, not a computer programmer.

Regards bravosx
bravosx is offline   Reply With Quote
Reply

Tags
conversion, epub, html, odf, opendoc

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
html to epub conversion andin1 Conversion 1 03-12-2013 07:38 PM
Nightmare epub: it's full of tables (conversion from CHM?) MelBr Conversion 2 02-23-2013 12:28 PM
html to epub CLI conversion / html input m4mmon Conversion 2 05-05-2012 03:10 AM
Help with HTML to ePub conversion...? Nethfel Calibre 4 05-10-2010 03:26 PM
Converting ODF to ePub with ODFToEPub wdonne News 0 04-22-2010 06:28 AM


All times are GMT -4. The time now is 11:27 AM.


MobileRead.com is a privately owned, operated and funded community.