04-02-2011, 09:18 PM | #1 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
HTMLZ - Single HTML File Output
I want to share an upcoming feature in the 0.7.54 release. One complaint I hear often is in regard to the inability to edit ebooks. Many people seem to think EPUB is not a good format for editing. Sigil is often the solution given around these parts but some people insist on the need for a book to be contained in a single HTML file. Simply unzipping an EPUB doesn't accomplish this due to the need to split the files.
To remedy this situation I've added a new output format: HTMLZ. Just like TXTZ it is just a zip file with with a different extension to differentiate it. Inside is a metadata.opf file (calibre can read and write metadata to it). Images are preserved, renamed and placed in an images folder. Also inside is a single HTML file. Even if you're converting from and EPUB that has been split into multiple parts a conversion to HTMLZ will result in a single HTML file. To go along with this there are a number of ways to configure CSS handling. The default is to place the CSS in separate style.css file. It can also place class based CSS inside of the head element in the HTML itself. Or you can have it write the CSS inline within each element. Finally the last option for CSS is to remove it and convert as much as possible (a very limited set right now) to HTML tags. As with all of my output format attempts I believe this will have quite a few bugs. Let me know about any issues so I can fix them. I hope people find this useful for their hand editing needs. |
04-02-2011, 10:30 PM | #2 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Thanks for doing this! I subconsciously have wanted this ever since I saw Sony's first ADE implementation forced a max. chunk size (300k) and required splitting your source .html to accomplish same.
I've always WANTED .epub to be the master source for ebooks I create, and now the next best thing will be .HTMLZ (a single .html with images and its .opf!!!!). By the way, when I do want to preserve my single source .html in conversions to .epub, I usually feed calibre's "ebook-convert" the tags: Code:
--dont-split-on-page-breaks --flow-size=40000 Kudos for adding this functionality! Last edited by nrapallo; 04-02-2011 at 10:42 PM. |
Advert | |
|
04-09-2011, 05:44 PM | #3 | |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Quote:
|
|
04-10-2011, 05:21 AM | #4 |
Fanatic
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
THANK YOU!!!
|
04-10-2011, 07:04 PM | #5 | |
Addict
Posts: 254
Karma: 59872
Join Date: Dec 2009
Location: New York, USA
Device: Kindle 3 (wifi) + nokia n900 tablet phone
|
Quote:
As someone who has trouble with epub and has been depending on RTF, a single-page HTML file will make things so much easier! |
|
Advert | |
|
04-12-2011, 01:55 PM | #6 | |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Quote:
Would it be better for that type of tool/file type plugin to create an ".htmlz" file instead of a ".zip" file? |
|
04-12-2011, 02:15 PM | #7 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
HTMLZ input is supported. As for that plugin no idea. Change the extension and see how it works.
|
04-12-2011, 04:52 PM | #8 | |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
So I imported the book as a .zip (which works just fine, metadata is recognized, cover is recognized, etc) and then tried to convert it to a .htmlz which I was going to save and then compare to see how the .htmlz metadata.opf info was stored so that I could compare it to the .zip version. Unfortunately, converting the .zip to .htmlz failed with the following error message: calibre, version 0.7.54 ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Tank Driver: With the 11th Armored from the Battle of the Bulge to VE Day) Convert book 1 of 1 (Tank Driver: With the 11th Armored from the Battle of the Bulge to VE Day) Processing archive... Resolved conversion options calibre version: 0.7.54 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'breadth_first': False, 'change_justification': u'original', 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': '/var/folders/0J/0JxyqG5bFGuePNZPaZd3-E+++TI/-Tmp-/calibre_0.7.54_tmp_nE26fS/calibre_0.7.54_vEs25X.jpeg', 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_package': False, 'enable_heuristics': False, 'extra_css': None, 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'htmlz_class_style': u'external', 'htmlz_css_type': u'class', 'input_encoding': u'iso-8859-1', 'input_profile': <calibre.customize.profiles.InputProfile object at 0x1088b9150>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_levels': 5, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_inline_navbars': False, 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x1088b99d0>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': '/var/folders/0J/0JxyqG5bFGuePNZPaZd3-E+++TI/-Tmp-/calibre_0.7.54_tmp_nE26fS/calibre_0.7.54_Cz_47v.opf', 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: HTML Input running on /var/folders/0J/0JxyqG5bFGuePNZPaZd3-E+++TI/-Tmp-/calibre_0.7.54_tmp_nE26fS/calibre_0.7.54_nQEnCB_plumber_archive/book.opf Parsing all content... Parsing book.html ... Generating default TOC from spine... Merging user specified metadata... Detecting structure... Auto generated TOC with 33 entries. Flattening CSS and remapping font sizes... style.css contains data in TXT format converting to HTML Converting style.css ... Parsing style.css ... Forcing style.css into XHTML namespace Stylesheet 'style.css' referenced by file 'book.html' is not CSS Source base font size is 12.00000pt Removing fake margins... Parsing stylesheet.css ... Found 270 items of level: div_1 Found 22 items of level: p_2 Found 718 items of level: p_1 Ignoring level p_2 div_1 left margin stats: Counter() div_1 right margin stats: Counter() p_1 left margin stats: Counter({u'0': 718}) p_1 right margin stats: Counter({u'0': 718}) Cleaning up manifest... Trimming unused files from manifest... Trimming 'style.css' from manifest Python function terminated unexpectedly: must be convertible to a buffer, not lxml.etree._Element Creating HTMLZ Output... Converting OEB book to HTML... Converting book.html to HTML... Traceback (most recent call last): File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 147, in main return run_entry_point() File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.7/site.py", line 116, in run_entry_point return getattr(pmod, func)() File "site-packages/calibre/utils/ipc/worker.py", line 119, in main File "site-packages/calibre/gui2/convert/gui_conversion.py", line 31, in gui_convert_override File "site-packages/calibre/gui2/convert/gui_conversion.py", line 25, in gui_convert File "site-packages/calibre/ebooks/conversion/plumber.py", line 1035, in run File "site-packages/calibre/ebooks/htmlz/output.py", line 76, in convert TypeError: must be convertible to a buffer, not lxml.etree._Element Interestingly, this same .zip converts just fine to an .epub (as far as I can tell) It also appears to have trouble with the style.css sheet as it declares that the "style.css is not CSS". Stylesheet 'style.css' referenced by file 'book.html' is not CSS This same file unzipped, imports quite nicely into Sigil and it does not seem to have any problem with the xhtml or css. |
|
04-12-2011, 05:00 PM | #9 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
The file extension not being recnogized is an easy fix. I forgot to add it to the list. The metadata reader just reads the OPF from the archive. It's just a standard OPF just like if you do save to disk. Now that you mention the cover Kovid point out that thats not supported when he committed the changes. Looks like I forgot to go back and add cover support.
email me the file and I'll see what's going on. |
04-12-2011, 05:13 PM | #10 |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi user_none,
Will do. I looked at the debug output of the conversion and notices that the style.css that I am passing in seems to get converted into a .xhtml (wrapped with its own <html><head><body> etc even though it is a normal linked css (without any namespace declaration). So something is messed up with the style.css recognition too as far as I can tell. I will send you a test .zip file. Thanks, KevinH |
04-12-2011, 05:21 PM | #11 |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Sent you a .zip archive of the book that does work upon import (but maybe with styles broken for some reason?) but that when converted to .htmlz from .zip, generated the error I reported. Thanks, KevinH |
04-12-2011, 07:24 PM | #12 | ||||
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Quote:
Quote:
Code:
Flattening CSS and remapping font sizes... style.css contains data in TXT format converting to HTML Converting style.css ... Parsing style.css ... Forcing style.css into XHTML namespace Stylesheet 'style.css' referenced by file 'book.html' is not CSS Since this is in the ZIP input stage it's going to happen no matter what output is used. Kovid will need to weigh in on this and say if it's an issue or not. Quote:
|
||||
04-12-2011, 07:33 PM | #13 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That error indicates that either the mimetype information for style.css is incorrect or missing in the opf file's manifest.
|
04-12-2011, 08:18 PM | #14 | |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Quote:
Thanks! It was staring me in the face but I just did not see it. The media-type for the stylesheet was set to "text.css" and not "text/css" in the manifest of the opf. I will pass that error along so that it gets fixed. KevinH |
|
04-12-2011, 08:19 PM | #15 | |
Sigil Developer
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi user_none,
Thanks for taking the time to look at and fix this! KevinH Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Several xhtml/html to a single epub file help. | clowe1028 | ePub | 3 | 03-21-2010 03:47 AM |
Mobigen Mass Batch conversion of HTML-Single-File ebooks to .mobi ebooks | cklammer | Kindle Formats | 9 | 11-20-2009 03:00 AM |
CHM to single html file...suggestions? | drogo | Workshop | 2 | 11-25-2008 12:35 PM |
OEB to Single HTML File Converter? | James Bryant | Workshop | 3 | 06-29-2008 08:28 AM |
converting lit html output into one big file for BD | Dave Berk | Sony Reader | 15 | 03-29-2007 10:02 PM |