05-19-2010, 06:31 PM | #1 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
BookDesigner HTML0 to clean HTML conversion utility
Many of us still use BookDesigner to edit our books, or have some old backups of books created in the past in "html0" format.
Uncompressed "html0" files are in fact html files, but they do not import well to other programs such as Sigil, and require a lot of manual work to fix them. I wrote this quick and dirty utility to help me convert my "html0" files to simpler and cleaner html files that can be easily imported with Sigil, and would like to share it with everybody here. Installation: Just uncompress the attached file and put it anywhere on your disk. Usage You will need an uncompressed html0 file as input. If your file is compressed, just open it with BookDesigner. An uncompressed version of the file will be written to the Lastfile directory of BD's installation folder. Run the file HTML02HTML.exe, press the "File" button and search for your html0 file. After that click on the "Convert" button. The source file remains unchanged. A new file, with the same name as the source file (but with "html" extension) will show in the same directory as the source file. Copy and paste the "style.css" file from the HTML02HTML directory to this directory. Open the new file with a browser or HTML editor. Conversion operations - Book Title and Book Author are converted to H1 tags. - Titles are converted to H2 tags. - Subtitles are converted to H3 tags. - <DIV> tags are converted to <p> tags. - Paragraph indentation with is removed. - A link to "style.css" is included. - <HR> tags are deleted. - Blank lines are replaced with <br /> - Notes and Links are fixed to work both ways (link to endnotes and "go back" links) Notes The program works in Win XP, I don't know if it works on Vista or W7. This is a preliminary version, barely tested with 3 files. Probably buggy. I would like to have some feedback to make it more useful. EDIT: I uploaded a modified version of the utility. Changes: - Encoding information copied to the output html. - Author and Book Title metadata added to the output html. This metadata is reconigsed as such by Sigil when importing the file. - Processing of <HR> tags is now user selectable: they can be deleted, kept or replaced by Sigil Chapter Breaks. The latter simplifies file splitting in Sigil 0.2.0. EDIT: Uploaded new version EDIT Uploaded new version EDIT Uploaded new version with optional paragraph splitting as requested by JSWolf EDIT (29 May) Uploaded new version with 2 new options EDIT (1 June) Uploaded new version with support for all BookDesigner styles and minor bug fixes. EDIT (18 August) Uploaded new version. Changes: -Minor bugfixes in css -New option "Different style for first paragraph after h2/h3" -H1 tags (Title and Author) automatically excluded from TOC in Sigil -<br /> tags no longer used for blank lines. When there is a blank line before a paragraph, the style of the paragraph is changed to include a margin at the top of it. This produces better results when imported with Sigil (Thanks charlesky for your help in this). I am also including a html0 version of Stevenson's "The Master of Ballantrae" (unzip before using) that can be used to test the utility. Last edited by Pablo; 08-18-2010 at 09:10 PM. Reason: New version uploaded |
05-20-2010, 05:10 PM | #2 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
I would like to add a "Smart quotes" option, to change straight quotes to curly quotes.
What would be the best approach? Just write the actual characters or use some decimal numeric character references, such as & #8220; for the left double quotation mark? I think this also depends on text encoding. Any ideas? Last edited by Pablo; 05-20-2010 at 05:14 PM. |
Advert | |
|
05-21-2010, 05:15 PM | #3 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
I just uploaded a new version of the utility (see first post for the file).
Changes: - css file generated automatically in the destination directory - no need to copy it manually. - Added option to change straight quotes to curly quotes ("smart quotes") - Added results window with save to file option Notes: - To use the utility you need to generate your ebook in BookDesigner format with Make ebooks --> BookDesigner (html0), and after that open it from BookDesigner with File --> Open book. If you make any changes to the book, you have to repeat the process. The reason for this is that when you save the file, BD makes some format changes in the html0 file in Lastfile that are not handled well with this utility. Generating the book and opening it again with File--> Open Book restores formatting. - The utility supports BookTitle, BookAuthor, Paragraph, Title, Subtitle and Notes and Links. - Verse, text author, anotation and epigraph are not supported by the utility. - The smart quotes option will not produce good results always. Particularly problematic are posesives of words ending in 's' and spelling of slang words. When the program finds something problematic, it indicates the line in the results window. In this case you will need an editor such as notepad++ that displays line numbers to locate the line and correct manually. Last edited by Pablo; 05-21-2010 at 05:20 PM. |
05-22-2010, 08:50 PM | #4 |
Resident Curmudgeon
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Thanks. This sounds a lot better then saving as HTML from Book Designer. I will give it a go and let you know how it works.
|
05-23-2010, 07:16 AM | #5 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
|
Advert | |
|
05-23-2010, 01:55 PM | #6 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
Just uploaded a new version:
Changes: - Fixed bug that caused left or right-justified text to convert incorrectly. - Paragraphs are now split in several lines no longer than 70 chars. See first post for the file |
05-23-2010, 01:56 PM | #7 |
Resident Curmudgeon
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
05-23-2010, 02:03 PM | #8 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
|
05-23-2010, 06:01 PM | #9 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
Uploaded a new version. The only change is that long paragraph splitting is now optional.
|
05-23-2010, 11:05 PM | #10 |
Resident Curmudgeon
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Thanks. I'll give it a go later this week. Maybe Tuesday.
|
05-29-2010, 05:55 PM | #11 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
Uploaded a new version of the utility.
Changes: - Added option to supress inner nbsp (non breaking spaces) - Added option to add <h> tags around IMG tags, with text Figure 1, 2, ... so that images can be indexed. See first post for the utility. |
06-01-2010, 05:53 PM | #12 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
Uploaded a new version (see first post for the file)
Changes: - Fixed a bug in the closing tag </h...> in image references. - Added support for the remaining BD styles (epigraph, text author, verse, annotation, epigraph+text author) |
06-14-2010, 03:33 PM | #13 |
Resident Curmudgeon
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I will have to give this a go later on tonight or tomorrow. I keep forgetting. Sorry.
|
06-15-2010, 07:03 PM | #14 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
|
08-18-2010, 09:13 PM | #15 |
Guru
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
I just uploaded a new version (see first post).
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
clean HTML or PDF before mobi conversion in Calibre | mark235 | Calibre | 9 | 12-25-2010 09:37 PM |
Ahhhhh - Utility overload: BookDesigner, BookCreator, Textify, txt2lrf...too much | FatDog | Workshop | 6 | 05-10-2010 12:00 AM |
BookDesigner file format (html0) | Pablo | Other formats | 3 | 09-11-2009 08:45 PM |
Best way to get clean HTML | JSWolf | Kindle Formats | 18 | 04-02-2009 11:00 AM |
Tool to easily clean and refurbish html-text before conversion | Pulp | Workshop | 3 | 10-13-2008 10:16 AM |