View Single Post
Old 07-04-2017, 06:36 AM   #1
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
[Plugin] CustomCleanerPlus

An epub-specific custom cleaning utility.


Requirements
Plugin Type: Edit
MIT Licence(OSI)
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows, OSX or Linux
*** Tested on Windows 7, 8 & 10, OSX and Linux ***
Current Version: "0.6.1"

Installation
* Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or External Python(Python 3.4+ should be installed on your computer to run this plugin externally).

* Click Add Plugin and select CustomCleanerPlus_vXXX.zip. This will load and install the plugin into Sigil, which you can then select and run using Plugins > Edit > CustomCleanerPlus.

Description
This epub-specific plugin cleaner is an edit plugin that can be used to clean-up epubs. It also transforms html code in the epub to help ensure proper xhtml compliance with epub format. This plugin no longer supports HTML clean-up but can now be used for cleaning up Epub 2 and Epub 3 files(Added in v0.5.0).

A characteristic feature of this plugin cleaner is that it is epub-specific and will not destroy, change or remove user styling.

This plugin cleaner is best used after using an epub converter in order to remove any dross or non-compliant proprietary data still remaining in the epub html or stylesheet.

Note: Please also take note that this plugin no longer supports any kind of HTML file input or HTML file clean-up. Only Epub cleanup is supported for this plugin now.

Features

Automatic Cleanup Tasks
-- Thoroughly cleans out and reformats all html file sections.
-- Removes or changes all unneeded or non-compliant proprietary data in the html
-- Trims the epub stylesheet(s) - removes any unneeded or redundant class properties from the css
-- Repairs illegal digit-start-char and spacing id values in html files only(added in v0.2.8)
-- Ensures that all ebook image formatting is at least epub 2 compliant
-- Now removes all unused bookmarks from the epubs.(added in v0.3.2)
-- Removes all empty spans(ie spans that contain no styling or classes)
-- Removes all tabs
-- Repair the image names by replacing all illegal spaces with underscores. Added in v0.5.0.
-- Adds the relevant chapter/section heading string to the <title></title> tags. Added in v0.5.0.
-- Adds a valid 'alt' attribute value to all <img> tags, which is derived from the image filename. Added in v0.5.0.

User Options 1: -- Added in v0.5.3
-- Convert all <i>, <b>, <em>, <u>, <s> and <strong> tags to span tag styling
-- Remove all unnecessary ad hoc black text color declarations from the css and from the html. Added in v0.5.0.
-- Remove all unnecessary ad hoc white background color declarations from the css and from the html. Added in v0.5.0.
-- Reformat ebook images to using percentage screen values to help normalize smaller image sizes across all ereaders.
-- Remove any unused images stored in Sigil's 'Images' directory. Added in v0.5.0.
-- Remove all hard breaks(<br> tags) caused by the enter key(added in v0.3.5)
-- Remove all non-breaking spaces from the html(ie &nbsp;, *). Added in v0.5.0.
-- Remove all hyphen class properties from the CSS(e.g. hyphens, adobe-hyphenate, -moz-hyphens, -webkit-hyphens etc).Added in v0.5.0.
-- Remove all font family declarations in the CSS. Added in v0.5.0.
-- Remove empty paragraphs that contain no text. Added in v0.5.0.

User Options 2: -- Added in v0.5.3
-- Remove all internal ids and associated link formatting while preserving the text. Added in v0.5.0. Now also removes id fragments from both internal links and from the OPF links in the guide section as well. Added in v0.5.3
-- Replaced all <div> tags
-- Remove all internet link formatting
-- Remove all internal link formatting
-- Remove the line-height property from CSS. Added in v0.5.3
-- Remove all horizontal rule html tags from the html. Added in v0.5.0
-- Remove all ids/bookmarks
-- Replace <div> tags with <p> tagsw. Added in v0.5.0.
-- Remove page links only. Added in v0.5.0.
-- Remove empty spans i.e. remove spans that contain no formatting. Added in v0.5.3

Plugin Run
Before running this plugin for the first time -- please make absolutely sure that you have Visual Studio 2010 (VC++ 10.0) SP1 already installed on your computer system. If it isn't installed then you should install it. Doing this will ensure that there are no tidy.dll access errors when you run this plugin.

First load your epub into Sigil and then just run this plugin. For epubs also ensure, before you run this plugin, that your epub is fully formed and contains the appropriate ebook cover, html files, xml files, stylesheet(s), images etc. After running the plugin it would also perhaps be advisable to run several passes of Sigil's Tool > Delete Unused Stylesheet Classes or just use the cssRemoveUnusedSelectors plugin to mop up any empty or unused classes in your epub's stylesheet(s) after the clean up.

Caveats
Avoid using fake smallcaps in you doc headings as this can cause nested <font> tag problems. Best to add the fake smallcaps to your epub styling in Sigil after running this plugin. You will normally get best results by using only paragraph style formatting for all text, headers and spacing in your doc. Do not use tables or captions in your html doc with this plugin.

Changes
Spoiler:

v0.6.1
-- Fixed code indent problem in svgAttributes2CamelCase().
-- Fixed spacing issues in removeTextColorBlack() and in removeBGColorWhite().
-- Removed svgAttributes2CamelCase_New()[unused code].
-- My thanks to @BeckyeBook for reporting these problems.
v0.6.0
-- Fixed 4 problems in the save prefs code section of the second dialog window.
-- Fixed an indent problem in the removeUnusedBookmarks() module in cutils2.py.
-- Fixed an indent problem in the svgAttributes2CamelCase() module in cutils.py.
-- Widened the width of both dialog windows to accommodate width differences for Linux users.
-- My thanks to both @DNSB and @BeckyeBook for spotting these problems.
v0.5.5
-- Fixed a problem where the pre-selected option for Remove empty spans was not working.
v0.5.4
-- Fixed several minor problems with the "Remove all ids and associated links" option.
-- Fixed a minor problem with the "Remove all page links" option.
-- Fixed a minor problem with the "Remove all ids/bookmarks" option.
v0.5.3
-- Added a new dialog option to remove line-height properties from the CSS.
-- Added a new dialog option to remove all empty spans in the epub html.
-- The "Remove all ids/bookmarks" option now also removes id fragments both from internal links and from the guide links in the OPF section as well.
-- The original dialog window was becoming too large, so I've split it into 2 separate, simplified dialog windows.
-- Increased the width of both new dialog windows to accommodate the width difference between Windows and Linux HD displays.
v0.5.2
-- Fixed a meta "name" id attribute problem in the xml namespace of the cover file.(thanks to @Leonatus).
v0.5.1
-- Fixed copy SVG image file problem(thanks to @DNSB).
-- Fixed the remove hyphen class properties problem(thanks to @DNSB).
-- Added a reminder in the release notes to ensure the Microsoft Visual C++ SP1 is installed to avoid Tidy access problems.
v0.5.0
-- Major update that includes multiple fixes and added features. See the Release Notes or go to the Major Update(v0.5.0) post on this thread for details.
v0.4.9
-- Fixed a problem with the remove internet links option.
-- Fixed a problem with the remove internal links option.
v0.4.8
-- Now adds a CSS link to xhtml files whenever "Move the html CSS to new stylesheet" is run(imported html only).
v0.4.7
-- New functionality. Added an option to allow the user to move the html CSS to a new stylesheet(for imported html only).
-- Fixed a problem with removing unused bookmarks. The plugin now automatically removes unused bookmarks from both imported html and epub.
v0.4.6
-- Fixed a bug in html cleanup.
v0.4.5
-- Fixed several problems with html <styles> section rendering for imported html only.
-- Fixed a name/uuid problem in the OPF meta tags for imported html only.
-- Now automatically adds language(as "en") and timestamp to epub metadata for imported html only.
v0.4.4
-- Modified and improved the main window layout.
v0.4.3
-- Fixed a bug when "Convert all <i>, <b>, <em>, <u>, <s> and <strong> tags to span tag styling" is selected. Now converts ALL instances of the relevant tag found in each paragraph to span tag styling.
v0.4.2
-- Fixed a bug whereby the plugin would crash if the input epub had a stylesheet named "styles.css".
v0.4.1
-- Fixed bug in removeUnusedBookmarks()
v0.4.0
-- Removed success dialog window. Requested by Gregg Bell.
v0.3.9
-- Fixed a bug where the plugin always converted <b>, <u>, <i> etc tags to their span tag equivalents. The plugin will now only convert these entities to their span tag equivalents if the "Convert all <i>, <b>, <em, <u>, <s> and <strong> tags to span styling" dialog option is selected. Thanks to Gregg Bell.
v0.3.8
-- The plugin no longer removes html space entities -- specifically "&nbsp;" and "&160;" -- from the epub or html during cleanup. Thanks to Gregg Bell.
-- Added a final success dialog to the plugin.
v0.3.7
-- The plugin will no longer automatically convert the epub's default font to serif. The default font will now remain unchanged unless the "Convert all ebook text and headings to default serif throughout" option is chosen.Thanks to Gregg Bell
v0.3.6
-- Allowed resizing of the Cleanup dialog window. Also increased dialog width, increased spacing between buttons and reduced font size. Thanks to Thasaidon.
v0.3.5
-- The automatic removal of soft/hard breaks by the plugin has been removed. This capability has been moved and added to the Cleanup Options dialog as a new user option.
-- Fixed a problem with missing 'xmlns:xlink' attributes in svg code after plugin rendering.
-- Fixed a bug in the options dialog.
v0.3.4
-- Fixed an svg problem where any epub files with svg images would give Epubcheck errors. This plugin can now process epubs that contain svg images without svg errors.
v0.3.3
-- The plugin will now remove any anchor tag that contains no attributes.
v0.3.2
-- The plugin now removes all unused bookmarks from the epub or imported html doc.
v0.3.1
-- For all anchor tags that just contain ids(with no href link), the plugin now ensures that the anchor's end tag will always be positioned at the start of the parent tag string.
v0.3.0
-- Fixed a problem with missing styles after Google cleanup
v0.2.9
-- Fixed a problem with the version updater.
-- Fixed a formatting problem with the epub TOC file.
v0.2.8
-- Now automatically repairs illegal digit-start-char and spacing id values in html files only
v0.2.7
-- Removed default CSS formatting(unnecessary)
v0.2.6
-- Fixed a CSS link bug occurring in Word derived epubs only.
v0.2.5
-- Added MIT SW Licence
v0.2.4
-- fixed div/anchor bug
v0.2.3
-- fixed css display: none bug
-- fixed css semi-colon bug
-- fixed css curly brace spacing bug
-- fixed toc item spacing bug(Scrivener epubs only)
v0.2.2
-- Improved cleanup on exit
-- Now automatically fixes all standalone <img> tag displays in the html
v0.2.1
-- Adjusted base pixel screen width for % width calculation in the reformat image function.
v0.2.0
-- Now ensures proper removal of all page break declarations from the epub css.
v0.1.9
-- Fixed a problem with multi-line comments in the css. Thanks to wrCisco.
v0.1.8
-- Fixed an image reformatting problem. Thanks to roger64.
v0.1.7
-- Increased user dialog font size for better readability on OSX. Thanks to KevinH.
v0.1.6
-- Fixed an html css formatting issue for Word doc html input.
v0.1.5
-- Fixed an issue with the 'Save selection' chkbutton not properly saving current chkbutton selections in the user dialog window.
v0.1.4
-- Fixed cleanup issues on exit.
v0.1.3
-- Added a "Save selection" chkbox to allow saving current user dialog selections for future sessions. Requested by Thasaidon.
v0.1.2
-- Changed from using default font size to fixed font size(9pt) to try and help alleviate Arch Linux dialog window issues.
v0.1.1
-- Added checks to ensure epub 2.0 only documents are used with this plugin. My thanks to DiapDealer & Doitsu.
v0.1.0
-- Initial release
Attached Thumbnails
Click image for larger version

Name:	Dialog1.JPG
Views:	243
Size:	58.6 KB
ID:	201420   Click image for larger version

Name:	Dialog2.JPG
Views:	215
Size:	40.1 KB
ID:	201421  
Attached Files
File Type: zip CustomCleanerPlus_v061.zip (1.14 MB, 353 views)

Last edited by slowsmile; 12-18-2023 at 12:13 AM.
slowsmile is offline   Reply With Quote