![]() |
[Plugin] IDErrorCheck
2 Attachment(s)
Checks, repairs and reports all id errors in the epub Requirements Plugin Type: Edit MIT Licence(OSI) Minimum Sigil requirement: v0.9.3 or higher Python Requirements: Python 3.4+ (Bundled or External) OS Requirements: Windows, Linux or OSX *** Tested on Windows 7, 8 & 10 only *** Current Version: "0.2.1" Installation * Select Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally). * Click Add Plugin and select IDErrorCheck_vXXX.zip. This will load and install the plugin into Sigil, which you can then run by selecting Plugins > Edit > IDErrorCheck Description This plugin was originally written with the sole intention of properly reporting and, if possible, fixing Epubcheck's infamous "colon" id error problems. This plugin now also does the following: * Converts all "name" attributes to "id" attributes in the html files. * Now checks and repairs all invalid id attribute values in the epub's html files. Checks and repairs illegal spaces and illegal first-digit-start errors and also checks and repairs other illegal non-alphanumerics that commonly occur within id attribute values.(v0.1.5) * Also checks and repairs all internal links that contain bad bookmarks associated with the above html id problems.(v0.1.5) * Checks and repairs all book uuid values in the toc.ncx and content.opf. If an illegal book uuid value is found then another unique uuid will be automatically generated to replace it.(v0.1.5) * Now checks and repairs all navPoint id values in the toc.ncx.(v0.1.5) * Checks and logs all id errors occurring in the content.opf manifest or spine wihout fixing them. * Will properly check, flag and identify Epubcheck's "colon" id errors and fix these errors. * At the end of the plugin run, an error dialog will display a simple error list showing all relevant information about each id error including associated file, line number, reason and bad id. Caveat Don't use the "Mend and prettify..." Sigil feature directly after using this plugin. Doing so will change and increase the number of lines in the html files so that any reported error line numbers generated by the plugin automatically become inaccurate and void. Plugin Run First load your epub into Sigil and then just run the plugin. If you only want to know which errors have not been fixed then just run the plugin twice. The first time you run the plugin the display log will show you errors that have been fixed or not fixed. The second time you run the plugin will only show you what has not been fixed. Update: This plugin can now process epubs that contain svg images without giving svg errors in Epubcheck. Change Log: Spoiler:
|
Is this plugin's functionality now all included in your CustomCleanerPlus plugin?
A note: you seem to change IDs beginning with a digit by replacing that digit with an x. Which will probably be fine, but could create duplicate IDs, e.g.: id="1" id="2" both become id="x" I manually corrected IDs by prepending X. There must be a limit to the length of an ID string, so I guess you should check if adding a character would push it over that if you were really being careful. Or just forget the original ID and regen them all. |
@AlanHK...
Quote:
The just-released IDErrorCheck does swap in an 'x' char for first char digit errors only. It also substitutes an underscore in all id values that have illegal spaces. It also regens both book ids in the toc.ncx and content.opf files if they are bad. That's all it fixes. All other illegal id values -- such as those containing illegal non-alphanumeriic chars -- are just reported. ID attribute errors in the content.opf are also not fixed -- just reported -- because of the complex rules and myriad dependencies between ids and hrefs within the content.opf and toc.ncx. |
I think what he's saying is that replacing any first-digits in an id with an 'x' could possibly result in identical ids in the same html file. Prepending the 'x' (instead of swapping) would at least guarantee that already unique ids would stay that way.
|
@DiapDealer...I'll try and put in the suggested change. This change will only apply to fixing the first char digit errors in the epub.
|
Plugin Update: The plugin has been updated(v0.1.2):
*Changed handling of illegal first char digit id errors. These errors are now fixed by prepending(not substituting) an 'x' char into the id value string. Thanks to AlanHK & DiapDealer. |
Could someone please add this new plugin to the Sigil Plugin Index? Thanks in advance.
|
Just added it.
|
1 Attachment(s)
Plugin replace id after hash for illegal first-digit-start errors, but incorrect IDs are do not fix.
Sample illegal ID: Code:
<h1 id="123abc">Chapter 1</h1>Code:
<a href="../Text/start.xhtml#123abc">Chapter 1</a>Second is corrected to: Code:
<a href="../Text/start.xhtml#x123abc">Chapter 1</a> |
@Becky...It's certainly true what you say. But here's what it says in the release notes:
Quote:
If you want to see the problem that Epubcheck has with describing bad ids then you could try running your test epub(with bad ids) through Epubcheck. Then you will see the problem with Epubcheck's strange error messaging, which always seems to involve phantom colons that aren't there. |
Thanks for the clarification.
I also understand "phantom colon", because in most cases this is the id that starts with a number. However ... Where do I get the "proper reasons for any id failure"? In IDErrorCheck Log are only records regarding changes made (in the example epub file it is the toc.xhtml file) Why in log has no records about the start.xhtml file and incorrect IDs? Information about the changes made is valuable, but the file still remains with incorrect identifiers. EpubCheck gives even more results, because not only does it provide: Code:
Error while parsing file 'value of attribute" id "is invalid; must be an XML name without colons'.Code:
Fragment of identifier is not defined. |
@BeckyEbook: You can avoid this whole issue, if you create epub3 books, because the HTML5 standard allows ids that don't start with a letter.
If that is not an option for you, you can easily identify broken links using the built-in Sigil reports tool (Tools > Reports > Links). |
@Doitsu: This is good information about epub3, but most of the files that go through my hands are still epub2.
The report is not perfect in this situation, because I see the same after validation in epubcheck. It's just a simple replacement, which I can add to Saved Searches: Code:
id="(\d)Code:
id="x\1 |
1 Attachment(s)
I'm not quite sure what you mean by "start.xhtml". Can you clarify what that file is - i.e. is it the cover file, toc file or a text file?
At the end of its run, the IDErrorCheck plugin should display all the results from the id error check in a final dialog. You also have the option of saving these results to a file if you want. Are you getting this dialog at the end of plugin run ?(see thumbnail below) |
Quote:
https://i.imgur.com/8YBsmJom.png In log are only replaces in toc.xhtml file (after hashes). |
| All times are GMT -4. The time now is 07:56 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.