Quote:
Originally Posted by roger64
To my shame, I never used the "report" feature of Sigil and it's indeed a nice one. 
|
Yeah the Reports are great! I use the Character Report at least once per EPUB just to make sure I can catch any anamolies (I do a lot of PDF -> OCR -> EPUB). Most of the time, my source already has the actual characters.... I rarely use the entity names, so I never ran into this problem.
Over the past week and a half I have been importing 15 years worth of articles (~6500) into Sigil, and cleaning them all up to prepare a few gigantic yearly EPUB releases (~300 articles per EPUB). In this case, the original HTML used entity names. I wanted to do some cleanup in Sigil, then do code comparison to the originals (this is why I want entities there), then I want to easily be able to swap back to characters before proofreading and releasing the EPUB (actual characters allow me to read the code much easier, and be able to catch more mistakes).
I rarely use the Link Report, but in this case, there are THOUSANDS of links pointing everywhere on the internet. The Link Report allows me to easily spot links which do not belong in the EPUB, footnotes I have not normalized (over 15 years... you can imagine all the different tools/programs that were used to generate these things).
The Class Reports allow me to catch outliers in the code itself (a weird class name that was only used once in all the articles, etc.).
I will definitely be using it more in the future, it is really helping me consolidate code. HUGE time savers.
Quote:
Originally Posted by roger64
You have trouble displaying French (mostly) characters - à, é, è, ô, etc.
[...]
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr-FR">
|
Looks to me like that XML declaration is just setting the language of the file to French. I believe Sigil uses the XML declaration at the top of the file to do some auto cleanup of entities. I know I read the logic behind the Sigil auto-clean of entities was somewhere on these forums (maybe meme or user_none in one of the older Sigil release topics?).
And I just thought of another slight tweak on the Entity -> Character, Character -> Entity request. Perhaps it can be added to the Right Click Menu -> Reformat HTML. So you will get 4 extra options there:
Characters to Entities
Characters to Entities - All HTML files
Entities to Characters
Entities to Characters - All HTML files