![]() |
#1 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,349
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Formalize current functionality?
Soooo....I'm cleaning up a book with 145000 chapters/files. I make a change - a very simple one that shouldn't be any problem - and when I try and save Sigil pukes and says it can't save because there is some malformed code....somewhere.
I could let Sigil fix it automatically...but I would rather not....who knows what would happen then?? So I go file by file and open them one at a time looking for the red error box in the preview pane. Did I mention there are 145000 chapters??...takes FOREVER. I can't even run a report to see if that would help because of the malformed html. I've lived with this for a couple years...I took it as my just deserts for making the mistake in the first place...but JUST THIS WEEK I found a way for Sigil to tell me which file actually has the error on it. "How?" you ask. I select all the files in the text folder then right-click and attempt to link a stylesheet. It doesn't let me do this either, but the error code includes the name of the first file with an error!! Hallelujah!! Of course, it only lists one errant file at a time....so my request is: Can we formalize that functionality into an error listing? Something along the lines of the "Report" page but with a list of offending files that could be clicked on to open. That would make my blundering so much more enjoyable..... Thanks, |
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I agree I agree, a few of those error dialogs could use some tweaking.
You could always click on the little checkmark icon as well to "Validate EPUB with Flight Crew". That would also then tell you which files are malformed, and should be able to point you in the general vicinity of the line number of the errors. Quote:
![]() Last edited by Tex2002ans; 03-21-2015 at 07:18 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
|
![]() |
![]() |
![]() |
#4 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,054
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I will admit
![]() ![]() |
![]() |
![]() |
![]() |
#5 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi
I think I could throw together a python plugin that would walk the complete set of xhtml files and build up a report of any not well-formed files with a description of at least the first error in the file if one exists. Would this do the trick? BTW: We have already removed Tidy and will use google's gumbo-parser to auto clean up any not properly formed files in the future. Gumbo implements the true html5 parsing spec and will handle the html exactly like browsers will. Gumbo is basically like Beautiful Soup but written in C and really fast. Last edited by KevinH; 03-21-2015 at 05:34 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,349
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
|
![]() |
![]() |
![]() |
#7 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
|
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Which? The plugin I offered to write?
or The version of Sigil without Tidy and with Gumbo? If the former, I will try to drum something up later this week or early next if I get a few free moments. If the latter, Sigil master already has Tidy gone and Gumbo in place but it is in very rough shape as we have been tearing Xerces out of it,and FlightCrew (will make it a plug-in) and will come with python 3.4 embedded in it when it stabilizes. It will need lots of testing but I would guess within a month or so you may see an alpha or a beta. We have already begun changing the OPF Parser to allow and keep epub3 features. Take care, KevinH |
![]() |
![]() |
![]() |
#10 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,750
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Thanks for the information. It is indeed the latter (Sigil without Tidy).
So with the next Sigil, will it be possible to not have the structure changed? |
![]() |
![]() |
![]() |
#11 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Hi,
No Tidy, so no playing with your classes, no need to remove duplicate classes, just a really robust parser that tries its best to come up with something useful out of any html soup. Plus we get the inherent benefit of recognizing all html5 tags. Still lots and lots to do before any release as we also replaced the strict Xml processor Xerces with Gumbo node trees and parsing for all xhtml files. We are replacing the remainder of Xerces use with pure xml (opf and ncx) with python and lxml. So lots of code had to be rewritten and redesigned and still needs some work. I'm sure lots of new bugs will have to be tracked down and fixed. But if we want to support both epub2 and 3 there was no other way then overhaul the tools Sigil used. This is the first step. Once this is complete we can start adding in support for epub3. KevinH |
![]() |
![]() |
![]() |
#12 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
|
Once Xerces is gone, will Sigil still require sse2 hardware?
Last edited by signum; 03-23-2015 at 02:17 AM. Reason: forgot which group I was in |
![]() |
![]() |
![]() |
#13 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,569
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
![]() |
![]() |
![]() |
#14 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,761
Karma: 5706256
Join Date: Nov 2009
Device: many
|
Sigil Plugin: SanityChecker_v0.1.0.zip
Hi,
This is now updated to version 0.1.0 which provides the line and column of the last open start tag as well when tag nesting mismatches occur. Quote:
Attached is a quick and dirty Sigil validation plugin that will do a rough (and I mean rough!) sanity check of all xhtml files in an ebook and report back the first nesting error, mismatched attribute quotes, things like that which will prevent it from being parsed by an xml parser. It will also detect basic structure errors. It is NOT in any way meant to replace Flightcrew or EpubCheck. But it will detect gross errors that would prevent a pure xml parser (without a dtd) from loading it. Nicely, the gumbo html5 parser would happily eat any of this for lunch and fix it automatically on the fly. Give it a try and see if it will cut your list of files to worry about down to a manageable level. We can improve it if anyone feels it is useful enough to want something that will do a bit more. But for real validation, you really need epubcheck. I have attached it. Hope this helps. KevinH Last edited by KevinH; 05-07-2015 at 03:34 PM. Reason: update to newer release |
|
![]() |
![]() |
![]() |
#15 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,349
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Wow, that was fast!!
Thanks again Kevin, I'll give it a whirl and report back! |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Collection functionality | sorin | Plugins | 12 | 04-06-2011 04:38 AM |
Request PDF Functionality | aidren | enTourage Archive | 10 | 05-04-2010 07:11 PM |
Right click functionality | dmikov | Calibre | 4 | 07-30-2009 12:25 AM |
Functionality | bookish | Which one should I buy? | 24 | 06-19-2007 12:32 PM |