01-02-2021, 07:10 PM | #31 | |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Took a look at the code in CNode.cpp and it is so thin a wrapper around GumboNode* that it does not handle the code of a NULL node. It has this problem all throughout the code in that file.
It will have to be rewritten and hardened. Thanks for the head's up! Quote:
|
|
01-02-2021, 08:45 PM | #32 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Okay, I hardened CNode up a lot, removed the uneeded CDocument.h/.cpp and integrated it into a new find method in Sigil/src/Misc/GumboInterface.cpp
Also adjusted the test cases in main.cpp to use the new GumboInterface approach. This should be enough to play around with for a while, while we either use the embedded python interface and css_parser to get a good list of selectors that need to be queried for or we end up writing our own quick and dirty css parser along the lines of QuickParser.cpp approach used for xhtml files. Last edited by KevinH; 01-03-2021 at 09:48 AM. |
01-03-2021, 09:47 AM | #33 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
I found a very basic css parser in C++ that uses the LGPL that we may be able to extract in order to create our own simple css parser.
See https://github.com/csstidy-c/csstidy...master/csstidy It is all interlaced with code to compress/optimize the css which we do not need that will have to be stripped away and it all needs to be greatly simplified but it could form the basis for what we need. It is the only state based css parser that does not depend on regular expressions to do the parsing. I will take a stab at extracting and modifying it into something we can use. |
01-03-2021, 04:25 PM | #34 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
For those interested, I have been playing around with csstidy's css parser and have removed all of the "optimization code" and its overhead to hopefully leave a decent css parser/tokenizer in C++.
I will work this week on restructuring it from a command line tool into a working class that we could incorporate into Sigil. Once we add this in and the gumbo-query code we should be better able to use it to detect unused selectors more safely. |
01-03-2021, 11:09 PM | #35 |
Enthusiast
Posts: 34
Karma: 467802
Join Date: Apr 2016
Device: none
|
I played a little with the Sigil/gumbo-query integration. Seems nice, but I got it to crash with a number of selectors:
Code:
p:nth-of-type(2n-1) p:lang(it) p.flush::first-letter p:focus Code:
nth-of-type(2n-1) Code:
:not(.someclass) (and since ::first-letter and :focus will never match anything, it will be mostly a problem for the css parser to filter them out). At the same time, just for the sake of it, I started experimenting with integrating the cssRemoveUnusedSelectors plugin in the Sigil reports tool. I succeeded in report back to the "CSS Classes" widget in the Reports dialog all the usages of the selectors (with the complete list of the number of matches per html file). I'm not sure how best to proceed about the deletion functionality: I thought maybe keeping around the parsed css rules as PyObjectPtr when the selector usages are reported, and then passing it back to python to perform the deletion, would be a good approach (but how could I pass a PyObjectPtr to a python function through the embeddedpython::runInPython interface?). Or maybe it might be better to execute twice the parsing of the stylesheets - once for the reporting and once for the deletion - and to pass back and forth only the indices of the enumerated rules and selectors? |
01-04-2021, 08:58 AM | #36 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Thanks for the feedback on the query functionality, I will track down the segfaults and add a filter to remove things like hover and focus to prevent issues.
As you can see, the embedded interface for python objects is quite limited. It is not a true bridge. The only similar example is to call in to get a python object as QVariant and from it create a PyObjectPtr (See Misc/PyObjectPtr) and then use it to invoke its methods. In PythonRoutines see the routines related to metadata as the only example I have for that. The python object itself can keep state across multiple calls and is reference counted so it should work to use it to call back in with additional information. But not any of that has been tested because it was never needed before. So your idea of parsing twice might be the best one. Last edited by KevinH; 01-04-2021 at 09:00 AM. |
01-04-2021, 11:17 AM | #37 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Okay I took a peek at the gumbo-query code and it silently threw strings (not exceptions) all throughout CParser that causes a sigabort not an actual sigsegfault.
So all of those unknown pseudo classes like ":lang(it)" (which BTW should/could have been converted to a simple attribute and value query) now result in throwing exceptions which are caught in CSelection find routine and a qDebug() message is printed and the result of the find is a CSelection with no valid GumboNode* With these changes using our now more heavily modified Query should be a bit more robust. I have pushed these changes to Sigil my own github site. |
01-04-2021, 08:11 PM | #38 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
By greatly simplifying csstidy c++ code and extracting just the pieces that do the actual parsing of the css, and merging it, I have created CSSParser.cpp, CSSProperties.cpp, and CSSUtils.cpp that seem to do a good job parsing CSS stylesheets so we can use it to extract a complete list of selectors. It can also do some validation of css stylesheets as well as pretty printing the parsed css.
When I get them cleaned up and reduced down even more, I will add the to my Sigil tree in case people want to play around with it. |
01-05-2021, 12:43 PM | #39 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
New version of CSSParser with an actual interface attached!
Okay I have pushed the following files to Sigil in my github tree: Misc/CSSParser.cpp and .h, Misc/CSSProperties.cpp and .h and Misc/CSSUtils.cpp and .h. These use only std:: cpp string manipulation. They have only been integrated into the Sigil build but they are not being used yet. For those who want to play around with the command line version of the parser to see how it works, I have attached a cssparser.zip file here with a very simple main.cpp and Makefile so that others can try building it. unzip cssparser.zip cd cssparser make cd release/cssparser ./cssparser PATH_TO_STANDALONE_EXISTING_CSS_FILE It will parse the whole thing and then spit it back all cleaned up. It stores the entire parser css as a vector of tokens that have a type indicator and string data piece associated with each. Check out the print_css routine in CSSParser.cpp to see how easy it is to add routines to get exactly what you want (for this problem we will want to the list of the selectors). Check out the main.cpp to get the CSSParser interface example. It should build easily on macOS and Linux and a Windows build should be doable with a bit more work. Any testing and feedback or windows build tweaks welcomed. Last edited by KevinH; 01-07-2021 at 08:46 AM. Reason: Added new expanded version of cssparser.zip |
01-06-2021, 10:10 AM | #40 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
BTW
The code in cssparser.zip is not final yet. I still might revert to using the original or an augmented structure for each token instead of the simpler std::pair. I may try to add file position information. I also will be adding an interface to get logged errors, and warning for use in diagnostic and validation. |
01-06-2021, 11:38 AM | #41 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
cssparser_v1.1.zip
Okay, for those playing along with the new cssparser code here is version 1.0 with a reasonable interface.
unzip cssparser_v1.1.zip cd cssparser make (on Windows use nmake from a Visual Studio command prompt( cd release/cssparser ./cssparser PATH_TO_STANDALONE_EXISTING_CSS_FILE It will parse the whole thing and then spit it back all cleaned up. It stores the entire parser css as a vector of tokens that have a type indicator and string data piece associated with each. Check out main.cpp to see the CSSParser interface example. It now can now: - set css level to use - parse a passed in std::string which is the contents of a css file - check for errors, warnings, and information messages so it can do validation - walk though the parsed tokens to enable post processing - and pretty print back the results of the parse It should build easily on macOS and Linux and Window (thanks to DiapDealer for Makefile.win32). Any testing and welcomed. ps. The 1.1 versions of the main files have also been pushed to my personal github Sigil repo. If you just want to see the source you can look at: Sigil/src/Misc/CSSParser.cpp and .h Sigil/src/Misc/CSSProperties.cpp and .h Sigil/src/Misc/CSSUtils.cpp and .h at https://github.com/kevinhendricks/Sigil Last edited by KevinH; 01-11-2021 at 12:01 PM. Reason: Removed zip, see a later post for an improved version |
01-06-2021, 12:51 PM | #42 |
Guru
Posts: 692
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
|
Build under Windows without a problem.
I have installed MSYS64 – very detailed description eg here: https://www.devdungeon.com/content/i...ndows-msys2-cc I typed "make" and a few seconds later I had cssparser.exe running If the program is to run quite independently, it needs three DLL files: libgcc_s_seh-1.dll libstdc++-6.dll libwinpthread-1.dll By default dll files are in the C:\msys64\mingw64\bin\ folder |
01-06-2021, 01:02 PM | #43 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Thanks for that info. I did not know about the MSYS64 toolchain.
If we decide to keep and use CSSParser officially, then those 3 classes should build out of the box on Windows with cmake as part of the Sigil build. It is just easier to evaluate CSSParser as a standalone command line app. The code itself does not use threads or throw exceptions so I think those extra dll's must be part of MSYS64 requirements. If you get a chance to test it on more advanced stylesheets, and get failures, I would love to hear about it. Thanks! |
01-06-2021, 01:42 PM | #44 |
Grand Sorcerer
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Here's a Makefile that will work with Visual Studio's nmake command-line tool.
Just put Makefile.win32 in the same folder as the 'Makefile' in Kevin's already unzipped cssparser and then open a Visual Studio command prompt and basically follow the same instructions Kevin gave. Make sure you've cded into cssparser: then type: nmake /f Makefile.win32 Feel free to include it in your testing zip file if you like, Kevin. |
01-06-2021, 01:49 PM | #45 |
Sigil Developer
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Will do!
Thanks! Added it to the cssparser_v1.0 to create cssparser_v1.1.zip with no other changes) Last edited by KevinH; 01-06-2021 at 05:11 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
alphabetizing stylesheet, check book, and remove unused styles | rjwse@aol.com | Calibre | 9 | 01-29-2020 06:48 PM |
Pseudo classes to be deleted as unused classes | Leonatus | Sigil | 2 | 09-23-2018 09:12 AM |
"unused stylesheet class" is actually used | AlanHK | Sigil | 6 | 06-20-2017 04:42 PM |
Search and Replace; delete "author" name from "serie" | roosten | Library Management | 6 | 12-17-2015 11:38 AM |
Cleaning a stylesheet of unused styles | roger64 | Sigil | 49 | 06-13-2012 05:23 AM |