Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-02-2021, 07:10 PM   #31
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Took a look at the code in CNode.cpp and it is so thin a wrapper around GumboNode* that it does not handle the code of a NULL node. It has this problem all throughout the code in that file.

It will have to be rewritten and hardened.

Thanks for the head's up!

Quote:
Originally Posted by Frenzie View Post
Since it's about selecting elements rather than actually applying styles, it doesn't look like the likes of ::first-line and ::first-letter are supported. You'd have to remove those before passing it on.

https://github.com/lazytiger/gumbo-q....cpp#L382-L523

Btw, if you try something like this (p:first-child:first-of-type) it gives you a segmentation fault:
Code:
void test_html() {
	std::string page = "<html><div class=\"chapter\"><p class=\"flush\">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</p><p>second child</p></div></html>";
	CDocument doc;
	doc.parse(page.c_str());
	CNode pNode = doc.find(".chapter > p:first-child:first-of-type").nodeAt(0);
	std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos());
	printf("Node: #%s#\n", content.c_str());
}
(Silly example? Absolutely. But what else does one test for. ^_^)
KevinH is offline   Reply With Quote
Old 01-02-2021, 08:45 PM   #32
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, I hardened CNode up a lot, removed the uneeded CDocument.h/.cpp and integrated it into a new find method in Sigil/src/Misc/GumboInterface.cpp

Also adjusted the test cases in main.cpp to use the new GumboInterface approach.

This should be enough to play around with for a while, while we either use the embedded python interface and css_parser to get a good list of selectors that need to be queried for or we end up writing our own quick and dirty css parser along the lines of QuickParser.cpp approach used for xhtml files.

Last edited by KevinH; 01-03-2021 at 09:48 AM.
KevinH is offline   Reply With Quote
Advert
Old 01-03-2021, 09:47 AM   #33
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
I found a very basic css parser in C++ that uses the LGPL that we may be able to extract in order to create our own simple css parser.

See https://github.com/csstidy-c/csstidy...master/csstidy

It is all interlaced with code to compress/optimize the css which we do not need that will have to be stripped away and it all needs to be greatly simplified but it could form the basis for what we need.

It is the only state based css parser that does not depend on regular expressions to do the parsing.

I will take a stab at extracting and modifying it into something we can use.
KevinH is offline   Reply With Quote
Old 01-03-2021, 04:25 PM   #34
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
For those interested, I have been playing around with csstidy's css parser and have removed all of the "optimization code" and its overhead to hopefully leave a decent css parser/tokenizer in C++.

I will work this week on restructuring it from a command line tool into a working class that we could incorporate into Sigil.

Once we add this in and the gumbo-query code we should be better able to use it to detect unused selectors more safely.
KevinH is offline   Reply With Quote
Old 01-03-2021, 11:09 PM   #35
wrCisco
Enthusiast
wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.
 
Posts: 34
Karma: 467802
Join Date: Apr 2016
Device: none
I played a little with the Sigil/gumbo-query integration. Seems nice, but I got it to crash with a number of selectors:
Code:
p:nth-of-type(2n-1)
p:lang(it)
p.flush::first-letter
p:focus
I thought it could have a problem with pseudo classes with function-like syntax as
Code:
nth-of-type(2n-1)
but then I tried
Code:
:not(.someclass)
and not only it didn't cause a crash, but it also correctly found a match.
(and since ::first-letter and :focus will never match anything, it will be mostly a problem for the css parser to filter them out).

At the same time, just for the sake of it, I started experimenting with integrating the cssRemoveUnusedSelectors plugin in the Sigil reports tool. I succeeded in report back to the "CSS Classes" widget in the Reports dialog all the usages of the selectors (with the complete list of the number of matches per html file).
I'm not sure how best to proceed about the deletion functionality: I thought maybe keeping around the parsed css rules as PyObjectPtr when the selector usages are reported, and then passing it back to python to perform the deletion, would be a good approach (but how could I pass a PyObjectPtr to a python function through the embeddedpython::runInPython interface?). Or maybe it might be better to execute twice the parsing of the stylesheets - once for the reporting and once for the deletion - and to pass back and forth only the indices of the enumerated rules and selectors?
wrCisco is offline   Reply With Quote
Advert
Old 01-04-2021, 08:58 AM   #36
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Thanks for the feedback on the query functionality, I will track down the segfaults and add a filter to remove things like hover and focus to prevent issues.

As you can see, the embedded interface for python objects is quite limited. It is not a true
bridge. The only similar example is to call in to get a python object as QVariant and from it create a PyObjectPtr (See Misc/PyObjectPtr) and then use it to invoke its methods. In PythonRoutines see the routines related to metadata as the only example I have for that. The python object itself can keep state across multiple calls and is reference counted so it should work to use it to call back in with additional information. But not any of that has been tested because it was never needed before.

So your idea of parsing twice might be the best one.

Last edited by KevinH; 01-04-2021 at 09:00 AM.
KevinH is offline   Reply With Quote
Old 01-04-2021, 11:17 AM   #37
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay I took a peek at the gumbo-query code and it silently threw strings (not exceptions) all throughout CParser that causes a sigabort not an actual sigsegfault.

So all of those unknown pseudo classes like ":lang(it)" (which BTW should/could have been converted to a simple attribute and value query) now result in throwing exceptions which are caught in CSelection find routine and a qDebug() message is printed and the result of the find is a CSelection with no valid GumboNode*

With these changes using our now more heavily modified Query should be a bit more robust.

I have pushed these changes to Sigil my own github site.
KevinH is offline   Reply With Quote
Old 01-04-2021, 08:11 PM   #38
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
By greatly simplifying csstidy c++ code and extracting just the pieces that do the actual parsing of the css, and merging it, I have created CSSParser.cpp, CSSProperties.cpp, and CSSUtils.cpp that seem to do a good job parsing CSS stylesheets so we can use it to extract a complete list of selectors. It can also do some validation of css stylesheets as well as pretty printing the parsed css.

When I get them cleaned up and reduced down even more, I will add the to my Sigil tree in case people want to play around with it.
KevinH is offline   Reply With Quote
Old 01-05-2021, 12:43 PM   #39
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
New version of CSSParser with an actual interface attached!

Okay I have pushed the following files to Sigil in my github tree: Misc/CSSParser.cpp and .h, Misc/CSSProperties.cpp and .h and Misc/CSSUtils.cpp and .h.

These use only std:: cpp string manipulation. They have only been integrated into the Sigil build but they are not being used yet.

For those who want to play around with the command line version of the parser to see how it works, I have attached a cssparser.zip file here with a very simple main.cpp and Makefile so that others can try building it.

unzip cssparser.zip
cd cssparser
make
cd release/cssparser
./cssparser PATH_TO_STANDALONE_EXISTING_CSS_FILE

It will parse the whole thing and then spit it back all cleaned up. It stores the entire parser css as a vector of tokens that have a type indicator and string data piece associated with each.

Check out the print_css routine in CSSParser.cpp to see how easy it is to add routines to get exactly what you want (for this problem we will want to the list of the selectors).

Check out the main.cpp to get the CSSParser interface example.

It should build easily on macOS and Linux and a Windows build should be doable with a bit more work.

Any testing and feedback or windows build tweaks welcomed.

Last edited by KevinH; 01-07-2021 at 08:46 AM. Reason: Added new expanded version of cssparser.zip
KevinH is offline   Reply With Quote
Old 01-06-2021, 10:10 AM   #40
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
BTW
The code in cssparser.zip is not final yet. I still might revert to using the original or an augmented structure for each token instead of the simpler std::pair. I may try to add file position information. I also will be adding an interface to get logged errors, and warning for use in diagnostic and validation.
KevinH is offline   Reply With Quote
Old 01-06-2021, 11:38 AM   #41
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
cssparser_v1.1.zip

Okay, for those playing along with the new cssparser code here is version 1.0 with a reasonable interface.

unzip cssparser_v1.1.zip
cd cssparser
make (on Windows use nmake from a Visual Studio command prompt(
cd release/cssparser
./cssparser PATH_TO_STANDALONE_EXISTING_CSS_FILE

It will parse the whole thing and then spit it back all cleaned up. It stores the entire parser css as a vector of tokens that have a type indicator and string data piece associated with each.

Check out main.cpp to see the CSSParser interface example. It now can now:

- set css level to use
- parse a passed in std::string which is the contents of a css file
- check for errors, warnings, and information messages so it can do validation
- walk though the parsed tokens to enable post processing
- and pretty print back the results of the parse

It should build easily on macOS and Linux and Window (thanks to DiapDealer for Makefile.win32).


Any testing and welcomed.

ps. The 1.1 versions of the main files have also been pushed to my personal github Sigil repo.
If you just want to see the source you can look at:

Sigil/src/Misc/CSSParser.cpp and .h
Sigil/src/Misc/CSSProperties.cpp and .h
Sigil/src/Misc/CSSUtils.cpp and .h

at https://github.com/kevinhendricks/Sigil

Last edited by KevinH; 01-11-2021 at 12:01 PM. Reason: Removed zip, see a later post for an improved version
KevinH is offline   Reply With Quote
Old 01-06-2021, 12:51 PM   #42
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 692
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
Build under Windows without a problem.
I have installed MSYS64 – very detailed description eg here: https://www.devdungeon.com/content/i...ndows-msys2-cc
I typed "make" and a few seconds later I had cssparser.exe running

If the program is to run quite independently, it needs three DLL files:
libgcc_s_seh-1.dll
libstdc++-6.dll
libwinpthread-1.dll

By default dll files are in the C:\msys64\mingw64\bin\ folder
BeckyEbook is offline   Reply With Quote
Old 01-06-2021, 01:02 PM   #43
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Thanks for that info. I did not know about the MSYS64 toolchain.

If we decide to keep and use CSSParser officially, then those 3 classes should build out of the box on Windows with cmake as part of the Sigil build.

It is just easier to evaluate CSSParser as a standalone command line app.

The code itself does not use threads or throw exceptions so I think those extra dll's must be part of MSYS64 requirements.

If you get a chance to test it on more advanced stylesheets, and get failures, I would love to hear about it.

Thanks!
KevinH is offline   Reply With Quote
Old 01-06-2021, 01:42 PM   #44
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Here's a Makefile that will work with Visual Studio's nmake command-line tool.

Just put Makefile.win32 in the same folder as the 'Makefile' in Kevin's already unzipped cssparser and then open a Visual Studio command prompt and basically follow the same instructions Kevin gave. Make sure you've cded into cssparser:

then type:
nmake /f Makefile.win32

Feel free to include it in your testing zip file if you like, Kevin.
Attached Files
File Type: zip Makefile.win32.zip (439 Bytes, 156 views)
DiapDealer is offline   Reply With Quote
Old 01-06-2021, 01:49 PM   #45
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Will do!

Thanks!

Added it to the cssparser_v1.0 to create cssparser_v1.1.zip with no other changes)

Last edited by KevinH; 01-06-2021 at 05:11 PM.
KevinH is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
alphabetizing stylesheet, check book, and remove unused styles rjwse@aol.com Calibre 9 01-29-2020 06:48 PM
Pseudo classes to be deleted as unused classes Leonatus Sigil 2 09-23-2018 09:12 AM
"unused stylesheet class" is actually used AlanHK Sigil 6 06-20-2017 04:42 PM
Search and Replace; delete "author" name from "serie" roosten Library Management 6 12-17-2015 11:38 AM
Cleaning a stylesheet of unused styles roger64 Sigil 49 06-13-2012 05:23 AM


All times are GMT -4. The time now is 07:16 PM.


MobileRead.com is a privately owned, operated and funded community.