Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-31-2020, 04:14 PM   #16
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,619
Karma: 724945
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
I skimmed the code a bit and I share @wrCisco's superficial impression of what it seems to be doing.

Quote:
Originally Posted by hobnail View Post
Agreed. My "fix" was using the assumption that Sigil doesn't handle bare classes in combinators. Possibly the css overgeneralized and the pc-rw was only used on divs, but otherwise it's not a good solution.
Cool!
Frenzie is offline   Reply With Quote
Old 12-31-2020, 09:47 PM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, it seems the the CSSInfo parser of Sigil does not handle combinators at all nor pseudo classes nor @media rules.

To properly test a css selector that uses adjacent, child, or descendent combinators means some use of a css selector based query or xpath like interface for Sigil's html5 repair parser gumbo. And as far as I know, these simply do not exist in C++ or C. I will continue to search for one. The closest I can find is a jQuery like interface for gumbo here:

https://github.com/lazytiger/gumbo-query

but it appears to be 5 years old with no real updates.


If I can not find anything useful, we must then turn to python and its css-parser and cssselect and lxml to do this properly. But that means we would just be pretty much duplicating wrCisco's plugin but internal to Sigil using pyqt5 in place of tk.

That seems to be wasteful duplication. Perhaps we should delete the unused class removal feature from Sigil and instead point people to wrCisco's plugin for that functionality completely.

Ideas? Thoughts?
KevinH is online now   Reply With Quote
Advert
Old 01-01-2021, 03:10 AM   #18
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,619
Karma: 724945
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
I'm guessing you already evaluated Qt's CSS parser and determined it was unsuitable to the purpose?

Quote:
Perhaps we should delete the unused class removal feature from Sigil and instead point people to wrCisco's plugin for that functionality completely.
Fwiw, sounds good to me.
Frenzie is offline   Reply With Quote
Old 01-01-2021, 06:41 AM   #19
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Couldn't we incorporate (with his permission, of course) wrCisco's python code into Sigili's python3lib and use the c++ embedded python interface to access it? Thus skipping the need to use PyQt at all for the gui? I'm not certain what else the existing plugin might provide, but even if we don't bring it entirely "in house" (eliminating the need for the third-party plugin altogether), surely we can come up with an interface to the portions we DO need to access via embedded python interpreter while still exposing those same absorbed parts to plugins via the plugin framework? Thus avoiding duplication.
DiapDealer is offline   Reply With Quote
Old 01-01-2021, 09:59 AM   #20
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,101
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by DiapDealer View Post
Couldn't we incorporate (with his permission, of course) wrCisco's python code into Sigili's python3lib and use the c++ embedded python interface to access it? Thus skipping the need to use PyQt at all for the gui? I'm not certain what else the existing plugin might provide, but even if we don't bring it entirely "in house" (eliminating the need for the third-party plugin altogether), surely we can come up with an interface to the portions we DO need to access via embedded python interpreter while still exposing those same absorbed parts to plugins via the plugin framework? Thus avoiding duplication.
There are 2 of wrCisco's plugins that I use regularly - each is a side of the same coin: cssRemoveUnusedSelectors and cssUndefinedClasses. The first, as you know, removes CSS selectors that aren't used in the HTML, the second removes class references in the HTML that don't have a corresponding style in the CSS.

If wrCisco doesn't object, it seems like incorporating BOTH of those plugins into the same Sigil function (with all the appropriate user selections) would make sense.

As a very minor nit - the Remove Unused Selectors does not combine leftover CSS.

Spoiler:
eg
Code:
sup, sub {font-size:0.675em}
sup      {vertical-align: 35%}
sub      {vertical-align: -20%}

<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>
becomes:
Code:
sup {font-size:0.675em}
sup {vertical-align: 35%}

<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>
when, ideally, it should be:
Code:
sup {font-size:0.675em; vertical-align: 35%}

<p>Today is the 1<sup>st</sup> day of 2021!!!!</p>

Last edited by Turtle91; 01-01-2021 at 10:01 AM.
Turtle91 is offline   Reply With Quote
Advert
Old 01-01-2021, 10:54 AM   #21
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Really, all of this depends on what wrCisco wants. But yes, we could use the python3lib direct interface and do the gui parts in qt.

As for Qt, there is no css parser for public use in Qt at all. They have their variant for qcss which we could extract and use but it is for their own version of css which is not compliant with css3. So there is no real point.

The only other css parser in Qt is inside QWebEngine but it is very closely integrated with their internal DOM so extracting it for reuse would be a major major pain.

I will take a look at the gumbo query code which would be nice to have anyway. If we can get that to work, we just need to better handle parsing css selectors to get what we need.
KevinH is online now   Reply With Quote
Old 01-01-2021, 10:58 AM   #22
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, combining CSS is not easy to do especially without the specificity rule calculations for css. You can not even sort the css selectors as order is important. That is really the domain of a css optimizer.
KevinH is online now   Reply With Quote
Old 01-01-2021, 11:20 AM   #23
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,553
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
For the record: I'm also not entirely opposed to just ditching Sigil's "Delete Unused Style Classes" and letting plugins handle it. Seeing as how the existing plugin already does it better, that approach has a simplistic/minimalist aspect to it that appeals greatly to me, if I'm being totally honest. I'm just spitballing, here.
DiapDealer is offline   Reply With Quote
Old 01-01-2021, 01:53 PM   #24
odamizu
just an egg
odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.odamizu ought to be getting tired of karma fortunes by now.
 
odamizu's Avatar
 
Posts: 1,587
Karma: 4300000
Join Date: Mar 2015
Device: Kindle, iOS
Quote:
Originally Posted by KevinH View Post
... Perhaps we should delete the unused class removal feature from Sigil and instead point people to wrCisco's plugin for that functionality completely.

Ideas? Thoughts?
Quote:
Originally Posted by DiapDealer View Post
... Seeing as how the existing plugin already does it better, that approach has a simplistic/minimalist aspect to it that appeals greatly to me, if I'm being totally honest. I'm just spitballing, here.
I love wrCisco's cssRemoveUnusedSelectors plugin (as well as cssUndefinedClasses) and use it exclusively. Makes sense to me to either remove "Delete unused stylesheet classes" from Sigil and point people to the plugin or, if wrCisco is willing, incorporate the plugin into official Sigil.
odamizu is offline   Reply With Quote
Old 01-01-2021, 05:36 PM   #25
wrCisco
Enthusiast
wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.wrCisco ought to be getting tired of karma fortunes by now.
 
Posts: 34
Karma: 467802
Join Date: Apr 2016
Device: none
Well, the idea of integrating the plugin's code into Sigil intrigues me, I'm almost tempted to try to write the glue code myself, if I may...

(As to the practical problem on how to integrate the code: one would need to put the appropriate python module in the python3lib directory and then write one or more c++ methods for the PythonRoutines class, which afterward could be called from elsewhere in Sigil, right? QVariants as arguments seem sufficiently harmless if one restrain themselves to numbers and strings - or lists of numbers and strings - I guess...)

But the integration wouldn't be a completely straight forward change:

- The code of the builtin Delete Unused Classes is a sort of specialization of the Sigil's Reports tool (which has the same flaws of the "Delete..." functionality in "Classes Used" and "CSS Classes"), so the code of the plugin would require some refactoring if we want to serve that as well.
- The python package cssselect, which now is only required to run plugins, will be required to run Sigil itself.
- There are two things in which the builtin Delete... is more thorough than the plugin: it looks for matches only in the xhtml files where a stylesheet is linked, and it collects selectors from <style> tags too, while the plugin is more conservative: it always looks for matches in every xhtml and xml file, and never considers <style> tags. The philosophy has always been to stay on the safe side of the error: it's better to leave some useless cruft than to remove some useful code. This, however, would probably not be good enough for a complete report on usages (so, more refactoring, or accepting a suboptimal system).

As to integrate also cssUndefinedClasses, I'm not contrary in principle, and I understand that from a user point of view the two plugins do something very similar, but from the developer point of view they are two different beasts (apart from a few GUI functions, the only thing that they have in common is the css parsing).

Combining rules, as Kevin pointed out, is in general trickier than it seems: it would be safe and not overly complicated only if the selectors were an exact match and the rules were one right after the other (as in the example of Turtle91, but that is the only case).

So, I'm not 100% sure of what would be the best course of action now: maybe wait to see what Kevin can squeeze out of the gumbo query library?
wrCisco is offline   Reply With Quote
Old 01-01-2021, 06:48 PM   #26
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Please do take a shot at the glue code if you have any interest. Yes QVariants are used to pass things. And PythonRoutines is an example of how we do the interface with the embedded python at least for a few functions. There are other cases where we skip PythonRoutines in a few places in the code but not many.

The embedded python interface/bridge does have a few restrictions, it will not pass maps but will pass lists of strings, lists of lists, or a pointer to a python Object. etc. To pass maps, I pass a keys list and values lists and build the map on the other side of the bridge. Just let me know if you have any questions.

Having cssselect be required for Sigil to function is not really an issue as all of lxml is already required for parsing and fixing pure xml files like the opf and the ncx during Sigil start-up.

That said, if we can get a gumbo based query working we can instead focus on a better selector parser in C++ or even do the parsing in python in css_parser and pass back what to query for.

There are lots of ways we could take this.

I will try to get gumbo-query building and running with our sigil specific gumbo and at least try a query and see if its jQuery-like selector interface works.

Thanks,

KevinH






Quote:
Originally Posted by wrCisco View Post
Well, the idea of integrating the plugin's code into Sigil intrigues me, I'm almost tempted to try to write the glue code myself, if I may...

(As to the practical problem on how to integrate the code: one would need to put the appropriate python module in the python3lib directory and then write one or more c++ methods for the PythonRoutines class, which afterward could be called from elsewhere in Sigil, right? QVariants as arguments seem sufficiently harmless if one restrain themselves to numbers and strings - or lists of numbers and strings - I guess...)

But the integration wouldn't be a completely straight forward change:

- The code of the builtin Delete Unused Classes is a sort of specialization of the Sigil's Reports tool (which has the same flaws of the "Delete..." functionality in "Classes Used" and "CSS Classes"), so the code of the plugin would require some refactoring if we want to serve that as well.
- The python package cssselect, which now is only required to run plugins, will be required to run Sigil itself.
- There are two things in which the builtin Delete... is more thorough than the plugin: it looks for matches only in the xhtml files where a stylesheet is linked, and it collects selectors from <style> tags too, while the plugin is more conservative: it always looks for matches in every xhtml and xml file, and never considers <style> tags. The philosophy has always been to stay on the safe side of the error: it's better to leave some useless cruft than to remove some useful code. This, however, would probably not be good enough for a complete report on usages (so, more refactoring, or accepting a suboptimal system).

As to integrate also cssUndefinedClasses, I'm not contrary in principle, and I understand that from a user point of view the two plugins do something very similar, but from the developer point of view they are two different beasts (apart from a few GUI functions, the only thing that they have in common is the css parsing).

Combining rules, as Kevin pointed out, is in general trickier than it seems: it would be safe and not overly complicated only if the selectors were an exact match and the rules were one right after the other (as in the example of Turtle91, but that is the only case).

So, I'm not 100% sure of what would be the best course of action now: maybe wait to see what Kevin can squeeze out of the gumbo query library?
KevinH is online now   Reply With Quote
Old 01-02-2021, 01:33 PM   #27
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, just to see if basic query works on gumbo, I have integrated gumbo-query into Sigil (only locally on my tree), fixed it to work with our version of sigilgumbo and ran the following testcases:

Code:
#if TEST_GUMBO_QUERY
            if (1) {
                std::string page("<h1><a>wrong link</a><a class=\"special\"\\>some link</a></h1>");
                CDocument doc;
                doc.parse(page.c_str());

                CSelection c = doc.find("h1 a.special");
                CNode node = c.nodeAt(0);
                printf("Node: %s\n", node.text().c_str());
                std::string content = page.substr(node.startPos(), node.endPos()-node.startPos());
                printf("Node: %s\n", content.c_str());
            };
            if (1) {
                std::string page = "<html><div><span>1\n</span>2\n</div></html>";
                CDocument doc;
                doc.parse(page.c_str());
                CNode pNode = doc.find("div").nodeAt(0);
                std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos());
                printf("Node: #%s#\n", content.c_str());
            };
            if (1) {
                std::string page = "<html><div><span id=\"that's\">1\n</span>2\n</div></html>";
                CDocument doc;
                doc.parse(page.c_str());
                CNode pNode = doc.find("span[id=\"that's\"]").nodeAt(0);
                std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos());
                printf("Node: #%s#\n", content.c_str());
            };
            if (1) {
                std::string page("<h1><a>some link</a></h1>");
                CDocument doc;
                doc.parse(page.c_str());

                CSelection c = doc.find("h1 a");
                std::cout << c.nodeAt(0).text() << std::endl; // some link
            }
#endif
And it all seemed to pass with flying colours. So it would appear we could easily add gumbo-query to our Sigil project (it is available under a MIT License) and use it to test CSS selectors to see if they return a value.

So all we need now is the ability to extract and better parse the selector rules themselves.

We could do that in css-parser via a python interface or we could try and write a better simpler css parser ourselves (along the lines of what we did for xhtml parsing with QuickParser.cpp).

It would not be difficult to parse css in c++ basically with a simple state machine looking for the first occurrence of non-whitespace, and then checking for special chars like "@", ";", "{" and "}" to determine state with state specific parsing. It need not validate and really just needs to properly generate the selector rules.

What we need to see is if gumbo-query can handle all of the css selectors available in css3.
Since there are no docs of any sort, we really just have to study the source code.

To make this easier, I will push everything to my own github tree: https://github.com/kevinhendricks/Sigil for anyone who wants to play around with it at all.

The modified to work gumbo-query code will live in Sigil/src/Query/ and the test code for the time being is being run out of Sigil/src/main.cpp and can be played with there.

It is being built directly into Sigil for the time being, but we could easily change it to a standalone c++ shared or static library.

I will push what I have now in case anyone wants to play around with it.

If anyone is interested in gumbo, you also might want to check out Sigil/src/Misc/GumboInterface.cpp / .h which is our C++/Qt based interface to the gumbo parsing c library.

If we like gumbo-query, I will integrate it with QString/Qt to make it very easy to use.


I just pushed all of these test changes to https://github.com/kevinhendricks/Sigil

KevinH
KevinH is online now   Reply With Quote
Old 01-02-2021, 03:34 PM   #28
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
From eyeballing the CSelector.cpp and CParse.cpp code in Sigil/src/Query/ it appears that gumbo-query handles all of the combinators and many pseudo classes and pseudo elements so gumbo-query can be a valuable addition for Sigil even on its own. So I will integrate the a find_by_selector() method directly into our GumboInterface class so that it css selectors can be used to find gumbo nodes.
KevinH is online now   Reply With Quote
Old 01-02-2021, 03:56 PM   #29
Frenzie
Wizard
Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.Frenzie ought to be getting tired of karma fortunes by now.
 
Posts: 1,619
Karma: 724945
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
Quote:
What we need to see is if gumbo-query can handle all of the css selectors available in css3.
Since it's about selecting elements rather than actually applying styles, it doesn't look like the likes of ::first-line and ::first-letter are supported. You'd have to remove those before passing it on.

https://github.com/lazytiger/gumbo-q....cpp#L382-L523

Btw, if you try something like this (p:first-child:first-of-type) it gives you a segmentation fault:
Code:
void test_html() {
	std::string page = "<html><div class=\"chapter\"><p class=\"flush\">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</p><p>second child</p></div></html>";
	CDocument doc;
	doc.parse(page.c_str());
	CNode pNode = doc.find(".chapter > p:first-child:first-of-type").nodeAt(0);
	std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos());
	printf("Node: #%s#\n", content.c_str());
}
(Silly example? Absolutely. But what else does one test for. ^_^)
Frenzie is offline   Reply With Quote
Old 01-02-2021, 04:17 PM   #30
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,655
Karma: 5433388
Join Date: Nov 2009
Device: many
We just have to check for empty vector found of Selections or GumboNode *. The code above will not be the interface we employ, instead we will add a find_by_ selector routine to our existing GumboInterface routine and remove use of CDocument and other query wrapper code completely.

Of course if the segfault happens in CParser, we would need to harden it.

Last edited by KevinH; 01-02-2021 at 04:49 PM.
KevinH is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
alphabetizing stylesheet, check book, and remove unused styles rjwse@aol.com Calibre 9 01-29-2020 06:48 PM
Pseudo classes to be deleted as unused classes Leonatus Sigil 2 09-23-2018 09:12 AM
"unused stylesheet class" is actually used AlanHK Sigil 6 06-20-2017 04:42 PM
Search and Replace; delete "author" name from "serie" roosten Library Management 6 12-17-2015 11:38 AM
Cleaning a stylesheet of unused styles roger64 Sigil 49 06-13-2012 05:23 AM


All times are GMT -4. The time now is 06:45 PM.


MobileRead.com is a privately owned, operated and funded community.