![]() |
#16 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
In The Source: something wicked this way comes...
My real professional life caught up with me, so this week and the next one will be even more barren as usual - prio one, deadlines, stress etc.
Did that, was there. Fake-implemented unknown language for SpellcheckEditor. Decided to face the dragons. SpellcheckEditor, to get his words, uses function Book::GetUniqueWordsInHTMLFiles() which uses time travel and parallel universes (QFuture, QtConcurrent - scary!) to invoke Book::GetWordsInHTMLFileMapped(HTMLResource *html_resource) which calls HTMLSpellCheck::GetAllWords(html_resource->GetText()). Thus we land in HtmlSpellCheck. tbc...? |
![]() |
![]() |
![]() |
#17 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
In The Source: The Player of the Games
... yes, your suspicion is right... at the moment I'm reading Iain.
HtmlSpellCheck: Not so scary after all. Just a collection of some static functions. The heart of it is: static QList<MisspelledWord> GetMisspelledWords(const QString &text, int start_offset, int end_offset, const QString &search_regex, bool first_only = false, bool include_all_words = false); It has a fine word seeking loop... which is tag aware... ...must update my regex apparently... ...wish I had more time... it got so interesting... ...added "language" to struct MisspeledWord...let's play this one... tbc...? Last edited by varlog; 06-16-2016 at 05:59 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#18 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
In The Source: Use of Weapons
.. you've surly seen it coming...
no time, no time... but: I've spend a little time in HTMLSpellCHeck. The choice of weapons is: - full quick html parser (as suggested by Kevin) versus shortcut (I need only "lang" atributte for this!). - recursion versus some logic loop (to get the language right when leaving language sick tags). Minimalistic, as I am, I go for shortcut (for now). My modest experience with parsing html bodies tells me the recursion is the answer. But... recursion is something you do not do at home... will try logic for now. But I have to go back to SpellCheck and SpellCheckEditor.... tbc...? Last edited by varlog; 07-15-2016 at 08:24 PM. Reason: wrong word |
![]() |
![]() |
![]() |
#19 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,494
Karma: 5703586
Join Date: Nov 2009
Device: many
|
You realize that lang attributes can be on tags with nested contents and even nested themselves. So a tag name stack must be built to properly handle all of these cases and to properly unwind nested tags with language attributes. So the shortcut approach is a non-starter.
KevinH |
![]() |
![]() |
![]() |
#20 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
It would be nice to have a sample ebook or, better, just a (long)snippet, with some not trivial usage of tags with language attribute. Can you spare a little time for it, Kevin? Anybody?
|
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,494
Karma: 5703586
Join Date: Nov 2009
Device: many
|
The xml:lang tag is virtually allowed anyplace. So I strongly recommend the approach of using a tag parser based on quickparser and keeping a fifo/stack of tag and lang.
|
![]() |
![]() |
![]() |
#22 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
I'm using, among others, something like this:
Code:
<html xmlns="http://www.w3.org/1999/xhtml" lang="zombie"> <head> <title>Spell Checking Languages</title> </head> <body> <p lang="fr"> <img alt="sigil" src="../Images/sigil.png" xml:lang="en"/>This is <span xml:lang="">Sigil </span>icone </p> </body> </html> Code:
<p lang="fr"> <img alt="sigil" src="../Images/sigil.png" xml:lang="en">This is <span xml:lang="">Sigil </span>icone</image> Merci! </p> Something more complicated, anybody? |
![]() |
![]() |
![]() |
#23 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,687
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Your code is invalid. You'll need to use: Code:
<p lang="fr"> <img alt="sigil" src="../Images/sigil.png" xml:lang="en" />This is <span xml:lang="en">Sigil </span>icone Merci! </p> If you're looking for a mini test case how about this old multilingual joke: Code:
<p>Four linguists were sharing a compartment on a train on their way to an international conference on sound symbolism. One was English, one Spanish, one French and the fourth German. They got into a discussion on whose language was the most eloquent and euphonious.</p> <p>The English linguist said: "Why, English is the most eloquent language. Take for instance the word "butterfly". Butterfly, butterfly... doesn't that word so beautifully express the way this delicate insect flies. It's like flutter-by, flutter-by."</p> <p>"Oh, no!" said the Spanish linguist, "the word for "butterfly" in Spanish is "<span lang="es" xml:lang="es">mariposa</span>". Now, this word expresses so beautifully the vibrant <span xml:lang="en-GB" lang="en-GB">colours</span> on the butterfly's wings. What could be a more apt name for such a brilliant creature? Spanish is the most eloquent language!"</p> <p>"<span xml:lang="fr" lang="fr">Papillon</span>!" says the French linguist, "<span xml:lang="fr" lang="fr">papillon</span>! This word expresses the fragility of the butterfly's wings and body. This is the most fitting name for such a delicate and ethereal insect. French is the most eloquent language!"</p> <p>At this the German linguist stands up, and demands: "<span xml:lang="de" lang="de">Und</span> <span xml:lang="und" lang="und">vot is rongk</span> <span xml:lang="de" lang="de">mit</span> '<span xml:lang="de" lang="de">Schmetterling</span>'?"</p> If you ever get your code to work, everything tagged as "und" or "xzz" (=no linguistic content) shouldn't be spell-checked. |
|
![]() |
![]() |
![]() |
#24 | |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
Quote:
![]() I haven't known that. Thanks. |
|
![]() |
![]() |
![]() |
#25 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
just a note for quick reference:
void elements html4 area, base, basefont(d), br, col, frame, hr, img, input, isindex(d), link, meta, param: source void elements html5 area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr(d?): source mostly Last edited by varlog; 06-23-2016 at 04:56 PM. |
![]() |
![]() |
![]() |
#26 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
In The Source: ...
It is not "The State of Art" phase for sure... and "Excession" is not happening anytime soon, either.
Was out for a few days. I'm back into SpellCheck and SpellCheckEditor now. It seems that the SpellCheck instance is first created by MainWindow initializing SpellCheckEditor which, in its initialization, calls, well, SpellCheck. I will use this, I think. But what I do now is making SpellCheck language aware. That means it has to be able, singleton as it is (for now), to hold more than one Hunspell objectcs. What happens? At the moment I have two Hunspell's loaded and my system (SSD, 4 core, 8Gb) doesn't seem to notice... will have to have more... tbc? Last edited by varlog; 06-30-2016 at 06:27 PM. |
![]() |
![]() |
![]() |
#27 |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
I just want to say...this is WAAAAAAAAAAAAAAY better than those telenovelas...
|
![]() |
![]() |
![]() |
#28 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
Why, thank you!
Also please note, I try too keep it suitable for all ages: no explicit crime or sex scenes ![]() |
![]() |
![]() |
![]() |
#29 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
I'm sure that my fellow addicts in watching this are appreciative of the G-rated effort. Don't know how you manage, myself. :-) Hitch |
|
![]() |
![]() |
![]() |
#30 |
actually it is /var/log
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
|
In The Source: The Labirynth
[REALITY] Now we are a week after deadline and we are still not done... There will be blood...[/REALITY]
oops... blood...? well, still not explicit... is it? Still in SpellCheck/SpellchecEditor. Can load as many Hunspell's as I wish... and unload them, too. Funny thing is that, even though the size of average hunspell dictionary file is about 0.5 to 2.1 MB (the ones I have), the real memory footprint is something (my ad hoc htop tells me, there are better tools, I'm sure) like 5MB to 7MB. It's irrelevant, I was just curious. Could be debugger or anything. The load times (SSD) are not noticeable. Will have to check on my laptop eventually (still HD, that is why I'm not using it anymore). The maze I'm actually in is "User Experience" (G-rated)thing. For instance I have on my system something like twenty Spain dictionaries to choose from. And Language class doesn't know them all... and prefers "-" to "_". Some "default language dictionary" is due. What a mess... My instance of Sigil, due to my meddling, has lost its ability (temporary?) to spell check. What a mess... tbc? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Yet another new blog | Nate the great | Lounge | 0 | 05-01-2011 04:32 PM |
new to blog | pemmike | Introduce Yourself | 6 | 01-03-2011 05:39 AM |
Blog | AlexRupflin | Deutsches Forum | 10 | 12-24-2008 04:05 AM |
My first Blog....ever | AJ Starr | Introduce Yourself | 7 | 05-23-2008 02:55 AM |