Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-02-2016, 07:52 PM   #31
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: Obvious Is Obvious When It's Obvious

[REALITY] Had a conversation with my college after a nasty showdown with a client: it turned out that what was obvious for him was not at all for me and vice versa [/REALITY]

The old question is answered.

When you invoke SpelcheckEditor (clicking on icon or per Tools->Spellcheck->Spellcheck) it calls MainWindow::SpellcheckEditorDialog() which calls m_SpellcheckEditor->show(). Now the SpellcheckEditor has a showEvent(QShowEvent *event) function - which is apparently catching this show event - which calls Refresh() function which calls CreateModel(sort_column, sort_order) function which uses
Code:
 QHash<QString, int> unique_words = m_Book->GetUniqueWordsInHTMLFiles()
to get all unique words in your text, of course.

The SpellcheckEditor m_Book variable, which was "0" by initialization, is, at this time, already changed by MainWindow which, in its initialization body, calls LoadInitialFile(openfilepath, is_internal) or CreateNewBook() functions. Both of them call, eventually, SetNewBook(QSharedPointer<Book> new_book) function which, in its initialization, uses
Code:
 m_SpellcheckEditor->SetBook(m_Book)
which, finally, sets the internal SpellcheckEditor m_Book variable to actual book.

Thus the SpellcheckEditor becomes its first book. No time travel, just some Qt event magic. Obvious.

tbc...?
varlog is offline   Reply With Quote
Old 07-03-2016, 05:11 PM   #32
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: Misery

yeah, yeah... the Stephen...

wrote a book today... not a great read... plagiary and google mostly... and the orthography is a disaster... attached, you can see for yourself. Not quite finished yet - have to add some void tags.
Got dictionaries as I wanted them... but not as "The User" would probably want them... Attached picture of latest cockpit... spell check is knockout'ed of course.

Decided on the way how to carry the language with the word... won't tell how because Kevin could protest and I want to see for myself...
Nothing final, more difficult things ahead... still have to check it on slower system... misery...

But I have my dictionaries now... time to use them. So let's tackle the pair HTMLSpellCheck - SpellCheck now.

tbc...?
Attached Thumbnails
Click image for larger version

Name:	SCE_next02.jpg
Views:	174
Size:	64.4 KB
ID:	149860  
Attached Files
File Type: epub Multilanguage.epub (6.3 KB, 125 views)
varlog is offline   Reply With Quote
Advert
Old 07-04-2016, 04:24 PM   #33
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: The Colour of Magic

Terry... will have more of him...

The moments like this keep me coding.

I can actually spell check again... multi spell check!

Well, sometimes... when I get the word, the language and the dictionary right at the same time.

I get the word as right as Sigill has ever done.
I don't get the language right because, as Kevin has nicely put it, my parser is a "non-starter" at the moment.
I don't get the dictionary right because of all those multiple dialect dictionaries.

But nevertheless... Soul Music.

tbc...?
Attached Thumbnails
Click image for larger version

Name:	SCE_next03.jpg
Views:	172
Size:	71.1 KB
ID:	149901  
varlog is offline   Reply With Quote
Old 07-04-2016, 05:34 PM   #34
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,583
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
In Sigil you need to put some Structure into the Magic

BR
BetterRed is offline   Reply With Quote
Old 07-04-2016, 06:32 PM   #35
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
NLP...? no, not in this context, surly... ehm... ok, it's only a title.. nevertheless...

I'm well aware of it.
This blog is supposed to document the process (one of the many possible) of developing the Structure.
The magic sustains me.

varlog is offline   Reply With Quote
Advert
Old 07-06-2016, 06:53 PM   #36
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: Witches Abroad

[REALITY]projects... the deadlines of the [PAST_IN_THEORY]current one[/PAST_IN_THEORY] are being (no time travel, alas) dealt with... they gave me a new one... starting now!... good for me...? not so for spellchecker...?[/REALITY]

Did some research on parsing html bodies.
This, quite well known item, apparently, made me smile... and sad...

Short version: everybody wants it, nobody has it, really, as I see it...

The private Qt qtexthatmlparser parsing class has 2054 lines of C++ code.
Kevin's quickparser.py has 202 lines of Python code.

Added (empty!) class QuickHtmlParser to Sigil.
This will be a ride...


tbc...?
varlog is offline   Reply With Quote
Old 07-06-2016, 07:47 PM   #37
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
Note, both gumbo and qt have xhtml parsers but these typically build DOM trees which we don't need. For speed reasons, all we need is a serialing parser, that gives you the sequence of text and tags (including tag type) by repeated calls so you can extract the text while keeping track of all open tags and current language. This is why quickparser.py is so much simpler.

Again, for speed reasons, we need to parse the QString representing the file contents using QChars and pointers. Please don't convert it to utf-8, instead work with QChars and pointers into QChar vectors/arrays to process everything on a const readonly basis.

You should be able to map the needed pieces of the quickparser.py to Qt QString and Qt QChar functions on almost a line by line (one to one) basis. If you run into trouble, just ask. Happy to help.

KevinH
KevinH is offline   Reply With Quote
Old 07-07-2016, 04:31 PM   #38
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
Quote:
Originally Posted by KevinH View Post
Note, both gumbo and qt have xhtml parsers but these typically build DOM trees which we don't need. For speed reasons, all we need is a serialing parser, that gives you the sequence of text and tags (including tag type) by repeated calls so you can extract the text while keeping track of all open tags and current language.
I'm with you on that.
Was considering QXmlStremreader. Too slow? Too strict?

Quote:
Again, for speed reasons, we need to parse the QString representing the file contents using QChars and pointers. Please don't convert it to utf-8, instead work with QChars and pointers into QChar vectors/arrays to process everything on a const readonly basis.
noted, I'll just have to get the meaning of it .

Quote:
... If you run into trouble, just ask. Happy to help.
counting on this, eventually .
varlog is offline   Reply With Quote
Old 07-07-2016, 05:09 PM   #39
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
Out of the Source: The Structure of Magic

...actually I've read this one, when I was a teenager... now I think you can only learn magic if you are a magician... met too many corporate guys who were not...

Put some structure into my book because I was getting lost... not quite finished yet.. And the spelling seems to be out of control.
My multiple dictionaries problem brought me to this.
Shortcut, for now...

But now [REALITY] calls ...

tbc...?
Attached Files
File Type: epub Multilanguage.epub (6.8 KB, 111 views)
varlog is offline   Reply With Quote
Old 07-07-2016, 08:45 PM   #40
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by varlog View Post
I'm with you on that.
Was considering QXmlStremreader. Too slow? Too strict?
.
Both too slow (high overhead) and too strict as everything must be perfectly formed otherwise it craps out.

KevinH
KevinH is offline   Reply With Quote
Old 07-11-2016, 06:11 PM   #41
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
...

...[/REALITY]
umm, where I was...? A, QuickHtmlParser... Hmmm... Will take some time...

for now... [REALITY]...

tbc...?
varlog is offline   Reply With Quote
Old 07-15-2016, 07:11 PM   #42
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: Interesting Times

The first one for me was the "Going Postal", as an audio book. I was immediately hooked.
But I agree with JSWolf: the proper reading order is this.


I'm on the QuickHtmlParser... should be QuickSerialHtmlParser... but I had not much time and, when I had, I got distracted. Did some code refactoring because I wanted the dictionary part ready for the main event. And had some new ideas about it, of course.
Reading Python code is a no-event for me (yes, I start with quickparser.py), have to ask Google WTF does it mean... exaggeration, of course...
And speaking of distractions: I've managed to shoot my foot with my book, I think.


tbc...?
Attached Files
File Type: epub Multilanguage.epub (7.6 KB, 97 views)
varlog is offline   Reply With Quote
Old 07-17-2016, 06:58 PM   #43
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: Thud!

I'm having my fun with Kevin's python quickparser.py and C++. He is using slice[i:i+1] where every other normal programing language would use [i]. Python? ...or missing something obvious, as usual?

But it's not the real problem.

I was avoiding this topic till now.

One of the main clients of HTMLSpellCheck class is XHTMLHighlighter class. It renders the content you see in code view.
It uses the function void highlightBlock(const QString &text) which
Code:
    // Overrides the function from QSyntaxHighlighter;
    // gets called by QTextEditor whenever
    // a block (line of text) needs to be repainted
it means that some QtGod((c)varlog) entity named QTextEditor decides what should be highlighted at the moment. It means you get a chunks of text you don't know the start and the end of, courtesy of QTextEditor.
The god-like status of QTextEditor is of course the matter of investigation... which is not the the aim of this exercise...

But what really happens is: it could be that your html tag is not finished yet but the chunk of text provided is!

The HTMLSpellCheck is called two times by Sigil start, my debugger tells me. This will need some investigation...


tbc...?
varlog is offline   Reply With Quote
Old 07-17-2016, 07:08 PM   #44
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,651
Karma: 5433388
Join Date: Nov 2009
Device: many
Python 3 uses a single character "slice" to render that char as a character and not a integer representing the unicode code value. Python 2 does not need this but can lve wih it.

This is just something to make python code work on both python 2.7 and python 3
KevinH is offline   Reply With Quote
Old 07-19-2016, 06:18 PM   #45
varlog
actually it is /var/log
varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.varlog ought to be getting tired of karma fortunes by now.
 
varlog's Avatar
 
Posts: 341
Karma: 2994236
Join Date: Sep 2012
Location: usually Europa
Device: prs t1
In The Source: The Texas Chain Saw Massacre

actually, I've not seen the film, the title was enough...

well, it's not so bad, really... I tend to exaggerate...

Still in QuickSerialHtmlParser... had my fun with git, which was obnoxiously protesting... trying to get it used to wild chunks of input... but the parsing of tags still craps out on Multilanguage.epub, so more work is due, before I start to do some real work...

Changed its status from initially static to singleton.

The copyright note on quickparser.py is:
Code:
# Copyright (c) 2014 Kevin B. Hendricks, John Schember, and Doug Massay
There were three of them on this! so I can take my time!

tbc...?
varlog is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Yet another new blog Nate the great Lounge 0 05-01-2011 04:32 PM
new to blog pemmike Introduce Yourself 6 01-03-2011 05:39 AM
Blog AlexRupflin Deutsches Forum 10 12-24-2008 04:05 AM
My first Blog....ever AJ Starr Introduce Yourself 7 05-23-2008 02:55 AM


All times are GMT -4. The time now is 06:30 AM.


MobileRead.com is a privately owned, operated and funded community.