Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > PocketBook > PocketBook Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2021, 08:57 AM   #121
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
It took me a while to dreg up what was happening with the Nouveau Littré. However, the recto entry now looks again as it should:
Code:
<ar>
<head><k>recto</k></head><def><b>RECTO,</b> N.*m. [<f>ʀɛkto</f>] (mot lat. <i>recto</i>, s.-e. <i>folio recto</i>, sur le feuillet qui est à l'endroit) <f>•</f>*La première page d'un feuillet*; il est opposé à<i>verso </i>. <f>•</f>* Au pl. <i>Des rectos. </i>*<f>♦</f>*<i>Recto verso, </i>des deux côtés. <i>Faire une impression recto verso. </i></def>
</ar>
Apparently, I had rerun the script without filtering entities based on DOCTYPE and the characters substituted for the hexadecimal codepoints ( E.g. '&#AACD') were never tested before.

@GetKey Could you test whether the Nouveau Littré of Nov15th2021 looks well?
Markismus is offline   Reply With Quote
Old 11-15-2021, 03:39 PM   #122
Getkey
Junior Member
Getkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-books
 
Posts: 8
Karma: 848
Join Date: Aug 2014
Location: Netherlands
Device: PB631
Quote:
Originally Posted by Markismus View Post
Try using another font.
It's not possible to change the font in the Dictionary app. I guess I would have to modify the font in the system files, because there is no global setting in the interface. That could be the topic of another thread, but I don't care too much about this issue myself.

Testing your Nov15th2021 versions:
  • Babylon_English_Greek dictionary: works perfectly
  • Duden: works perfectly
  • Longman: works perfectly
  • Nouveau Littré: works perfectly

Very nice!

Quote:
Originally Posted by Markismus View Post
I've created a subroutine to remove the html-escape characters:
Code:
sub unEscapeHTMLString{
    my $String = shift;
    $String =~ s~\&lt;~<~sg;
    $String =~ s~\&gt;~>~sg;
    $String =~ s~\&apos;~'~sg;
    $String =~ s~\&amp;~&~sg;
    $String =~ s~\&quot;~"~sg;
    return $String;}
Why did you need to? Wasn't it done on line 1582?
Getkey is offline   Reply With Quote
Old 11-15-2021, 05:55 PM   #123
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
In some cases cleanseAr introduces '&amp;' and '&lt;'. It doesn't remove them.

It is a shortcoming of Pocketbook Dictionary, that it can't handle basic HTML. What we're doing now is converting a perfectly good-looking dictionary, so that it looks well on Pocketbook, too.

Anyway, glad that everything works. I'll rerun the script tomorrow for all those Pocketbook dictionaries. We got rid of quite a bit of bugs! Also nice for my niece, who just got a Pocketbook Lux. Thanks for testing.
Markismus is offline   Reply With Quote
Old 11-16-2021, 03:23 PM   #124
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
@Getkey I've updated all the pocketbook dictionaries in the pCloud. If you would be willing to test the new ones, I would appreciate it.
Markismus is offline   Reply With Quote
Old 11-17-2021, 03:46 PM   #125
Getkey
Junior Member
Getkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-books
 
Posts: 8
Karma: 848
Join Date: Aug 2014
Location: Netherlands
Device: PB631
Quote:
Originally Posted by Markismus View Post
In some cases cleanseAr introduces '&amp;' and '&lt;'. It doesn't remove them.
Aaah. Makes sense.

Quote:
Originally Posted by Markismus View Post
@Getkey I've updated all the pocketbook dictionaries in the pCloud. If you would be willing to test the new ones, I would appreciate it.
Just went through all of them. They look mostly great, your niece is gonna have a good time! Some little things I noticed though:
  • κοινής νεοελληνικής: sub-lists are not perfectly indented, the level 1 list are on the same horizontal line as the level 2 lists
  • all Van Dale: line breaks are missing
Attached Thumbnails
Click image for larger version

Name:	scr0003.png
Views:	63
Size:	66.9 KB
ID:	190293   Click image for larger version

Name:	scr0004.png
Views:	53
Size:	58.4 KB
ID:	190294   Click image for larger version

Name:	scr0014.png
Views:	55
Size:	41.0 KB
ID:	190295  
Getkey is offline   Reply With Quote
Old 11-18-2021, 06:25 PM   #126
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
Post #13 and #15 suggest that the only formatting for whitespace is the br-tag. All others, such as span, quote, blockquote are stripped by the converter to dic-format.

VanDale is styled with blockquotes:
Code:
<ar><head><k>leerling</k></head><def>

<blockquote><span> <span><b>l<span><u>ee</u></span>r·ling</b></span> <span> <span>de<sup><i>m</i></sup></span> </span> <span> <span> <span><i>meerv</i></span>: leerlingen</span>; <span><span><i>verkleinw</i></span>: leerlingetje</span></span> </span> <span> <span> <span> <span><i>zelfst nw</i></span> </span> <span> <span><span><b>1. </b></span> <span> <span>iem. die onderwijs krijgt</span> </span> <span> 

<blockquote align="left"><span><b>• </b></span> <span><i>een <span><b>externe</b></span> leerling</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>een <span><b>ijverige</b></span> leerling</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>een <span><b>interne</b></span> leerling</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>een <span><b>knappe</b></span> leerling</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>een leerling van <span><b>school</b></span> /sturen/verwijderen/</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>een <span><b>zwakke</b></span> leerling</i></span> </blockquote> 

<blockquote align="left"><span><b>• </b></span> <span><i>de <span><b>zwakste</b></span> leerling van de klas</i></span> | <span>de slechtst presterende leerling</span>; <span>land, bedrijf enz. dat het slechtst presteert op een bepaald terrein</span> </blockquote> 

</span> </span> <span><span><b>2. </b></span> <span> <span> <span><b>volgeling</b></span> van iemands leer of stelregels</span> </span> <span> 

<blockquote align="left"><span><b>• </b></span> <span><i>de leerlingen van <span><b>Jezus</b></span></i></span> </blockquote> 

</span> </span> </span> </span> </span></blockquote>
</def></ar>
(Empty lines are inserted for readability.)

The new greek dictionary is styled with div-blocks:
Code:
<ar><head><k>ελληνο-</k></head><def>
<div style="margin-left:0em"><span style="color:darkslategray"></span> [<!-- T --><span style="font-family:'Helvetica'">elıno</span><!-- T -->]</div>

<div style="margin-left:0em"><span style="color:darkslategray"><b>ελληνό-</b></span> [<!-- T --><span style="font-family:'Helvetica'">elınó</span><!-- T -->], όταν κατά τη σύνθεση ο τόνος ανεβαίνει στο α&#x27; συνθετικό</div>

<div style="margin-left:0em"><span style="color:darkslategray"><b>ελλην-</b></span> [<!-- T --><span style="font-family:'Helvetica'">elın</span><!-- T -->], σπάνια όταν το β&#x27; συνθετικό αρχίζει από [<!-- T --><span style="font-family:'Helvetica'">o</span><!-- T -->]</div>

<div style="margin-left:1em">α&#x27; συνθετικό σε σύνθετες λέξεις με αναφορά κατά περίπτωση στα ονόματα<i><span style="color:slategray">Έλληνας, ελληνικός, ελληνισμός.</span></i></div>

<div style="margin-left:2em"><span style="color:green"><b>1.</b></span> </div>

<div style="margin-left:3em"><span style="color:mediumseagreen"><b>α.</b></span> (σε σύνθετα επίθετα) για σχέση (φιλία, συνθήκη <i class="p" style="color:green">κτλ.</i>) μεταξύ των Ελλήνων και του λαού που υποδηλώνει το β&#x27; συνθετικό· (<i class="p" style="color:green">πρβ.</i> <i><span style="color:slategray">-ελληνικός</span></i><sub>1</sub>): <i><span style="color:slategray">~αγγλικός, ~αμερικανικός, ~γερμανικός, ~ολλανδικός, ~τουρκικός</span></i>.</div>

<div style="margin-left:3em"><span style="color:mediumseagreen"><b>β.</b></span> σε εθνικά ουσιαστικά: <i><span style="color:slategray">Ελληνοκύπριος,</span></i> ο Έλληνας της Κύπρου. <span style="color:lime"><b>||</b></span> <i><span style="color:slategray">Ελληνοαμερικανός.</span></i></div>

<div style="margin-left:2em"><span style="color:green"><b>2.</b></span></div>

<div style="margin-left:3em"><span style="color:mediumseagreen"><b>α.</b></span> με αναφορά στη νεοελληνική γλώσσα. <i class="p" style="color:green">ΑΝΤ</i> ξενο-: <i><span style="color:slategray">ελληνόγλωσσος, ~μαθής, ελληνόφωνος.</span></i> <span style="color:lime"><b>||</b></span> (<i class="p" style="color:green">παρωχ.</i>) με αναφορά στην αρχαία ή τη λόγια μορφή της ελληνικής γλώσσας: <i><span style="color:slategray">~διδάσκαλος</span></i>.</div>

<div style="margin-left:3em"><span style="color:mediumseagreen"><b>β.</b></span> για λεξικό, λεξιλόγιο <i class="p" style="color:green">κτλ.</i> στο οποίο οι λέξεις της ελληνικής γλώσσας ερμηνεύονται (αποδίδονται) στη γλώσσα που υποδηλώνει το β&#x27; συνθετικό: <i><span style="color:slategray">~αγγλικός,</span></i> <i class="p" style="color:green">ΑΝΤ</i> αγγλοελληνικός· <i><span style="color:slategray">~γερμανικός, ~ελληνικός, ~τουρκικός.</span></i></div>

<div style="margin-left:2em"><span style="color:green"><b>3.</b></span></div>

<div style="margin-left:3em"><span style="color:mediumseagreen"><b>α.</b></span> με αναφορά <i class="p" style="color:green">συνήθ.</i> στον αρχαίο ελληνικό πολιτισμό: <i><span style="color:slategray">~λάτρης</span></i>.</div>
...
</def></ar>
(I've truncated the entry at your screenshot ending and inserted empty lines for readability.)

So new to the info in post #13 and #15 is that apparently, the div-tag is retained: Every div-block gets its own block. So substituting blockquotes for div's could be an approach for VanDale.

The margin-left:..em styling doesn't work as expected. Both 1.α. and 2.α. have 3em left-margin, but they are not aligned. α' and 1. both have 0em and 1em respectively, but they are aligned.
Obviously, we are not in control! Maybe the left-margins are correctly displayed, but the div-blocks are of different width and centered. Maybe we can force them with style="width: 100%";?

@Getkey You could just get the xdxf-file, remove all entries except those your are testing and change the html-around and convert to see what works. If you could provide me with the an example of the working code and an accompanying screenshot, I am willing to implement and rerun the script on all offending dictionaries.

Last edited by Markismus; 11-19-2021 at 03:22 AM.
Markismus is offline   Reply With Quote
Old 11-20-2021, 08:35 AM   #127
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
@GetKey I replaced the <blockquote>-tags with <div style="margin: 0 0 0 1em;">-tags. For example the article 'leerling' becomes:
Code:
<ar>
<head><k>leerling</k></head><def><div style="margin: 0 0 0 1em;"><span> <span><b>l<span><u>ee</u></span>r·ling</b></span> <span> <span>de<sup><i>m</i></sup></span> </span> <span> <span> <span><i>meerv</i></span>: leerlingen</span>; <span><span><i>verkleinw</i></span>: leerlingetje</span></span> </span> <span> <span> <span> <span><i>zelfst nw</i></span> </span> <span> <span><span><b>1. </b></span> <span> <span>iem. die onderwijs krijgt</span> </span> <span> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een <span><b>externe</b></span> leerling</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een <span><b>ijverige</b></span> leerling</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een <span><b>interne</b></span> leerling</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een <span><b>knappe</b></span> leerling</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een leerling van <span><b>school</b></span> /sturen/verwijderen/</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>een <span><b>zwakke</b></span> leerling</i></span> </div> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>de <span><b>zwakste</b></span> leerling van de klas</i></span> | <span>de slechtst presterende leerling</span>; <span>land, bedrijf enz. dat het slechtst presteert op een bepaald terrein</span> </div> </span> </span> <span><span><b>2. </b></span> <span> <span> <span><b>volgeling</b></span> van iemands leer of stelregels</span> </span> <span> <div align="left" style="margin: 0 0 0 1em;"><span><b>• </b></span> <span><i>de leerlingen van <span><b>Jezus</b></span></i></span> </div> </span> </span> </span> </span> </span></div></def>
</ar>
This has a serious effect on converter.exe. It crashes when the formerly stripped blockquote-tag is at the end of a line with the message: Unclosed xml-tag at line xxxxx. Removing all EOL-characters following a <div>-tag get rid of this message.

Could you test the VanDale GWHD dictionary dated November 21th?

Last edited by Markismus; 11-21-2021 at 10:12 AM.
Markismus is offline   Reply With Quote
Old 11-21-2021, 02:16 PM   #128
Getkey
Junior Member
Getkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-booksGetkey has learned how to read e-books
 
Posts: 8
Karma: 848
Join Date: Aug 2014
Location: Netherlands
Device: PB631
@Markismus I tested your VanDale GWHD dictionary from November 21th, no changes compared to the one from the 16th.

I did a couple tests myself:

Code:
<ar><head><k>leerling</k></head><def>
		<b>l<u>ee</u>r·ling</b>  de<sup><i>m</i></sup>    <i>meerv</i>: leerlingen; <i>verkleinw</i>: leerlingetje     <i>zelfst nw</i>
		<b>1. </b>  iem. die onderwijs krijgt   <b>• </b> <i>een <b>externe</b> leerling</i> <br/> <b>• </b> <i>een <b>ijverige</b> leerling</i> <br/> <b>• </b> <i>een <b>interne</b> leerling</i> <br/> <b>• </b> <i>een <b>knappe</b> leerling</i> <br/> <b>• </b> <i>een leerling van <b>school</b> /sturen/verwijderen/</i> <br/> <b>• </b> <i>een <b>zwakke</b> leerling</i> <br/> <b>• </b> <i>de <b>zwakste</b> leerling van de klas</i> | de slechtst presterende leerling; land, bedrijf enz. dat het slechtst presteert op een bepaald terrein <br/>
		<b>2. </b>   <b>volgeling</b> van iemands leer of stelregels   <b>• </b> <i>de leerlingen van <b>Jezus</b></i> <br/>     <br/>
</def></ar>

<ar><head><k>leerling2</k></head><def><span> <span><b>l<span><u>ee</u></span>r·ling</b></span> <span> <span>de<sup><i>m</i></sup></span> </span> <span> <span> <span><i>meerv</i></span>: leerlingen</span>; <span><span><i>verkleinw</i></span>: leerlingetje</span></span> </span> <span> <span> <span> <span><i>zelfst nw</i></span> </span> <span> <span><span><b>1. </b></span> <span> <span>iem. die onderwijs krijgt</span> </span> <span> <span><b>• </b></span> <span><i>een <span><b>externe</b></span> leerling</i></span> <br/> <span><b>• </b></span> <span><i>een <span><b>ijverige</b></span> leerling</i></span> <br/> <span><b>• </b></span> <span><i>een <span><b>interne</b></span> leerling</i></span> <br/> <span><b>• </b></span> <span><i>een <span><b>knappe</b></span> leerling</i></span> <br/> <span><b>• </b></span> <span><i>een leerling van <span><b>school</b></span> /sturen/verwijderen/</i></span> <br/> <span><b>• </b></span> <span><i>een <span><b>zwakke</b></span> leerling</i></span> <br/> <span><b>• </b></span> <span><i>de <span><b>zwakste</b></span> leerling van de klas</i></span> | <span>de slechtst presterende leerling</span>; <span>land, bedrijf enz. dat het slechtst presteert op een bepaald terrein</span> <br/> </span> </span> <span><span><b>2. </b></span> <span> <span> <span><b>volgeling</b></span> van iemands leer of stelregels</span> </span> <span> <span><b>• </b></span> <span><i>de leerlingen van <span><b>Jezus</b></span></i></span> <br/> </span> </span> </span> </span> </span><br/></def></ar>
<ar>
See the attached picture. When you remove the <blockquote>s, nothing happens. It's only after you also remove the <span>s that the <br/>s start working normally again. Maybe there is too much nesting for convert.exe.
Attached Thumbnails
Click image for larger version

Name:	scr0006.png
Views:	53
Size:	60.3 KB
ID:	190355  
Getkey is offline   Reply With Quote
Old 11-21-2021, 04:41 PM   #129
Markismus
Guru
Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.Markismus is at one with the great books of the world.
 
Markismus's Avatar
 
Posts: 760
Karma: 143987
Join Date: Jul 2013
Location: Netherlands
Device: HiSenseA5ProCC, OnyxNotePro, lots of cracked Kobo's
And what if you only moved the <br> tag outside span-tags?
Markismus is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Pocketbook dictionary logan PocketBook 301 02-09-2021 03:28 PM
Dictionary coversion from .mobi to pocketbook format? doctorat PocketBook 16 07-01-2020 06:34 PM
Webster's 1913 Dictionary in Pocketbook Format luqmaninbmore PocketBook 8 05-27-2020 11:41 AM
SW>EN Dictionary for Pocketbook tttrine PocketBook 3 06-09-2015 07:01 AM


All times are GMT -4. The time now is 09:37 PM.


MobileRead.com is a privately owned, operated and funded community.