View Single Post
Old 02-09-2021, 12:02 PM   #76
InMyPocket
Member
InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.InMyPocket can teach chickens to fly.
 
Posts: 21
Karma: 3620
Join Date: Feb 2021
Device: Pocketbook
Hacked Penelope, Convert Tools Bundle, Wiktionary (fr) and Nouveau Littré

Hi,


I "hacked" the 3.1.3 version of the Penelope tool : https://github.com/pettarin/penelope to support XDXF file instead of the suggested XML format. (The generated file extension is .xdxf and the tags are replaced by the XDXF standard).
It also manage some problems with encoding and escaping done by the original Penelope tool.

The file is availables here :
https://gofile.io/d/mTBOqX

penelope-3.1.3-XDXF-InMyPocket.zip


I also created a bundle with the tools required to convert files to Pocketbook .dic file.
It includes:
* the above hacked penelope tool;
* the Pocketboot converter tool with several locales;
* a "sed" tool for Windows to make some quick cleaning when necessary.

This bundle also included a demo of 2 Stardict files and the 2 converted files:
* dict-fr.zip : is a Stardict version of the Wiktionnaire (date 01 feb 2021) from this project: https://github.com/BoboTiG/ebook-reader-dict
* Nouveau Littre 2011 (from Bookeen by Penelope)_reconstructed.Stardict.zip generated by Marksimus (see https://www.mobileread.com/forums/sh...&postcount=276)

Two ".bat" are provided to automate the convertion process explained in the examples below.


The bundle is available here:
https://gofile.io/d/R5Oin6


Examples:

################
Wiktionnaire
################

First get the Wiktionnaire here (in Stardict format):
https://github.com/BoboTiG/ebook-reader-dict
https://github.com/BoboTiG/ebook-rea...eleases/tag/fr

1) Convert the french wiktionary (Stardict format) to XDXF
python penelope -i dict-fr.zip -j stardict -o TEMPDIC -p xml -f fr -t fr -d

2) Make some cleaning and truncate lines to avoid crash of Pocketbook "convert.exe" tools.
sed -f XML2XDXF.txt TEMPDIC.xdxf > Wiktionnaire.xdxf


XML2XDXF.txt content:

s/&lt;/</g
s/&gt;/>/g
s/<dictionary>/<xdxf lang_from="fr" lang_to="fr" format="visual">\n<full_name>Wiktionnaire<\/full_name>\n<description>Wiktionary FR<\/description>\n<lexicon>/
s/<\/dictionary>/<\/lexicon>\n<\/xdxf>/
s/<entry>/<ar>/g
s/<\/entry>/<\/ar>/g
s/<key>/<k>/g
s/<\/key>/<\/k>/g
s/<full_name>.*<\/full_name>/<full_name>Wiktionnaire<\/full_name>/
s/\(<\/li>\)/\1\n/g
s/\(<\/*\)\(ol\|def\)\(>\)/\1\2\3\n/g
s/#\([0-9a-fA-F]\)\{6\}//g
s/<li>/<li># /g
s/\(<i>\)*(Siècle à préciser)\(<\/i>\)*//g
s/\(<i>\)*(Date à préciser)\(<\/i>\)*//g

3) Convert to PB format
convert Wiktionnaire.xdxf fr



######################
Nouveau Littré:
######################

First get the Nouveau Littré (in stardict format) from this post:
https://www.mobileread.com/forums/sh...&postcount=276
Download link:
https://markismus.stackstorage.com/s/ueuYYszltpoI9yn

1) Convert the french wiktionary (Stardict format) to XDXF

python penelope -i "Nouveau Littre 2011 (from Bookeen by Penelope)_reconstructed.Stardict.zip" -j stardict -o TEMPDIC -p xml -f fr -t fr

2) Make some cleaning and truncate lines to avoid crash of Pocketbook "convert.exe" tools.

sed -e "s/<f>\xe2\x99\xa6<\/f>/\n<br\/>#/g" -e "s/<f>\xe2\x80\xa2<\/f>/\n<br\/>#/g" -e "s/<\/ar>/<\/ar>\n/g" -e "s/<full_name>Dictionary Name<\/full_name>/<full_name>Nouveau Littr\xc3\xa9 2011-PM<\/full_name>/g" TEMPDIC.xdxf > NouveauLittre.xdxf

3) Convert to PB format
convert NouveauLittre.xdxf fr
InMyPocket is offline   Reply With Quote