View Single Post
Old 12-17-2010, 12:43 PM   #1
bld
Junior Member
bld has a complete set of Star Wars action figures.bld has a complete set of Star Wars action figures.bld has a complete set of Star Wars action figures.bld has a complete set of Star Wars action figures.bld has a complete set of Star Wars action figures.
 
Posts: 5
Karma: 422
Join Date: Dec 2010
Device: Pocketbook 360
HOW-TO: clean up that dictionary (for your pocketbook)

Hello from a long time lurker, first time poster!

Having recently bought a Pocketbook 360, I discovered there were no decent English dictionaries available for it. I tried downloading the available .dic dictionaries but they seemed all to be badly converted. All had weird square characters or formatting codes rendered in the text.

Example from a Russian forum:



Uh-oh! Well, here's how to get a clean dictionary:

1. Start with a .dsl format dictionary. The one I used: "Longman Dictionary of Contemporary English, 5th edition, 2009 г. ". Don't ask me where to get one. Just acquire a dictionary that you like in an acceptable way.

2. Unpack the library if it the download is compressed. You get a big (>100MB) .dsl file.

3. Open the .dsl file with a nice text editor. Programming editors are preferred because of regular expression support. I used EditPad Pro, the trial is free and fully functional.

4. Start searching and replacing! Some corrections are simple search & replace and some use regular expression. Though my command of regex is very primitive... The phrases are also in a separate text file to conserve characters.

In the dictionary I chose there were ugly square characters. The solution was a simple replace:

Code:
Replace
‧ with • (bullet in Windows Character Map)
⇨ with → (Rightwards Arrow in Windows Character Map)
After this the dictionary is already quite a bit more aesthetic!

Some optional tweaking:

Code:
Regular expression (explanation in parentheses):

{{.*?}} (remove {{Roman}} tags)
\[sup\][0-9]\[/sup\] (use?)
\([0-9]\) (?)

Remove unused speech samples:
\[p\].*?E\[/p\] \[s\].*?wav\[/s\] \[p\].*?E\[/p\] \[s\].*?wav\[/s\] 
\[p\].*?E\[/p\] \[s\].*?wav\[/s\] (as with above)
\[s\].*?wav\[/s\]

\[b\]\[c .*?\] [SW][0-9]\[/c\]\[/b\] + normal replace  AC (S1,W1 etc. are word frequency markings, "AC" marks academic words)

\[c .*?\] (and normal replace [/c] & [c]) (remove color tags, obvisously not used in a B&W device)
There is still more useless stuff (not used or displayed on a Pocketbook) to remove, but the gains are small. Few irregularities are perhaps still present (double spaces etc.) but this a tiny annoyance in my opinion.

5. Save your edited dictionary.

6. Convert the dictionary. I have attached the converter (version 4.1, from the-ebook.ru forum) to this post. Should be virus free (scanned with F-Secure).

A) Unpack the converter in a convenient directory, for example: c:\dic

B) Transfer the edited .dsl dictionary to this directory

C) open a dos prompt and navigate to this directory.

D) use command "converter.exe dictionary_name.dsl eng" to convert your dictionary. Use de/rus/etc in place of eng if your device keyboard is in another language.

7. Transfer the created .dic dictionary to your device and enjoy!
Attached Files
File Type: txt replace.txt (1.3 KB, 262 views)
File Type: zip Converter 4.1.zip (1.01 MB, 249 views)
bld is offline   Reply With Quote