Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > PocketBook > PocketBook Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2016, 01:44 PM   #1
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Question Needs specs to build Lingvo dictionaries

Hello there

So I made a big XDXF from dictionary.com.
And then I wanted to convert it to the PocketBook format, which I believe is the ABBYY Lingvo format, using the tool that is right there

But... my 540 mb XDXF is too big and the tool seems to crash
You can check how I did it over there.

So I'm thinking about coding a new converter myself.

And what I need are the Lingvo format specs.
Anybody has technical knowledge about that?

Cheers guy

Last edited by RedShadow; 03-28-2016 at 02:11 PM.
RedShadow is offline   Reply With Quote
Old 03-28-2016, 02:40 PM   #2
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Have you tried the latest converter from the PocketBook site? It might work better than the one in the DictionaryConverter-neu\ 171109.zip file.
rkomar is offline   Reply With Quote
Old 03-28-2016, 03:26 PM   #3
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Quote:
Originally Posted by rkomar View Post
Have you tried the latest converter from the PocketBook site? It might work better than the one in the DictionaryConverter-neu\ 171109.zip file.
Yep that's the one.
Here is a screenshot of the error.
http://prntscr.com/al8jns

But I doubt that a tag is open. I'll have to make sure though, but my guess is the program tries to load the whole XML in memory and it 'cuts' the file at some point.
RedShadow is offline   Reply With Quote
Old 03-28-2016, 03:47 PM   #4
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I would check that line for errors. I created a dictionary with 33283 lines, so I'd be surprised if you ran out of memory at only 1813 lines.
rkomar is offline   Reply With Quote
Old 03-28-2016, 04:18 PM   #5
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Quote:
Originally Posted by rkomar View Post
I would check that line for errors.
Not sure I can easily so big it is.
I'm gonna find some 3rd party 'industrial grade' XML editors instead
.....
...
.

So I opened the XML in firstobject XML editor and it doesn't complain. However it can't validate because of lack of memory.

Next I used XML ValidatorBuddy 5 and this one seems to handle big files better.

Here is a screen of line 1813.
http://prntscr.com/al99d8
As expected at first glance nothing seems outta place.

Then I tried validating the XML and no error was given.
It actually says 'The file is well-formed.'
http://prntscr.com/al9cvx

Quote:
Originally Posted by rkomar View Post
I created a dictionary with 33283 lines, so I'd be surprised if you ran out of memory at only 1813 lines.
Hmmm, you didn't seem to have caught what I said in my first post lol
Mine is 7.978.798 lines and is a 500+ Mb file
... no way in hell this tool can open it since I'm pretty sure it's trying to load the whole XML.

Lines count don't matter anyway cuz I could have only 1 long huge line.

I just need the ABBY Lingvo specs so I can create the .dic myself.
But I can't find any tech doc. :/

Last edited by RedShadow; 03-28-2016 at 04:20 PM.
RedShadow is offline   Reply With Quote
Old 03-28-2016, 04:59 PM   #6
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I downloaded and tried it out myself and got the same error as you did. By trial and error, I've figured out that the maximum line length is something like 4096 bytes. You have to split lines longer than that into shorter lengths. It sucks, in that it will be a fair amount of work if you do it by hand, but it should get you further along.
rkomar is offline   Reply With Quote
Old 03-28-2016, 05:05 PM   #7
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Quote:
Originally Posted by rkomar View Post
I downloaded and tried it out myself and got the same error as you did. By trial and error, I've figured out that the maximum line length is something like 4096 bytes. You have to split lines longer than that into shorter lengths. It sucks, in that it will be a fair amount of work if you do it by hand, but it should get you further along.
Nice! Never thought about the individual line lengths.
It's fairly easy to do from my side... however I will have to regenerate the whole file.

If the line length is truncated like that, it's probably because the converter doesn't actually load the whole file, instead it's parsing line by line. Maybe.
It's worth a shot anyway
RedShadow is offline   Reply With Quote
Old 03-30-2016, 01:32 PM   #8
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Well, you were right: the converter can read any size of XDXF except for lines that must be < 4096 chars.

So I made a new XDXF this time parsing even more the definitions so that it would add more line breaks. Which it did: now the XDXF is 832,681 KB.

So I fed it to the converter...


...which worked for a while...

...and then it gave up:


So close.
However there might be a way: I duplicate definitions for words that have the same definitions instead of 'linking' a definition to multiple words. Which would drastically reduce the size of that thing.

That's this little line 'Searching for equal words' that made me think about that. However I'm not sure I can do that with an XDXF format...

--> I'd like to know what the converter is looking for to determine same words definitions from an XDXF.

If I know this, then I'll build my XDXF accordingly.

EDIT: actually... I can write multiple 'keywords' for a definition
To time get back to work

EDIT2: Doesn't work.
When I search for 'unfanned' for instance (which is linked to 'fan' ) it doesn't find it.
Looks like the converter doesn't use the multiple 'keywords' to link stuff to the same definitions. It's too bad since it's much faster to create the XDXF now, and it only weights
157,821 KB ...

Maybe, do you happen to have a working XDXF that works with giving you the proper definition for like different spellings, expression and stuff? I'd like to know the XDXF structure that the converter understands.

Last edited by RedShadow; 03-30-2016 at 03:32 PM.
RedShadow is offline   Reply With Quote
Old 03-30-2016, 03:34 PM   #9
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I don't think that the PocketBook dictionary interface can follow links. So, I wouldn't put a lot of effort into adding links to the definitions until it can be verified that it would work.
rkomar is offline   Reply With Quote
Old 03-30-2016, 03:38 PM   #10
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
I think you are thinking of 'clickable links', but I'm not talking about these.
I'm talking about 'internal links' between words and definitions; so that the reader knows that 'fanned', 'fans' or 'disc', 'disk' have the same definitions.

Before I was creating multiple entries for all the different spelling and stuff.
Now, because of the 'block count' constraint, I'm trying to do thing the 'right'way: linking multiple 'words' to the same 'definition'.
But the converter seems to ignore these links. Which is why I duplicated definitions in the first place, I suspected this would happen.
RedShadow is offline   Reply With Quote
Old 03-30-2016, 03:58 PM   #11
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Some of the different words can be linked to the correct root word definition via the morphems.txt file. That is handled automatically by the dictionary interface. The default morphems.txt file that comes with the converter is pretty small, and I think you could add to it pretty easily. That would take care of the various words like 'fanned' and 'fans'. It would not handle different spellings, though, like 'disk' and 'disc'. So, you only have to worry about the latter in your dictionary.
rkomar is offline   Reply With Quote
Old 03-31-2016, 03:43 AM   #12
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Thumbs up

Quote:
Originally Posted by rkomar View Post
Some of the different words can be linked to the correct root word definition via the morphems.txt file. That is handled automatically by the dictionary interface. The default morphems.txt file that comes with the converter is pretty small, and I think you could add to it pretty easily. That would take care of the various words like 'fanned' and 'fans'. It would not handle different spellings, though, like 'disk' and 'disc'. So, you only have to worry about the latter in your dictionary.
Excellent, I'll check this out.
RedShadow is offline   Reply With Quote
Old 05-27-2016, 01:45 PM   #13
RedShadow
D00d
RedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enoughRedShadow will become famous soon enough
 
RedShadow's Avatar
 
Posts: 15
Karma: 594
Join Date: Mar 2015
Device: PocketBook Lux 3 (626)
Change of plans: I ended up building a StarDict version of the dictionary:
dictionarycom-as-stardict-dictionary

I am now using KOReader on my PocketBook so I can use StarDict dictionaries
RedShadow is offline   Reply With Quote
Old 05-27-2016, 04:25 PM   #14
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,977
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I think that's a good idea. When a hobby stops being fun, it's time to do something else.
rkomar is offline   Reply With Quote
Reply

Tags
converter, dictionary, specs, xdxf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Abbyy Lingvo kguil Apple Devices 1 05-22-2013 05:03 PM
Lingvo Dictionaries - IQ 701 KellyS PocketBook 1 04-22-2012 12:40 PM
Lingvo dictionaries, annotation exporting and ad-hoc WiFi on 602/902 lordvetinari2 PocketBook 1 11-30-2010 01:06 PM
ABBYY Lingvo in pocketbook360 hardgainer PocketBook 1 08-29-2010 09:10 AM
Can lingvo be installed to Sony ibook? Evandor Sony Reader 2 05-28-2007 01:16 AM


All times are GMT -4. The time now is 11:41 AM.


MobileRead.com is a privately owned, operated and funded community.