Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 06-21-2020, 05:50 AM   #16
annoporci
Enthusiast
annoporci began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Oct 2012
Location: Los Angeles
Device: Kindle Paperwhite 2019, Firmware 5.12.4
Hi jhowell!

So it's taken me a while, but I have finally edited my Catalan dictionary. I'm still in the middle of testing, and there are a few things I need to clean up and/or automate, but I was able to dump my `content.opf` file into KindlePreviewer and a dictionary was produced. It looks alright and I'd like to test to see if it does act like a look-up dictionary in my Kindle Paperwhite. But I have a few questions/problems first.

I used KindlePreviewer because from what I read it was going to be the quickest way: just open the `content.opf` and wait 30 minutes. But here's my first issue: KindlePreviewer will not let me "export" the ebook it produced. Any workaround? I'm on MacOS, is the ebook somewhere in a hidden directory? This is what KindlePreviewer says about exporting:
Quote:
Note:If you are unable to export or the export option is disabled, it might be due to the following reasons: f you used Kindle Previewer to generate the KPF file, you can use it to open your book without reconverting it the next time you want to preview, and you can send your original source file (e.g., EPUB, DOCX) or the KPF file to Amazon for publishing. If you’reusing ePub or other HTML-based formats, make surethat you have defined the right language, and remove any unnecessary language definitions. Some third-party tools add additional language definitions that are not necessary, which might be preventing Kindle Previewer from exporting your book.
Now KindlePreviewer lists the language as English in `View > Language`, but gives "ca" under `View > Book Information`. Could this be the problem? My `content.opf` file has `<dc:language>ca</dc:language>, while every dictionary entry is tagged with `<idx:entry name="Catalan" scriptable="yes" spell="yes">`

Note that every Catalan ebook I throw at KindlePreviewer lists the language as English, so it seems that KindlePreviewer has limited support for languages. Perhaps I need to use KindleGen instead?

I'm curious to see if I can get my Kindle to pop-up some definition. If only I could find the book that KindlePreviewer made and is hiding from me!

On another note, I'm pretty sure my effort will need to be refined, as I haven't done anything about `<idx:infl>` for instance (each entry is empty right now). My code is on Github, but I'm a little embarrased to show it right now. I have mostly used Python/BeautifulSoup. I have some undesired white spaces caused by my editing and the `xhtml` code is not indented properly, but I checked an online tool that did not detect problems in the `xml` structure.
annoporci is offline   Reply With Quote
Old 06-21-2020, 07:46 AM   #17
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by annoporci View Post
Any workaround? I'm on MacOS, is the ebook somewhere in a hidden directory?
You might want to use KindleGen instead. On macOS the latest version can usually be found in:

Code:
/Applications/Kindle Previewer 3.app/Contents/MacOS/lib/kindlegen/fc/bin/kindlegen
Quote:
Originally Posted by annoporci View Post
Note that every Catalan ebook I throw at KindlePreviewer lists the language as English, so it seems that KindlePreviewer has limited support for languages. Perhaps I need to use KindleGen instead?
AFAIK, Kindle Previewer processes all files with KindleGen. I.e., choosing one or the other won't make a difference.

Quote:
Originally Posted by annoporci View Post
Now KindlePreviewer lists the language as English in `View > Language`, but gives "ca" under `View > Book Information`. Could this be the problem?
Did you add an <x-metadata>...</x-metadata> section to the dictonary .opf file?

Spoiler:
Code:
  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    <dc:Title>Catalan-English Dictionary</dc:Title>
    <dc:Language>ca</dc:Language>
    <dc:Subject>Dictionaries</dc:Subject>
    <dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:b35c18a7-b81f-47fb-b2f8-aa22fd27da78</dc:identifier>
  </metadata>
  <x-metadata>
    <output encoding="utf-8"/>
    <DictionaryInLanguage>ca</DictionaryInLanguage>
    <DictionaryOutLanguage>en</DictionaryOutLanguage>
    <!-- ... more custom fields -->
    <!-- ... more custom fields -->
  </x-metadata>


and Catalan language codes to the books that you tested them with?

Spoiler:
Code:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
  <dc:creator opf:role="aut">[Creator name here]</dc:creator>
  <dc:title>[Title here]</dc:title>
  <dc:language>ca</dc:language>
  <!-- more entries -->

  <!-- more entries -->

</metadata>

Last edited by Doitsu; 06-21-2020 at 07:50 AM.
Doitsu is offline   Reply With Quote
Old 06-21-2020, 09:27 AM   #18
annoporci
Enthusiast
annoporci began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Oct 2012
Location: Los Angeles
Device: Kindle Paperwhite 2019, Firmware 5.12.4
Wow thanks for replying so fast! Looks like I have omitted a thing or two. Will fix it and attempt to recompile again in a couple of days.

One issue I had noticed before but forgotten about is that calibre encodes Catalan as "cat" rather than "ca". So the metadata of all my Catalan books have

<dc:language>cat</dc:language>

instead of:

<dc:language>ca</dc:language>

As far as I remember the official code is ca. Will try to change that and see if it helps.

EDIT: The language metadata is 'ca' in the epub, but 'cat' in the azw converted by calibre. Looks like a bug... no?

How can I edit the metadata and make it stick? Right now when I click on "edit book" and edit the metadata, the edits do not get saved. I tried to "save a copy" of the book from within the "edit book" window, but again the edit was not saved...

Thanks again!

Last edited by annoporci; 06-21-2020 at 09:43 AM.
annoporci is offline   Reply With Quote
Old 07-12-2020, 06:16 AM   #19
annoporci
Enthusiast
annoporci began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Oct 2012
Location: Los Angeles
Device: Kindle Paperwhite 2019, Firmware 5.12.4
I eventually managed to produce a dictionary that works as a look-up!

The file is quite large at about 70MB. I wonder if there's anything I could do to reduce its size. Any suggestions?

Is there an open source mono-lingual look-up dictionary in html/xhtml format that I could look at? My only source so far is Amazon's Create a Dictionary page. Thanks!

It turns out that "ca" and "cat" are both valid codes for "Catalan". One is an ISO 639-2 code the other is an ISO 639-3 code. Not sure what exactly was going wrong in my earlier attempts. I still need to properly code "inflections" and clean a few things up, but that may have to wait the upcoming second covid lockdown.

Last edited by annoporci; 07-12-2020 at 06:20 AM.
annoporci is offline   Reply With Quote
Old 07-12-2020, 07:55 AM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by annoporci View Post
The file is quite large at about 70MB. I wonder if there's anything I could do to reduce its size. Any suggestions?
By default, KindleGen will attach the source files. Use the -dont_append_source paramater to change this behavior.

Quote:
Originally Posted by annoporci View Post
Is there an open source mono-lingual look-up dictionary in html/xhtml format that I could look at?
AFAIK, very few Open Source dictionaries contain inflections. If you manage to DeDRM the free Merriam Webster dictionary (B00OLDL0BA) that eInk Kindle owners can download, you could use the KindleUnpack Calibre plugin to unpack it.
Also, many of the older Mobipocket .prc dictionaries contain inflections. (The dictionary format hasn't changed that much.)

Quote:
Originally Posted by annoporci View Post
It turns out that "ca" and "cat" are both valid codes for "Catalan".
AFAIK, KindleGen will only use the first two letters of the language code.

Quote:
Originally Posted by annoporci View Post
I still need to properly code "inflections" and clean a few things up, but that may have to wait the upcoming second covid lockdown.
Google Open Source Catalan POS (part-of-speech) taggers. There might be one whose data files you could reformat and use to add inflections.
Doitsu is offline   Reply With Quote
Old 07-12-2020, 05:43 PM   #21
annoporci
Enthusiast
annoporci began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Oct 2012
Location: Los Angeles
Device: Kindle Paperwhite 2019, Firmware 5.12.4
Thank you so much Doitsu! I'll get back to this thread once I've managed to fix the size and I'll also try to set styles that make the definitions more compact, because right now, on my Kindle Paperwhite, the pop-up definition covers a little less than half the screen, which forces me to scroll to read the definition. I'll have the code on github.
annoporci is offline   Reply With Quote
Old 09-10-2020, 01:49 AM   #22
annoporci
Enthusiast
annoporci began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Oct 2012
Location: Los Angeles
Device: Kindle Paperwhite 2019, Firmware 5.12.4
This is embarrassing: I haven't been able to go back and polish my code since last July. And it looks like I may not be able to for quite a while. So never mind, here is my unfinished code, together with a sample of the dictionary. I only tested a few words with it. I'll get back to it if I get a lull in my real life. In the meantime, anyone is free to borrow/read the code and/or sample dictionary (it is inside the 'output' directory):

https://github.com/ptoche/GDLC
annoporci is offline   Reply With Quote
Reply

Tags
dictionary, dictionary language, edit book, kindle


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Make Dictionary layout look Pretty lotteryticket KOReader 2 12-01-2019 03:45 AM
How to make Kindle *not* recognize dictionary benjavi Amazon Kindle 1 07-17-2017 12:18 AM
Is it possible to make 2 languages dictionary in the same file animal1234 Kindle Formats 2 09-01-2016 10:28 PM
Can anyone make a new dictionary for prs? zcqsimon Sony Reader 0 10-23-2010 09:54 AM


All times are GMT -4. The time now is 09:04 AM.


MobileRead.com is a privately owned, operated and funded community.