![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Problem with ocr
Hi,
I have installed ell.traineddata and grc.traineddata into koreader/data/tessdata, but KOReader doesn't recognize a scanned pdf I have in Ancient Greek, even I have switched on the "Forced OCR". I would also like to ask why there are only two options for "Document Language", English and Chinese? Thank you for your help! P.S.: Let me wish you all a blessed new year. May the light of the newborn Christ illuminate your heart in a dark hopeless world! (sorry if it is not politically correct) |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
|
I suspect it was written by a Chinese contributor many years ago.
![]() Incidentally, is there a document available on Archive.org or some such to test with? |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Quote:
I upload a page of a scanned book. I noticed that the book I was reading was in djvu format. I converted the page into pdf for you. I believe that the problem exist both for pdf and djvu. |
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
|
The text is meaningless really, it's the three letters hidden behind it that count. In your case grc and ell.
https://github.com/koreader/koreader....lua#L115-L118 |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
|
It works for me — more or less. The OCR isn't great at spaces in italic.
Last edited by Frenzie; 12-29-2020 at 04:31 PM. Reason: typo |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
So, do I have to make this correction?
-- document languages for OCR DKOPTREADER_CONFIG_DOC_LANGS_TEXT = {"English", "Ancient Greek"} DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "grc"} -- language code, make sure you have corresponding training data DKOPTREADER_CONFIG_DOC_DEFAULT_LANG_CODE = "eng" -- that have filenames starting with the language codes From the screenshot you sent I conclude that the breathings (᾿ ῾), the circumflex (῀) and the grave accent (`) are not recognized... and some letters ![]() Can this problem be solved? |
![]() |
![]() |
![]() |
#7 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#8 | ||
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Quote:
Quote:
Thanks for your replies! |
||
![]() |
![]() |
![]() |
#9 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,758
Karma: 731681
Join Date: Oct 2014
Location: Antwerp
Device: Kobo Aura H2O
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#10 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Frenzie, I have made the correction in defaults.lua and individual words are recognized correctly. (I try to take a screenshot to show you, but I can't. I've just make a thread with this question...) But, when I choose more than one words and then I choose dictionary at the popup menu, nothing happens.
Also, I notice that when I highlight one or more words, the text isn't shown in the bookmark, as usually, but only the page and the time. Quote:
One more question: Why there are only two options for the text language? What should be the second option instead of "Chinese"? Each user has to make the change manually in the defaults.lua? Thanks again! |
|
![]() |
![]() |
![]() |
#11 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Quote:
It's a pity I can't take a screenshot of this... |
|
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,055
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
If you're doing this on your Pocketbook device, you _can_ take a screenshot. You can set some button (e.g. Power Double Press) to capture a screenshot by configuring it in Settings>Personalize>Key Mapping. The screenshots end up as bitmap images in the /screens folder, so you can get them via USB from there.
|
![]() |
![]() |
![]() |
#13 | |
cosiñeiro
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,406
Karma: 2451781
Join Date: Apr 2014
Device: BQ Cervantes 4
|
Quote:
|
|
![]() |
![]() |
![]() |
#14 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 176
Karma: 1686
Join Date: Jul 2020
Location: Greece
Device: Pocketbook Touch Lux 5
|
Quote:
However, thank you for your help. |
|
![]() |
![]() |
![]() |
#15 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,055
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Ah, I assumed that the system was taking it from the framebuffer, since it works with the home screen. Sorry for the misdirection.
|
![]() |
![]() |
![]() |
Tags |
ocr |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
OCR problem in a PDF flie | mzel | KOReader | 16 | 04-21-2020 01:09 PM |
pages in OCR | cloclo36 | Assistance | 0 | 06-03-2019 12:19 PM |
How to convert an OCR file to a Non-OCR one | res9282 | 1 | 08-05-2011 05:58 AM | |
Do I have to OCR? | Ceryta | Workshop | 7 | 05-07-2011 11:03 AM |
OCR to use | pepak | Workshop | 17 | 05-26-2008 05:30 PM |