Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Non-English Discussions > Deutsches Forum > Software

Notices

Reply
 
Thread Tools Search this Thread
Old 02-17-2009, 12:35 PM   #1
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,635
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
OCR-Software für altdeutsche Schrift

Ich möchte hier mal eine Aufruf starten, vielleicht hab ich ja Glück.

Also: Ich habe sämtliche Abenteuer des Detektiv Nobody in altdeutscher Schrift(PDF). Ich weiss. dass es von Abbyy OCR-Software gibt, die diese Schrift lesen kann, aber ich kann sie mir leider nicht leisten. Daher möchte ich gerne wissen, ob jemand diese Software hat und die PDFs durchlaufen lassen könnte (zu HTML oder TXT) Ich würde das K-Lesen übernehmen. Bitte per PM melden ode hier rein schreiben.

Falls ich niemanden finde, muss ich wohl oder übel den ganzen Text abschreiben und das wär nun wirklich sehr aufwändig.

Danke im Voraus

MTH
mtravellerh is offline   Reply With Quote
Old 02-18-2009, 04:33 AM   #2
Pulp
Palm Addict
Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.
 
Pulp's Avatar
 
Posts: 477
Karma: 1001951
Join Date: Aug 2008
Device: Cybook Gen3 [512mb, FW: 1.5]
Vom Finereader 9 gibt's eine demo-Version.

Sie läßt sich soweit ich weiß 15 Tage nutzen und verarbeitet bis zu 50 Seiten auf einmal.

Wenn du das Ergebnis danach in HTML (oder andere Formate) exportierst (und eventuell zusammensetzt) sollte es Dir viel Zeit sparen.
Pulp is offline   Reply With Quote
Advert
Old 02-18-2009, 03:23 PM   #3
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,635
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
Danke. Ich werd das mal probieren.
mtravellerh is offline   Reply With Quote
Old 02-18-2009, 05:10 PM   #4
netseeker
sleepless reader
netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.
 
netseeker's Avatar
 
Posts: 4,763
Karma: 615547
Join Date: Jan 2008
Location: Germany, near Stuttgart
Device: Sony PRS-505, PB 360° & 302, nook wi-fi, Kindle 3
Tesseract ist Open Source und hat Unterstützung und Trainingsdaten sowohl für moderne deutsche Schrift als auch für die Frakturschrift:
Habs noch nicht getestet, werde das aber jetzt machen, da ich ebenfalls Bedarf am OCR von Frakturschrift habe. Wahrscheinlich werden die Ergebnisse aber schlechter wie bei Finereader & Co sein...umständlicher ist es allemal.
netseeker is offline   Reply With Quote
Old 02-18-2009, 06:42 PM   #5
netseeker
sleepless reader
netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.
 
netseeker's Avatar
 
Posts: 4,763
Karma: 615547
Join Date: Jan 2008
Location: Germany, near Stuttgart
Device: Sony PRS-505, PB 360° & 302, nook wi-fi, Kindle 3
Habe es mit 2 verschiedenen Büchern, welche unterschiedliche Frakturschriftarten benutzen mal getestet und war ganz positiv überrascht. Naja, so positiv wie man bei einem kostenlosen OCR und dann noch mit Frakturschrift halt sein kann.

Zuerst muss man die PDF-Inhalte als tif-Grafiken bekommen, dann kann man Tesseract via
Quote:
tesseract test\nobody05_pic0005.tif testout\05 -l deu-f
damit füttern.

Anbei mal die Resultate der ersten zwei Seiten vom Detektiv Nobody 5.
Das Ergebnis der ersten Seite ist aufgrund des Drop-Cap am ersten Absatz natürlich zwangsläufig nicht so gut. Die zweite Seite sieht besser aus.

Keine Ahnung wie sich der Finereader da schlägt - vielleicht kann ja mal jemand einen Vergleich posten...
Attached Thumbnails
Click image for larger version

Name:	nobody05_pic0004.png
Views:	1897
Size:	149.8 KB
ID:	23877   Click image for larger version

Name:	nobody05_pic0005.png
Views:	5267
Size:	178.4 KB
ID:	23878  
Attached Files
File Type: txt 04.txt (1.4 KB, 1221 views)
File Type: txt 05.txt (1.8 KB, 920 views)
netseeker is offline   Reply With Quote
Advert
Old 02-18-2009, 07:21 PM   #6
Pulp
Palm Addict
Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.
 
Pulp's Avatar
 
Posts: 477
Karma: 1001951
Join Date: Aug 2008
Device: Cybook Gen3 [512mb, FW: 1.5]
In dem Fall solltet Ihr mal das testen: http://www.frakturschrift.de/

Der gewöhnliche Finereader bräuchte auch eine Musterdatei um brauchbare Ergebnisse zu liefern, die sollten hier schon dabei sein.
Pulp is offline   Reply With Quote
Old 02-19-2009, 03:43 AM   #7
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,635
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
Quote:
Originally Posted by netseeker View Post
Habe es mit 2 verschiedenen Büchern, welche unterschiedliche Frakturschriftarten benutzen mal getestet und war ganz positiv überrascht. Naja, so positiv wie man bei einem kostenlosen OCR und dann noch mit Frakturschrift halt sein kann.

Zuerst muss man die PDF-Inhalte als tif-Grafiken bekommen, dann kann man Tesseract via

damit füttern.

Anbei mal die Resultate der ersten zwei Seiten vom Detektiv Nobody 5.
Das Ergebnis der ersten Seite ist aufgrund des Drop-Cap am ersten Absatz natürlich zwangsläufig nicht so gut. Die zweite Seite sieht besser aus.

Keine Ahnung wie sich der Finereader da schlägt - vielleicht kann ja mal jemand einen Vergleich posten...
Also ich find das Resultat richtig gut. Mit ein bisserl Training müsste das doch zu machen sein! Danke netseeker. Karma für Dich!
mtravellerh is offline   Reply With Quote
Old 02-19-2009, 09:13 AM   #8
netseeker
sleepless reader
netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.netseeker ought to be getting tired of karma fortunes by now.
 
netseeker's Avatar
 
Posts: 4,763
Karma: 615547
Join Date: Jan 2008
Location: Germany, near Stuttgart
Device: Sony PRS-505, PB 360° & 302, nook wi-fi, Kindle 3
Beim Trainieren von Tesseract hilft unter Windows JTesseract, eine überraschend komfortable GUI, ungemein...
Attached Thumbnails
Click image for larger version

Name:	jtesseract.jpg
Views:	1748
Size:	214.8 KB
ID:	23937  
netseeker is offline   Reply With Quote
Old 02-19-2009, 12:36 PM   #9
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,635
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: PB360°
Danke nochmal. Bin schon fleissig am OCRen (oder wie immer das heisst). Funktioniert überraschend gut!
mtravellerh is offline   Reply With Quote
Old 02-19-2009, 02:29 PM   #10
Pulp
Palm Addict
Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.
 
Pulp's Avatar
 
Posts: 477
Karma: 1001951
Join Date: Aug 2008
Device: Cybook Gen3 [512mb, FW: 1.5]
optical character recognition = optische Zeichenerkennung
Pulp is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Grafische Oberfläche für tesseract OCR - Anforderungen bitte netseeker Software 39 10-09-2010 04:48 AM
Software - Entwicklung für's PocketBook360 für Einsteiger... tzenzen PocketBook 7 06-14-2010 07:10 AM
OCR Software Help kpfeifle Workshop 5 03-01-2010 02:27 PM
Mobipocket-Software für Pocketbook Moredread PocketBook 0 01-03-2010 08:12 AM
Recommendation for basic scanning software (non OCR) yunti Workshop 1 11-27-2009 07:08 AM


All times are GMT -4. The time now is 08:45 AM.


MobileRead.com is a privately owned, operated and funded community.