![]() |
#1 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jun 2020
Device: Google tablet
|
a project to scan and convert a printed book to EPUB format
Hello,
I'm was checking the calibre forum to find an answer to some questions. background: I have a printed book (in German) with over 1000 pages. This book is not available as EPUB. I started a small 'test' project and scanned 10 pages of the book. Then I processed the images (jpg's) with tesseract to get txt files. Each scanned page = one jpg = one txt file. My question is now: How can I in an 'automated' fashion convert these txt files, remember it will be iver 1000 pages/files, to one EPUB ebook? Can I use a script and reference file (for the 1000 txt files) to produce that ebook with Calibre? Any help is much appreciated drobble |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,055
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
![]() |
![]() |
Advert | |
|
![]() |
#4 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
Rather than making 1000 files, you might consider doing it page by page, and putting the text of each page into Writer or Word. I use a GUI front-end to tesseract, OCRFeeder, that makes this easy. It also does a very good job of unwrapping lines, which is a nice leg up. You can load one or many images at a time, and recognise them one by one or all together, then just copy the text over to your book document. I tend to do about 20 pages per session.
I know it sounds dreary, but you have to proof it all anyway. I find that doing most of the proofing page by page, while I have the scanned image right in front of me, to be much less daunting than attacking the whole book later. Then you can do some styling in the word processor as you build the book, like heading styles for chapters and basic styles to format the text. The result, as an .odt or .docx file, will convert to something a lot prettier than all that bare text, and most of the proofing and styling will be done. And unless you have some other tools, how else will you get all those bare text lines unwrapped and enclosed in html tags? Last edited by retiredbiker; 06-05-2020 at 09:11 PM. |
![]() |
![]() |
![]() |
#5 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 518
Karma: 2268308
Join Date: Nov 2015
Device: none
|
Don't use tesseract for making books. Use FineReader and correct each unsure character manually.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Create index on epub from printed book | 1v4n0 | ePub | 13 | 05-16-2020 05:59 AM |
Converting from EPUB/MOBI to PDF with Printed Book Formatting | nickmik123 | Conversion | 1 | 05-03-2018 10:21 PM |
iPhone Convert epub format to kindle for iPhone format. Is it possible? | thecyberphotog | Apple Devices | 16 | 03-14-2013 01:04 AM |
Best format to convert to EPUB? | AprilHare | Workshop | 10 | 12-27-2010 02:40 AM |
A National Scan Center: A Public Works Project (O'Reilley Radar) | Nate the great | News | 1 | 01-01-2010 12:13 PM |