Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-05-2020, 02:44 PM   #1
drobble
Junior Member
drobble began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jun 2020
Device: Google tablet
a project to scan and convert a printed book to EPUB format

Hello,

I'm was checking the calibre forum to find an answer to some questions.
background:
I have a printed book (in German) with over 1000 pages. This book is not available as EPUB.
I started a small 'test' project and scanned 10 pages of the book.
Then I processed the images (jpg's) with tesseract to get txt files.
Each scanned page = one jpg = one txt file.

My question is now:
How can I in an 'automated' fashion convert these txt files, remember it will be iver 1000 pages/files, to one EPUB ebook?

Can I use a script and reference file (for the 1000 txt files) to produce that ebook with Calibre?

Any help is much appreciated
drobble
drobble is offline   Reply With Quote
Old 06-05-2020, 03:49 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,055
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Make an HTML Index file to control the order

https://manual.calibre-ebook.com/faq...specific-order
theducks is offline   Reply With Quote
Advert
Old 06-05-2020, 05:10 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,725
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@drobble - you will also find tips regarding scanning books and their conversion in the Workshop forum.

BR
BetterRed is offline   Reply With Quote
Old 06-05-2020, 08:51 PM   #4
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
Rather than making 1000 files, you might consider doing it page by page, and putting the text of each page into Writer or Word. I use a GUI front-end to tesseract, OCRFeeder, that makes this easy. It also does a very good job of unwrapping lines, which is a nice leg up. You can load one or many images at a time, and recognise them one by one or all together, then just copy the text over to your book document. I tend to do about 20 pages per session.

I know it sounds dreary, but you have to proof it all anyway. I find that doing most of the proofing page by page, while I have the scanned image right in front of me, to be much less daunting than attacking the whole book later.

Then you can do some styling in the word processor as you build the book, like heading styles for chapters and basic styles to format the text. The result, as an .odt or .docx file, will convert to something a lot prettier than all that bare text, and most of the proofing and styling will be done.

And unless you have some other tools, how else will you get all those bare text lines unwrapped and enclosed in html tags?

Last edited by retiredbiker; 06-05-2020 at 09:11 PM.
retiredbiker is offline   Reply With Quote
Old 06-05-2020, 09:08 PM   #5
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
Quote:
Originally Posted by theducks View Post
Make an HTML Index file to control the order

https://manual.calibre-ebook.com/faq...specific-order
But OP will have text files...bare, wrapped text.
retiredbiker is offline   Reply With Quote
Advert
Old 06-05-2020, 11:01 PM   #6
Sarmat89
Fanatic
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 518
Karma: 2268308
Join Date: Nov 2015
Device: none
Don't use tesseract for making books. Use FineReader and correct each unsure character manually.
Sarmat89 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Create index on epub from printed book 1v4n0 ePub 13 05-16-2020 05:59 AM
Converting from EPUB/MOBI to PDF with Printed Book Formatting nickmik123 Conversion 1 05-03-2018 10:21 PM
iPhone Convert epub format to kindle for iPhone format. Is it possible? thecyberphotog Apple Devices 16 03-14-2013 01:04 AM
Best format to convert to EPUB? AprilHare Workshop 10 12-27-2010 02:40 AM
A National Scan Center: A Public Works Project (O'Reilley Radar) Nate the great News 1 01-01-2010 12:13 PM


All times are GMT -4. The time now is 03:43 PM.


MobileRead.com is a privately owned, operated and funded community.