Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-08-2021, 05:06 PM   #1
anonlivros
Junior Member
anonlivros began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2021
Device: none
Post Tutorial-from Paper Book to Ebook PDF - 400 pages in 4 hours

Tutorial-from Paper Book to Ebook PDF - 400 pages in 4 hours

Hey guys
There are many techniques for scanning a book, but many of them either require investment in the purchase of expensive equipment or cost a lot of time to finish the job.

Thinking about it, I gathered years of experience to develop a tutorial that allows you to build a PDF Ebook from:
- Paper book
- Goose neck cell phone holder
- Computer

With productivity of 400 pages in 4 hours. High visual quality.

The tutorial is very detailed, with more than 30 pages and still in the process of incrementing.

If at least 1 person feels satisfied after producing their first PDF Ebook from this tutorial, all the hours of effort developing it will have been worth it.

Unfortunately, the material is currently in Portuguese, but will soon be translated appropriately into English.

But you can check it out with the auto-translated version of the link below:
https://translate.google.com/transla...ros/anonlivros

I wish you all the best

Keep books free
t.me/anonlivros
anonlivros is offline   Reply With Quote
Old 04-14-2021, 03:05 AM   #2
graatch
Enthusiast
graatch began at the beginning.
 
graatch's Avatar
 
Posts: 31
Karma: 10
Join Date: Apr 2018
Device: Samsung Galaxy Tab S2, iPad 2 (Bluefire Reader); fire hd 10, Windows
Thanks friend, this looks quite helpful. I'm thinking of setting up a little scan-workshop soon and this will be my guide.
graatch is offline   Reply With Quote
Advert
Old 04-14-2021, 01:43 PM   #3
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,469
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
You may also want to look at HowTo: Create an eBook in our wiki. It now includes a link to the above site but has a lot of addition information.
DaleDe is offline   Reply With Quote
Old 04-14-2021, 05:00 PM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,952
Karma: 8877603
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by anonlivros View Post
Thinking about it, I gathered years of experience to develop a tutorial that allows you to build a PDF Ebook from:
I skimmed through the tutorial.

Seems like there's good ideas in there, especially using things you may already have (cellphone camera + lamp).

I didn't think of voice activation. That's a great idea. One of the problems is the camera moving because you're pressing buttons or touching the screen.

Quote:
Originally Posted by anonlivros View Post
If at least 1 person feels satisfied after producing their first PDF Ebook from this tutorial, all the hours of effort developing it will have been worth it.
On cleaning up the images + OCR, you may also want to check out a few topics:

2020: "OCRing + EPUBing my first book: Tips?" (Especially my Posts #12+#15)
2020: "Optimize PDFs from archive.org for E-Ink devices"

I find Scan Tailor Advanced better at cropping/dewarping/deskewing compared to Finereader's built-in tools.

Quote:
Originally Posted by anonlivros View Post
Keep books free


And I like the little link at the end of the tutorial (I do lots of ebook conversions for Mises!)... although does it really belong?

(Kinsella's v2.0 of the book is coming out soon!)

Complete Side Note:

Spoiler:
You may be interested in my posts from 2014 + 2020:

2020: "Copyright Reform Poll" (Posts #182+)
2014: "New Paper Argues for Reasonable Reduction in Copyright" (Posts #70+)

And you'll definitely enjoy:
"Against Intellectual Monopoly" by Michele Boldrin and David K. Levine.

And see one of the reviews for it on Mises.org:

Mises Review: "Against Intellectual Monopoly, by Michele Boldrin"

Last edited by Tex2002ans; 04-14-2021 at 05:23 PM.
Tex2002ans is offline   Reply With Quote
Old 04-16-2021, 08:16 AM   #5
anonlivros
Junior Member
anonlivros began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2021
Device: none
Thanks to all

Quote:
Originally Posted by graatch View Post
Thanks friend, this looks quite helpful. I'm thinking of setting up a little scan-workshop soon and this will be my guide.
I'm flattered that you use the tutorial in your workshop.

Quote:
Originally Posted by DaleDe View Post
It now includes a link to the above site but has a lot of addition information.
thanks for the recommendation and recognition

Quote:
Originally Posted by Tex2002ans View Post
Seems like there's good ideas in there, especially using things you may already have (cellphone camera + lamp).
good to know ... I wrote with a focus on the public "poor students in universities with few books". In Brazil, there is a massive public in these conditions.

Quote:
Originally Posted by Tex2002ans View Post
I find Scan Tailor Advanced better at cropping/dewarping/deskewing compared to Finereader's built-in tools.
I didn't know this software ... I'll check it out, thanks!
Do you recommend making these edits with it, to then generate PNG pages that would be further processed by FineReader?

And thanks for reading recommendations. I really enjoyed them.
anonlivros is offline   Reply With Quote
Advert
Old 04-16-2021, 03:01 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,952
Karma: 8877603
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by anonlivros View Post
I didn't know this software ... I'll check it out, thanks!

Do you recommend making these edits with it, to then generate PNG pages that would be further processed by FineReader?
Yes. Use Scan Tailor Advanced as an in-between step.

Instead of:

1. Take your pictures.
2. Use Finereader to crop, dewarp, change to B&W, [...].

You:

1. Take your pictures.
2. Use Scan Tailor Advanced to crop, dewarp, turn images B&W, [...].
3. Feed those into Finereader.

* * *

You can see some example images I posted in Post #15 in the "OCRing + EPUBing my first book: Tips?" thread:

So pictures of pages like this:

or this:

could turn into this:

* * *

Instead, when you only use Finereader's built-in stuff, you get pages like this:

vs.

Of course, those images are easy examples.

But when you have pages that are:
  • crooked or very curved because of the spine
  • uneven lighting
  • very speckled

you'll see how much better Scan Tailor is at those steps.

Plus, with Scan Tailor, you can adjust all the sliders along every step, or even different settings on a per-page basis.

So let's say one page had lots of speckles (tiny dots):

https://www.mobileread.com/forums/at...3&d=1567734681

You could set the despeckling strength to very high, so you might get something like this:

https://www.mobileread.com/forums/at...4&d=1567734681

You can adjust the strength so it catches the speckles, while still leaving the actual "." (periods).

Quote:
Originally Posted by anonlivros View Post
good to know ... I wrote with a focus on the public "poor students in universities with few books". In Brazil, there is a massive public in these conditions.


Quote:
Originally Posted by anonlivros View Post
And thanks for reading recommendations. I really enjoyed them.
Side Note: You may also like his podcast episode from a few weeks ago:

"Kinsella on Liberty 328 | Heterodorx Ep. 10 with Nina Paley: I.P. Everywhere!"

Last edited by Tex2002ans; 04-16-2021 at 03:09 PM.
Tex2002ans is offline   Reply With Quote
Old 04-17-2021, 10:36 AM   #7
anonlivros
Junior Member
anonlivros began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2021
Device: none
Thank you for putting so much effort into such a complete and example-filled response.

I watched some tutorial videos about Scan Tailor Advanced and was amazed by its potential for customization.

Especially because of the ultra customizable wrap as a 3d grid with dozens of adjustment points, different from the simple 'trapezoidal correction' of the finereader.

I am delighted to have found this mobileread forum.
Thank you for sharing your knowledge here
anonlivros is offline   Reply With Quote
Old 04-19-2021, 12:53 AM   #8
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,562
Karma: 2999999
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Some weeks ago, scan tailor advanced has replaced scan tailor for Arch linux users.

However, I've found its use to be much more complex and I reverted to the old scan tailor package. I certainly does not wish to indulge in page by page settings, through many manual tweakings, though I willingly admit there are some botched/damaged pages that may require some special treatment.

For example, speckles may be found in all pages (more or less), yellow background the same.

My two centimes.
roger64 is offline   Reply With Quote
Old 04-19-2021, 02:56 AM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,952
Karma: 8877603
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
I certainly does not wish to indulge in page by page settings, through many manual tweakings, though I willingly admit there are some botched/damaged pages that may require some special treatment.
Then you'd set the setting once, then "apply to all pages" (or only even or only odd):

Click image for larger version

Name:	ScanTailor.Advanced.-.ApplyTo.png
Views:	166
Size:	4.7 KB
ID:	186686

It would act the same as the old Scan Tailor.

Below almost all settings, there's a little "Apply To..." button, and you get those options.

Quote:
Originally Posted by roger64 View Post
However, I've found its use to be much more complex and I reverted to the old scan tailor package.
What's more complex?

All those original steps are there, but added a few more optional bells and whistles on top. For example:
  • Old Despeckling buttons: "Cautious, Normal, Aggressive".
    • Now, you actually get a slider to specify the exact strength.
    • Cautious = 1.0, Normal = 2.0, Aggressive = 3.0
  • Accepting PNG and other image formats as input.
    • The original was stuck with TIFF only.

From what I recall, Scan Tailor Advanced (and the other forks) carried over everything from the original Scan Tailor, plus added better features on top.

I haven't used the original in such a long time though, so I don't remember... but I do remember upgrading to Enhanced when I saw how much better it was, then was sold on Advanced the instant it was mentioned to me in 2018.

Quote:
Originally Posted by roger64 View Post
Some weeks ago, scan tailor advanced has replaced scan tailor for Arch linux users.
The original Scan Tailor was already inactive/abandoned many years ago.

Various forks were created (Featured + Enhanced + Universal), but even those slowly became abandoned.

Scan Tailor Advanced merged all those enhancements together into one ultimate package + a ton more. There's a big list of features listed on the main Github page:

https://github.com/4lex4/scantailor-advanced

but the largest for me being:
  • Multi-threading.
    • This makes every stage in the entire process run much faster.
  • Better Project Saving
    • Allows you to return to a project and remember all your settings + Output.
    • In the original Scan Tailor, you had to redo all the steps all over, and generate Output again.
  • Much better Page Splitting

So many other little QoL improvements too, like being able to sort images in the Margins step:

Click image for larger version

Name:	ScanTailor.Advanced.-.Margins.Sorting.png
Views:	168
Size:	57.4 KB
ID:	186687

Again, the original had no choice. But Advanced includes the original functionality AND more:
  • Natural Sort
    • Default. (And this was original Scan Tailor.)
  • Order by increasing Width
  • Order by increasing Height
  • Order by decreasing deviation

This becomes invaluable for finding/correcting some of the bad pages.

The vast bulk of pages take up the entire page, but some might only include chapter titles, a single small blockquote, or a few sentences and the chapter would end.

The original Scan Tailor would completely botch those pages during the Margins step, and it would be difficult to even spot which pages were the problem when trying to "Match size with other pages".

Quote:
Originally Posted by anonlivros View Post
Especially because of the ultra customizable wrap as a 3d grid with dozens of adjustment points, different from the simple 'trapezoidal correction' of the finereader.
Yes, that dewarping by grid is incredible.

When GrannyGrump was working on a digitization of the original Sweeney Todd book:

Rymer, J. M. "The String of Pearls; or, Sweeney Todd, The Demon Barber of Fleet Street"

Here's the original Archive.org PDF:

https://archive.org/details/stringof...e/n13/mode/2up

Click image for larger version

Name:	Sweeney.Todd.-.Archive.org.PDF[22-23].jpg
Views:	187
Size:	435.6 KB
ID:	186689 Click image for larger version

Name:	Sweeney.Todd.-.ScanTailor.Advanced.PDF[22-23].png
Views:	181
Size:	486.0 KB
ID:	186688

It had 2 large issues solved by Advanced:
  • Left/Right pages were accidentally photographed at different distances.
    • So Left text slightly smaller (probably 90%) size of the Right pages.
    • I was able to easily find/correct all bad pages by using "sort by width/height" mentioned above.
  • Many curved pages
    • This became even more noticeable because every page's text actually had a rectangle box drawn around it.
    • Luckily, I was able to use Advanced's automatic mode to get me most of the way there, then manual grids to "straighten" based on the rectangle instead.

Finereader only handles that basic "Trapezoidal" problem, if the camera was angled too close to the ground. And the rectangle completely ruined their algorithm to detect wavy text.

Original Scan Tailor didn't have automatic dewarping.

Sadly, I recently cleared up some space and deleted the original source files for that project... or I'd show you more detailed before/afters.

Last edited by Tex2002ans; 04-19-2021 at 04:02 AM.
Tex2002ans is offline   Reply With Quote
Old 04-20-2021, 01:14 AM   #10
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,562
Karma: 2999999
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks for your comments. I probably missed something with the new advanced version. Mine is the "experimental" scan tailor version packaged in november 2018.

I got also satisfactory results for ocr purpose with it on the Grannygrump image in .tif format (3.3mb each...). This result is ok for Tesseract. so, for the time being, I'll stay with the old "experimental" version.
Attached Files
File Type: zip mr.zip (6.56 MB, 163 views)

Last edited by roger64; 04-20-2021 at 03:19 AM.
roger64 is offline   Reply With Quote
Old 04-20-2021, 04:42 AM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,952
Karma: 8877603
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
Thanks for your comments. I probably missed something with the new advanced version. Mine is the "experimental" scan tailor version packaged in november 2018.
Scan Tailor Experimental:

https://github.com/Tulon/scantailor

is a continuation made by one of Scan Tailor's original creators (Tulon).

(He moved on from Scan Tailor for many years, but made a return back in ~2015.)

That version was last updated in 2017.

To see a nice summarized list of all the alternate versions, see the DIY Book Scanner thread: "Feature Comparison for the Various Flavors of Scan Tailor".

From all my testing, Advanced is still the best.

Quote:
Originally Posted by roger64 View Post
I got also satisfactory results for ocr purpose with it on the Grannygrump image in .tif format (3.3mb each...). This result is ok for Tesseract. so, for the time being, I'll stay with the old "experimental" version.
Towards the middle of that book is where things started really getting hairy. You had pages that didn't follow the norms set in the rest of the book, plus large images (hence despeckling sliders were perfect).

Since the Left pages were slightly smaller, Finereader also thought most of the text was "bold", so the text was a big ol' mess!

Anyway, you're severely missing out. Multi-core alone is a reason to use Advanced.

So much of my time was spent staring at the screen while working on large books, or waiting for Output to happen after tweaking an earlier stage slightly.

Upgrade, you won't regret it.

Last edited by Tex2002ans; 04-20-2021 at 04:45 AM.
Tex2002ans is offline   Reply With Quote
Old 04-22-2021, 11:13 PM   #12
anonlivros
Junior Member
anonlivros began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Apr 2021
Device: none
Quote:
Originally Posted by Tex2002ans View Post
[*] Luckily, I was able to use Advanced's automatic mode to get me most of the way there, then manual grids to "straighten" based on the rectangle instead.

I would love to watch a video recording of your entire editorial process, even if it was a 4-hour straight video
anonlivros is offline   Reply With Quote
Old 04-23-2021, 03:32 PM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,952
Karma: 8877603
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by anonlivros View Post
I would love to watch a video recording of your entire editorial process, even if it was a 4-hour straight video
Heh, maybe one of these days!

All my knowledge + lots of tutorials will eventually be compiled on my site (see my signature).

For now, you'll just have to dig through my years of MobileRead posts. If you do a search for this in your favorite search engine:

Code:
problem you're having Tex2002ans site:mobileread.com
you'll probably run across a post I've done.

Another fantastic way is searching Hitch's name too:

Code:
problem you're having Hitch site:mobileread.com
Hitch runs the conversion company BookNook.biz, and has answered a bajillion questions over the years.

Side Note: Actually, if you want, I'd be available for webcam.

I'd be interested in explaining/showing some of my methods to somebody. And we'll both then help each other:

I'll answer any questions you'll have, and you'd be showing me what kinds of questions a new person might have. This would help me reprioritize some of my own thoughts.

I'm going to send you a PM, so we could figure out a time + which program to use.

Edit: Oh, shoot. Looks like I can't send you a PM. Maybe it's because you have too few posts?

Please email me at:

***PUT MY MobileRead USERNAME HERE***+anon@gmail.com

Last edited by Tex2002ans; 04-23-2021 at 03:36 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I match calibre pages and paper book pages khait Calibre 1 08-23-2017 05:02 PM
Do the number of pages in an ebook differ from the number of pages in a physical book Phoebemy General Discussions 12 07-19-2012 10:25 AM
PDF ebook jumping pages in Adobe Digital Editions j.e.b 123 PDF 0 03-06-2012 09:48 AM
Danish paper on ebooks - 400 titles this year kaan News 9 10-02-2009 09:11 AM
This e-paper self-destructs in 16 hours Alexander Turcic News 2 09-10-2006 06:40 AM


All times are GMT -4. The time now is 08:56 AM.


MobileRead.com is a privately owned, operated and funded community.