![]() |
#1 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5
Karma: 40142
Join Date: Apr 2021
Device: none
|
![]()
Tutorial-from Paper Book to Ebook PDF - 400 pages in 4 hours
Hey guys There are many techniques for scanning a book, but many of them either require investment in the purchase of expensive equipment or cost a lot of time to finish the job. Thinking about it, I gathered years of experience to develop a tutorial that allows you to build a PDF Ebook from: - Paper book - Goose neck cell phone holder - Computer With productivity of 400 pages in 4 hours. High visual quality. ![]() The tutorial is very detailed, with more than 30 pages and still in the process of incrementing. If at least 1 person feels satisfied after producing their first PDF Ebook from this tutorial, all the hours of effort developing it will have been worth it. Unfortunately, the material is currently in Portuguese, but will soon be translated appropriately into English. But you can check it out with the auto-translated version of the link below: https://translate.google.com/transla...ros/anonlivros I wish you all the best Keep books free ![]() t.me/anonlivros |
![]() |
![]() |
![]() |
#2 |
Enthusiast
![]() Posts: 37
Karma: 10
Join Date: Apr 2018
Device: Samsung Galaxy Tab S2, iPad 2 (Bluefire Reader); fire hd 10, Windows
|
Thanks friend, this looks quite helpful. I'm thinking of setting up a little scan-workshop soon and this will be my guide.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
You may also want to look at HowTo: Create an eBook in our wiki. It now includes a link to the above site but has a lot of addition information.
|
![]() |
![]() |
![]() |
#4 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,244
Karma: 11708297
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Seems like there's good ideas in there, especially using things you may already have (cellphone camera + lamp). ![]() I didn't think of voice activation. That's a great idea. One of the problems is the camera moving because you're pressing buttons or touching the screen. Quote:
2020: "OCRing + EPUBing my first book: Tips?" (Especially my Posts #12+#15) 2020: "Optimize PDFs from archive.org for E-Ink devices" I find Scan Tailor Advanced better at cropping/dewarping/deskewing compared to Finereader's built-in tools. ![]() And I like the little link at the end of the tutorial (I do lots of ebook conversions for Mises!)... although does it really belong? ![]() (Kinsella's v2.0 of the book is coming out soon!) Complete Side Note: Spoiler:
Last edited by Tex2002ans; 04-14-2021 at 04:23 PM. |
||
![]() |
![]() |
![]() |
#5 | ||||
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5
Karma: 40142
Join Date: Apr 2021
Device: none
|
Thanks to all
Quote:
![]() Quote:
Quote:
Quote:
Do you recommend making these edits with it, to then generate PNG pages that would be further processed by FineReader? And thanks for reading recommendations. I really enjoyed them. |
||||
![]() |
![]() |
Advert | |
|
![]() |
#6 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,244
Karma: 11708297
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Instead of: 1. Take your pictures. 2. Use Finereader to crop, dewarp, change to B&W, [...]. You: 1. Take your pictures. 2. Use Scan Tailor Advanced to crop, dewarp, turn images B&W, [...]. 3. Feed those into Finereader. * * * You can see some example images I posted in Post #15 in the "OCRing + EPUBing my first book: Tips?" thread: So pictures of pages like this: or this: could turn into this: * * * Instead, when you only use Finereader's built-in stuff, you get pages like this: vs. Of course, those images are easy examples. But when you have pages that are:
you'll see how much better Scan Tailor is at those steps. ![]() Plus, with Scan Tailor, you can adjust all the sliders along every step, or even different settings on a per-page basis. So let's say one page had lots of speckles (tiny dots): https://www.mobileread.com/forums/at...3&d=1567734681 You could set the despeckling strength to very high, so you might get something like this: https://www.mobileread.com/forums/at...4&d=1567734681 You can adjust the strength so it catches the speckles, while still leaving the actual "." (periods). Quote:
![]() Quote:
"Kinsella on Liberty 328 | Heterodorx Ep. 10 with Nina Paley: I.P. Everywhere!" Last edited by Tex2002ans; 04-16-2021 at 02:09 PM. |
|||
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5
Karma: 40142
Join Date: Apr 2021
Device: none
|
Thank you for putting so much effort into such a complete and example-filled response.
I watched some tutorial videos about Scan Tailor Advanced and was amazed by its potential for customization. Especially because of the ultra customizable wrap as a 3d grid with dozens of adjustment points, different from the simple 'trapezoidal correction' of the finereader. I am delighted to have found this mobileread forum. Thank you for sharing your knowledge here ![]() |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,583
Karma: 2999999
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Some weeks ago, scan tailor advanced has replaced scan tailor for Arch linux users.
However, I've found its use to be much more complex and I reverted to the old scan tailor package. I certainly does not wish to indulge in page by page settings, through many manual tweakings, though I willingly admit there are some botched/damaged pages that may require some special treatment. For example, speckles may be found in all pages (more or less), yellow background the same. My two centimes. |
![]() |
![]() |
![]() |
#9 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,244
Karma: 11708297
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
It would act the same as the old Scan Tailor. Below almost all settings, there's a little "Apply To..." button, and you get those options. Quote:
All those original steps are there, but added a few more optional bells and whistles on top. For example:
From what I recall, Scan Tailor Advanced (and the other forks) carried over everything from the original Scan Tailor, plus added better features on top. I haven't used the original in such a long time though, so I don't remember... but I do remember upgrading to Enhanced when I saw how much better it was, then was sold on Advanced the instant it was mentioned to me in 2018. Quote:
Various forks were created (Featured + Enhanced + Universal), but even those slowly became abandoned. Scan Tailor Advanced merged all those enhancements together into one ultimate package + a ton more. There's a big list of features listed on the main Github page: https://github.com/4lex4/scantailor-advanced but the largest for me being:
So many other little QoL improvements too, like being able to sort images in the Margins step: Again, the original had no choice. But Advanced includes the original functionality AND more:
This becomes invaluable for finding/correcting some of the bad pages. The vast bulk of pages take up the entire page, but some might only include chapter titles, a single small blockquote, or a few sentences and the chapter would end. The original Scan Tailor would completely botch those pages during the Margins step, and it would be difficult to even spot which pages were the problem when trying to "Match size with other pages". Quote:
When GrannyGrump was working on a digitization of the original Sweeney Todd book: Rymer, J. M. "The String of Pearls; or, Sweeney Todd, The Demon Barber of Fleet Street" Here's the original Archive.org PDF: https://archive.org/details/stringof...e/n13/mode/2up It had 2 large issues solved by Advanced:
Finereader only handles that basic "Trapezoidal" problem, if the camera was angled too close to the ground. And the rectangle completely ruined their algorithm to detect wavy text. Original Scan Tailor didn't have automatic dewarping. Sadly, I recently cleared up some space and deleted the original source files for that project... or I'd show you more detailed before/afters. ![]() Last edited by Tex2002ans; 04-19-2021 at 03:02 AM. |
||||
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,583
Karma: 2999999
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your comments. I probably missed something with the new advanced version. Mine is the "experimental" scan tailor version packaged in november 2018.
I got also satisfactory results for ocr purpose with it on the Grannygrump image in .tif format (3.3mb each...). This result is ok for Tesseract. so, for the time being, I'll stay with the old "experimental" version. Last edited by roger64; 04-20-2021 at 02:19 AM. |
![]() |
![]() |
![]() |
#11 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,244
Karma: 11708297
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
https://github.com/Tulon/scantailor is a continuation made by one of Scan Tailor's original creators (Tulon). (He moved on from Scan Tailor for many years, but made a return back in ~2015.) That version was last updated in 2017. To see a nice summarized list of all the alternate versions, see the DIY Book Scanner thread: "Feature Comparison for the Various Flavors of Scan Tailor". From all my testing, Advanced is still the best. Quote:
![]() Since the Left pages were slightly smaller, Finereader also thought most of the text was "bold", so the text was a big ol' mess! Anyway, you're severely missing out. Multi-core alone is a reason to use Advanced. ![]() So much of my time was spent staring at the screen while working on large books, or waiting for Output to happen after tweaking an earlier stage slightly. Upgrade, you won't regret it. ![]() Last edited by Tex2002ans; 04-20-2021 at 03:45 AM. |
||
![]() |
![]() |
![]() |
#12 | |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5
Karma: 40142
Join Date: Apr 2021
Device: none
|
Quote:
I would love to watch a video recording of your entire editorial process, even if it was a 4-hour straight video ![]() |
|
![]() |
![]() |
![]() |
#13 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,244
Karma: 11708297
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() All my knowledge + lots of tutorials will eventually be compiled on my site (see my signature). For now, you'll just have to dig through my years of MobileRead posts. If you do a search for this in your favorite search engine: Code:
problem you're having Tex2002ans site:mobileread.com ![]() Another fantastic way is searching Hitch's name too: Code:
problem you're having Hitch site:mobileread.com Side Note: Actually, if you want, I'd be available for webcam. I'd be interested in explaining/showing some of my methods to somebody. And we'll both then help each other: I'll answer any questions you'll have, and you'd be showing me what kinds of questions a new person might have. This would help me reprioritize some of my own thoughts. I'm going to send you a PM, so we could figure out a time + which program to use. Edit: Oh, shoot. Looks like I can't send you a PM. Maybe it's because you have too few posts? Please email me at: ***PUT MY MobileRead USERNAME HERE***+anon@gmail.com Last edited by Tex2002ans; 04-23-2021 at 02:36 PM. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how can I match calibre pages and paper book pages | khait | Calibre | 1 | 08-23-2017 04:02 PM |
Do the number of pages in an ebook differ from the number of pages in a physical book | Phoebemy | General Discussions | 12 | 07-19-2012 09:25 AM |
PDF ebook jumping pages in Adobe Digital Editions | j.e.b 123 | 0 | 03-06-2012 08:48 AM | |
Danish paper on ebooks - 400 titles this year | kaan | News | 9 | 10-02-2009 08:11 AM |
This e-paper self-destructs in 16 hours | Alexander Turcic | News | 2 | 09-10-2006 05:40 AM |