Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 07-19-2023, 06:00 PM   #16
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,171
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by Karellen View Post
That's great. Thank you @Tex2002ans
I've installed and a quick trial run on an image I was previously having poor results in, and it OCR'd almost perfectly. In the few minutes I fiddled around with it, it seemed pretty easy to use. But I'll spend some time understanding it better.
I just learnt that images OCR better when using a non-compressed / lossless format.
Yes, I always use PNG or TIFF with a flatbed scanner. Never jpeg. Though some phones and cameras can "save" in png or TIF, many actually use jpeg as an intermediate format so you may need to set quality to 95 and this is why sometimes an elderly scanner with apparently lower resolution can give better results, apart from the issue of skew and lighting. This also why if the book is not valuable the spine may be cut off to at least allow flatter pages and possibly a duplex sheet feeder. Only do that with a cheap in-print title.
More modern dedicated scanners based on cameras have built in lighting, lasers etc to ensure de-skewing and even contrast. Better value for A3 and needed for books you can't cut up.

The png is typically one image per page. The Tiff format and a motion png format equivalent to gif can have an entire book in one file. Both do lossless compression and will compress white space or sold black completely, so good illumination is important.

Last edited by Quoth; 07-19-2023 at 06:03 PM.
Quoth is offline   Reply With Quote
Old 07-19-2023, 09:58 PM   #17
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by WV-Mike View Post
Whew! This is all a bit overwhelming.
I looked at 4lex4 / scantailor-advanced but I cannot see how to use it.
Yep, like others said...

To Download Scan Tailor Advanced

On the right side, you should see "Releases". That leads you to this page with a list of EXEs:

You might need to expand the spot where it says "Assets".

- - -

If by "how to use it", you meant Tutorials... then yeah... there isn't too much good step-by-step explanations of Scan Tailor out there.

Back when I responded to anonlivros's fantastic:

I contacted him + showed him how to use it over screenshare—answering any questions he had live—but I never formalized the instructions/tips anywhere.

Since then, I explained a little more detail on the overall process:

but again, nothing specific on tricks I've learned inside of Scan Tailor Advanced. I'd just suggest poking around.

- - -

Side Note: Perhaps one of these days, I'll finally write down and formalize this Scan Tailor stuff.

Recently, my time has been focusing less on MR... and more on helping LibreOffice.

Within the past 2 years, I've written nearly 1000 posts about all sorts of random LibreOffice questions!!!

Like the ultimate:

I've been refining all my Documentation/Technical Writing skills—just haven't turned them back towards MobileRead/Sigil/Calibre and all my favorite ebook tools... yet!

- - -

Quote:
Originally Posted by WV-Mike View Post
FineReader is now by subscription only. It seem they all are now.
Oh gods, what the hell have they done... I haven't visited their site in a few years.

I guess Finereader 16 jumped ship to that horrible yearly subscription fee nonsense.

Finereader 15 is then the last version that is standalone, so I'd recommend seeing if you can get a copy of that.

- - -

Personally, I still use Finereader 12, which is the version I purchased at the time.

Finereader 13->14 introduced a few minor features that I didn't feel were huge enhancements.

Finereader 15 introduced a lot more PDF + PDF comparison stuff, so I was tempted to upgrade, just never got around to purchasing it.

Quote:
Originally Posted by WV-Mike View Post
Too be clear this software preps the images prior to running OCR software.

Is that correct?
Yes. Scan Tailor Advanced is just a COMPLETELY OPTIONAL program or step.

If your original scan/PDF is fine, then you can just feed that right into your OCR.

But if you did things like:
  • Take photos of your pages using a camera.
  • Have wobbly/crooked/tilted pages.
  • Have 2 pages in 1 picture, with spine showing down the middle.
  • Have 1.5 pages showing, with page's edges inside the photo.
  • Dark speckles/spots/dust all around your pages.
  • [...]

Scan Tailor can help clean that type of stuff up, so when you DO feed it into OCR, the OCR has a much easier time and can be more accurate.

Last edited by Tex2002ans; 07-19-2023 at 10:19 PM.
Tex2002ans is offline   Reply With Quote
Old 07-20-2023, 06:45 AM   #18
WV-Mike
Connoisseur
WV-Mike began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jul 2023
Device: None
From print to ePub - how I did it

Quote:
Originally Posted by Karellen View Post
Try this OCR package... https://github.com/manisandro/gImageReader

As with all things Github, along the right side of the page you will see Releases. Click on that, look for the latest version which is usually at the top or second one down, expand the Assets button and download the appropriate installer.
Thanks, Karellen.
WV-Mike
WV-Mike is offline   Reply With Quote
Old 07-20-2023, 06:53 AM   #19
WV-Mike
Connoisseur
WV-Mike began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jul 2023
Device: None
Quote:
Originally Posted by Tex2002ans View Post
Yep, like others said...

Oh gods, what the hell have they done... I haven't visited their site in a few years.
I guess Finereader 16 jumped ship to that horrible yearly subscription fee nonsense.
I am hoping to find a used copy of the CD with KEY.
Anyone out there know of one which is available?

Thanks,
WV-Mike
WV-Mike is offline   Reply With Quote
Old 07-20-2023, 08:13 AM   #20
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,101
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
The only advice you'll get here on MR about getting a KEY is to legally purchase the software. No discussion of pirating is allowed.

If you are trying to follow tex's advice on obtaining Finereader 15 then I would just avail yourself of Mr Google or one of his cousins. They do fine work. (see what I did there? )
Turtle91 is offline   Reply With Quote
Old 07-20-2023, 09:31 AM   #21
WV-Mike
Connoisseur
WV-Mike began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Jul 2023
Device: None
Quote:
Originally Posted by Turtle91 View Post
The only advice you'll get here on MR about getting a KEY is to legally purchase the software. No discussion of pirating is allowed.

If you are trying to follow tex's advice on obtaining Finereader 15 then I would just avail yourself of Mr Google or one of his cousins. They do fine work. (see what I did there? )
I have done quite a few searches and have found nothing but very old software.

Thanks,
WV-Mike
WV-Mike is offline   Reply With Quote
Old 07-21-2023, 02:11 PM   #22
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
I randomly stumbled upon this video today.

I thought it did a pretty great job showing off the basics of Scan Tailor Advanced's steps:

He showed off how to split pages, reorient, add a box around the content, etc.

Even I learned a little something: I had no idea "Fill Zones" even existed—so I'll be using that in the future.

Towards the end, he even quickly showed "the equivalent steps" using Finereader, and you can see how much better/easier Scan Tailor is for cleaning up scans:
  • Finereader is "once you edit/change the image, that's it, you can't go back"

where:
  • Scan Tailor lets you readjust and fix any image at any stage, then just reoutput.

- - -

Note: I wouldn't follow a lot of his advice on "low DPI"... or how he exports images out of PDF (using low quality JPGs is going to introduce a lot more errors).

But overall, I thought the Scan Tailor parts were a great beginner intro.

The rest of his video, you can take with a huge grain of salt.

Last edited by Tex2002ans; 07-21-2023 at 03:42 PM.
Tex2002ans is offline   Reply With Quote
Old 07-22-2023, 02:28 AM   #23
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,107
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
That is a great video @Tex2002ans, really helpful.

Today I decided to create a better workflow with this new information.

Firstly, I had to do something about my prehistoric scanner. The interface is non-existant. Yep, when Adobe Flash Player was removed from Windows, I lost access to the scanners GUI. Up until now I've been using the WIA function on photoshop. Not ideal.

1. So hunting around for new WIA compliant scanner software, I found this...
https://www.naps2.com/
Simple and easy to use. The BEST feature is that it can batch scan. You enter how many scans to make, how many seconds between scans (6 sec in my case) and press Start. All you need to worry about is turning pages in that 6 seconds. In a matter of a few minutes 15 scans have been completed (30 book pages).

2. Once those scans are created, then it's time for Scan Tailor Advanced.
It is very quick and simple with all the batch processes. In a few minutes 30 pages are turned into OCR ready tiff images.

3. Then onto gImage Reader...
https://github.com/manisandro/gImageReader/
Batch OCR the 30 pages

4. Next comes LibreOFFICE and the OCR text is copied across.
This is where it becomes quite time consuming- fixing all those little OCR errors. Then marking the chapter headings. Once done export to epub.

Of course, the ebook needs a bit of work for a good quality final product, but my main concern was the OCR side.

I have previously tried to scan pages from books, but it was a very frustrating experience, and I spent close to three hours to scan 20 pages and add them to an ebook. I realise now I attempted this without the right knowledge and tools. So thanks for all the great pointers!!

If there is anything in my workflow that could be improved, please let me know.
Karellen is online now   Reply With Quote
Old 07-22-2023, 03:38 AM   #24
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Karellen View Post
1. So hunting around for new WIA compliant scanner software, I found this...
https://www.naps2.com/
Yes, NAPS2 is awesome. I've been using that for the past few years as well.

It:
  • helps gather/reorder images
  • can create a rough PDF from it
  • can even produce a quick OCR for you (using Tesseract).

I use it when I need to create a rough PDF from an actual scanner, or, to quickly crop/edit photos taken from a camera.

Like if my family gives me a small/short document to scan, I just use NAPS2 instead of busting out the full-blown editing + OCR tools!

Quote:
Originally Posted by Karellen View Post
I realise now I attempted this without the right knowledge and tools. So thanks for all the great pointers!!

If there is anything in my workflow that could be improved, please let me know.
Read the linked threads. There's years and years of knowledge I buried in there about every step of the workflow.

Edit in Word/LibreOffice (DOCX) or Sigil/Calibre (EPUB)?

In the DOCX stage, if that's where you prefer to do your edits...

LibreOffice has Regular Expressions, so if you know how to master those, you can do lots of mass corrections in there.

LibreOffice's Regex is SO MUCH better than Word's Wildcards... but it still has limitations. So...

Personally, I do all edits in Sigil/Calibre, because you have full access to:

And since you're working directly in HTML, nothing can hide from you.

For more on Regex + Spellcheck Lists, and even how to take advantage of some of this stuff in LibreOffice... see my post in:

If you follow the pyramid of links, it'll:
  • Summarize how/why they're helpful.
  • + link to many other MobileRead topics where I've written about it.

My Current PDF->EPUB Workflow

I settled on:
  • PDF -> Finereader to OCR
  • -> DOCX
  • -> Word / Toxaris's EPUB Tools
  • -> EPUB.

where:
  • PDF -> Finereader gives me fantastic OCR.
  • Finereader -> DOCX carries over most of the text/formatting.
    • Note: Finereader -> EPUB, at least in 12, was a little buggy, so you had potential to lose chunks of text/footnotes. Maybe things got better in 15+.
  • Toxaris's EPUB Tools are specifically built to fix lots of OCR/Finereader's quirks.
    • Merging split pages, fixing lists, fixing hard/soft hyphens, normalizing fonts/font sizes, removing font colors, [...].
    • (This saves TONS OF TIME from manual cleanup of simple OCR/formatting errors.)
  • Toxaris -> EPUB = incredibly clean HTML, carrying over the Styles + leaving you with barebones formatting (<h1>s + <i> + <b>).

This gives me extremely clean HTML code—with almost all the trash removed—so when I begin editing EPUB, I can focus purely on:
  • fixing the text
  • + reintroducing actual formatting I wanted to maintain, like blockquotes.

Cutting down on all the wasted in-between cleanup/repairing time drastically.

- - -

Side Note: Sadly, Toxaris's EPUB Tools is now abandoned + will not be getting support (or the much-anticipated version 2 release).

I did recover and share one of the final versions of EPUBTools (v1.27.1) in:

You could also still read Toxaris's original "EPUBTools" MobileRead thread or visit his (now-dead) website via Archive.org:

The instant I finally gave in and began using this, it fully converted me. It was just SO MUCH BETTER than the manual cleanup I was doing before.

And the "Dialogue Checker" alone is the best dang thing since sliced bread:

To even APPROXIMATE that same type of "find the mismatching quotation marks" functionality... this is the kind of steps + Regexes you'd need to use:

and that still doesn't even get close to what Toxaris solved with his amazing cleanup tool.

- - -

Side Note #2: If you want more random EPUB productivity tips, also see my posts in:

Last edited by Tex2002ans; 07-23-2023 at 01:48 AM.
Tex2002ans is offline   Reply With Quote
Old 08-22-2023, 12:56 PM   #25
jwes
Enthusiast
jwes began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jul 2023
Device: none
Quote:
Originally Posted by Tex2002ans View Post
Fantastic! Congrats.
Boy, oh boy... Well, you've come to the right place.

I've been writing about this stuff extensively since 2012.
Have you thought about assembling your posts into an epub or a wiki? I clicked on a link which led me to a post with several more interesting links, which . . . until I felt I was in a maze of twisty (not so) little forum posts, all different.
jwes is offline   Reply With Quote
Old 08-22-2023, 01:36 PM   #26
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by jwes View Post
Have you thought about assembling your posts into an epub or a wiki? I clicked on a link which led me to a post with several more interesting links, which . . . [...]
Yes! See link in my signature.

My blog, Digital Slug—where I'll be collecting + reorganizing all my ebook knowledge—will be coming.

- - -

Like I said in Post #17, the past two years, I've mostly been focusing on LibreOffice tutorials/info + really boosting my Technical Writing skills.

When the blog eventually comes, I'll also be rewriting the ebook information in a much more easy-to-digest form.

So instead of having the knowledge spread across 2000+ MobileRead posts and 1100+ Reddit posts, it'll be gathered in one location—the blog!

Quote:
Originally Posted by jwes View Post
[...] until I felt I was in a maze of twisty (not so) little forum posts, all different.
Yes, similar thing happens to me when I'm searching for old info.

The frustrating thing is the titles of many of the MobileRead/LibreOffice topics have nothing to do with the underlying answers.

So while I know a given answer is buried in the replies, most others might not.

(For example, there might be some godly answer about italics/emphasis, but it was a side-discussion happening while answering Questions A, B, and C.)

Part of what I'm aiming to do with the blog is gathering/pulling out all that info I've written about over the years, making it much easier to read and search through. Then, I'd bring it all up to the latest standards / best practices too!

So if you came across some older MR post from 2016, I probably came up with much better ways/explanations since then!

Last edited by Tex2002ans; 08-22-2023 at 01:38 PM.
Tex2002ans is offline   Reply With Quote
Old 08-25-2023, 01:54 AM   #27
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by jwes View Post
Have you thought about assembling your posts into an epub or a wiki?

Such as adding to
https://wiki.mobileread.com/wiki/Dig...ooks_to_Ebooks
AlanHK is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
EPUB DIRECT PRINT hershe ePub 2 02-21-2013 01:28 AM
Can I print an Epub book? Bart123 ePub 3 12-01-2011 12:04 AM
Print version of ePub rplantz ePub 3 09-08-2011 03:51 AM
epub print squashed pendragginp Calibre 16 11-10-2010 08:19 AM
How can I print an Epub jimjam ePub 4 11-27-2009 11:41 AM


All times are GMT -4. The time now is 05:51 AM.


MobileRead.com is a privately owned, operated and funded community.