Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book General > General Discussions

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2021, 07:52 PM   #46
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 8,419
Karma: 59666665
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
Quote:
Originally Posted by retiredbiker View Post
The OCR Internet Archive uses is so bad, it is easier to do it myself rather than try and correct any of their text formats, especially magazines. Even scraping the text off a pdf is really full of errors. . So I start with one of the cbr or cbz files. If only a pdf is available, I will use pdfimages to get the images out and use those.

For the OCR, I use tesseract with a GUI front end called OCRFeeder (on Linux). On each page I select the text I want and then recognise it. This lets me do multi-column magazines, avoid advertisements, deal with "continued on page 161", and so on. I copy each column of text into LO Writer. It's pretty fast, for a two-column magazine page I average about 50 seconds for the select-recognise-copy-paste part. At the end I convert the odt file from Writer to epub using Calibre, and touch it up in the editor.

OCRFeeder does a great job of finding correct paragraphs, dealing with end-of-line hyphens, and so on, so there is very little detail formatting needed. I've a handful of saved styles in Writer for chapter headings, notes or letters or signs in the text, poetry and so on.

Of course there are scannos--proofreading the result is the most time-consuming part of the work, by far. The clarity of the original print job, and the image, determine how many errors you get. A really, really clear image of excellent printing might give 1 or 2 errors per page, but if the print is very blurry and there are lots of dirty marks, it could be 100 per page. So some source files I'll look at, and say "no thanks" on that one.

Labour intensive, yes, but it's a hobby. I might spend 4 or 5 days on an issue of something like Dime Detective, with maybe 8 stories, 75,000 words and 12 illustrations.
Also, what is up with Internet Archive's PDF compression? It rarely renders correctly on my Kobo, and instead I just get an image of text smudges. Even on my PC it's slow to render.

It doesn't matter much now because they semi-recently removed the option to ADE-download (and thus sideload) most of their Open Library books, but it still affects the public-domain stuff.

Last edited by ownedbycats; 11-06-2021 at 01:27 AM.
ownedbycats is online now   Reply With Quote
Old 11-05-2021, 09:43 PM   #47
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 378
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Ubuntu, Jutoh,Kobo Forma
Quote:
Originally Posted by ownedbycats:
Also, what is up with Internet Archive's PDF compression? It rarely renders correctly on my Kobo, and instead I just get an image of text smudges. Even on my PC it's slow to render.
If you run pdfimages on one of these, you get out all sorts of crap. Black images, images of blurry smudges, images of real text, often inverted to white on black. Formats are mostly .ppm and .pbm. Without going into detail, I use ImageMagick, mostly, to end up with just the images I want. I read somewhere that the non-text images are masks, but I have no idea how a pdf uses them. I haven't seen this elsewhere, only from IA.

I've never tried one on my Kobo. They display well in my PC's Document Viewer. Sounds like the Kobo is displaying those blurry smudgy ones.
retiredbiker is offline   Reply With Quote
Advert
Old 11-06-2021, 01:18 AM   #48
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 8,419
Karma: 59666665
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
I seem to vaguely recall something about Internet Archive using some sort of proprietary compression tech, but I'm having a hard time trying to pull enough details from my memory to be able to actually find anything on it - it's possible I might've posted before about it so I'll look.

Last edited by ownedbycats; 11-06-2021 at 01:20 AM.
ownedbycats is online now   Reply With Quote
Old 11-06-2021, 01:02 PM   #49
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,262
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by DyckBook View Post
Well that answer my question of when did this start?
My two questions are:

1. Why would any publisher/author think this enhances a book in any way?

2. Does any book reader actually find all capital letters in the first several words (or more) of a chapter pleasing?
j.p.s is offline   Reply With Quote
Old 11-06-2021, 01:17 PM   #50
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,262
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by ownedbycats View Post
Also, what is up with Internet Archive's PDF compression? It rarely renders correctly on my Kobo, and instead I just get an image of text smudges. Even on my PC it's slow to render.
Quote:
Originally Posted by retiredbiker View Post
If you run pdfimages on one of these, you get out all sorts of crap. Black images, images of blurry smudges, images of real text, often inverted to white on black. Formats are mostly .ppm and .pbm. Without going into detail, I use ImageMagick, mostly, to end up with just the images I want. I read somewhere that the non-text images are masks, but I have no idea how a pdf uses them. I haven't seen this elsewhere, only from IA.
I suspect it has to do with the "Archive" part of the name. They are trying to preserve the books as well as they can. Volunteers supply scans. IA stores the data. Others, possibly in the far future, produce restored versions. The various files are meant as input to data processing rather than for viewing.

For example, I think the white text on black background images are intended to be used as masks to apply to full page color or grayscale scans to aid in making pure white backgrounds. They also supply them for pages with line drawings, which makes cleaning up such images much easier.
j.p.s is offline   Reply With Quote
Advert
Old 11-06-2021, 04:04 PM   #51
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,825
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by DiapDealer View Post
Not even metaphorically?

My point is why sweat this crap at all? The one thing I hate about ebooks is the new crop of armchair typography-experts/police they seem to have spawned.
This. It seems to be an obligatory comment in amazon books. Very occasionally the reviewer will give examples, but it is rare. I have frequently read said books, and noticed no typos. And yet the typo nazi can barely stand to read the book because of the horrible typos. What is going on in their brains?
Pajamaman is offline   Reply With Quote
Old 11-06-2021, 04:08 PM   #52
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,825
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by DNSB View Post
But at least everyone who bought the same edition of the print book got the same typesetting. Compare to the same edition on multiple ereaders where the page display is, at best, rather variable.
This is why i use a reading app that allows me to overide all publisher settings. Kills unwanted formatting dead!
Pajamaman is offline   Reply With Quote
Old 11-06-2021, 04:15 PM   #53
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,825
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by DyckBook View Post
Well here's another editing faux-pas that drives me crazy. Adding 's to the end of a word that already ends in s. For example Collins's book should be Collins' book. At what point am I no longer just correcting minor editing missteps for legibility and peace of mind; and am now trying to hold back a tidal wave of global ignorance?
I agree, but i think its become the standard, at least in fiction, to the point that i do it, eventhough i dont like.
Pajamaman is offline   Reply With Quote
Old 11-06-2021, 04:19 PM   #54
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 2,825
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
Quote:
Originally Posted by DyckBook View Post
Well that answer my question of when did this start?
For starters, i can state that the sumerians did not use it.
Pajamaman is offline   Reply With Quote
Old 11-06-2021, 04:32 PM   #55
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,262
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by Pajamaman View Post
For starters, i can state that the sumerians did not use it.
Weren't alphabets all capitals all the time back then?
j.p.s is offline   Reply With Quote
Old 11-06-2021, 05:14 PM   #56
ZodWallop
Gentleman and scholar
ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.
 
ZodWallop's Avatar
 
Posts: 10,910
Karma: 106650939
Join Date: Jun 2015
Location: Space City, Texas
Device: Clara HD; Nook ST w/Glowlight, (2015) Glowlight Plus, Paperwhite 3
Quote:
Originally Posted by ownedbycats View Post
Also, what is up with Internet Archive's PDF compression? It rarely renders correctly on my Kobo, and instead I just get an image of text smudges. Even on my PC it's slow to render.
I tried using a freeware OCR program to convert some IA PDFs and saw exactly those smudges, though I don't see them when opening the PDFs on Edge.
ZodWallop is offline   Reply With Quote
Old 11-06-2021, 05:15 PM   #57
ZodWallop
Gentleman and scholar
ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.ZodWallop ought to be getting tired of karma fortunes by now.
 
ZodWallop's Avatar
 
Posts: 10,910
Karma: 106650939
Join Date: Jun 2015
Location: Space City, Texas
Device: Clara HD; Nook ST w/Glowlight, (2015) Glowlight Plus, Paperwhite 3
Quote:
Originally Posted by Pajamaman View Post
This. It seems to be an obligatory comment in amazon books. Very occasionally the reviewer will give examples, but it is rare. I have frequently read said books, and noticed no typos. And yet the typo nazi can barely stand to read the book because of the horrible typos. What is going on in their brains?
Of course, it is always possible that the reason you saw no typos is because of the complaints in older reviews.
ZodWallop is offline   Reply With Quote
Old 11-06-2021, 05:31 PM   #58
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 8,419
Karma: 59666665
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
Quote:
Originally Posted by ownedbycats View Post
I seem to vaguely recall something about Internet Archive using some sort of proprietary compression tech, but I'm having a hard time trying to pull enough details from my memory to be able to actually find anything on it - it's possible I might've posted before about it so I'll look.
Found it.

Quote:
Originally Posted by ownedbycats View Post
Yes, I've seen the invisible text layer on some PDFs from other sources. From what I recall, the Internet Archive uses LuraTech's brand of mixed-raster content compression. Basically there's several different images and the text all layered on each page. It's pretty efficient in filesize, but slow to render and can look pretty terrible if done poorly.
ownedbycats is online now   Reply With Quote
Old 11-06-2021, 07:39 PM   #59
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,700
Karma: 6254413
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by Mel View Post
The thing that drives me crazy is when they bundle books together and then create a faux boxed set cover. Which defeats the whole purpose of a cover because they've angled the graphic so you can't read it.
The silver lining is that they often are lazy about assembling it, so it's just the individual EPUBs (even with separate duplicated stylesheets, filenames, and front/back matter) merged together in the OPF plus some additional front/back matter, so it's easy enough to split them into the original individual books again.
geek1011 is offline   Reply With Quote
Old 11-06-2021, 07:43 PM   #60
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,700
Karma: 6254413
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by JSWolf View Post
One thing that's come from eBooks is paragraph spaces. Novels in print don't have them. Why do some eBooks?
I personally like a tiny amount of space (~0.75 of a single-spaced line) between paragraphs if I'm reading on a computer screen or a larger phone.

Quote:
Trying to duplicate the pBook version is also silly sometimes. It doesn't always work. So why do it if it doesn't work?
The main thing which bothers me is when they render headers as high-resolution JPEGs (not even SVG), wasting lots of space. One particularly bad case of this is the North American EPUBs of Brandon Sanderson's books.

Other than that, the styling doesn't bother me too much unless they just use a whole lot of spans with IDs rather than proper classes and semantic tags since I often apply my own common stylesheet to all non-FXL fiction books I read. If I didn't do this, the worst would be when they have fractional em sizes on body text, overridden line spacing, NBSP-based alignment, or fake margins.
geek1011 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free (Kindle UK) You Drive Me Crazy [Chick-Lit Korea Expat Workplace Romantic Comedy] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 1 07-05-2017 10:32 AM
Free (nook/Kindle/iTunes/ePub) People Can't Drive You Crazy If You… [Xtian Self-Help] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 0 11-17-2014 07:01 AM
Free Book (EPUB) - How to Drive Your Competition Crazy koland Deals and Resources (No Self-Promotion or Affiliate Links) 1 10-27-2011 02:16 AM
Free (Kindle) Crazy Sexy Cancer Tips (Crazy Sexy) arcadata Deals and Resources (No Self-Promotion or Affiliate Links) 1 01-21-2011 01:15 PM
Will an iLiad drive me crazy? Polaris Which one should I buy? 6 06-03-2008 02:35 AM


All times are GMT -4. The time now is 06:51 AM.


MobileRead.com is a privately owned, operated and funded community.