Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-11-2013, 02:41 AM   #16
Tex2002ans
Evangelist
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 424
Karma: 360129
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by xristy View Post
O'Reilly is a publisher which does charge a one-time fee for the set of mobi/ePub/PDF/Daisy with no DRM! Certainly it is a pricing choice and nothing inherent in the use of multiple sized PDFs.
I really haven't paid much attention to the technical book market since I finished college.

I now just get outraged when books are over $30! I can't imagine myself paying hundreds of dollars for any books any more.

Especially with a lot of the technical fields I am interested in (programming, math, economics, physics), you can find perfectly good material FOR FREE. If I ever did go purchase a physical book on the topic, there is sure as hell no way I would go for the latest/"greatest" edition.

Quote:
Originally Posted by xristy View Post
As I have mentioned, I get very good results with Acrobat X and good quality scans.
And again, the key here is "good quality scans". In practice, this is the exception, not the rule.

In many cases, you cannot get the good quality scan!

Either they paid a crappy scanning company to scan the book (as you can see, crappy/cheap solutions bring headaches later), the book itself is so old that it is degraded (water stains), the book is rare (so this is the only copy that you have), someone wrote in the book (this one makes me want to pull my hair out! NEVER WRITE IN YOUR BOOKS OR YOU WILL SUFFER MY WRATH! ).

For example, here is one of the most egregious examples (~50 out of 576 pages were marked BADLY)... a few were marked minorly (I was able to fix those before OCR):

Click image for larger version

Name:	pg126.png
Views:	37
Size:	85.8 KB
ID:	116523 Click image for larger version

Name:	pg126EPUB.png
Views:	37
Size:	70.0 KB
ID:	116524

Click image for larger version

Name:	pg130.png
Views:	34
Size:	94.5 KB
ID:	116525 Click image for larger version

Name:	pg130EPUB.png
Views:	35
Size:	75.9 KB
ID:	116526

It doesn't matter what amazing PDF reader you are using on your tablet, there is no way you can get that scan as good as that EPUB.

But yes, having a great scan goes a great way in speeding up the OCR process and making it more accurate. It can chop down a process that would take me a few hours, down to less than an hour (this is with me double-checking the areas marked as "unsure" by the OCR).

Quote:
Originally Posted by xristy View Post
I don't know what Archive.org is doing but their results are not very uplifting as far as OCR'd PDFs and searching.
Well, most of their stuff is in the "not great scan" category (mostly because the books are so old). They run it through OCR with no human intervention (I believe they use the Finereader engine (?)), and while it is "99.8%" accurate (or something like that), there are still a bunch of errors (which is why you pay for a human to look through it and fix it).

Last edited by Tex2002ans; 12-11-2013 at 02:55 AM.
Tex2002ans is offline   Reply With Quote
Old 12-11-2013, 03:43 AM   #17
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,227
Karma: 11710165
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Hi, all:

There's a lot I'd love to reply to in this thread, but we're at "that time of year," and I'm so slammed I don't have time to breathe, much less write long posts. We had another 40 books walk in the door today, that have to be done by the 20th...and on top of what's already here, all having arrived even later than usual this year, that's really pushing our buttons.

@Tex:

I would never accept someone else's INDD file, and I don't know anyone, you excepted, it seems, who will. If I'm getting an INDD file from another designer, it means that s/he doesn't know how to make an ePUB from it, which means that 5 hours in, we're still going to be trying to figure out what "char-override 66" means, in the CSS. Moreover, we almost never do get all the INDD files when we do get a submission; the images are missing, the fonts are missing, you name it. It's always a mess, and it's always a file that's laid out in ways that aren't supported in ebooks. I simply gave up and won't take them any longer. Believe it or not, it's FASTER for us to OCR it with Abbyy and export it using our custom clips, than it is to slum through 60, 70, 100 "character override styles" and figure out what the designer MEANT to say. Not to mention, regexing everything into submission. Pah on that.

Wordperfect? Sure. Fine. But still, the point is, like the old joke, you can't get there from here. Even with MathML, you can't output the content (the equations) in any textual way that can be supported. Back to images, and thence we are no forrader, as they say.

Nobody in India is getting $0.50-$5.00 page for a scanned book. Not for the scanning. They'll get closer to $0.50 for the completed book, per page. That's in ePUB and MOBI formats, both. That pricing includes the scanning (if needed), OCR, A/B compare, html output, ePUB creation, MOBI creation. It can get up to $1/page, but generally, that's where it tops out, and the Indians are now being underpriced by the Chinese, FWIW.

With regard to "PDF's" and how great they are: sure, on a massive tablet like the iPad, they're great, although I find trying to page through them really annoying no matter what reader I'm using. However, they are anything but great on smaller tablets, even the larger Kindle tablets or the Mini-Pad. Then, they suck, because you are constantly pinch-zooming them and trying to read them and scrolling around, etc. So, it's different horses for different courses. Believe me, we do a LOT of technical work (we did an 1800 page Medical Textbook that I often discuss with a lot of cursewords), and I'd be the first to agree that some things should stay in a print-layout, to facilitate perfect vertical and horizontal alignment. Unfortunately, or fortunately, take your pick, many people, like you, want their books portable. The only sellers for PDF are basically small bookstores online, Smashwords (and you can't even sell your original PDF there, mind you--it's a Calibre-conversion-created PDF), and your own websites. As many people who've sold from their own website will tell you, unless you're O'Reilly, that dog doesn't hunt.

And speaking thereof: yeah, he offers multi-book format packages, and I don't think I know a soul who's bought one. Not a single person. They cost the earth.

I'm not opposed to using PDF's for technical books; I'm really not. And I turn business away all the time that walks in the door with a big, technical book that I do not think will convert well. Ditto some cookbooks, kids' books, etc. But that market's appetite is whetted for portable books that can be sold on larger retailers, like Amazon. As long as Amazon, B&N and iBooks won't sell PDF's, I just don't see that working, from a commercial standpoint.

And, lastly, making a print-layout PDF isn't a finger-snap. Even for plain fiction, it takes time to do correctly. Doing a full-bore, print layout for a highly technical book will cost a LOT of money, and the publisher has to feel that the result of that expenditure will be worth it. The average print layout house that will take that type of work (we don't, not for print), starts pricing at ~$5/page, (250 words) and then goes UP from there, adding for each element (each formulae, each equation), every blockquote, each pullquote, etc. When you start talking about 300 page texts, it can really add up. Hell, even Createspace, which is subsidized by Amazon and which can run at a loss, charges $679 to start a book with a "custom complex interior" and then adds $25/pop for each "table or chart." Start doing that math, add the cost of creating QUALITY ebooks on top of that, versus sales price, and royalty...and there you go. You're talking thousands in print layout costs--without even starting on the ebook versions. Publishing is a business, and the numbers have to make sense to the publishers.

That's all I have time for...I know I had a bunch of other things, but...like the Rabbit in AoW, I gotta go.

Hitch
Hitch is offline   Reply With Quote
Old 12-11-2013, 05:21 AM   #18
xristy
Connoisseur
xristy doesn't litterxristy doesn't litterxristy doesn't litter
 
Posts: 54
Karma: 210
Join Date: Sep 2007
Device: iPad
Quote:
Originally Posted by Tex2002ans View Post
For example, here is one of the most egregious examples (~50 out of 576 pages were marked BADLY)... a few were marked minorly (I was able to fix those before OCR)

...

It doesn't matter what amazing PDF reader you are using on your tablet, there is no way you can get that scan as good as that EPUB.
Well if I was populating a serious economics website and considered Fabian Freeway an important book representing the position of the website then I would get a better copy to work from. They're out there.

And I would very much rather look at a good quality PDF than the ePub - at least you make both available on the website.

Again I am saying make the PDFs available as well as the ePub/mobi. You are and that's great, but most publishers/distributors are not.
xristy is offline   Reply With Quote
Old 12-11-2013, 05:51 AM   #19
Tex2002ans
Evangelist
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 424
Karma: 360129
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by Hitch View Post
I would never accept someone else's INDD file, and I don't know anyone, you excepted, it seems, who will.
Well actually, I don't (I despise Adobe products... I tend to stick with free/open source alternatives). BUT, if someone DID hand one over, I wouldn't mind installing InDesign and exporting it for myself. At work, the typographer actually just exports the InDesign EPUB for me. (I just completed a new book last month, simultaneous PDF/EPUB/MOBI/Physical release).

Quote:
Originally Posted by Hitch View Post
If I'm getting an INDD file from another designer, it means that s/he doesn't know how to make an ePUB from it, which means that 5 hours in, we're still going to be trying to figure out what "char-override 66" means, in the CSS.
Yeah, that dreaded CSS that comes out of there is HORRIBLE (especially because a lot of this stuff is not relevant in EPUB (kerning, letter-spacing, etc. etc.). I tend to request the EPUB (from InDesign) AND the finalized PDF.

I do A/B comparison (have PDF open in left-half of screen, EPUB on right), and I strip out all the classes that I see are not relevant (I actually strip everything down to pretty much headings + blockquotes + bold/italics).

Then I plop in my "in-house CSS", and from there, I just go through and introduce spacing, no indentation, a few margins here and there... doesn't take long at all. (Last InDesign EPUB took me 5 hours (most of this was me checking the book for actual typos/inconsistencies)).

Although you probably get a LOT more horrible documents than I do. (I must admit, I maybe only did six or seven new books directly from InDesign output, some were super clean, others were pretty bad (but still better than PDF )).

Also, your "in-house CSS" is probably a lot more complex than what I use. My mentality is bare minimum, for maximum portability, and minimal chance of breaking on the multitude of present/future devices.

Side Note: Which reminds me, another thing that the cheap places do is just Input -> Output. Someone who cares about quality will spend a little time to point out ACTUAL typos/inconsistent usage. (For example, I point out hyphenation problems, forgetting to italicize a newspaper/journal, missing accents in words (Indexes usually are rife with little errors), check my site for in-depth changelog of hundreds/thousands of typos I have caught when making the EPUBs, ...)

Quote:
Originally Posted by Hitch View Post
Moreover, we almost never do get all the INDD files when we do get a submission; the images are missing, the fonts are missing, you name it.
This is what I was talking about with people not understanding the tools they use! We like to imagine that everyone is masters of XYZ, but in reality, there are A TON of people who don't know what they are doing when designing the documents.

As I mentioned a few posts back, you will have someone who does something as simple as editing metadata, and thinks that the output PDF is exactly the same (it sure "looks" the same).

Or you have people who use Word/InDesign/Quark and make their document LOOK good, but have zero clue about using Styles. So the "backend" of the file is HIDEOUS (not noticeable until you try to change formats/move things around).

And Hitch can probably explain the horror Word document stories (after Christmas time it seems). You always get the dreaded person who PRESSES ENTER TWO TIMES to get a "double-spaced" document.

Quote:
Originally Posted by Hitch View Post
Believe it or not, it's FASTER for us to OCR it with Abbyy and export it using our custom clips, than it is to slum through 60, 70, 100 "character override styles" and figure out what the designer MEANT to say.
Hmmm... and the error rate that is introduced? Do you do something like: Strip all the InDesign poo out of the EPUB (so it is basically plaintext), and then compare that to your OCRed output?

That is how I handle cases where I pull HTML from a different source. I run the original PDF through a very rough OCR, and then code compare what I generated with the HTML site. They usually catch mistakes that I missed, and I usually catch mistakes that they missed.. so combined, I get a better EPUB in the end!

I am just ecstatic every time I run into the book in anything OTHER than PDF, ANYTHING is better than working backwards from PDFs. (Although I do like to have both versions available so I can pull higher resolution images)

Quote:
Originally Posted by Hitch View Post
Nobody in India is getting $0.50-$5.00 page for a scanned book. Not for the scanning. They'll get closer to $0.50 for the completed book, per page. That's in ePUB and MOBI formats, both. That pricing includes the scanning (if needed), OCR, A/B compare, html output, ePUB creation, MOBI creation. It can get up to $1/page, but generally, that's where it tops out, and the Indians are now being underpriced by the Chinese, FWIW.
Yeesh... didn't know it was that cheap. Maybe I was mixing up my American/Indian companies. I scoured looking for PDF pricing a while back, most of them just have it hidden and say "send us the PDF and we will quote you!"

Quote:
Originally Posted by Hitch View Post
Even for plain fiction, it takes time to do correctly. Doing a full-bore, print layout for a highly technical book will cost a LOT of money, and the publisher has to feel that the result of that expenditure will be worth it.

[...]

You're talking thousands in print layout costs--without even starting on the ebook versions.
Thanks for the information. I definitely don't have much knowledge about the print side of things.

Side Note: Tome about the conversion process is complete!
Tex2002ans is offline   Reply With Quote
Old 12-11-2013, 06:05 AM   #20
xristy
Connoisseur
xristy doesn't litterxristy doesn't litterxristy doesn't litter
 
Posts: 54
Karma: 210
Join Date: Sep 2007
Device: iPad
@Hitch, Thanks for the informative response.

Quote:
Originally Posted by Hitch View Post
With regard to "PDF's" and how great they are: sure, on a massive tablet like the iPad, they're great, although I find trying to page through them really annoying no matter what reader I'm using. However, they are anything but great on smaller tablets, even the larger Kindle tablets or the Mini-Pad. Then, they suck, because you are constantly pinch-zooming them and trying to read them and scrolling around, etc. So, it's different horses for different courses. Believe me, we do a LOT of technical work (we did an 1800 page Medical Textbook that I often discuss with a lot of cursewords), and I'd be the first to agree that some things should stay in a print-layout, to facilitate perfect vertical and horizontal alignment. Unfortunately, or fortunately, take your pick, many people, like you, want their books portable. The only sellers for PDF are basically small bookstores online, Smashwords (and you can't even sell your original PDF there, mind you--it's a Calibre-conversion-created PDF), and your own websites. As many people who've sold from their own website will tell you, unless you're O'Reilly, that dog doesn't hunt.
Exactly. PDFs are sensible on larger format devices and not well suited to smaller devices. It seems to me that trying to wade through a serious technical text - such as a medical text with copious illustrations, photos and the like - on a smaller format device is pretty much something one has to be pretty desperate to do - it just doesn't seem to be a workaday solution to me.

Quote:
Originally Posted by Hitch View Post
And speaking thereof: yeah, he offers multi-book format packages, and I don't think I know a soul who's bought one. Not a single person. They cost the earth.
I have actually purchased several of the O'Reilly eBook packages.
xristy is offline   Reply With Quote
Old 12-11-2013, 06:16 AM   #21
Tex2002ans
Evangelist
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 424
Karma: 360129
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by xristy View Post
Well if I was populating a serious economics website and considered Fabian Freeway an important book representing the position of the website then I would get a better copy to work from. They're out there.
Hey, I don't make the choices, I only convert books. If it was up to me, I would also want high quality scans (so if someone like me came along in the future, I would be able to save that guy headaches... or save you in bad PDF headaches on your tablet).

Luckily the EPUB satiates nearly all people, and those who dislike the quality of the (PERFECTLY FREE) PDF, well then, they can suffer and buy a physical version (although a used version might/might not be worse).

Quote:
Originally Posted by xristy View Post
Again I am saying make the PDFs available as well as the ePub/mobi. You are and that's great, but most publishers/distributors are not.
Luckily, more are seeing the advantages of offering the books in a multitude of formats (Physical/PDF/EPUB/MOBI)... the publishing industry is full of slow lumbering beasts! I think the past few years though the tide has really been turning. The Kindle/Tablet/ereader sales are really tough to ignore.

And the better the tools get for InDesign/Quark export, I think the slightly better quality ebooks we will see. (Although I sense you will still have a lot of this "designed for iBooks" type nonsense). This will allow a lot of those typesetters who are not very familiar with HTML/coding, to more easily auto-export cleaner code.

Quote:
Originally Posted by xristy View Post
I have actually purchased several of the O'Reilly eBook packages.
Just like I am one of the few who works from badly designed InDesign files. There are dozens of us... DOZENS!!!

Last edited by Tex2002ans; 12-11-2013 at 06:36 AM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Commercial ePub (3) authoring software icsorea ePub 9 06-12-2012 04:40 PM
Troubleshooting Kindle and math formula DrShakalu Amazon Kindle 12 12-11-2011 07:25 AM
tables, math formulas & different fonts in a .mobi file? Zim Kindle Formats 3 10-22-2011 07:10 PM
'Grey texts' and 'Typos' in Kindle ebooks fyrogenesis Amazon Kindle 3 02-01-2011 11:41 AM
Scanned books to Epub, best software? Student1 Workshop 4 02-27-2009 03:08 PM


All times are GMT -4. The time now is 10:19 PM.


MobileRead.com is a privately owned, operated and funded community.