View Single Post
Old 11-12-2023, 02:55 PM   #20
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by NovelFan View Post
So the goal is conversion pdf->docx->epub.
That is one way, yes.

If you are more familiar with Word/LibreOffice and fine with editing in DOCX, you can do that.

The key thing is, using an OCR program that can figure out the PDF text/layout/formatting, and give you a clean document you can work from.

Having a GUI, where you can quickly compare original vs. converted is also a HUGE TIMESAVER. (Like I showed in those "magnified" Finereader examples... Left = Original, Right = Converted, Bottom = Zoomed-in Version of PDF.)

The better your OCR is, the more time you'll save on all those later steps. Think of it like a pyramid. If you have garbage foundation, you're going to be spending so much more time on all those later steps, trying to correct the errors you introduced in the beginning. The more problems you can squash EARLY, the better off you'll be.

Quote:
Originally Posted by Karellen View Post
How does that happen?
How does an author write a novel, spend considerable time and mental energy on it, then not have a copy? Do the publishers demand all copies to be handed in? Does the author have to buy their own book to read it?
Or is it a case of bad luck and copies were destroyed in some disaster?
Yep, Quoth is exactly correct.

You only handed in the "first draft" document. Publishers took it from there, then did all of their bells/whistles to it. Editing, layout, Indexing, etc.

In the olden days, you'd only get the physical Print proofs + a final copy.

In the newer days, authors might get handed the digital PDF.

But almost never would they get the actual, original, completed source files. (InDesign, Quark, etc.)

- - -

Publishers then go out of business, change buildings, fire/hire new people, etc., losing the originals.

Authors also do a horrible job with backing up important files too, so while the physical book might have survived and still be sitting on their shelves... the old PDF copy might have been completely lost (on an old laptop that broke, hard drive died, old computer got tossed away, etc. etc.).

- - -

Side Note: If you're interested in decades of publishing, also see this fantastic documentary:

Back then, you'd only print X copies, then poof... the original pages would just disappear. They wouldn't store those things indefinitely.

- - -

Quote:
Originally Posted by Karellen View Post
How does that happen?
How does an author write a novel, spend considerable time and mental energy on it, then not have a copy? Do the publishers demand all copies to be handed in? Does the author have to buy their own book to read it?
Or is it a case of bad luck and copies were destroyed in some disaster?
Heh, in many cases, the author/publisher might send me a book I worked on.

But there are plenty I've worked on (with my name in the Acknowledgements) that I don't have.

Same with journal articles, etc. etc. These things just get lost in time. Takes up too much space, you "have a digital copy of it" so you threw away the original, etc.

Look at all the reasons why people get rid of their physical book collections, even though they might LOVE books.

- - -

Side Note #2: Same exact thing with film/TV. Just today, an article came out about 2+ lost old "Doctor Who" episodes being found:

These things get lost and buried in someone's collection for over 60 years.

Side Note #2.1: If you're interested in that, you might also be interested in this great video:

Side Note #3: And if you're interested in other old magazines being lost in time... see the fantastic article:

and his podcast episodes about it:

Computer Shopper was this monthly magazine from 1979–2009. An absolute treasure trove of information + articles over decades... completely lost in time.

Jason Scott is one of the top archivists at the Internet Archive (Archive.org), so he was describing this enormous undertaking of digitizing these. After many years, he finally got his hands on nearly every single copy of the magazines.

And, in the Hacker News comments, you can see all sorts of authors and people coming out of the woodwork, thinking their old articles and things were completely lost. They then discuss some of their influences too, and awesome that these things are now possible to be rediscovered.

Last edited by Tex2002ans; 11-12-2023 at 03:22 PM.
Tex2002ans is offline   Reply With Quote