View Single Post
Old 11-12-2013, 05:41 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by kksdragons View Post
It has about 80 html files and a lot of graphics generated during the scanning process that look like smeared pages.
May I ask what the process was to go from PDF -> EPUB? (This is the area where I do most of my work, I have converted ~180 books so far).

I explained and posted a rough outline of most of my OCR methods (I use Finereader + Sigil) in this topic: https://www.mobileread.com/forums/sho...d.php?t=223817

Quote:
Originally Posted by kksdragons View Post
I honestly can't remember if it validated originally but I have Sigil set to validate w/flight crew every time I save (in hopes this somewhat idiot proofs what I'm trying to do :P).
Ugh... Validating the entire book definitely makes Sigil chug when you are on such a large file.

I would recommend turning off the validate on Save, and keeping the Preview window open instead (View - Preview (F10)):

http://web.sigil.googlecode.com/git/...t/preview.html

If you have the Preview window open while you are coding, you will see all of the changes in real time. If you make a mistake in the code, the Preview window will go bonkers and tell you (plus you can see the changes as you make them).

Quote:
Originally Posted by kksdragons View Post
After my original post, I DID go back in and delete all those crummy graphics and that seems to have helped. It hasn't crashed since.
Could have been a corrupt image. Usually I try to take care of all of that at the PDF cleaning + OCR stage, so that crap never even makes it into the EPUB files.

What you want to do is try to minimize as many errors as possible at the OCR stage, so you will focus almost fully on just tweaking/combining/moving things around in Sigil... and not having to worry so much about massive amounts of typos.

Quote:
Originally Posted by theducks View Post
I have not run into a book too big.
Heh.. heh... There are a few out there that make Sigil drop to its knees. Although I haven't had one that made Sigil crash on very often... just take forever to open.

Quote:
Originally Posted by theducks View Post
Frequent saves are in order.
Yes, ALWAYS a good idea to save multiple versions of the EPUB as you work on it.

I tend to save a new version after every "round" of changes (I added in all the blockquotes, SAVE, I added in headers, SAVE, I added alignment, SAVE, I fixed footnotes, SAVE.). In a 1000+ page book..... I would probably save new versions after every few chapters.

Quote:
Originally Posted by kksdragons View Post
I'm now editing an epub created from a PDF that's over 1,100 pages.
May I ask what the book is? That is a mighty large book... and let me warn you beforehand, converting a book this large does become QUITE the daunting task. It feels like you barely make any progress after you have been working on it for HOURS, and really sucks the morale right out of you.

I tend to take a break and work on other smaller/easier conversion projects in between, just to feel like I am making some sort of progress.
Tex2002ans is offline   Reply With Quote