MobileRead Forums - View Single Post

Hitch · 06-12-2010, 05:53 PM

Hi, gang:

I have a similar "issue," for which I would appreciate input, even if it's just a "best practices," or "this worked for me..." kind of response, rather than a feature request:

I get a LOT of backlist, rights-reverted titles from authors. They don't have their own digital files, so the books are physically sent out to a scanning operation. Now, the scanner is a great guy, and he can output the files in a variety of ways, BUT (ain't there always a butt?), to benefit my authors the most, he outputs it in a Word file that is essentially ready for POD--all the page headers (author, book title, page #), etc. So, when I get the fracking files, for epubs and mobis, I have to remove all that crap. Fine, so far, so good. I do NOT do this in Sigil, because I'm running 32-bit XP, and Sigil, as much as I love it, is waaaaaaaaaaaaaay slow (make a change, go brew a cup of coffee, pour it, sip it, schlep back to the 'puter) before you break it into chapters.

HOWEVER, here's where I run into significant brain-damage: I read the thread on the span-space issue, but I'm having the opposite problem. The Scan-to-Word process creates a zillion (literally...I had an 1100-line CSS file out of this POS) spans that occur at "random," to the human eye. When you look at the html in an html editor, it's crapola. Multiple spans inside of paras, sometimes halfway through a word for no apparent reason, and so forth; almost all for para and font styling. The PROBLEM arises in that, if I import this into Sigil (even after removing the CSS and sticking it in a separate stylesheet for subsequent importation), the spans create a scenario in which the spaces between words are omitted. This is a frequent occurrence--many times in each paragraph--so I cannot output the final file like this.

My current "solution" (and I admit it sucks) is to strip ALL the formatting from the Word file, in Word 2003, and then manually reformat the whole bloody thing. Every bit of italic, bold, blockquotes, blah blah blah. It takes me about 3 hours to strip it, reformat it and do a page-by-page compare with a pdf of the original printed output. THEN I export it as stripped html, take out the newly-created css (now down to something manageable like 30 lines of code), and THEN put it in Sigil, and THEN do the Chapter Breaks and add the miscellany, like "About the Author" pages and so on.

Sorry to be long-winded, but I'm trying to be clear about the problem. Here's my question:

Does ANYONE have a better way? Something faster and easier? Because honestly, it makes my head explode. I love using Sigil for the epub because I can create TOC items (like illustrations or maps or whatever) by using the H1 Title attribute, and just because it's cool.

Should I put these wanking things into Calibre, which flattens the CSS, and then pull the epub into Sigil??? Would that work????

So, if anyone has ANY suggestions that would make my life easier, make this go faster, or automate this process further, I'd be grateful.

OT: Happy LeMans Weekend!! Coolest event all year. I've got popcorn and doughnuts for my 24-hour marathon viewing.

Hitch

06-12-2010, 05:53 PM	#16
Hitch Bookmaker & Cat Slave Posts: 11,503 Karma: 158448243 Join Date: Apr 2010 Location: Phoenix, AZ Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2	My Spans are makin' me crazy... Hi, gang: I have a similar "issue," for which I would appreciate input, even if it's just a "best practices," or "this worked for me..." kind of response, rather than a feature request: I get a LOT of backlist, rights-reverted titles from authors. They don't have their own digital files, so the books are physically sent out to a scanning operation. Now, the scanner is a great guy, and he can output the files in a variety of ways, BUT (ain't there always a butt?), to benefit my authors the most, he outputs it in a Word file that is essentially ready for POD--all the page headers (author, book title, page #), etc. So, when I get the fracking files, for epubs and mobis, I have to remove all that crap. Fine, so far, so good. I do NOT do this in Sigil, because I'm running 32-bit XP, and Sigil, as much as I love it, is waaaaaaaaaaaaaay slow (make a change, go brew a cup of coffee, pour it, sip it, schlep back to the 'puter) before you break it into chapters. HOWEVER, here's where I run into significant brain-damage: I read the thread on the span-space issue, but I'm having the opposite problem. The Scan-to-Word process creates a zillion (literally...I had an 1100-line CSS file out of this POS) spans that occur at "random," to the human eye. When you look at the html in an html editor, it's crapola. Multiple spans inside of paras, sometimes halfway through a word for no apparent reason, and so forth; almost all for para and font styling. The PROBLEM arises in that, if I import this into Sigil (even after removing the CSS and sticking it in a separate stylesheet for subsequent importation), the spans create a scenario in which the spaces between words are omitted. This is a frequent occurrence--many times in each paragraph--so I cannot output the final file like this. My current "solution" (and I admit it sucks) is to strip ALL the formatting from the Word file, in Word 2003, and then manually reformat the whole bloody thing. Every bit of italic, bold, blockquotes, blah blah blah. It takes me about 3 hours to strip it, reformat it and do a page-by-page compare with a pdf of the original printed output. THEN I export it as stripped html, take out the newly-created css (now down to something manageable like 30 lines of code), and THEN put it in Sigil, and THEN do the Chapter Breaks and add the miscellany, like "About the Author" pages and so on. Sorry to be long-winded, but I'm trying to be clear about the problem. Here's my question: Does ANYONE have a better way? Something faster and easier? Because honestly, it makes my head explode. I love using Sigil for the epub because I can create TOC items (like illustrations or maps or whatever) by using the H1 Title attribute, and just because it's cool. Should I put these wanking things into Calibre, which flattens the CSS, and then pull the epub into Sigil??? Would that work???? So, if anyone has ANY suggestions that would make my life easier, make this go faster, or automate this process further, I'd be grateful. OT: Happy LeMans Weekend!! Coolest event all year. I've got popcorn and doughnuts for my 24-hour marathon viewing. Hitch