Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-18-2010, 05:52 AM   #1
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,650
Karma: 15280959
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Hmmm, SGC's being added weirdness

Hi, Valloric, all:

I've now used Sigil on some 30-odd books, give or take, with good, solid results. However, the last two books I've done, both having been sent to me from Word files, have had bizarre problems that I thought I'd share here.

In both instances, I went through the files in NoteTab after converting Word->html. I stripped everything I could find, leaving only <p> tags and some <i> hither and yon. Apparently, a few old MS <spans> slipped by me, and this seems to have caused major disasters.

In both instances, after I put the files into Sigil, and they looked fine, for some inexplicable reason, Sigil inserted sgc-X styles into EVERY, and I mean EVERY, paragraph, completely overriding the included external ss and changing the fonts and the font size, usually to match a vestigial span laying about somewhere in the manuscript.

I didn't catch it, visually, the first time it happened (I was rushing to get a proof copy out to a good client), and she emailed me in a swivet, because the book was such a disaster. I opened it and looked at it, and what to my wondering eyes should appear, but a book with differing fonts appearing out of literally nowhere, so that instead of TNR 12, I would have 11 for half a paragraph, and then back to 12. Or an entire para would be 11 now, instead of the 12 it should have been when it only had <p> tags. I fixed that one by going through and S&R'ing all the damned sgc codes out, but I thought then that it had been caused by something weird that might have come through from its initial origin, which was Adobe ID. (InDesign->RTF->Word->to me).

But then, I had a perfectly normal, plain Word file from another client, which I converted into html (same as I always do), then stripped it down, added noindent paras for new scenes, the usual. It rendered fine; I put it into Sigil; it looked perfectly good...and then, ka-blammo!, out of nowhere, the whole thing was restyled with sgc tags. Every plain old <p> was suddenly <p style="sgc-3">, which looked NOTHING like what I'd put in my ss. The ss was clearly used by the Sigil file (the title page proved that)...I ended up AGAIN going through and deleting all the "style="blahblah" text, and the Tidy cleaned up the spans that ALSO appeared out of nowhere.

It's tres bizarre. I've pared down my stylesheet to the very bare essentials, so it's not that. I know that it's not helpful for me to post this here without the file, so next time it happens, I'll do a "save as" so you can see the result, BUT, I'm familiar enough with Sigil, and I've done enough books, where this is not normal behavior. I can't remember when the last update was, but this has only happened with the last two books. It seems to occur when Sigil encounters a piece of vestigial <span> code, leftover from Word, that sets a different font than that which was set for the <p> in the stylesheet...and for some odd reason, it creates an sgc class to match that span, which wouldn't be so bad, but it then assigns that newly-created sgc class to ALL paragraphs, which it should not fracking do.

If anyone has had similar experience, or if, Valloric, you are familiar with this, I'd be grateful for some input on how to avoid this in the future. I mean, sure, I can go through and rip out all the sgc-X coding, but it's a bit annoying after I've hand-coded most of the html file to get rid of all the accursed MS coding. (I used to use BookDesigner's templates as an interim step, to strip all the formatting but leave the italics, but I still ended up having to strip all the "MsoNormal" coding from the <p>'s, and the stylesheet it generated was godawful. Thousands of lines of embedded fonts for no apparent reason and no way to prevent it.) Thank the heavens for NoteTab, which is "da bomb."

Anyway...sorry for the length, but it's confusing and frustrating, coming out of left field when I'd gotten this process down to a pretty smooth operation. If anyone has any insight, that would be great.

Thanks,

Hitch
Hitch is offline   Reply With Quote
Old 09-18-2010, 06:11 AM   #2
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Is it related to this? Tidy goes haywire on unclosed spans.
Valloric is offline   Reply With Quote
Old 09-18-2010, 07:05 AM   #3
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,188
Karma: 727236
Join Date: Sep 2009
Device: PRS-505
Cleaning up Word-generated html is fine if you're a masochist. If you regularly generate epubs from Word files then Atlantis Word processor is a far better option.

Valloric is right, this behaviour sounds very much like what happens when HTMLTidy tries to resolve an unclosed inline tag.
charleski is offline   Reply With Quote
Old 09-18-2010, 11:44 AM   #4
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,236
Karma: 6020307
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Been there, had that happen.
Mess up just one closing tag (non-)placement and

Either it truncates the file, or it garbage's it up.

Is there a way to make it leave the file alone and toss the ball back to my court?

"You Broke it: Now, Fix me!"
(OK) (Cancel)
theducks is offline   Reply With Quote
Old 09-18-2010, 12:11 PM   #5
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by theducks View Post
Is there a way to make it leave the file alone and toss the ball back to my court?

"You Broke it: Now, Fix me!"
(OK) (Cancel)
Forthcoming. Once the validation system is in place, there should be a way to inform the user of any errors and let him deal with them by hand instead of handing it to Tidy.
Valloric is offline   Reply With Quote
Old 09-18-2010, 07:33 PM   #6
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 346
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
Hitch, as a last step workaround, search for span> in NoteTab. It'll at least help you find all the span tags before you import.

cap
capidamonte is offline   Reply With Quote
Old 09-18-2010, 09:15 PM   #7
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,650
Karma: 15280959
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
It could very well be unclosed spans...but more likely unOPENed spans. I tend to find all the opened ones...not so much the closing tags. Wait..that can't be right, Sigil usually finds and nukes those. Must be unclosed ones, then, although the wondrous NoteTab usually makes unclosed tags obvious, even to dunderheads like yours truly.

Charleski: I don't really choose to "generate" epub files from Word; I receive Word files from clients (ether direct or as Abbyy output from scanners). I tried Atlantis--briefly, admittedly--and wasn't blown away. Really, cleaning up Word files, as long as you have a method to tag the italicization, is easy as hell--you just click "Clear Formatting" and bobs-yer-uncle.

But I did NOT use BD for these last two, electing insted to output the html and clean it myself, because I get pissed-off at having to scroll through the 2000 lines of ss that BD puts in the exported html file--even "filtered," since BD seems to feel some compulsion to embed every font in the universe, for reasons that completely elude me--in order to cut it out of the html file.

I was able to get BD to export the html once, without nine bazillion lines of ss, but I've never been able to replicate it, so I was trying to eliminate that step and regex all the crapola out of the file. Clearly, I seemed to miss a span or two and holy moley, what a mess. Fortuitously, I was able to find them all (the sgc's) and nuke 'em, but it made my head hurt.

I'll just use BD again, much as I hate adding that extra step. It's only 5 minutes and it'll save me this type of "SURPRISE!!!" in the future.

@Charleski: I'll try Atlantis again, I still have it here somewhere. Can I import .doc and export clean-ish html??
@Cap: good idea. I'll try that on the one I have in here for edits.
@Valloric: Still love Sigil.
@Ducky: Hola, comrade!! I feel your pain. Really.

Hitch
Hitch is offline   Reply With Quote
Old 09-19-2010, 03:03 PM   #8
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,188
Karma: 727236
Join Date: Sep 2009
Device: PRS-505
Quote:
Originally Posted by Hitch View Post
I'll try Atlantis again, I still have it here somewhere. Can I import .doc and export clean-ish html??
It converts everything to externally-styled xhtml. It also refactors the styling and separates it all out into paragraph and character-level styles, though if the styles are a mess in the source then that will carry over. Some character-level styles can be unnecessarily repeated, though it rips out all the needless system-level font and colour information. It's far easier than trying to clean up the disgusting html Word exports.

They still haven't fixed the incorrect method they use for doing page borders, though that's a lot easier to fix than the irritating embedded @page styles calibre uses. Internal margins and indents also need to be changed from points (or inches) to em, but that's a problem with the .doc format itself as much as with AWP. All automatic converters need some post-processing of the output, though for AWP this can all be done in the css, and most extraneous styles point to inappropriate styling that should be cleaned up in the Word file.
charleski is offline   Reply With Quote
Old 09-26-2010, 05:01 AM   #9
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,650
Karma: 15280959
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
As a follow-up...it appears that the author in question had exported RTF from InDesign (curse you, Adobe!), then stuck it in Word, thence on to yours truly. Natch, when I exported it to HTML w/o using BD first, the nine gazillion spans came with it that were created by an imperfect InD export.

Having learned my lesson {sigh...AGAIN}, I'll just use BD's built-in macros to strip all the spans via the "clear all formatting" methodology, which allows me to tag the italics first, which is actually the big issue with "clear all." Yes, clearing the bloody stylesheet font embedding is maddening, but it's easier than cleaning up Sigil's attempts to "help me" with unfinished spans. Not Sigil's fault, but it did give me a few heart failures along the way.

Hitch
Hitch is offline   Reply With Quote
Reply

Tags
sgc, stylesheet, substitution, tags

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hmmm, Outsourcing Public School to India... kennyc Lounge 15 10-16-2010 04:57 PM
Hmmm...could this be the Kindle 3? brecklundin News 19 02-04-2010 03:33 AM
Hmmm...can you say group buy of K2's @ $100-$150/ea? brecklundin News 1 06-14-2009 08:39 AM
Weirdness... rakshi Calibre 14 03-14-2009 01:56 PM
Hmmm iRex or Sony (DRM question) Bierius iRex 31 09-05-2006 06:30 PM


All times are GMT -4. The time now is 08:31 PM.


MobileRead.com is a privately owned, operated and funded community.