Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-12-2010, 05:53 PM   #16
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Question My Spans are makin' me crazy...

Hi, gang:

I have a similar "issue," for which I would appreciate input, even if it's just a "best practices," or "this worked for me..." kind of response, rather than a feature request:

I get a LOT of backlist, rights-reverted titles from authors. They don't have their own digital files, so the books are physically sent out to a scanning operation. Now, the scanner is a great guy, and he can output the files in a variety of ways, BUT (ain't there always a butt?), to benefit my authors the most, he outputs it in a Word file that is essentially ready for POD--all the page headers (author, book title, page #), etc. So, when I get the fracking files, for epubs and mobis, I have to remove all that crap. Fine, so far, so good. I do NOT do this in Sigil, because I'm running 32-bit XP, and Sigil, as much as I love it, is waaaaaaaaaaaaaay slow (make a change, go brew a cup of coffee, pour it, sip it, schlep back to the 'puter) before you break it into chapters.

HOWEVER, here's where I run into significant brain-damage: I read the thread on the span-space issue, but I'm having the opposite problem. The Scan-to-Word process creates a zillion (literally...I had an 1100-line CSS file out of this POS) spans that occur at "random," to the human eye. When you look at the html in an html editor, it's crapola. Multiple spans inside of paras, sometimes halfway through a word for no apparent reason, and so forth; almost all for para and font styling. The PROBLEM arises in that, if I import this into Sigil (even after removing the CSS and sticking it in a separate stylesheet for subsequent importation), the spans create a scenario in which the spaces between words are omitted. This is a frequent occurrence--many times in each paragraph--so I cannot output the final file like this.

My current "solution" (and I admit it sucks) is to strip ALL the formatting from the Word file, in Word 2003, and then manually reformat the whole bloody thing. Every bit of italic, bold, blockquotes, blah blah blah. It takes me about 3 hours to strip it, reformat it and do a page-by-page compare with a pdf of the original printed output. THEN I export it as stripped html, take out the newly-created css (now down to something manageable like 30 lines of code), and THEN put it in Sigil, and THEN do the Chapter Breaks and add the miscellany, like "About the Author" pages and so on.

Sorry to be long-winded, but I'm trying to be clear about the problem. Here's my question:

Does ANYONE have a better way? Something faster and easier? Because honestly, it makes my head explode. I love using Sigil for the epub because I can create TOC items (like illustrations or maps or whatever) by using the H1 Title attribute, and just because it's cool. Should I put these wanking things into Calibre, which flattens the CSS, and then pull the epub into Sigil??? Would that work????

So, if anyone has ANY suggestions that would make my life easier, make this go faster, or automate this process further, I'd be grateful.

OT: Happy LeMans Weekend!! Coolest event all year. I've got popcorn and doughnuts for my 24-hour marathon viewing.

Hitch
Hitch is offline   Reply With Quote
Old 06-12-2010, 08:47 PM   #17
yekim54
What the Dog Saw
yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.yekim54 ought to be getting tired of karma fortunes by now.
 
yekim54's Avatar
 
Posts: 305
Karma: 981684
Join Date: Jul 2008
Location: Dunn Loring
Device: Sony PRS-505, PRS-650, Asus TF101
Quote:
Originally Posted by Hitch View Post
So, if anyone has ANY suggestions that would make my life easier, make this go faster, or automate this process further, I'd be grateful.
You could try opening the Word file with Open Office and then saving it as HTML 3.2. This might get rid of most of the crapola but save your important formatting.
yekim54 is offline   Reply With Quote
 
Enthusiast
Old 06-12-2010, 09:58 PM   #18
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by yekim54 View Post
You could try opening the Word file with Open Office and then saving it as HTML 3.2. This might get rid of most of the crapola but save your important formatting.
Hell, I tried opening it in Writer, but my results weren't that great--iirc, the spans were still present, but I'll try the save as html 3.2 again, just in case I missed a step somewhere, and see if it works. Thanks for the suggestion.

n.b.: Okay, I tried it again this evening and now I remember the issues; there are (heresy, I know) far too many OO Writer problems to make it viable, unless I do all the Search and Replace functions for section breaks, headers, P chars, etc., in Word and then import it into Writer and then save it. (I mean, the fact that you can't do a viable S&R on manual page breaks alone is more brain-damage than it's worth). Also, Writer creates almost as many problems with a zillion font declarations and paragraph style declarations as the Word span issues, it's a wash. I spent 3 hours just changing the para and font declarations, and I still have only made it through Chapter 1...so that's not working.

Any other suggestions by anyone?

Hitch

Last edited by Hitch; 06-13-2010 at 04:08 AM. Reason: Tried it, reporting back.
Hitch is offline   Reply With Quote
Old 06-13-2010, 06:36 AM   #19
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 346
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
1) If you know the guy doing the scanning, why not get him to send you something a little more basic than a Word file? Surely he has an interim format that is more useful to you.

2) Any chance you could post some fragment of a file here that we could take a look at and try with various ideas?

cap

ps: I once reformatted a scanned PDF to HTML conversion that took something like 60 hours of work to make look right. I'm still not sure it's completely correct. I certainly won't do it again.
capidamonte is offline   Reply With Quote
Old 06-13-2010, 07:15 PM   #20
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by capidamonte View Post
1) If you know the guy doing the scanning, why not get him to send you something a little more basic than a Word file? Surely he has an interim format that is more useful to you.

2) Any chance you could post some fragment of a file here that we could take a look at and try with various ideas?

cap

ps: I once reformatted a scanned PDF to HTML conversion that took something like 60 hours of work to make look right. I'm still not sure it's completely correct. I certainly won't do it again.
Hi, capidamonte:

First, I should have better explained, but I shortcutted when I made the POD reference. In order to best serve my clients, most of whom are digitizing rights-reverted books, having them receive a Word file that is already perfectly formatted for Print On Demand (with a few minor changes on the copyright page), is in their best interests. They can make a few changes to the copyright page, output it to pdf and the whole book is perfectly formatted for Print on Demand. This makes more work for yours truly, the slob who's doing the ebooks, because I have to remove all that crap, but it gives my clients the option to provide their backlist in EITHER format, without them paying additional monies to create differing files. Do you see what I'm saying?

Besides: if it's any more "basic" than a Word file, it doesn't really help me. I can have him send me a txt file, but that puts me right back where I am; once I remove all the bloody formatting, I have to go through the thing page-by-page and put back IN all the italics, blockquotes, bladdy-blah-blah. And the html he outputs is the same as what I convert from the Word file, so that doesn't help me.

I'm in the middle of an experiment using Word to do one set of things (change the page margins, eliminate the section breaks and s&r all the soft hyphens) and then using Sigil to do the regex searches to eliminate all the bloody spans. (I would use Crimson Editor for the regex, since Sigil is a titch sloooooooow during the saves, but the word-wrapping problem in CE is driving me bats.) If that doesn't work as I think it might, I'll copy a chunk of text and put it in a file and upload it here.

Thanks, seriously,

Hitch
Hitch is offline   Reply With Quote
Old 06-13-2010, 07:40 PM   #21
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Hi, cap:

Well, I thought I'd figured out a way to get this to work, but I can't seem to get around the span problems. I've attached a sampling from the live file I'm working in right now, as a txt file.

As you'll see, some of the spans need to be deleted and replaced with a space; some don't. There does not seem to be any rhyme or reason as to what type of span appears where. I had some success with uploading the file to OO Writer and then tweaking it in Sigil; so my process may end up being something horrid like open in Word; s&r the section breaks and soft hyphens; save; open in OOWriter, then save as html; THEN open in Sigil and remove all the bloody font declarations. No matter how you slice it, this is just a pain. Ignore the page headers that are still in there; I was going to experiment with using regex to simplify that process, as well.

Hitch
Attached Files
File Type: txt demo.txt (5.0 KB, 94 views)
Hitch is offline   Reply With Quote
Old 06-13-2010, 08:16 PM   #22
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,713
Karma: 18475602
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Use Bookdesigner. Save your Word document as RTF and load it BD. Make any edits you need to make. Save it as HTML and it will be a lot easier to clean up then HTML from word.

I have done this with some pretty bad HTML code pulled out of some older Mobipocket eBooks. Was a lot easier to clean up the HTML and convert to ePub.
JSWolf is offline   Reply With Quote
Old 06-13-2010, 08:47 PM   #23
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by JSWolf View Post
Use Bookdesigner. Save your Word document as RTF and load it BD. Make any edits you need to make. Save it as HTML and it will be a lot easier to clean up then HTML from word.

I have done this with some pretty bad HTML code pulled out of some older Mobipocket eBooks. Was a lot easier to clean up the HTML and convert to ePub.
{sigh} Nothing ventured, nothing gained. I'll try it. I wonder how many more pieces of software I can schlep into this process?

Thanks for the info, I'll give it a shot.

Hitch
Hitch is offline   Reply With Quote
Old 06-13-2010, 09:07 PM   #24
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 346
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
Hitch,

First, this is obviously output from Sigil -- and at that, it's fairly doable with regex.

You've basically got useless, undifferentiable <span> tags marking spaces and hyphens. I ran the following in my text editor:

Code:
  1. REMOVE <p class="MsoNormal sgc-\d+"><span class="sgc-\d+">\d+(\&nbsp\;)*<span class="sgc-\d+">SALLY WRIGHT</span></span></p>
  2. REMOVE SPACEclass="MsoNormal sgc-\d+"
  3. REMOVE </i><i>
  4. REPLACE </span><span class="sgc-\d+">ing WITH ing
  5. REPLACE </span><span class="sgc-\d+"> WITH SPACE
  6. REMOVE <span class="sgc-\d+">
  7. REMOVE </span>
in the order listed.

(SPACE is a single blank space. REMOVE means replace with nothing.)

I was left with one error (Cum berland) that was found by spell-check.

Now, that's particular to this text. If your others are markedly similiar, it could work. Worst cases can be found via spell check, and extra spaces can be easily fixed via regex (s: \s\s+ r:\s )

Write all this regex as a macro, and it will take one click to fix an entire book.

One that I would worry about would be the SALLY WRIGHT header/footer -- is it consistent enough? Is there a pattern to the inconsistency?

Also, are there other identifiable prefices/suffices like ing that are recognizable?

If you're getting different, dissimilar results in every book, then I'd suggest posting a section of the actual Word or Writer HTML, as there may be loss of better identifiable patterns in translation to Sigil.

Also, Jon may have suggested a good answer; there is recently a Book Designer HTML0 to Sigil pre-processor script here in the forums that's supposed to improve importation.

cap
Attached Files
File Type: txt demo-regexed.txt (2.9 KB, 65 views)
capidamonte is offline   Reply With Quote
Old 06-14-2010, 02:58 AM   #25
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by capidamonte View Post
Hitch,

First, this is obviously output from Sigil -- and at that, it's fairly doable with regex.

You've basically got useless, undifferentiable <span> tags marking spaces and hyphens. I ran the following in my text editor:

Code:
  1. REMOVE <p class="MsoNormal sgc-\d+"><span class="sgc-\d+">\d+(\&nbsp\;)*<span class="sgc-\d+">SALLY WRIGHT</span></span></p>
  2. REMOVE SPACEclass="MsoNormal sgc-\d+"
  3. REMOVE </i><i>
  4. REPLACE </span><span class="sgc-\d+">ing WITH ing
  5. REPLACE </span><span class="sgc-\d+"> WITH SPACE
  6. REMOVE <span class="sgc-\d+">
  7. REMOVE </span>
in the order listed.

(SPACE is a single blank space. REMOVE means replace with nothing.)

I was left with one error (Cum berland) that was found by spell-check.

Now, that's particular to this text. If your others are markedly similiar, it could work. Worst cases can be found via spell check, and extra spaces can be easily fixed via regex (s: \s\s+ r:\s )

Write all this regex as a macro, and it will take one click to fix an entire book.

One that I would worry about would be the SALLY WRIGHT header/footer -- is it consistent enough? Is there a pattern to the inconsistency?

Also, are there other identifiable prefices/suffices like ing that are recognizable?

If you're getting different, dissimilar results in every book, then I'd suggest posting a section of the actual Word or Writer HTML, as there may be loss of better identifiable patterns in translation to Sigil.

Also, Jon may have suggested a good answer; there is recently a Book Designer HTML0 to Sigil pre-processor script here in the forums that's supposed to improve importation.

cap
Hi, cap:

First, big-time thanks to you. Second, somehow, I managed to not provide the segment of code I thought I was providing; there are innumerable instances of the "Cumberland" problem in the full book (and all the other books produced by this process), in which some closing spans-opening spans should be spaces, some should be nothing, but not all "nothings" have recognizable suffixes; they're simply where the word was broken in the original justified text. Maybe a regex s&r would work better than deleting all the "optional hyphens" in Word. Hmmm.

The page headers--the Title and the Author headers--really concern me the least of my issues. It's the spans that are driving me daft. I can, and have, "fix" this by doing the whole bloody book manually--it's about a 3-hour job--but I'd rather not, mostly because once I've stripped all the formatting, I worry that I'll miss italicization or some other small thing. of course, any time I can do something faster, I'm happy with that, also.

I'm going to try the Bookdesigner thing, but I admit that I don't see how a template will "fix" the spans issue, which is created entirely by Word in its attempts to interpret what it's been fed by the OCR. As I said, though, nothing ventured, nothing gained. If this doesn't work, I'm going to try your regex "fix" on the raw html file (what html editor are you using? I seem to have a pretty endless series of mystery problems using Crimson Editor when it comes to word-wrapping, which apparently can't be "undone," because it apparently can't "see" wrapped words as contiguous for regex searches.)

THANKS, guys, your input keeps me from tearing my hair out,

Hitch
Hitch is offline   Reply With Quote
Old 06-14-2010, 06:28 AM   #26
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Hi:

quick update: the BD template has two very handy macros--tag italic and tag bold--which you can use to identify those separately, then strip all the existing formatting and replace it with a basic font/para/etc. selection. So far, so good; I think if I start in Word, and use its best features (the s&r for section breaks, etc.), then put it into the BD template and use the tag features, strip the funky formatting, create an epub and then put it into Sigil for the finishing touches, that should work great--I think. Famous Last Words, LOL!

I'm going to put up a new thread about this shortly, BUT, while we're all hanging out here, anyone found a good way to put in reviews as metadata? Or any other way to squeeze reviews in, since I can't manually input them onto the Amazon, etc. pages?

Thx,

Hitch
Hitch is offline   Reply With Quote
Old 06-14-2010, 06:26 PM   #27
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 346
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
Are you asking about where in an ePub to store such info? Try the comments field.

cap

ps: I used a text editor to do the regex. I highly recommend using a text editor instead of any sort of word processor to develop your xhtml. 'Course, I've been doing it that way for a while, and I've got some good practices. Once you've got it right, and clean, the import/conversion process goes very, very well.
capidamonte is offline   Reply With Quote
Old 06-14-2010, 07:09 PM   #28
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 346
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
Oh, I found the HTML0 to HTML script here.

cap
capidamonte is offline   Reply With Quote
Old 06-14-2010, 07:56 PM   #29
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by capidamonte View Post
Are you asking about where in an ePub to store such info? Try the comments field.

cap

ps: I used a text editor to do the regex. I highly recommend using a text editor instead of any sort of word processor to develop your xhtml. 'Course, I've been doing it that way for a while, and I've got some good practices. Once you've got it right, and clean, the import/conversion process goes very, very well.
Howdy:

I thought I'd put in one of these posts (?) that I'm using Crimson Editor--and in fact asked you which one you were using, because I occasionally have problems with regex searches in CE when I am also using WordWrap.

I wouldn't consider using a wordprocessor to generate xhtml or html, for that matter. I only recently moved "up" to CE from notepad, LOL!!

Yes, I was wondering if anyone had had any luck "squishing" reviews into a comments field or other miscellaneous field (for example, Sigil has a "reviewer" metadata field, but nothing for the review itself, never have figured out the point of that one) that would feed the various resellers for display, like LibreDigital (for those of us who do NOT have Mac's!!).

Thanks, @cap. I'm working on tweaking my way out of the page header issues (the Title and Author headers page by page) in the BD-Word template, then using its "tag italic" and "tag bold" features to mark what I need to save--that's really been the kicker--and using regex in Sigil for all the other wee bits.

I have a new author I'm starting tonight who fortunately is not enamored of nine bazillion bits of italicization, which will, I hope, make my life easier. I'm going to try to do my whole first "sweep" through the scanned doc in BD and see what happens...although I'm still a little unsure about saving the file as html while in BD...I'll experiment with that (wouldn't it be just as easy to output it as epub and pull it into Sigil for finalization and tweakage?).

I've been doing plain "conversions" of txt, Word, pdf files for a while for clients on their backlist, but this latest experiment with their backlists getting OCR'd and then sent to me has been a whole new joy. I am very grateful for all the help I've been getting here.

Hitch
Hitch is offline   Reply With Quote
Old 06-14-2010, 07:58 PM   #30
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,520
Karma: 13917553
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by capidamonte View Post
THANKS!

Hitch
Hitch is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove Formatting crutledge Sigil 5 09-15-2010 02:04 PM
Should ebooks specify exact paragraph and page formatting? sourcejedi General Discussions 27 07-01-2010 06:08 PM
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 05:06 PM
RFE: Remove remove tags in bulk edit magphil Calibre 0 08-11-2009 10:37 AM
Anyway to remove paragraph spaces in pdb files? twister Other formats 3 03-12-2009 09:36 PM


All times are GMT -4. The time now is 08:35 PM.


MobileRead.com is a privately owned, operated and funded community.