Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-17-2012, 02:39 PM   #1
Leverpullr
Member
Leverpullr began at the beginning.
 
Leverpullr's Avatar
 
Posts: 16
Karma: 10
Join Date: Jul 2012
Location: Left Coast
Device: Kindle/generic android tablet
Best Pre-Sigil word processor tool/workflow?

Hi,
Been a lurker for a while now, but after working on several ebook projects using Sigil I have a question for other regarding best practices / tools for getting an ebook manuscript into shape BEFORE importing into Sigil.

I know it isn't technically a Sigil question, but after struggling with _horrible_ html output from MSWORD 97 (really really bad), and WORD2003 (better, but so so ugly and bloated..) I figured there were better workflow options and tools that would help me avoid fixing hundreds of EPUB validation issues with every book.

My key questions are:
1) What word processors can export to html that is nmore EPUB/xhtml clean.
2) What is your workflow like: i.e. original manuscript in MS WORD (as most start there..) to application X to do Y -->Then use __ to ___ --> import to Sigil --> Save to .epub.
Leverpullr is offline   Reply With Quote
Old 07-17-2012, 03:38 PM   #2
Rabbi Steve
Junior Member
Rabbi Steve began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2011
Location: Northern California
Device: Kindle Touch
Great question! I too have it, so thanks for posting it. I look forward to someone's reply. I just got Scrivener, and am still learning it what it can do and I know it has epub export options, so I wonder if it would fit this bill.

Anyway, thanks again for posting a question, I'm sure a lot of us have.
Rabbi Steve is offline   Reply With Quote
 
Enthusiast
Old 07-17-2012, 04:49 PM   #3
elibrarian
Connoisseur
elibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to behold
 
elibrarian's Avatar
 
Posts: 87
Karma: 19700
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: Sony PRS-T1, iPad
Quote:
Originally Posted by Leverpullr View Post
Hi,

My key questions are:
1) What word processors can export to html that is nmore EPUB/xhtml clean.
2) What is your workflow like: i.e. original manuscript in MS WORD (as most start there..) to application X to do Y -->Then use __ to ___ --> import to Sigil --> Save to .epub.
I guess there's a lot of "religion" and personal preferences involved here, but in my opinion you would be better off using a text editor or maybe a (x)html editor (my preferred one is Notetab, but there are of course others. Or you could copy the clean text into Sigil and use that to edit). Epubs don't involve very many formatting codes to look nice, and judging from the posts here and in other mobile forums, people are using more time cleaning up the nasty code from various word processors, than it would take to format a textfile.

- but that's of course just my 0.02$ and I'm looking forward to the discussion coming up

Regards,

Kim

Last edited by elibrarian; 07-17-2012 at 04:50 PM. Reason: Typos
elibrarian is offline   Reply With Quote
Old 07-18-2012, 02:55 AM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,737
Karma: 2117255
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
I just do it in Word. I convert the Word document to HTML with my macro. I do not use the save as (filtered) HTML in Word. I import that in Sigil and I continue there.

I have also done quite some tests with DocToHTML and that produces rather clean code and you can immediately convert it to XHTML on the go. It can also create a stylesheet.

Others use OpenOffice with the Writer2Epub add-on.
Toxaris is offline   Reply With Quote
Old 07-18-2012, 04:52 AM   #5
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,004
Karma: 10381859
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by Leverpullr View Post
Hi,
Been a lurker for a while now, but after working on several ebook projects using Sigil I have a question for other regarding best practices / tools for getting an ebook manuscript into shape BEFORE importing into Sigil.

I know it isn't technically a Sigil question, but after struggling with _horrible_ html output from MSWORD 97 (really really bad), and WORD2003 (better, but so so ugly and bloated..) I figured there were better workflow options and tools that would help me avoid fixing hundreds of EPUB validation issues with every book.

My key questions are:
1) What word processors can export to html that is nmore EPUB/xhtml clean.
2) What is your workflow like: i.e. original manuscript in MS WORD (as most start there..) to application X to do Y -->Then use __ to ___ --> import to Sigil --> Save to .epub.
I find most of the "anti-Word" hysteria to be hyperbolic. Yes, it produces messy HTML, but so does OO, no matter what the evangelists for it say, ditto LO. WordPerfect's output is just as bad. Scrivener MOBI's, I think I read, are currently being rejected (which makes me wonder if they forked Calibre?), as are Calibre's, by and large. I was underwhelmed with Atlantis' output; I think Jutoh works pretty well for DIY'ers.

The bottom line for me is that you need to know regex. At least a little. At the end of the day, I haven't found a single magic bullet that will automagically clean up Word or any other word-processing output. We very simply clean up the Word file if necessary (we get a lotta, lotta, LOTTA crappy files--I mean, really awful), but mostly we clean the files in HTML, and the tool of choice here is NoteTab Pro. Not NotePad, NoteTab. We extract the HTML and then run a variety of standardized clips to clean it; then we clean up any residual oddities.

So, our process is:

Word (or other input source)-->HTML-->NoteTabPro-->Sigil.

From the ePUBs, we have custom PERL and, again NTP clips that we use to create an inline TOC from the ncx, as well as make some other mods (usually the guide), and then drop it on Kindle Previewer/Kindlegen for MOBI versions.

That's it. So, our "magic bullet" is simply to work in a super HTML editor. We do the finalization in Sigil, and any post-production copyedits there as well. That's it.

HTH,
Hitch
Hitch is offline   Reply With Quote
Old 07-18-2012, 05:36 AM   #6
elibrarian
Connoisseur
elibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to behold
 
elibrarian's Avatar
 
Posts: 87
Karma: 19700
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: Sony PRS-T1, iPad
Oh well, forgot the question of our workflow, here goes:

  • Scanning from original source (printed book) with Finereader (often source is printed with blackletter, as we specialize in 19th and early 20th century literature in danish - so we use the training ability of Finereader a lot, since there are many variations in the blackletter fonts used back then!)
  • Saving result as txt and pdf.
  • Load textfile into Notetab and processing it with a clip-program, that removes most of the common OCR-errors etc., then proofreading onscreen in Notetab and pdfreader side by side, modernising archaic language and inserting HTML formatting while reading.
  • Copy proofread text to Word for spell- and grammarchecking, afterwards copy text back to Notetab and processing with another Notetab-clip to produce xhtml source + css.
  • Check xhtml in The W3C Markup Validation Service, then:
  • Load file into Sigil (the xhtml coding holds codes for setting the necessary metadata automagically and importing frontpage and other images) and parting the file into chunks of max 250 kb. by hand.
  • Save as epub, and validate in Sigil and ePubCheck.
Finis!

(Seems like a lot of work, but there aren't really any shortcuts available, if you want to produce output of a reasonable quality ... the longest part of the process is of course the proofreading - and we like reading here )

Regards,

Kim
elibrarian is offline   Reply With Quote
Old 07-18-2012, 04:25 PM   #7
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,004
Karma: 10381859
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by Toxaris View Post
I just do it in Word. I convert the Word document to HTML with my macro. I do not use the save as (filtered) HTML in Word. I import that in Sigil and I continue there.

I have also done quite some tests with DocToHTML and that produces rather clean code and you can immediately convert it to XHTML on the go. It can also create a stylesheet.

Others use OpenOffice with the Writer2Epub add-on.
Tox: I don't think I've tried DocToHMTL--you say you're getting clean code with that? Better than your own macro?

You know me...always trolling the software, like a great white shark, looking for easier ways to do stuff...

H.
Hitch is offline   Reply With Quote
Old 07-19-2012, 02:32 AM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,737
Karma: 2117255
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
It is rather clean, yes. It retains styles in a separate stylesheet (or internal, whatever you want) and has many blows and whistles to tune it to your liking. For example, you can specify to retain italic in the stylesheet, but ignore color usage in the styles.
It is relatively fast and does not rely on Word inner HTML conversion engine. Some things I like better in my macro, others better in the program.
The creator is also very helpful. I had a request that I could specify the styles to retain and convert all others to standard paragraph and it is on the To-Do list. I found a small bug and it was solved the next day.
One other thing that might interest you, is the option to run various RegEx after the conversion as part of the process. I use it to convert placeholders at the latest option and also to clean something up.

If you want I can convert a document for you so you can see the result. You know how to reach me.
Toxaris is offline   Reply With Quote
Old 07-19-2012, 04:50 AM   #9
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,004
Karma: 10381859
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Thanks, Tox! I'll take a look at it.

Hitch
Hitch is offline   Reply With Quote
Old 07-20-2012, 02:37 AM   #10
bobcdy
Fanatic
bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.bobcdy ought to be getting tired of karma fortunes by now.
 
bobcdy's Avatar
 
Posts: 509
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
Quote:
Originally Posted by elibrarian View Post
Oh well, forgot the question of our workflow, here goes:

[LIST][*]Scanning from original source (printed book) with Finereader (often source is printed with blackletter, as we specialize in 19th and early 20th century literature in danish - so we use the training ability of Finereader a lot, since there are many variations in the blackletter fonts used back then!)
Kim
In the past I've tried (and tried and tried) to train FineReader 10 to handle old English (with ligatures) texts, and I concluded from my efforts that FR10 was a very slow (no?) learner. Eventually I gave up and just used MS word auto-correct options, eventually reaching a fully corrected version of the text - but this was very time consuming because FR10/11 is confused by the ligatures; for example, it would interpret the c-t ligature (probably the most difficult for it) in many different ways.

Finally I purchased the upgraded FR XIX Fraktur edition (Recognition Server v3) when ABBYY greatly reduced its price. Still pretty expensive but certainly worth the price if one works with 18th/early 19th century books. What a difference! RS3 reduced correction/proofing time by many days per book!
bobcdy is offline   Reply With Quote
Old 07-20-2012, 03:12 AM   #11
elibrarian
Connoisseur
elibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to beholdelibrarian is a splendid one to behold
 
elibrarian's Avatar
 
Posts: 87
Karma: 19700
Join Date: Dec 2011
Location: Ĝlstykke, Denmark
Device: Sony PRS-T1, iPad
Quote:
Originally Posted by bobcdy View Post
In the past I've tried (and tried and tried) to train FineReader 10 to handle old English (with ligatures) texts, and I concluded from my efforts that FR10 was a very slow (no?) learner. Eventually I gave up and just used MS word auto-correct options, eventually reaching a fully corrected version of the text - but this was very time consuming because FR10/11 is confused by the ligatures; for example, it would interpret the c-t ligature (probably the most difficult for it) in many different ways.

Finally I purchased the upgraded FR XIX Fraktur edition (Recognition Server v3) when ABBYY greatly reduced its price. Still pretty expensive but certainly worth the price if one works with 18th/early 19th century books. What a difference! RS3 reduced correction/proofing time by many days per book!
Well, some of our "blackletter libraries" in Finereader (I have about twenty for different font varieties) started life in Finereader 6.0 ten years ago, and even if we can't get 99.9 correctness (those "long" s's and h, and k,l, and t will probably never be anything but guesses), they are by now pretty good at what the do. The Finereader Fraktur edition's price is still pretty stiff for a small-time publisher as we are. Our latest project is a danish edition of Walter Scott's "Old Mortality" from 1870 - at almost 600 pages the cheapest FR XIX licence would only last 4 volumes ... Sometimes we also have to work with bleak photocopies of magazine pages from the early 19th century, which I doubt even the Fraktur Edition would make much of.

It's funny how the human eye and brain is able to fill out the voids in such document, so we actually can get something legible out of it, isn't it?

regards,

Kim
elibrarian is offline   Reply With Quote
Old 07-27-2012, 09:36 AM   #12
J.C. Hendee
Author
J.C. Hendee began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jul 2012
Device: Samsung Galaxy Tab 7+
Libre to Sigil to Vendor

I take an MSWord or any other format manuscript and open it LibreOffice. All additional needed content for the actual "ebook" is added and positioned... though not cover or sometimes even other graphics. Styles are cleaned up and streamlined.

Using the plugin "Writer2xhtml", I export the prepared ebook straight to EPUB format. This is then opened up in Sigil for cover addition, tweaking, etc.

I have used multiple layout programs, and things like DreamWeaver, Komodo Edit, etc. etc. I own InDesign and a full Adobe suite and three other packages for layout of traditional publications with export to EPUB as well. I own multiple office suites...

The work flow above is the leanest, quickest, cleanest way to produce purely an ebook and nothing else if that is what you are after. And it's nearly glitch free. I rarely see anything go wrong during validation unless I've personally tried to do something odd. I would not do a pure ebook release any other way, especially since I don't us aggregators and only distribute through direct publishing portals such as KDP, PubIt, WritingLife, etc. And only once as one epub run into a problem due to a slightly non-standard cover image.
J.C. Hendee is offline   Reply With Quote
Old 08-03-2012, 12:58 AM   #13
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,314
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Quote:
Originally Posted by J.C. Hendee View Post
I take an MSWord or any other format manuscript and open it LibreOffice. All additional needed content for the actual "ebook" is added and positioned... though not cover or sometimes even other graphics. Styles are cleaned up and streamlined.

Using the plugin "Writer2xhtml", I export the prepared ebook straight to EPUB format. This is then opened up in Sigil for cover addition, tweaking, etc.

I have used multiple layout programs, and things like DreamWeaver, Komodo Edit, etc. etc. I own InDesign and a full Adobe suite and three other packages for layout of traditional publications with export to EPUB as well. I own multiple office suites...

The work flow above is the leanest, quickest, cleanest way to produce purely an ebook and nothing else if that is what you are after. And it's nearly glitch free. I rarely see anything go wrong during validation unless I've personally tried to do something odd. I would not do a pure ebook release any other way, especially since I don't us aggregators and only distribute through direct publishing portals such as KDP, PubIt, WritingLife, etc. And only once as one epub run into a problem due to a slightly non-standard cover image.
I fully agree. Exactly same experience with writer2xhtml, in French language, though I proceed directly from Apache OpenOffice.

I complete writer2xhtml with Sigil mainly to add dropcaps and a multiple decoration. Here is the latest example EPUB.
roger64 is online now   Reply With Quote
Old 08-03-2012, 09:38 PM   #14
FatDog
Witless protection Agent
FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.
 
Posts: 265
Karma: 1002898
Join Date: Nov 2009
Location: Los Angeles
Device: Kindle
Here is something to add to your toolkit: JEdit

This is a free programmers editor. It is outstanding with text.

But the best part is is the macro language.

You can record temporary macros and it shows you all the commands it created. You can then use this as the basis for a more permanent macro.

You do NOT want to use this to create content, but for sucking in .txt or .html files and cleaning them up or doing regular expression search or replaces - it works great.

I take text files and re-format the paragraphs, then add <p> tags.

Then another macro searches for chapter breaks and converts them to <div id='chapter'> tags. Etc.
FatDog is offline   Reply With Quote
Old 08-04-2012, 01:25 PM   #15
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
I just use plaintext markup and text2tags to output xhtml. I have a little wiki-like app for keeping it all together and providing a Kate kpart based interface. Might release it in the future, it's pretty similar to old Sigil in some ways.

A few modifications to the markup and generation to make things nicer. Output is pretty much just thrown into Sigil to be compiled into an epub, very little editing needs to be done unless different styles need to be applied
Serpentine is offline   Reply With Quote
Reply

Tags
html, tool, word processors

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Atlantis Word Processor librarianchat Calibre 3 12-15-2011 01:38 AM
Word Processor? Joefitch Kindle Developer's Corner 1 08-05-2011 10:53 AM
eink word processor? Giggleton General Discussions 11 02-20-2011 06:20 PM
Android Word processor ivan enTourage Archive 3 02-07-2011 09:14 AM
Keyboard and Word Processor Devlar iRex 2 06-11-2007 03:43 AM


All times are GMT -4. The time now is 10:48 AM.


MobileRead.com is a privately owned, operated and funded community.