Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-08-2013, 04:38 AM   #1
ecbritz
Book Concocter
ecbritz began at the beginning.
 
ecbritz's Avatar
 
Posts: 59
Karma: 10
Join Date: Jun 2010
Location: China
Device: Sony Reader
How do I clean up WordPerfect-published HTML files?

I have just started learning to use Sigil and find myself in unknown territory. On the first pages of the Tutorials, I learned that HTML files published by MS Word must be filtered or cleaned up to remove unnecessary or undesirable HTLM tags. A special macro will prepare Word-published HTML files for Sigil, I'm told.

I use WordPerfect X6 (the latest version of WordPerfect), not MS Word. When I publish to HTML with WP X6, I have to choose between (i.e. check or uncheck) "Publish Comments", "Launch Browser" or "Plain HTML". When I choose (check) "Publish Comments", the HTML file published looks OK when opened with Sigil. When I choose (check) "Plain HTML", the file published does open in Sigil, but it seems to have lost much of its formatting.

How should I use Sigil in conjunction with WordPerfect X6? Is it necessary the filter or clean up WordPerfect-published HTML files? How should I do that? And is the "Publish Comments" option indeed the publish-to-HTML option I should use in WordPerfect X6?
ecbritz is offline   Reply With Quote
Old 04-08-2013, 06:16 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,873
Karma: 2809711
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
The last time I worked with WordPerfect was with release 6.0. To me 'plain HTML' sounds the most logical, but hey, that's me.

Can you perhaps post a small example of the two different export methods? Based on these examples we can see how 'clean' it is.
Toxaris is online now   Reply With Quote
Old 04-08-2013, 06:38 AM   #3
exaltedwombat
Evangelist
exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.
 
Posts: 433
Karma: 1703930
Join Date: Nov 2011
Device: none
The other approach is to export plain text, then add the minimum necessary formatting in Sigil.
exaltedwombat is offline   Reply With Quote
Old 04-08-2013, 06:56 AM   #4
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,943
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
You don't need to Launch Browser, or include comments, unless they are a part of what you want to work on. Don't check Plain HTML either. The output is fairly simple and no problem for Sigil to cope with.

Wordperfect will put style information at the top of the document which you can leave there or move to a stylesheet which you can link to the document. Any images will need to be positioned by you in Sigil, though they will be there.

This is my experience with WP X5.
mrmikel is offline   Reply With Quote
Old 04-08-2013, 05:27 PM   #5
ecbritz
Book Concocter
ecbritz began at the beginning.
 
ecbritz's Avatar
 
Posts: 59
Karma: 10
Join Date: Jun 2010
Location: China
Device: Sony Reader
Thanks for the appreciated advice.

In response to the request to attach an example of a html file published with WordPerfect X6, I attach a zip file with such an example inside. It's the first page of a file called "Sigil Instructions" I am trying to write with WordPerfect -and then convert to HTML and finally to ePub with Sigil - for my personal use.

I added a photo and a footnote with WordPerfect. I also kept "Publish Comments" switched on when I did File > Publish to > HTML in WordPerfect - just to add possible "unclean" html tags.

My first question is: Does this particular html file need special filtering or cleaning before being imported into Sigil? Or is it clean enough as is? If it needs cleaning, how do I clean it?

A second question. The footnote I added with WordPerfect can be seen after the words "3. Prepare a HTML file for Sigil". However, this footnote, indicated in the html file by a call icon (also when imported into Sigil) cannot be viewed and read. How should I place this footnote and make it readable in the ePub book I am trying to create with Sigil?
Attached Files
File Type: zip SIGIL INSTRUCTIONS.zip (2.6 KB, 18 views)
ecbritz is offline   Reply With Quote
Old 04-08-2013, 06:04 PM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,293
Karma: 5495472
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
All those script sections, inline styles .

Since it is for you own use... do you care if it is not really code?
theducks is offline   Reply With Quote
Old 04-08-2013, 07:02 PM   #7
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,943
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Why the javascript? If you are working something for e-ink readers, they can't use it.

Fancy boxes can be displayed on some, but not all readers.

WP will generate relatively useful code, but not if you load it down with tons of styles and javascript.

You just can't control the placement of everything in e-readers as you can, and perhaps should, on paper. Tables are problematic and have trouble keeping useable size if they contain much at all.

If it is for yourself, for a particular device, you can do anything so long as it works for you. Inline styles work, but they can be a real bugbear if you decide you want to change, for example, how the names of ships or aircraft should be displayed. Italic, bold, big small, all easily done with css, where you change it one time. The way this is structured you will be editing it forever if it needs any changes.

Many readers only display shades of gray and some will be unusable as text colors.

If you have a flurry of existing documents in WordPerfect, you can save them as HTML, but you will be using regular expressions to pull out all the superfluous stuff. I didn't try plain HTML because the sample I pulled up didn't seem to need it. But that may be the best course of you if you want to avoid a lot of searching and replacing. Put in the emphasis yourself, change margins yourself in Sigil to get the effect that is possible in this limited format.

If you've got to have fixed text and spacing, then WP will easily spit out pdfs which many of the readers read. Just make the page size conform to a typical reader as they have done at feedbooks.com.

Footnotes are probably going to need to be constructed in Sigil as endnotes to each chapter to avoid jumping out of the document, which can slow things down. It is easily done in Sigil, but you will have to link to them and link back from them, since you should not assume every reader has a back button, nor that they should go back to the exact place they came from.

In short, it will not translate like you want it to translate. But then that is why Sigil exists.

You can download a epub from the MR library here and examine its stylesheet either by unzipping or opening it in Sigil. You can copy it to your hard drive and alter it to suit your needs. Then paragraph style, for instance, can be controlled by the stylesheet, after you eliminate all of the stuff following <p by regular expressions.
mrmikel is offline   Reply With Quote
Old 04-09-2013, 05:04 AM   #8
ecbritz
Book Concocter
ecbritz began at the beginning.
 
ecbritz's Avatar
 
Posts: 59
Karma: 10
Join Date: Jun 2010
Location: China
Device: Sony Reader
Thanks for the comments, I will certainly study them closely.

We seem to have a misunderstanding regarding my knowledge of Sigil and HTML files. I know nothing of either. I am a Junior Forum Member in the Kindergarten phase. The "script" or "JavaScript" codes or tags you discovered in the HTML file I produced by doing File > Publish to > HTML in WordPerfect X6, have no meaning for me, due to my ignorance at this point in time.

What I want to achieve, is to learn to use Sigil properly, step by step, so that in the end, I will be able to produce a well-formatted eBook, suitable for all or most e-Readers. I would like to take the second step in my learning curve - namely learning Sigil formatting and editing techniques - after getting over a problem encountered on taking the first step.

The first step I took, was to start reading the first, initial Sigil tutorial. What I learnt on the first tutorial pages, I noted down. These simple notes, filling about one page, I published as a HTML file with WordPerfect. This page I attached here, in my previous posting, so you could see how clean or unclean my WordPerfect HTML publication was.

Before I had reached the end of the first tutorial and the first page of my personal "Sigil Instructions" document, I hit the first snag, the first obscurity or problem.

The tutorial instructed me to import a "cleaned up" HTML file into Sigil, using a cleaning macro available for MS Word-published HTML's. Huh? What? I asked with my newbie cluelessness. But I'm using WordPerfect X6! How am I going to "clean up" my WordPerfect-published "Sigil Instructions" document?

At this point - where I am still stuck - I turned to the Sigil Forum. By some magic it turned out that I was already a member. I must have joined when I bought a Sony Reader in 2010. The Sony Reader with its dark screen was no good, so I never used the Forum.

What I need to know at this point, is how best to import a clean-enough HTML file produced with WordPerfect X6. Once I have imported Sigil contents which are suited to Sigil and workable as basis for a good and proper e-Book, I can progress to step two. Step two, I suspect, will be instructions on how to format and edit the contents imported into Sigil.

I hope I have made my dilemma clear now. Just help me to overcome or bypass the obstacle I encountered on taking the first step please. Then I can continue writing the previously attached document, as I progress to step two of learning to use Sigil.
ecbritz is offline   Reply With Quote
Old 04-09-2013, 05:25 AM   #9
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,871
Karma: 4630362
Join Date: Dec 2010
Device: Kindle PW2
If the HTML output generated by WordPerfect is causing problems, you could also try and save your document as an .rtf file, convert it to ePub with Calibre and fine-tune the .ePub file with Sigil.

Alternatively, if you didn't use any fancy formatting in your book, you could simply select all text in WP (CTRL+A), copy it to the clipboard (CTRL+C) open a new Sigil project and paste the Text in the default window (CTRL+V).

Then search for the chapter headings, highlight them and click the h3 button on the Sigil toolbar. When you're done, press CTRL+T to generate the TOC and you're almost done.

Last edited by Doitsu; 04-09-2013 at 05:28 AM.
Doitsu is offline   Reply With Quote
Old 04-09-2013, 05:59 AM   #10
exaltedwombat
Evangelist
exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.
 
Posts: 433
Karma: 1703930
Join Date: Nov 2011
Device: none
If you're writing a book from scratch, just write plain text in a basic text editor - Wordpad maybe. Don't attempt ANY formatting at this stage. Or write directly into Sigil.

Then add a simple Header text style for chapter headings. Learn enough css to make paragraphs look the way you want - spacing, indent etc.

CREATING an eBook is easy. Putting in too much complicated layout in a WP program then trying to CONVERT it is a nightmare.

Printed books are about page layout. eBooks are about pouring text into a container.
exaltedwombat is offline   Reply With Quote
Old 04-09-2013, 07:59 AM   #11
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,943
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
I have experimented around a little this morning in X5. Publish to HTML is intended for web documents. Even in Plain HTML it throws in extra stuff that a browser will ignore but Sigil and an ereader will not.

If you want to stay in WP because it is familiar, then Doitsu's suggestion of copying in WP and pasting into book view works just fine.

You can just try typing into Sigil doing very much what you do in WP. If Sigil's functions seem limited, it is what you will be ultimately limited to anyway. Code view is like reveal codes for WP, though not so easy to decipher. You can find out about HTML at http://www.w3schools.com/html/default.asp

There will be some items, like footnotes, that will be copied, but linking them to and from the text you will have to do in Sigil. The anchor button puts in a place you can go to, and the chain link a place you can click to get there. You will have to keep track of the names yourself, but since you get to name them, you can make them something easy for you to remember. (Just don't start the name of either with a number..it is a no no.)

If you have images, you will be better off putting them in Sigil. You can add additional files, the + button, and bring them all into your document at once. Then put the cursor where you want, then select edit insert file, which will bring up a box where you can select which file you want to insert.

If it seems a lot of work, it is, if you want to do anything complicated. Otherwise, if it is just something like a novel, just text, you can just type away in WP or Sigil and the primary thing that is a good idea to do is press control enter at the period at the end of each chapter. This will create a separate section for each chapter, which makes it possible to use with older readers and faster for the rest. It also limits the damage you can do in editing while finding and replacing if you confine it to current html file.
mrmikel is offline   Reply With Quote
Old 04-09-2013, 10:36 AM   #12
ecbritz
Book Concocter
ecbritz began at the beginning.
 
ecbritz's Avatar
 
Posts: 59
Karma: 10
Join Date: Jun 2010
Location: China
Device: Sony Reader
Thanks for all the advice, I really appreciate it.

The answer to my original question is then that I cannot effectively filter or clean up a HTML file published by doing File > Publish To > HTML in WordPerfect. I must bypass the instruction in the Sigil tutorial to import a HTML file derived from a word processor (after cleaning this HTML file up).

In stead of importing a HTML file created with WordPerfect, I could simply copy text from a WordPerfect document and paste it into the Sigil editing window. Do you mean copy and paste the text in its WordPerfect-formatted state? Or should the WordPerfect document first be saved as a .txt file, and this .txt file then be opened with Wordpad, before the text is copied and pasted into Sigil?

I don’t understand “Learn enough css to make paragraphs look the way you want”. What is meant by “css”?

I will now bypass the instruction in the Sigil tutorial, read further, study your advice, and see how it goes. Thanks again for helping me so well.
ecbritz is offline   Reply With Quote
Old 04-09-2013, 11:41 AM   #13
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,943
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Yup, do your thing in Wordperfect and then select all, copy, go to Sigil new document, and paste. You can save as a text file, but it might end up throwing in carriage returns which might appear as breaks at the end of each line or each line is a paragraph.

CSS-Cascading Style Sheet. What is means is that instead of putting in styling information every time you want to change the text from standard paragraph text, you use a stylesheet in which you define what you want to happen every time you use that class.

Say you are writing about cities of the world. You want them to be bold and and italicized in the document. You can put both of these into the text every time you want them to be that way. Or you could define a class of .cities in a stylesheet. In the stylesheet you define what .cities means. The advantage of this is if you decide they should be underlined or whatever, you simply change the definition in the stylesheet and presto, everyplace where you have called forth that class in your document is changed.

Otherwise you have to go through your document and change every single one. You can try doing it by search and replace, but you might change something you did not intend, so you probably end up doing it one by blinking one.

A stylesheet can look like this:
.capt {margin-left:30%; margin-right:30%; font-size:80%; text-align:left;text-indent:0px;}
h2, h3, h4 {text-align:center;}
.quote {margin-left:5%; margin-right:5%;text-indent:0px;}
p
{
text-indent:30px;
}

All paragraphs are indented, headers 2-4 are centered and captions have the margins brought in from both sides are centered and have smaller text, and quotes have margins brought in from both sides so as to stand out from the surrounding text.

To make a quote paragraph it would start like this:

<p class="quote">This is a quote paragraph</p>

If you don't define a class you have to put in the information that is shown in the stylesheet into the document every time you want it.

There is info on CSS at:
http://www.w3schools.com/css/default.asp
but not all css for web pages works in epubs. That is why it is useful to check your work from time to time in your reader.

Last edited by mrmikel; 04-09-2013 at 01:14 PM.
mrmikel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Word macro for clean HTML code Toxaris ePub 132 05-23-2014 05:32 AM
Converting to a Clean html File jackibar Sigil 18 02-07-2013 06:14 PM
Clean and compress HTML before making ebook eping Workshop 4 01-13-2010 07:51 PM
Best way to get clean HTML JSWolf Kindle Formats 18 04-02-2009 11:00 AM
Docvert 2.0 converts MS Word files to clean HTML Alexander Turcic Lounge 0 03-16-2006 04:50 AM


All times are GMT -4. The time now is 04:43 AM.


MobileRead.com is a privately owned, operated and funded community.