View Single Post
Old 05-04-2011, 02:32 PM   #22
Faster
Connoisseur
Faster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of light
 
Posts: 60
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
Style

Styles from Word to Sigil

Method:
Mark styles with pseudo-tags and save as text Unicode (UTF-8).
Then open in Sigil and convert pseudo-tags to proper CSS tags.

Realize that there are Character styles applied to a few words in a paragraph and Paragraph styles applied to whole paragraphs.
When you transfer a PDF file to Word Character formatting is preserved (fonts, sizes, style, weight), but Paragraph formatting (indents, centering, spacing) is lost; that's if there ever was any in the PDF. I suspect not!
-------------
IN WORD
-------------
Note: in this section Wildcards can be left unchecked.

Using tags for Character styles
Italic style
Put i_ before italic word(s) and _i after.
Find/Replace:
Code:
Find:	blank	More > Format > Font > italic
Repl:	i_^&_i
Regarding bold style, be careful, if the only bold sections are the chapter headings they are better fixed with heading tags (see later). Only do this if there are lots of essential bolded words in the body of the story.
Code:
Find:	blank	More > Format > Font > bold
Repl:	b_^&_b
It's a good idea to click 'No formatting' before starting each Find/Replace.

If you have tried tagging 'bold' and it has tagged the chapter headings you would need to modify the expressions given below ***.

You could create your own tags for other character styles.

Using pseudo-tags for chapter headings with Paragraph styles
Let's suppose that you've managed to convert all the PDF chapter headings into Word's 'Heading 2' paragraph style.

How? See *** at end of this post.

These heading styles will need to be tagged before you convert the lot to a text file. Here's how:
Code:
Find:	blank	More > Format > Style > Heading 2
Repl:	h2_^&
Note: There is no tag '_h2' at the end in the Replace box because this is a complete paragraph and we'll use the </p> tag in Sigil.

You can create your own tags for other Headings (h1_, h3_) if you've used these headings and their numbers warrant a global Find/Replace.

If any of your lines have indents, the indent could be replaced with multiple spaces in the text file.
To avoid this:
Format menu > Paragraph
Set all the indents to 'none' or zero.

Now it's time to save your file and you'll be saving it as a text file with special encoding.

File menu > Save As
Give the file a name.
Use the drop down menu of 'Save as type:' and select 'Plain Text (*.txt)'.
A dialogue, 'File Conversion' pops up.
Click 'Other encoding:'
Select 'Unicode (UTF-8)' and click 'OK'.

-----------
IN SIGIL
-----------
Open the file in Sigil

Add a stylesheet.
(Use the one below until you've developed your own.)

Copy into notepad and save as 'stylesheet.css'.
Code:
@namespace h "http://www.w3.org/1999/xhtml";
@page {
    margin-top: 12pt;
    margin-bottom: 1pt
    }
body {
    margin-left: 1%;
    margin-right: 1%
    }
p { 
    margin: 0;
    text-indent: 1em;
    font-family: "Times New Roman",serif;
    }
h1, h2 {
    margin: 0;
    text-align: center;
    }
.italic {
    font-style: italic;
    }
.bold {
    font-weight: bold
    }
.image { 
     margin-top: 1em; 
     margin-bottom: 1em; 
     margin-left: 1.2em; 
     text-align: center; 
     max-height: 100% 
    }
blockquote {
    font-family: Arial, sans-serif;
    font-style: italic;
    }
In Sigil right click the Styles folder.
Select 'Add existing items...' and locate the file 'stylesheet.css' you've just created.

Double click 'Section0001.xhtml' to open it.
Go into Code View.
Between the <head> </head> tags put this link to your stylesheet.
Code:
<link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />
Now to replace the _tags with CSS tags.
All these Find/Replaces are done in Code View.
CTRL H for Find/Replace
Check use Regex.

For Character styles use a Find/Replace like this:
Code:
Find:	i_([^_]*)_i
Repl:	<span class="italic">\1</span>

Find:	b_([^_]*)_b
Repl:	<span class="bold">\1</span>
Note: Finding italic tags when you have nested bold tags inside italic tags (i_b_some text_b_i) doesn't work.
Always Find/Replace the inner tags first.

For Paragraph styles use this type of Find/Replace to replace any 'h(digit)_' headings with <h1><h2> etc:
Code:
Find:	<p>h(\d)_([^</]*)</p>
Repl:	<h\1>\2</h\1>
*** Making Chapter headings in Word:
(You don't have to do this in Word. I've written a post in the Sigil section named 'Regex' which explains how to do this in Sigil.)
Code:
http://www.mobileread.com/forums/showthread.php?t=130763
But if you want to do it in Word...
There are two ways to do this. The first method relies on the transferred PDF chapter headings having a distinguishing character style. The second uses Find/Replace based on text content.

Using STYLE to alter Chapter headings to <h2>
Format menu > Styles and Formatting
Click in one of the chapter headings in the document.
Find the style in the Formatting panel by looking for a blue outline box.
Move the cursor to the right-hand side end of the box. A drop down menu arrow appears.
Click the arrow and choose 'Select All x Instance(s)'. The nearer the x is to the number of chapters the better, but don' forget 'Prologue' and 'Epilogue'.
With these multiple selections all you need to do is click 'Heading 2' once. Done.
If there are headings like 'Part One', 'Part Two' you can use the same method replacing with 'Heading 1'.

Using Find/Replace to alter Chapter headings to <h2>
For these searches you must check 'Use wildcards'.
We will find chapter headings and bracket them with the pseudo-tags h2_ ... _h2

1) The line starts with the word 'Chapter', 'CHAPTER' or 'chapter':
Code:
Find:	(^13)([Cc][Hh][Aa][Pp][Tt][!^13]@)(^13)
Repl:	\1h2_\2_h2\3
2) Chapter is labelled by DIGITS only (example '35'):
Code:
Find:	(^13)([0-9]{1,2})(^13)
Repl:	\1h2_\2_h2\3
Note: this assumes that there are less than 100 chapters and avoids lines like '1943'. If there are more than 99 chapters!!! then change {1,2} to {1,3}

3) Chapter is headed by NUMBERS in WORDs (possibly hyphenated) only (example 'Forty-five'):
Code:
Find:	(^13)([A-z-]@)(^13)
Repl:	\1h2_\2_h2\3
Note: be aware that any other single word paragraphs will also be found and made Heading 2.

In Sigil:
At this stage you have a single HTML file. If you have created <h2> tags for Chapters you can use the following to split on chapter headings <h2>:
In Code View.
Code:
Find:	(<h2)
Repl:	<hr class="sigilChapterBreak" />\1
Put the focus back onto the page.
Now press F6
Faster is offline   Reply With Quote