View Full Version : PDF to Epub Workshop


Faster
04-28-2011, 04:55 PM
PDF to Epub workshop

The process consists of
transferring
editing
styling
assembling

In this first post I'll deal with transferring and editing.

MS Word is used because it can reveal hidden characters, retains some of the PDF's useful formatting and it has powerful styling capabilities.
Final assembly is in Sigil.
This post deals only with converting novels, ie all text apart from a cover, maybe a map and possibly a little chapter decoration.

----------------------------
TRANSFERRING
----------------------------
Open the PDF in Adobe Reader
Take a careful look at it. If it is so riddled with errors it may not be worth bothering with. Is there a better format or better version available [edited]? Realize that the intensive proof reading of a novel will often spoil the later reading experience for you.
Open MS Word (mine is Office 2003 because I hate the ribbon in Office 2007. It sits unused on another computer!)
In Adobe Reader from the Edit Menu select 'Copy File to Clipboard' (Alt-E, Alt-B)
Note: nothing may happen at this stage!
Switch focus to MS Word. Have a new document open in Web Layout. From Edit Menu select 'Paste' (CTRL-V)
Note: depending on the speed of your computer, you should see a progress bar in the bottom right of Adobe Reader and screen refresh may be slow.

Eventually you should get a copy of the PDF in document format in Word.
Some formatting will have carried over. In a moment we'll use variations in format to delete bits we don't want - such as page titles and numbers which are senseless junk on an ebook reader.

If all you see are a whole bunch of squares you won't be able to copy the PDF (without some stuff I'm not dealing with) so you'll need to either read the PDF on your computer or find an alternative format of the book.
---------------------------------------
DELETE UNWANTED PAGES
---------------------------------------
First delete any sections you don't want such as 'Also by This Author', TOC, Acknowledgements, Dedications, teasers, 'About the Author', 'About the Publisher' - in fact any parts you don't want to read later. You'll still have the PDF to read these.
----------------------------
FIND AND REPLACE
----------------------------
*Until you trust the expressions presented here use a copy of your file.
*Don't jump straight into 'Replace All', try a few single Replaces first
*If a dialogue says '0' replacements it's probably because you haven't checked or unchecked 'Use wildcards' when required, or you haven't cleared the formatting from a previous search. (Click 'No formatting' box under 'More'.)
*Get into the habit of using ^13 in the Find box and ^p in the Replace box. There are important reasons for this.
*Get used to working with hidden characters showing.
--------------
THE JUNK
--------------
Removing Amber/Nova/file junk:

Use F/R and leave Replace blank
Check 'Use wildcards'
Find: Generated by ABC *^13
Find: ABC Amber *^13
Find: Create PDF *^13
Find: file:*^13
-------------------------------------------------------------------------
HIDDEN CHARACTERS THAT CAN CAUSE PROBLEMS
-------------------------------------------------------------------------
Click the pilcrow (reversed P shape in toolbar) to see all the hidden characters.
At this stage, if paragraph characters occur at the end of each line of text instead of only at the end of paragraphs IGNORE them. If you make corrections now dealing with page numbers will be difficult. They will become embedded inside paragraphs and you don't want that.

First:
a) Look for optional hyphens (little lines inside words)
b) Tabs (little arrows)
c) A space (small dot) followed by a paragraph character (pilcrow)
d) Manual line breaks (bent arrows) (Inserted with SHIFT-ENTER)
e) Non-breaking space (small circle) (Inserted with CTRL-SHIFT SPACE

Uncheck 'Use wildcards'
Removal (a) Find: ^- Replace: blank
(b) Find: ^t Replace: blank
(c) Find: <space>^13 Replace: ^p (for <space> hit the spacebar)
(d) Find: ^| Replace: ^p (the char after ^ is the 'pipe' found with shift backslash '\'.
(e) Find: ^s Replace: <space> (<space> means press space bar once)

Codes for these hidden characters can be inserted in the Find/Replace dialogue by clicking 'More' - 'Special' and selecting what you need.
-----------------------
PAGE NUMBERS
-----------------------
Now use the formatting to remove repetitive items like Author, Book title, Chapter title, Page number.
Here's how:
Click on a sample piece of the text/number you want to remove. Check the formatting toolbar to see if the text has some distinguishing characteristic such as size or font.
It may be necessary to click the AA icon on the left of this toolbar to open a panel where there's more information. The format of the text where the cursor is will be identified by having a box around it and hovering the arrow over it brings up format info.
If you find something unique then put the cursor in the Find box in the Find/Replace dialogue and click More - Format. Set the format to be found.
Leave the Find and Replace box empty and Replace All.
(Remember to click 'No Formatting' before doing further searches!)

If there was no unique format for the numbers/authors/title we need plan B!
Here are some common page numberings:- (where x = page number)

(a) Author-x and x-Title OR Title-x and x-Author.
(b) Page x of y where y is the total pages
(c) Page x
(d) {x} where {} is some simple form of decoration
(e) x

(a) Author-x and x-Title OR Title-x and x-Author
You won't be able to deal with all of them in one scan because of Word's limited F/R compared to Regex.

Find a line that that has the authors name and page number. Copy and paste this line into the Find box. Change the digit to [0-9]@
and modify the line to look like (A) or (B) according to whether the number is before or after the text.
Click 'More' and check 'Use wildcards'.
(A)
Find: (^13)the title or author*[0-9]{1,3}*^13
Repl: \1
(B)
Find: (^13)[0-9]{1,3}*the title or author*^13
Repl: \1

Notes:
If you've copied and pasted the title or author, deselect this before searching.
[0-9] means any digit in the range 0-9 and @ means 1 or more.
* means zero or more of anything (always use with care. Have a back-up.)
^13 means a paragraph character (use only in Find box, never in Replace box)
\1 means the item captured by the expression in the first set of round brackets.
We are the restoring the first found paragraph character because this contains formatting information about its preceding paragraph.

Repeat for the way the other page is done.

Sometimes title/author, page number sneaks onto the end of a text paragraph. It's a good idea to search (Find Next) for both the author name and the title to check this out.
If the title/author page number is not in it's own paragraph, use the following F/Rs:

Notes:
IT IS IMPORTANT TO DO THESE IN THE ORDER SHOWN
Use REPLACE not REPLACE ALL as use of * CAN SELECT TOO MUCH.
Where possible replace *s with actual text.
If there is a character that won't paste into the Find box use the '?' for any character.
In the expressions below paste the actual title or author.

FIRST
Check 'Use wildcards'
Find: the title/author[!^13]@[0-9]{1,3}*^13
Repl: blank or space Check with Find Next to see what is required
- it depends whether you've selected 'title/author' with a leading space
THEN
Find: [0-9]{1,3}[!^13]@the title/author*^13
Repl: blank or space Check with Find Next to see what is required

(b) Page x of y

Check 'Use wildcards'
Find: (^13)[Pp][Aa][Gg][Ee] [0-9]@ of [0-9]@^13
Repl: \1
Notes:
Finds all forms of capitalisation of 'page'
There are three spaces in the expression - one after [Ee] and two around " of ".

(c) Page x

Check 'Use wildcards'
Find: (^13)[Pp][Aa][Gg][Ee] [0-9]@^13
Repl: \1

(d) {x}
Check 'Use wildcards'
Find: (^13)[\decorative character ]@[0-9]@[\decorative character ]@^13
Repl: \1
Example: (^13)[\{\} ]@[0-9]@[\{\} ]@^13
Notes:
The backslash is to 'escape' the next character which otherwise could be interpreted as a wildcard.
The expression [\{\} ]@ means one or more of the characters '{' '}' and <space> in any order.
Replace {} with the decorative character you find in your document.

(e) x USE WITH CARE ***

Check 'Use wildcards'
Find: (^13)[0-9 ]@^13
Repl: \1
Notes: A space before or after the number or between digits is allowed for here by using [0-9<space>].

*** If you're retaining the TOC this Find/Replace could play havoc with it.
The solution is to get the Find/Replace ready then select the TOC and Cut it using Ctrl X.
Do the F/R then Paste the TOC back. Or selecta Find format that excludes the TOC's format.
*** If chapters are headed only by digits you will need to avoid deleting those.
Often you can do this by specifying a format in the Find criteria (eg Size 10 or Not Bold) which will exclude the chapter numbers but include the page numbers.
----------------------------
UNHAPPY RETURNS
----------------------------
Empty and Broken paragraphs
Before going any further it would be a good idea to replace straight quotes with curly quotes. They may be present but not obvious. So select a 'straight' quote and change the font to Times New Roman which displays them clearly. If it looks curly then you don't need to do anything; otherwise type "curly quotes" into the help box and follow the instructions to change straight quotes into curly quotes. Use CTRL Z to revert the quote back to its original font.

Some PDFs reflow the text and when you paste into Word a line of text will fill the available width before wrapping around to the next line. If this is the case you can disregard this section.

After you've pasted into Word and clicked the pilcrow you may see a paragraph character at the end of each and every line. You'll want to remove all of these except those marking the end of a true paragraph.

Firstly, there should not be any empty paragraphs creating blank lines. Spacing should be accomplished by using format/style.
Remove empty paragraphs:

Check 'Use wildcards'
Find: ^13{2,10}
Repl: ^p

Broken paragraphs:

First method -
When not following certain punctuation marks such as full stop, question mark and so on, the paragraph character is removed.

(A)

Check 'Use wildcards'
With cursor in Find box click More, Font, Not Bold
Find: ([!.\?:"\!”'’\)0-9])^13 Note both straight and closing curly double quotes are required
Repl: \1 That's \1 followed by a <space>
*** Afterwards with cursor in Find box click 'No formatting' ***

The main problem with this is some of the punctuation may not actually end a paragraph - yet coincidently it's at the end of a line followed by a paragraph character.
Here's an example created by the above F/R: </p> indicates a paragraph character

Even Palmer couldn’t ignore something like that. “Mwhuh?”</p>
she replied as she chewed her doughnut.</p>

'she replied ...' should be on the same line as “Mwhuh?”

So a different approach is to look for lines starting with lowercase letters:

(B)

Check 'Use wildcards'
Find: ^13([a-z]) Looks for lines starting with a lowercase letter after the paragraph character
Repl: \1 That's a <space> followed by \1

So the answer (still imperfect) is to use (A) then use (B)

Still there are problems:
(a) Chapter headings and poetry bits (epigraphs) don't have end punctuation so lose their line/paragaph characters.
Solution: Parts not to be scanned should be made bold then during F/R the Find Criteria should include Format, Font - Not Bold. Change back from Bold afterwards. Click 'No formatting' in F/R dialogue.

(b) In USA it seems the practice is to put a full stop (period) after Mr. Mrs. Ms. Dr. A coincidental paragraph character after the dot would survive and we'd get:
Mr.<p/> </p> represents a paragraph character
Smith....
(C)

Solution: After the above F/Rs, (A) and (B)

Check 'Use wildcards'
Find: ([DM][rs]{1,2}.)^13([A-Z])
Repl: \1 \2 Optional space between \1 and \2 for Dr.Smith or Dr. Smith

(c) Similarly, if by coincidence the para-character is just after a full stop then it's left there even though it's mid paragraph.
Here's an example pre F/R: </p> represents paragraph character

“Okay, now you just sound like a scary boyfriend,” May</p>
said, reaching under her T-shirt to unhook her bra. “Explain.</p>
Why am I doing this?”</p>

Following a F/R the paragraph character after 'May' is removed but the one after 'Explain.' isn't because it just happens to come after a full stop.

(D)

Solution: This will only work for curly quotes, because of a need to distinguish between opening and closing quotes.

Check 'Use wildcards'
Find: (“[!^13”]@)^13([!”]@”)
Repl: \1 \2 Space between \1 and \2
--------------------------------
MINOR CORRECTIONS
--------------------------------
You may have gone ahead with this conversion knowing that there were errors in the original PDF. Common errors are missing spaces and wrongly scanned or OCRed letters eg 'r n' becomes 'm'. There's also sometimes a problem with a PDF that uses characters, such as stylistic ligatures.

Missing Spaces
a) Missing space after a punctuation mark

Find: [!A-Z]([.,:;”\!])([A-z])
Repl: \1 \2 \1<space>\2
Notes: [!A-Z] is to avoid titles such as A.B.C. becoming A. B. C.; however Ph.D. will be split into Ph. D.
Unfortunately this also will put a space between punctuation and a closing quote;
Example: .” becomes . ” .<space>”

Find: ” That's <space>”
Repl: ” That's ” only


b) Missing space between a word in italic and a word non-italic.
You may see this: firstsecond

(This is a two part F/R)

UNCHECK 'Use wildcards'
(i)
With the cursor in the Find box, click Format, Font and select Italic.
Find: leave blank
Repl: ^& That's <space>^&<space>
If the Find button is dimmed, you haven't unchecked 'Use wildcards'.

(ii)
CHECK 'Use wildcards'
With the cursor in the Find box, click No Formatting
Find: {2} That's <space>{2}
Repl: That's <space>

Explanation: First we add a space on each side of every block of word(s) in italic.
Then we search for any double spaces and replace with a single space.

Problems: If there is a blockquote in italic (eg a poetic verse) or if a paragraph ends in italic a space will be added to the start of the following line.

Solution:

CHECK 'Use wildcards'
Find: ^13 That's (^13)<space>
Repl: \1

Spelling
Mistakes and missing spaces between regular style words
There's no easy solution to this, but remember you can right-click on a word underlined with a wavy red line and Word will suggest corrections including inserting a missing space (sometimes).
-------------------
VBA MACROS
-------------------
If you're happy with using macros most of these will be suitable. (The exceptions are where you need to enter specific text such as Author and Title. I may show you how to deal with this using an Input box in a later post if requested, but I'm not sure if this is the correct forum for that sort of thing).

In the sequence presented here put the Find and Replace data into the F/R dialogue in advance.
With the Visual Basic toolbar showing click the round red button. Accept the name. Click OK.
In the F/R dialogue click Replace All.
On the floating Visual Basic toolbar click the sqare button to stop recording.

Repeat for each F/R you want to use.

Go into the Visual Basic Editor. (ALT F11 or find the icon on the VB toolbar)
Now I'm not sure how this will open for you, so assuming you cannot see your macros this is what you do:
Go to View, click on Project Explorer.
In the Project Explorer open Modules by clicking the plus sign in a small box.
Right click New Macros and select View Code.

Each of your macros begins Sub Macro_whatever() and ends with End Sub
Leave the first Sub Macro_whatever() and leave the very last End Sub.
Remove all the other Sub Macro_s titles and End Subs inbetween to make one big sub-routine.
You can change the name by editing it -
example to Sub PdfToEpubEdit () note no spaces in name and empty ( )

When you close Word, the macro will be saved (in Normal.dot)
To run your macro in future click on the Run button (a triangle) on the VB toolbar and select your macro.

If you don't like it you can select it all in the VB Editor and delete.
Because you have recorded your macro(s) the VB uses Select. More efficient (faster) macros use Range but they have to be hand written.

In Word, Tools > Customize > Commands tag > Keyboard
Left panel find and select 'Macros'
Right panel shows macros:
Find your macro
Click on it.
Click in the box 'Press new shortcut key:'
Press a key combination, example ALT SHIFT P
This will be stored in the Normal.dot template along with your macro.
Click Assign, and close the dialogues. Try it out on a copy of a document.

-------------------------------------------------------
Still to come: Styles and the CSS in MS Word.
-------------------------------------------------------

Adjust
04-28-2011, 08:56 PM
Brilliant, Thanks for the excellent write up.

Faster
04-30-2011, 04:30 PM
Thanks for your comment Adjust; however, the lack of more responses suggests that there is little interest in the topic so I won't bother to continue with it.

DaleDe
04-30-2011, 05:15 PM
Actually it could be a great page in our wiki

Dale

Pablo
04-30-2011, 05:39 PM
Thanks for sharing your expertise!!!

GeoffC
05-01-2011, 06:52 AM
A masterful piece of work - and I agree - it should be in the Wiki .....

rakulos
05-01-2011, 09:46 AM
Brilliant summary - and this...

"Realize that the intensive proof reading of a novel will often spoil the later reading experience for you."

is all too true :(

JSWolf
05-01-2011, 09:56 AM
The OP is describing how to take a PDF downloaded illegally from the net and convert it into some other format. Do we want to really give credence to this illegal activity?

Pablo
05-01-2011, 10:04 AM
The OP is describing how to take a PDF downloaded illegally from the net and convert it into some other format. Do we want to really give credence to this illegal activity?

It's a technical post, I don't see why you should say this.

GeoffC
05-01-2011, 11:08 AM
The OP is describing how to take a PDF downloaded illegally from the net and convert it into some other format. Do we want to really give credence to this illegal activity?

Where, Jon, does it say that ?

JSWolf
05-01-2011, 09:28 PM
Open the PDF in Adobe Reader
Take a careful look at it. If it is so riddled with errors it may not be worth bothering with. Is there a better format or better version available (v5.0 is best)?

Riddled with errors and v5.0 give it away. I've not seen publisher PDF riddled with errors. Also, v5.0 is a version number for publisher copy. It's used by the people pirating eBooks. So yes, this is an article describing how to convert a pirated PDF. It was a dead giveaway.

DaleDe
05-01-2011, 09:52 PM
The OP is describing how to take a PDF downloaded illegally from the net and convert it into some other format. Do we want to really give credence to this illegal activity?

I saw no DRM breaking in the description or necessarily anything other than format shifting for personal use. This is legal in the US but may be illegal other places unless the document is not copyrighted. A copyright caution can easily be added to the wiki entry to tell the user check the laws of their country regarding format shifting. Of course redistributing any copyrighted document without permission is illegal whether the format is changed or not. Did you see something I missed?

Dale

JSWolf
05-01-2011, 10:04 PM
I saw no DRM breaking in the description or necessarily anything other than format shifting for personal use. This is legal in the US but may be illegal other places unless the document is not copyrighted. A copyright caution can easily be added to the wiki entry to tell the user check the laws of their country regarding format shifting. Of course redistributing any copyrighted document without permission is illegal whether the format is changed or not. Did you see something I missed?

Dale

The fact that the OP started off by telling us to try to get a v5.0 PDF means a pirated PDF.

Adjust
05-02-2011, 01:38 AM
The fact that the OP started off by telling us to try to get a v5.0 PDF means a pirated PDF.

No I read that to be open in Acrobat V5.0 reader.
(in my industry we refer to PDF from whatever version they were made from, v5.0 being one)

I have no idea where you are getting your assumption that its pirated.

I have every edition of Acrobat going back to V4.0 and now CS5 (v9.4.4)

CS3 Acrobat convert PDFs to text like vomit. CS5 does a good job.
CS does a better job.

I am constantly converting PDFs to text (Word files) and found the write up excellent.

And it has already helped me fast track my workflow

And I'm looking forward to his next one

DaleDe
05-02-2011, 02:05 AM
I would agree. Many people have reasons to convert PDF's. Even PDF files from scans they made themselves. There is no reason to believe this process condones copyright violation or stealing. It has legitimate purpose.

Dale

JSWolf
05-02-2011, 12:02 PM
It's not the process. It;s that the OP is telling you to go look for publisher pirated PDF to convert. V5.0 is not any version of Acrobat or any other software. v5.0 is the version number given to publisher originals as served by the pirates. Why is this a hard concept to grasp?

Adjust
05-02-2011, 09:19 PM
Do you mean "Publisher" as in Book publisher or Microsoft Publisher...
I work for Book publishers and I've never heard the term you are referring to, this is why As you say the "Concept is hard to Grasp".

JSWolf
05-02-2011, 09:31 PM
Do you mean "Publisher" as in Book publisher or Microsoft Publisher...
I work for Book publishers and I've never heard the term you are referring to, this is why As you say the "Concept is hard to Grasp".

I mean the publishers putting out eBooks. You may not have heard of this because you've not gone to the dark side and looked around. I have and the OP is saying to start with a stolen publisher produced PDF.

Adjust
05-03-2011, 12:43 AM
Humm...ok I'll take your word for it.

Maybe if the OP were to omit that reference.

Because the rest of the stuff he wrote is helpful.

Toxaris
05-03-2011, 10:36 AM
I have to agree with JSWolf. The OP is absolutly talking about pirated PDF's. Otherwise the guide is nice, but not great. A lot of search and replaces could end up in undesired results...

JSWolf
05-03-2011, 11:16 AM
Yes, the OP should remove the piracy references. And I think the fact that the mods didn't pick up on this means we need new mods who know about eBook piracy who would pick up on this. Heck, I even reported it and still nothing.

Faster
05-04-2011, 03:32 PM
Styles from Word to Sigil

Method:
Mark styles with pseudo-tags and save as text Unicode (UTF-8).
Then open in Sigil and convert pseudo-tags to proper CSS tags.

Realize that there are Character styles applied to a few words in a paragraph and Paragraph styles applied to whole paragraphs.
When you transfer a PDF file to Word Character formatting is preserved (fonts, sizes, style, weight), but Paragraph formatting (indents, centering, spacing) is lost; that's if there ever was any in the PDF. I suspect not!
-------------
IN WORD
-------------
Note: in this section Wildcards can be left unchecked.

Using tags for Character styles
Italic style
Put i_ before italic word(s) and _i after.
Find/Replace:

Find: blank More > Format > Font > italic
Repl: i_^&_i

Regarding bold style, be careful, if the only bold sections are the chapter headings they are better fixed with heading tags (see later). Only do this if there are lots of essential bolded words in the body of the story.

Find: blank More > Format > Font > bold
Repl: b_^&_b


It's a good idea to click 'No formatting' before starting each Find/Replace.

If you have tried tagging 'bold' and it has tagged the chapter headings you would need to modify the expressions given below ***.

You could create your own tags for other character styles.

Using pseudo-tags for chapter headings with Paragraph styles
Let's suppose that you've managed to convert all the PDF chapter headings into Word's 'Heading 2' paragraph style.

How? See *** at end of this post.

These heading styles will need to be tagged before you convert the lot to a text file. Here's how:

Find: blank More > Format > Style > Heading 2
Repl: h2_^&

Note: There is no tag '_h2' at the end in the Replace box because this is a complete paragraph and we'll use the </p> tag in Sigil.

You can create your own tags for other Headings (h1_, h3_) if you've used these headings and their numbers warrant a global Find/Replace.

If any of your lines have indents, the indent could be replaced with multiple spaces in the text file.
To avoid this:
Format menu > Paragraph
Set all the indents to 'none' or zero.

Now it's time to save your file and you'll be saving it as a text file with special encoding.

File menu > Save As
Give the file a name.
Use the drop down menu of 'Save as type:' and select 'Plain Text (*.txt)'.
A dialogue, 'File Conversion' pops up.
Click 'Other encoding:'
Select 'Unicode (UTF-8)' and click 'OK'.

-----------
IN SIGIL
-----------
Open the file in Sigil

Add a stylesheet.
(Use the one below until you've developed your own.)

Copy into notepad and save as 'stylesheet.css'.

@namespace h "http://www.w3.org/1999/xhtml";
@page {
margin-top: 12pt;
margin-bottom: 1pt
}
body {
margin-left: 1%;
margin-right: 1%
}
p {
margin: 0;
text-indent: 1em;
font-family: "Times New Roman",serif;
}
h1, h2 {
margin: 0;
text-align: center;
}
.italic {
font-style: italic;
}
.bold {
font-weight: bold
}
.image {
margin-top: 1em;
margin-bottom: 1em;
margin-left: 1.2em;
text-align: center;
max-height: 100%
}
blockquote {
font-family: Arial, sans-serif;
font-style: italic;
}

In Sigil right click the Styles folder.
Select 'Add existing items...' and locate the file 'stylesheet.css' you've just created.

Double click 'Section0001.xhtml' to open it.
Go into Code View.
Between the <head> </head> tags put this link to your stylesheet.

<link href="../Styles/stylesheet.css" rel="stylesheet" type="text/css" />


Now to replace the _tags with CSS tags.
All these Find/Replaces are done in Code View.
CTRL H for Find/Replace
Check use Regex.

For Character styles use a Find/Replace like this:

Find: i_([^_]*)_i
Repl: <span class="italic">\1</span>

Find: b_([^_]*)_b
Repl: <span class="bold">\1</span>


Note: Finding italic tags when you have nested bold tags inside italic tags (i_b_some text_b_i) doesn't work.
Always Find/Replace the inner tags first.

For Paragraph styles use this type of Find/Replace to replace any 'h(digit)_' headings with <h1><h2> etc:

Find: <p>h(\d)_([^</]*)</p>
Repl: <h\1>\2</h\1>


*** Making Chapter headings in Word:
(You don't have to do this in Word. I've written a post in the Sigil section named 'Regex' which explains how to do this in Sigil.)
http://www.mobileread.com/forums/showthread.php?t=130763

But if you want to do it in Word...
There are two ways to do this. The first method relies on the transferred PDF chapter headings having a distinguishing character style. The second uses Find/Replace based on text content.

Using STYLE to alter Chapter headings to <h2>
Format menu > Styles and Formatting
Click in one of the chapter headings in the document.
Find the style in the Formatting panel by looking for a blue outline box.
Move the cursor to the right-hand side end of the box. A drop down menu arrow appears.
Click the arrow and choose 'Select All x Instance(s)'. The nearer the x is to the number of chapters the better, but don' forget 'Prologue' and 'Epilogue'.
With these multiple selections all you need to do is click 'Heading 2' once. Done.
If there are headings like 'Part One', 'Part Two' you can use the same method replacing with 'Heading 1'.

Using Find/Replace to alter Chapter headings to <h2>
For these searches you must check 'Use wildcards'.
We will find chapter headings and bracket them with the pseudo-tags h2_ ... _h2

1) The line starts with the word 'Chapter', 'CHAPTER' or 'chapter':

Find: (^13)([Cc][Hh][Aa][Pp][Tt][!^13]@)(^13)
Repl: \1h2_\2_h2\3


2) Chapter is labelled by DIGITS only (example '35'):

Find: (^13)([0-9]{1,2})(^13)
Repl: \1h2_\2_h2\3

Note: this assumes that there are less than 100 chapters and avoids lines like '1943'. If there are more than 99 chapters!!! then change {1,2} to {1,3}

3) Chapter is headed by NUMBERS in WORDs (possibly hyphenated) only (example 'Forty-five'):

Find: (^13)([A-z-]@)(^13)
Repl: \1h2_\2_h2\3

Note: be aware that any other single word paragraphs will also be found and made Heading 2.

In Sigil:
At this stage you have a single HTML file. If you have created <h2> tags for Chapters you can use the following to split on chapter headings <h2>:
In Code View.

Find: (<h2)
Repl: <hr class="sigilChapterBreak" />\1

Put the focus back onto the page.
Now press F6