02-19-2010, 03:24 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: stanza
|
Make chapters for a document/HTML
Does anyone have ideas on how to make chapters for a master document? That is, if i have a pdf or doc, how to split it to different txt files for different chapters?
|
02-19-2010, 06:36 AM | #2 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
What are you using to create your ePubs?
Atlantis Word Processor automatically splits the output ePub files when it comes across a paragraph with a Heading style (and will insert that paragraph into the ToC). Sigil allows you to chose exactly where to split the file by inserting a chapter break mark (elements tagged with <h1>,<h2>, etc will be inserted into the ToC independently of the chapter breaks). |
Advert | |
|
02-19-2010, 06:28 PM | #3 | |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Quote:
I usually use <H2>...</H2> tags for chapter headings in my HTML code. (Or perhaps <H2 class="chaptertitle">...</H2>, etc. Calibre allows you to set the XPath expression for chapter detection, but if memory serves, it default setting will pick up H2 tags. It'll do the splitting for you, at least with normal settings. |
|
02-20-2010, 04:48 AM | #4 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: stanza
|
Quote:
Actually i am using Ecub as a compiler to generate epub file. As for ecub, it can only imports plain txt files or HTML file. Therefore i wonder will there be a way to make chapters for a pdf or doc , then i can use ecub to compile epub. But actually i am more interested in how to split a pdf or doc into different xhtml files according to chapters . |
|
02-20-2010, 07:15 AM | #5 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Here's a very simple word macro to select chapters and paste them into a new document. Place the cursor at the start of the first chapter, run it, save the new doc, run again, etc.
It assumes that chapters are properly marked with one paragraph at the start which has the Heading 1 style. You may need to edit it to adapt it to the particular formatting your document uses (if you have problems with that, then you're probably better-off just using cut-and-paste). Code:
Sub Macro6() ' ' Extract Chapter ' ' Selection.MoveRight Unit:=wdSentence, Count:=2, Extend:=wdExtend Selection.Extend Selection.Find.ClearFormatting Selection.Find.Style = ActiveDocument.Styles("Heading 1") With Selection.Find .Text = "" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = True .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Selection.ExtendMode = False Selection.MoveLeft Unit:=wdSentence, Count:=2, Extend:=wdExtend Selection.Copy Selection.MoveRight Unit:=wdSentence, Count:=1 Application.Documents.Add Selection.Paste End Sub |
Advert | |
|
02-20-2010, 10:25 AM | #6 | |
Boo-Frickety-Hoo-Erizer
Posts: 251
Karma: 686
Join Date: Oct 2007
Device: Kobo Glo HD!
|
Yay! Someone else brought up Word!
My turn: This macro will detect the word "Chapter", change it to H1, and add a line above it: Quote:
Calibre, eCub, and Sigil will recognize and use the word-generated toc. This proves quick & easy if you assign the macro to a alternate keystroke. -bjc |
|
02-20-2010, 12:57 PM | #7 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: stanza
|
Quote:
Thx for your macro!!!! It is really cool, i have also written a similar macro but yours is better than mine. Really thx for your macro. But i encounter a new problem, actually i am dealing with a document with image in it. Therefore i can't seperate the chapters and save it to txt but need to save as xhtml. But word seems can't save as xhtml files. I ve tried to save as html but can't import those htm files in ecub. Are there any ways to save doc as xhtml ? |
|
02-20-2010, 01:47 PM | #8 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
You can save it as (filtered) HTML, which should be good enough for ePub. If for some reason you need it as XHTML, just open the HTML and change the doctype at the top:
e.g., from <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> to <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> and rename the file to end in .xhtml. There's nothing in HTML that can't be used in XHTML. |
02-20-2010, 02:08 PM | #9 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Saving them as filtered html should be fine, but you'll need to make a few changes afterwards. Open the files in Notepad++ and check the head at the top of the file. If you have
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> there, then Word saved it in ANSI coding. Go to the Format menu and select 'Convert to UTF-8' and change 'windows-1252' to 'utf-8' . You'll also probably want to delete all the unneeded @font-face definitions Word will have inserted at the top. |
02-21-2010, 12:16 AM | #10 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: stanza
|
Quote:
That's exactly the problem i faced. I save the files as filtered html but can't be imported in ecub. Thx for the solution. But are there any ways to tidy up the html files 'cause they are too messy, too much unwant information |
|
02-21-2010, 08:33 AM | #11 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Actually, I realised that Word's html export has a few other flaws. You'll probably want to run the html through HTML Tidy or something similar to fix all the flaws (mostly, Word fails to put quotes around attribute values). Notepadd++'s TextFX plugin can do the HTML Tidy job for you.
Word does add a lot of needless fluff, like spans to define the language, those are a pain to remove in Notepad++ as its regex engine doesn't handle newlines or non-greedy matches. Sigil, OTOH, has a regex engine that will remove them easily - set regular expression and minimal matching, Find string <span xml:lang="EN-US" lang="EN-US">(.*)</span> Replace string \1 Sigil also automatically does the HTML Tidy xhtml conversion for you. |
02-21-2010, 08:37 AM | #12 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Also consider installing AbiWord or OpenOffice and use one of those to convert the .doc/.docx to .html. You can also convert through Google Docs. All are free, and do a better job. Switching away from reliance on Microsoft is always good for society. That company has too much power for anyone's good.
|
02-23-2010, 02:24 PM | #13 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: stanza
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Order of Chapters in HTML->ePub | alias_neo | Calibre | 9 | 05-16-2011 11:55 AM |
Hacks Is there a way to add multible chapters to make one book | 18Coaster | Amazon Kindle | 4 | 09-10-2010 07:40 PM |
how to make chapters? | rysiu | Calibre | 9 | 05-31-2010 10:28 AM |
chapters (HTML-files) not showing up | erik5000 | ePub | 1 | 12-21-2009 04:22 PM |
Multi-html files as chapters... | WigglePig | Sony Reader | 5 | 09-16-2008 04:06 AM |