![]() |
#1 |
Member
![]() Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
|
xpath for chapter detection
i have a TXT to EPUB conversion that I'm trying to work through.
the TXT uses a linebreak and tab to signify a new paragraph, and two linebreaks before and after the chapter name in order to detect chapters, with no obvious "chapter" in the chapter title, I copied a list of the names of the chapter titles from the toc and used find replace and the xpath expression wizard to create this xpath expression //*[re:test(., "^SOMEONE LIKE YOU$|^Taste$|^Lamb to the Slaughter$|^Man from the South$|^The Soldier$|^My Lady Love, My Dove$|^Dip in the Pool$|^Galloping Foxley$|^Skin$|^Poison$|^The Wish$|^Neck$|^The Sound Machine$|^Nunc Dimittis$|^The Great Automatic Grammatizator$|^Claud's Dog$|^The Ratcatcher$|^Rummins$|^Mr Hoddy$|^Mr Feasey$|^EIGHT FURTHER TALES OF THE UNEXPECTED$|^The Umbrella Man$|^Mr Botibol$|^Vengeance is Mine Inc.$|^The Butler$|^Ah, Sweet Mystery of Life$|^The Bookseller$|^The Hitchhiker$|^The Surgeon$", "i")] to make this work, I need the expression to recognize the paragraph as true if the entire 'chapter title' paragraph matches one of the literal strings. However, the expression seems to be matching any paragraph that contains the string. can you help me make this work? |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,246
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Rather than typing all this into Calibre you may have more joy with the following approach.
Edit your source .TXT file's headings using Markdown markup language, like this:- Code:
# My Book Title # My Book's Author ///Table of Contents/// ## SOMEONE LIKE YOU ... blah blah blah ... ## Taste ... blah blah blah ... ## Lamb to the Slaughter ... blah blah blah ... etc etc The ///Table of Contents/// line should be placed where you want the internal TOC to be. When you've finished editing, convert the TXT to EPUB, making sure you check the [Convert] - [TXT Input] - "Process using markdown" box. If you would also like a TOC which appears in the TOC left sidebar in the ebook viewer you should also do this during the conversion in [Convert] - [Structure Detection] set "Detect chapters at" to //h:h2 in [Convert] - [Table of Contents] set "Level 1 TOC" to //h:h2 You can read more about markdown here |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
|
useful, but I can't think of a methodology for editing the text document to markup without doing it manually, meaning I would have to edit each book I ever formed a toc for.
well, I guess I could do a python script, but learning xpath, if possible, seems easier. |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
Part of the problem might be that in your regex "//*[re:test(., "^SOMEONE LIKE YOU", you have the asterisk, meaning all tags, but you have no tags at all in a .txt file. Also, usually, in 'or' expressions like "(a|b|c)", the carat goes outside the leading parenthesis, and the $ goes outside the closing parenthesis. Can you use the 2 linebreaks before and after the chapter name to detect the chapters? Something like "\n\n.*\n\n"? Of course, I imagine you'd have to do that in a program, since you still don't have any tags to match.
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
|
i was under the impression calibre ran toc creation after converting to xhtml, and made everything separated by a carriage return into a new paragraph
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
|
oh, sorry I wasn't explicit that I was using calibre
|
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
|
if the special characters $ or ^ are put outside of the commas, EX:
//*[re:test(., ^"SOMEONE LIKE YOU"$, calibre returns that the xpath expression is invalid. |
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
|
This is definitely the easiest way to do it. If you have a text editor which can manage multi-line regular expressions then you can do it with one search/replace. Alternatively you can semi-automate it using a free editor such as Notepad++ by using an Extended seach to find \n\nChapter_Name and then invoke a Macro to insert the ## markdown characters. It only takes a few seconds per chapter and it's worth the effort.
|
![]() |
![]() |
![]() |
Tags |
chapters, txt, xpath |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
chapter detection in any book | yuki86 | Calibre | 9 | 05-06-2009 06:54 AM |
Chapter detection for LRF | HenryP | Calibre | 12 | 04-03-2009 08:22 AM |
Cant find help for chapter detection | fallwood | Calibre | 6 | 12-10-2008 01:20 PM |
Calibre chapter detection | AKninja04 | Calibre | 5 | 09-14-2008 12:09 PM |