Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 07-24-2010, 03:49 PM   #1
romnempire
Member
romnempire began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
xpath for chapter detection

i have a TXT to EPUB conversion that I'm trying to work through.

the TXT uses a linebreak and tab to signify a new paragraph, and two linebreaks before and after the chapter name

in order to detect chapters, with no obvious "chapter" in the chapter title, I copied a list of the names of the chapter titles from the toc and used find replace and the xpath expression wizard to create this xpath expression

//*[re:test(., "^SOMEONE LIKE YOU$|^Taste$|^Lamb to the Slaughter$|^Man from the South$|^The Soldier$|^My Lady Love, My Dove$|^Dip in the Pool$|^Galloping Foxley$|^Skin$|^Poison$|^The Wish$|^Neck$|^The Sound Machine$|^Nunc Dimittis$|^The Great Automatic Grammatizator$|^Claud's Dog$|^The Ratcatcher$|^Rummins$|^Mr Hoddy$|^Mr Feasey$|^EIGHT FURTHER TALES OF THE UNEXPECTED$|^The Umbrella Man$|^Mr Botibol$|^Vengeance is Mine Inc.$|^The Butler$|^Ah, Sweet Mystery of Life$|^The Bookseller$|^The Hitchhiker$|^The Surgeon$", "i")]

to make this work, I need the expression to recognize the paragraph as true if the entire 'chapter title' paragraph matches one of the literal strings. However, the expression seems to be matching any paragraph that contains the string.

can you help me make this work?
romnempire is offline   Reply With Quote
Old 07-24-2010, 10:35 PM   #2
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,784
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Rather than typing all this into Calibre you may have more joy with the following approach.

Edit your source .TXT file's headings using Markdown markup language, like this:-

Code:
# My Book Title

# My Book's Author

///Table of Contents///

## SOMEONE LIKE YOU

... blah blah blah ...

## Taste

... blah blah blah ...

## Lamb to the Slaughter

... blah blah blah ... etc etc
The single # lines will be treated as <h1> and the ## lines as <h2>.

The ///Table of Contents/// line should be placed where you want the internal TOC to be.

When you've finished editing, convert the TXT to EPUB, making sure you check the [Convert] - [TXT Input] - "Process using markdown" box.

If you would also like a TOC which appears in the TOC left sidebar in the ebook viewer you should also do this during the conversion
in [Convert] - [Structure Detection] set "Detect chapters at" to //h:h2
in [Convert] - [Table of Contents] set "Level 1 TOC" to //h:h2

You can read more about markdown here
jackie_w is online now   Reply With Quote
Old 07-26-2010, 03:29 PM   #3
romnempire
Member
romnempire began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
useful, but I can't think of a methodology for editing the text document to markup without doing it manually, meaning I would have to edit each book I ever formed a toc for.

well, I guess I could do a python script, but learning xpath, if possible, seems easier.
romnempire is offline   Reply With Quote
Old 07-26-2010, 04:17 PM   #4
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 1,878
Karma: 1698650
Join Date: Jan 2009
Device: Kindle, iPad (not used much for reading)
Part of the problem might be that in your regex "//*[re:test(., "^SOMEONE LIKE YOU", you have the asterisk, meaning all tags, but you have no tags at all in a .txt file. Also, usually, in 'or' expressions like "(a|b|c)", the carat goes outside the leading parenthesis, and the $ goes outside the closing parenthesis. Can you use the 2 linebreaks before and after the chapter name to detect the chapters? Something like "\n\n.*\n\n"? Of course, I imagine you'd have to do that in a program, since you still don't have any tags to match.
susan_cassidy is offline   Reply With Quote
Old 07-26-2010, 04:42 PM   #5
romnempire
Member
romnempire began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
i was under the impression calibre ran toc creation after converting to xhtml, and made everything separated by a carriage return into a new paragraph
romnempire is offline   Reply With Quote
Old 07-26-2010, 04:44 PM   #6
romnempire
Member
romnempire began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
oh, sorry I wasn't explicit that I was using calibre
romnempire is offline   Reply With Quote
Old 07-26-2010, 04:58 PM   #7
romnempire
Member
romnempire began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Dec 2009
Device: Kindle 2
if the special characters $ or ^ are put outside of the commas, EX:

//*[re:test(., ^"SOMEONE LIKE YOU"$,

calibre returns that the xpath expression is invalid.
romnempire is offline   Reply With Quote
Old 07-26-2010, 05:34 PM   #8
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 651
Karma: 436517
Join Date: Jul 2010
Location: UK
Device: PRS-300, PW2
Quote:
Originally Posted by jackie_w View Post
Rather than typing all this into Calibre you may have more joy with the following approach.

Edit your source .TXT file's headings using Markdown ...
This is definitely the easiest way to do it. If you have a text editor which can manage multi-line regular expressions then you can do it with one search/replace. Alternatively you can semi-automate it using a free editor such as Notepad++ by using an Extended seach to find \n\nChapter_Name and then invoke a Macro to insert the ## markdown characters. It only takes a few seconds per chapter and it's worth the effort.
Agama is offline   Reply With Quote
Reply

Tags
chapters, txt, xpath

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
chapter detection in any book yuki86 Calibre 9 05-06-2009 06:54 AM
Chapter detection for LRF HenryP Calibre 12 04-03-2009 08:22 AM
Cant find help for chapter detection fallwood Calibre 6 12-10-2008 01:20 PM
Calibre chapter detection AKninja04 Calibre 5 09-14-2008 12:09 PM


All times are GMT -4. The time now is 01:47 AM.


MobileRead.com is a privately owned, operated and funded community.