Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-02-2011, 12:50 PM   #1
buckm56
Member
buckm56 began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2010
Location: Colorado, US
Device: Kindle 3
TOC based on Capitalized Words

I have a book which isn't well formatted to created a TOC. The only format clue is that all letters of the first couple of words of the chapter are CAPs. Is there any formula that can create a TOC from this.

Thanks
buckm56 is offline   Reply With Quote
Old 06-02-2011, 01:25 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by buckm56 View Post
I have a book which isn't well formatted to created a TOC. The only format clue is that all letters of the first couple of words of the chapter are CAPs. Is there any formula that can create a TOC from this.

Thanks
If you convert to EPUB, then use Sigil's REGEX (match case ticked),
you can quickly step through (just Find-next or Replace)
change the pattern to match those skipped: repeat
BTW: if the Chapter start uses 'small-caps' a case sensitive S&R will not work, but the small-caps class is your best ever trigger

Assuming each 'chapter' started in a separate segment, I would have to see the <body> tag through to the lines that would make the TOC entry.
If a chapter also starts mid-file, to see if there is a pattern to trap.
theducks is offline   Reply With Quote
Advert
Old 06-03-2011, 02:19 AM   #3
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
You could do the same thing in the Calibre conversion process, I believe- you can use regex in XPath, so you'd have to formulate an expression that matches capitalized words.

Another thing to try would be to activate the relevant heuristic options.
Manichean is offline   Reply With Quote
Old 06-03-2011, 06:46 PM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
If I understand the use case he's describing it's not one currently covered by Heuristics. That said, next time I get back to tweaking that it's a potential test case to add, but one that works pretty differently from the existing function as you wouldn't want to wrap those lines in H2 tags.

XPATH with Regex is about the only thing that would work, but all the TOC entries would then be a paragraph long.
ldolse is offline   Reply With Quote
Old 06-03-2011, 07:13 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ldolse View Post
If I understand the use case he's describing it's not one currently covered by Heuristics. That said, next time I get back to tweaking that it's a potential test case to add, but one that works pretty differently from the existing function as you wouldn't want to wrap those lines in H2 tags.

XPATH with Regex is about the only thing that would work, but all the TOC entries would then be a paragraph long.
Good point.
I have done a non-visible header inserted before the back referenced text is rebuilt.


<h2 class="very short" title="\1">&nbsp;<h2> < The original source preceeding Stuff>\1
theducks is offline   Reply With Quote
Advert
Old 06-03-2011, 11:16 PM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by theducks View Post
Good point.
I have done a non-visible header inserted before the back referenced text is rebuilt.


<h2 class="very short" title="\1">&nbsp;<h2> < The original source preceeding Stuff>\1
That's kind of how I was thinking I would do it for Heuristics too, still fiddly because lots of books that do have first paragraphs with that formatting have normal chapter headings anyway. That fact could also help reduce false positives for some books though.

Last edited by ldolse; 06-03-2011 at 11:19 PM.
ldolse is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ePub TOC to mobi TOC edmnddntes Conversion 5 01-24-2011 02:56 AM
Creating TOC for books based on Text Files crich70 Kindle Formats 4 09-13-2010 12:47 AM
Should ''internet'' be capitalized or lowercase? taglines Lounge 18 07-06-2010 04:15 AM
Detect chapter headings with capitalized words fiendmish Calibre 6 05-31-2010 10:45 AM
Making a TOC for LRFs? Issues with Calibre + LRF TOC editor not working Magitek LRF 0 05-06-2009 01:25 PM


All times are GMT -4. The time now is 11:11 AM.


MobileRead.com is a privately owned, operated and funded community.