09-01-2008, 05:08 PM | #1 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
chapter detection, basic functionality
I encountered some problems in chapter detection while testing the bookit plugin, and found out that I don't understand exactly how it works.
Quite simply, I created a small html file with "<h1>some text</h1>" here and there, to represent chapter headings. I found out that the html2lrf switches: --add-chapters-to-toc --chapter-regex="." (or --chapter-regex="h1", or --chapter-regex 'h1', or different other variations) accomplish exactly nothing: the lrf is created, without any TOC. Can you please explain me what I'm doing wrong? Thanks! alessandro |
09-01-2008, 11:02 PM | #2 |
creator of calibre
Posts: 44,353
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
download the calibre beta and use
Code:
--chapter-attr h1,none, |
Advert | |
|
09-02-2008, 02:25 AM | #3 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
seems like a snaky way to find out beta-testers!
Joking apart, I was a bit afraid - but if it's not too unstable I'll give it a try. Seems however a functionality already present in the stable version (at least in the man pages), and the switch does not raise an error. Of course, however, it generates a LRF file without a TOC - as usual. alessandro |
09-02-2008, 03:37 AM | #4 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
well, I jumped in so I'll post you about my problems - if any - on the beta (no problems yet).
I tried your suggestion, but I got: [root@lambda2 Booklets in process]# html2lrf literature.htm --add-chapters-to-toc --chapter-attr h1,none, Processing u'literature.htm' Parsing HTML... Converting to BBeB... Traceback (most recent call last): File "<string>", line 2026, in <module> File "<string>", line 2020, in main File "<string>", line 1910, in process_file File "<string>", line 273, in __init__ File "<string>", line 395, in add_file File "<string>", line 509, in parse_file File "<string>", line 723, in process_children File "<string>", line 1762, in parse_tag File "<string>", line 723, in process_children File "<string>", line 1449, in parse_tag AttributeError: 'unicode' object has no attribute 'pattern' /usr/bin/html2lrf: line 6: 32591 Segmentation fault ./html2lrf "$@" where, usually, I didnt have any error - (apart from no-TOC) then I thought that the new html2lrf certainly wasnt in /usr/bin anymore, and sure enough it was in /opt/calibre. Then I got this: [root@lambda2 Booklets in process]# /opt/calibre/html2lrf literature.htm --add-chapters-to-toc --chapter-attr h1,none, Traceback (most recent call last): File "<string>", line 7, in <module> File "/home/kovid/build/pyinstaller/iu.py", line 346, in importHook ImportError: No module named PyQt4 Now, I have this on my machine: [root@lambda2]# rpm -qa|grep -i pyqt PyQt-3.17.4-1.fc8 PyQt-devel-3.17.4-1.fc8 PyQt4-4.3.3-1.fc8 I wonder it it's related to the warning I got recently with the updtates, that tells me this: WARNING: You need PyQt >= 4.4.2 for the GUI. You have 4.3.3 You may experience crashes or other strange behavior. alessandro |
09-02-2008, 10:34 AM | #5 |
creator of calibre
Posts: 44,353
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
oops typo, will be fixed in the next beta release.
/usr/bin/html2lrf is the correct way to run it. |
Advert | |
|
09-03-2008, 02:43 AM | #6 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
good, I'm waiting for it!
Anyway, coming back to my original question, I still have a basic doubt. I believed that the default setup of the engine that recognizes the chapters was to look for stuff inside <h1></h1> (don't know about h2,h3,...), so I just needed the default functionality, just in case to be activated with --add-chapters-to-toc . The need to use --chapter-attr h1,none, to install the beta etc, means that the chapter detection engine does not work? Sorry for the questions, but the man pages are not very clear about it ... alessandro |
09-03-2008, 11:15 AM | #7 |
creator of calibre
Posts: 44,353
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The default is to search for the string chapter,section or book inside either h1 or h2 tags and if they are found to mark them as chapters.
|
09-04-2008, 02:56 AM | #8 | |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
Quote:
But where? In the id attribute (<h1 id="chapter 1">chapter title</h1>)??? If it's so, maybe it will be more difficult that I thought to easily create a TOC from the Bookit editor - I believed that html2lrf was able to "harvest" the chapters just by looking to <h1>chapter title</h1> text, so in the Bookit editor I could just set the h1 attribute to selected text. alessandro |
|
09-04-2008, 10:28 AM | #9 |
creator of calibre
Posts: 44,353
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
--chapter-regex matches the text inside heading tags i.e. <h1>chapter 1</h1> will match
If you want to match on tagnames or tag attributes, use --chapter-attr |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
xpath for chapter detection | romnempire | Calibre | 7 | 07-26-2010 05:34 PM |
Chapter detection for LRF | HenryP | Calibre | 12 | 04-03-2009 08:22 AM |
Cant find help for chapter detection | fallwood | Calibre | 6 | 12-10-2008 01:20 PM |
Calibre chapter detection | AKninja04 | Calibre | 5 | 09-14-2008 12:09 PM |