08-22-2010, 05:20 AM | #1 |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
can't generate a toc from an html file
I am trying to convert an html file( Gideons' Band) to an Epub from the command line or the GUI but It won't generate a TOC. The file is in the same format as one I did successfully convert(Bonaventure). The files are located on Project Gutenberg. here is the URL
Gideon's Band: http://www.gutenberg.org/files/19348...-h/19348-h.htm Bonaventure: http://www.gutenberg.org/files/24078...-h/24078-h.htm To reiterate, Bonaventure was fine, Gideon's Band would not generate the TOC the files are stated by PG to be in the public domain. TIA, Paul |
08-22-2010, 07:43 AM | #2 |
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hi Paul,
I've had a look inside the HTML of Gideon's Band. The chapter tagging looks like this: Code:
<div class="c1"> <h1>GIDEON'S BAND</h1> <h2>I</h2> <h3>THE STEAMBOAT LEVEE</h3> </div> ... ... <div class="c1"> <h2>II</h2> <h3>THE "VOTARESS"</h3> </div> Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:h2 Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:h2 (or you could leave it blank in this particular case) or you may prefer Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:div[re:test(@class, "c1", "i")] Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:h2 or even Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:div[re:test(@class, "c1", "i")] Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:div[re:test(@class, "c1", "i")] All of these worked for me. Good Luck. Last edited by jackie_w; 08-22-2010 at 07:48 AM. Reason: added 3rd option |
Advert | |
|
08-23-2010, 12:09 AM | #3 |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
Well it did generate all the chapter titles, but not the correct pages!
Paul Last edited by p3aul; 08-23-2010 at 01:28 AM. |
08-23-2010, 06:24 AM | #4 |
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
To help you further you need to be more specific about what isn't working.
|
08-25-2010, 01:14 AM | #5 |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
OK I don't know in detail. All I know is that on the command line I type: ebook-convert gideon.html gideon epub. This is supposed to convert the file to epub, right. I understand calibre looks for the <h1>, <h2>, markup tags to create a TOC they are there. It doesn't. I've also tried to convert in the GUI. Same thing.
I tried your suggestions I replace the default settings with yours. I get a complete listing of chapters but the pages are wrong. most of them just take you back to page 3 for some reason. I converted the book to rtf. This time all I got for a TOC was links to illustrations. Tomorrow I will delete the illustrations entirely and try again. |
Advert | |
|
08-25-2010, 09:51 AM | #6 |
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Firstly, sticking with the GUI for the moment. I've converted the HTML to epub using option 2 of the three I listed above.
I have attached a screencap of the resulting epub when viewed on the PC using the calibre ebook viewer. When I open the TOC panel (left-hand side 7th button from top), I see a list of all 63 chapters. If I click on one it takes me straight there. Screencap shows Chapter 2 selected. Once sent to the PRS505 using GUI 'Send to Device', I select the book and press my 505's TOC button (button 5). It lists all the chapters and I can select whichever I want. I have also attached a screencap of the 505's TOC. Which of these differs from your own experience? Secondly, if you are trying to use the 'inline TOC' (i.e. the one with hyperlinks which is actually contained in the early pages of the book) then you will find that the HTML has coded the labels BEFORE the <div> and <h2> tags. Consequently when you press a hyperlink it will take you to a point just before your chapter heading and you will need to turn to next page to get to the selected chapter heading. Personally, I find these inline TOCs more trouble than they're worth. Thirdly, I don't use the commandline version of ebook-convert myself but I do know that it has a large number of options which need to be set to customise your conversion. Here's a link to the relevant part of the User Manual. ... and finally... I'm not sure how removing the images will solve your problem. They show up fine on my 505. |
08-25-2010, 06:30 PM | #7 |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
I confess, I haven't tried your option 2 yet, only 1.
the second time I tried, with option 1 resulted in all the chapter headings but if you pressed the appropriate key on the 505, it mostly always sent you to page 3. I use ebook-convert because Calibre lease so many child processes running when it exits, that it slows down my computer. I tried to just copy the epub to my external card on the 505, but it leaves the metadata behind, so I have to use the GUI to copy the epub to the 505. I refer to the manual(ebook-convert, so much I have a link to it on my Chrome toolbar! Also using the command line, it's easier to trouble-shoot when things go wrong. I only tried removing the "links" to the images in the html, not the images themselves. I thought if I remove the links, it might fall through to the chapter-headings. IMPORTANT: From the Calibre manual: --level1-toc XPath expression that specifies all tags that should be added to the Table of Contents at level one. If this is specified, it takes precedence over other forms of auto-detection. Does this mean a complete xpath expression as in the "Structure Detection" in the GUI Convert books, or just a partial one like "//h1" Thanks, Paul |
08-25-2010, 07:26 PM | #8 | ||
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
All 3 options give very similar results. The only reason I used option 2 was that it centred the chapter headings and opt 1 didn't. (Opt 3 adds the chapter name to the TOC - which I thought was a bit cluttered) but it's personal preference.
Quote:
Quote:
Code:
ebook-convert "Gideon's Band - George W Cable.zip" gb2.epub --chapter "//h:h2" --level1-toc "//h:h2" Last edited by jackie_w; 08-25-2010 at 07:30 PM. Reason: more info |
||
08-26-2010, 04:52 AM | #9 | |
US Navy, Retired
Posts: 9,865
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
Of course if you have the Enable system tray icon feature checked under Preferences - Interface, then you have to use ctrl-q to exit the program. Just clicking on the big red X just minimizes calibre to the system tray. See attached. You can also go to Preferences - Advanced and lower the number of worker processes. Last edited by DoctorOhh; 08-26-2010 at 04:55 AM. |
|
08-26-2010, 05:30 PM | #10 | |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
Jackie:
Quote:
Im curious though. In all the stuff I've read here, I thought the xpath thingy was just "//h2" not "//h:h2" Is there a reason for typing "//h:h2" and NOT "//h2"? Just curious.. Thanks, Paul Walt: Well, it's impolite to argue, but I know for a fact it does, either way. I guess if Adobe can ignore memory leaks in every version of PS up to cs3 I guess I'll have to put up with the processes. It's the only game in town and besides I only use the GUI to transfer the books to my reader. I could use Sony for that I guess. When neither program is perfect, you have to use a bit of each I guess. It's no secret, Kovid knows the way I feel about the GUI. I'm just thankful for his command-line programs. If you can remember MS Dos 3.2, the command line isn't so bad. |
|
08-26-2010, 07:57 PM | #11 |
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hurrah!
Er... I have no real understanding of XPath, I have to let the GUI Wizard (the Harry Potter magic wand button) generate my XPath for me. If you select a heading tag it always puts the //h: in front of it. If you select a div tag it becomes //h:div I guess one would have to set to with an XPath manual to understand it fully. |
08-26-2010, 08:59 PM | #12 | |
US Navy, Retired
Posts: 9,865
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
|
08-26-2010, 10:56 PM | #13 |
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
You know, I guess the "h:" is just a signal to the pre-processor that an h tag is coming. I should play with the GUI more I guess, I had forgotten he had included the Wizard! It really is a wonderful program. The problem with so many of the PG books is they just use the original's way of marking chapters. If the first word in a chapter heading is chapter, it's easy. I just load the text file in Word and do a search and replace, replacing all Chapter words with ##Chapter, and then convert.
Anyway, Thanks for helping me, Paul |
08-27-2010, 05:44 AM | #14 | ||
Captain Courageous
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
|
Quote:
Quote:
|
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Merging multiple HTML files into one HTML file | skoobwoman | Workshop | 45 | 07-11-2014 10:46 AM |
pulling an entire website into Calibre and generate an epub file using news function? | N13L5 | Calibre | 2 | 10-06-2010 09:00 PM |
How can i convert HTML or txt file to EPUB file ? | guguqiaqia | ePub | 7 | 05-28-2010 09:15 PM |
How not to auto-generate TOC in Calibre -setting? | Jundle | Calibre | 0 | 05-05-2010 02:53 AM |
HTML Book + non HTML TOC to epub | aarcane | Calibre | 4 | 03-02-2010 02:58 AM |