Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-22-2010, 05:20 AM   #1
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Talking can't generate a toc from an html file

I am trying to convert an html file( Gideons' Band) to an Epub from the command line or the GUI but It won't generate a TOC. The file is in the same format as one I did successfully convert(Bonaventure). The files are located on Project Gutenberg. here is the URL

Gideon's Band: http://www.gutenberg.org/files/19348...-h/19348-h.htm

Bonaventure: http://www.gutenberg.org/files/24078...-h/24078-h.htm

To reiterate, Bonaventure was fine, Gideon's Band would not generate the TOC the files are stated by PG to be in the public domain.

TIA,
Paul
p3aul is offline   Reply With Quote
Old 08-22-2010, 07:43 AM   #2
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,776
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Hi Paul,

I've had a look inside the HTML of Gideon's Band. The chapter tagging looks like this:

Code:
  <div class="c1">
    <h1>GIDEON'S BAND</h1>

    <h2>I</h2>

    <h3>THE STEAMBOAT LEVEE</h3>
  </div>

... ...

  <div class="c1">
    <h2>II</h2>

    <h3>THE "VOTARESS"</h3>
  </div>
The easiest way to get a TOC in your EPUB is to

Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:h2
Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:h2 (or you could leave it blank in this particular case)

or you may prefer
Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:div[re:test(@class, "c1", "i")]
Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:h2

or even
Set [Convert] - [Structure Detection] - 'Detect chapters at' to //h:div[re:test(@class, "c1", "i")]
Set [Convert] - [Table of Contents] - 'Level 1 TOC' to //h:div[re:test(@class, "c1", "i")]

All of these worked for me. Good Luck.

Last edited by jackie_w; 08-22-2010 at 07:48 AM. Reason: added 3rd option
jackie_w is online now   Reply With Quote
Old 08-23-2010, 12:09 AM   #3
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Well it did generate all the chapter titles, but not the correct pages!

Paul

Last edited by p3aul; 08-23-2010 at 01:28 AM.
p3aul is offline   Reply With Quote
Old 08-23-2010, 06:24 AM   #4
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,776
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
To help you further you need to be more specific about what isn't working.
jackie_w is online now   Reply With Quote
Old 08-25-2010, 01:14 AM   #5
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
OK I don't know in detail. All I know is that on the command line I type: ebook-convert gideon.html gideon epub. This is supposed to convert the file to epub, right. I understand calibre looks for the <h1>, <h2>, markup tags to create a TOC they are there. It doesn't. I've also tried to convert in the GUI. Same thing.

I tried your suggestions I replace the default settings with yours. I get a complete listing of chapters but the pages are wrong. most of them just take you back to page 3 for some reason.

I converted the book to rtf. This time all I got for a TOC was links to illustrations. Tomorrow I will delete the illustrations entirely and try again.
p3aul is offline   Reply With Quote
Old 08-25-2010, 09:51 AM   #6
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,776
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Firstly, sticking with the GUI for the moment. I've converted the HTML to epub using option 2 of the three I listed above.

I have attached a screencap of the resulting epub when viewed on the PC using the calibre ebook viewer.

When I open the TOC panel (left-hand side 7th button from top), I see a list of all 63 chapters. If I click on one it takes me straight there. Screencap shows Chapter 2 selected.

Once sent to the PRS505 using GUI 'Send to Device', I select the book and press my 505's TOC button (button 5). It lists all the chapters and I can select whichever I want. I have also attached a screencap of the 505's TOC.

Which of these differs from your own experience?

Secondly, if you are trying to use the 'inline TOC' (i.e. the one with hyperlinks which is actually contained in the early pages of the book) then you will find that the HTML has coded the labels BEFORE the <div> and <h2> tags. Consequently when you press a hyperlink it will take you to a point just before your chapter heading and you will need to turn to next page to get to the selected chapter heading. Personally, I find these inline TOCs more trouble than they're worth.

Thirdly, I don't use the commandline version of ebook-convert myself but I do know that it has a large number of options which need to be set to customise your conversion. Here's a link to the relevant part of the User Manual.

... and finally... I'm not sure how removing the images will solve your problem. They show up fine on my 505.
Attached Thumbnails
Click image for larger version

Name:	GideonPC.jpg
Views:	104
Size:	141.9 KB
ID:	57130   Click image for larger version

Name:	Gideon505.jpg
Views:	107
Size:	45.1 KB
ID:	57131  
jackie_w is online now   Reply With Quote
Old 08-25-2010, 06:30 PM   #7
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
I confess, I haven't tried your option 2 yet, only 1.

the second time I tried, with option 1 resulted in all the chapter headings but if you pressed the appropriate key on the 505, it mostly always sent you to page 3.

I use ebook-convert because Calibre lease so many child processes running when it exits, that it slows down my computer. I tried to just copy the epub to my external card on the 505, but it leaves the metadata behind, so I have to use the GUI to copy the epub to the 505.

I refer to the manual(ebook-convert, so much I have a link to it on my Chrome toolbar! Also using the command line, it's easier to trouble-shoot when things go wrong.

I only tried removing the "links" to the images in the html, not the images themselves. I thought if I remove the links, it might fall through to the chapter-headings.
IMPORTANT:
From the Calibre manual:
--level1-toc
XPath expression that specifies all tags that should be added to the Table of Contents at level one. If this is specified, it takes precedence over other forms of auto-detection.

Does this mean a complete xpath expression as in the "Structure Detection" in the GUI Convert books, or just a partial one like "//h1"


Thanks,
Paul
p3aul is offline   Reply With Quote
Old 08-25-2010, 07:26 PM   #8
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,776
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Quote:
Originally Posted by p3aul View Post
I confess, I haven't tried your option 2 yet, only 1.
All 3 options give very similar results. The only reason I used option 2 was that it centred the chapter headings and opt 1 didn't. (Opt 3 adds the chapter name to the TOC - which I thought was a bit cluttered) but it's personal preference.

Quote:
Originally Posted by p3aul View Post
the second time I tried, with option 1 resulted in all the chapter headings but if you pressed the appropriate key on the 505, it mostly always sent you to page 3.
I cannot reproduce this problem. It works perfectly for me.

Quote:
Originally Posted by p3aul View Post
I use ebook-convert because Calibre lease so many child processes running when it exits, that it slows down my computer. I tried to just copy the epub to my external card on the 505, but it leaves the metadata behind, so I have to use the GUI to copy the epub to the 505.

I refer to the manual(ebook-convert, so much I have a link to it on my Chrome toolbar! Also using the command line, it's easier to trouble-shoot when things go wrong.

I only tried removing the "links" to the images in the html, not the images themselves. I thought if I remove the links, it might fall through to the chapter-headings.
IMPORTANT:
From the Calibre manual:
--level1-toc
XPath expression that specifies all tags that should be added to the Table of Contents at level one. If this is specified, it takes precedence over other forms of auto-detection.

Does this mean a complete xpath expression as in the "Structure Detection" in the GUI Convert books, or just a partial one like "//h1"
As I said, I don't use this method myself, but I tried this as a no-bells-or-whistles commandline approximation to opt 1 and it seems to work:
Code:
ebook-convert "Gideon's Band - George W Cable.zip" gb2.epub --chapter "//h:h2" --level1-toc "//h:h2"
where "Gideon's Band - George W Cable.zip" is the resulting file in my calibre library after drag-drop of the source html file into calibre.

Last edited by jackie_w; 08-25-2010 at 07:30 PM. Reason: more info
jackie_w is online now   Reply With Quote
Old 08-26-2010, 04:52 AM   #9
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,838
Karma: 12535517
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by p3aul View Post
I use ebook-convert because Calibre lease so many child processes running when it exits, that it slows down my computer. I tried to just copy the epub to my external card on the 505, but it leaves the metadata behind, so I have to use the GUI to copy the epub to the 505.
Calibre doesn't leave any processes running if you exit the program.

Of course if you have the Enable system tray icon feature checked under Preferences - Interface, then you have to use ctrl-q to exit the program. Just clicking on the big red X just minimizes calibre to the system tray. See attached.

You can also go to Preferences - Advanced and lower the number of worker processes.
Attached Thumbnails
Click image for larger version

Name:	Preferences-Interface.jpg
Views:	76
Size:	168.3 KB
ID:	57170  

Last edited by DoctorOhh; 08-26-2010 at 04:55 AM.
DoctorOhh is offline   Reply With Quote
Old 08-26-2010, 05:30 PM   #10
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Jackie:

Quote:
ebook-convert "Gideon's Band - George W Cable.zip" gb2.epub --chapter "//h:h2" --level1-toc "//h:h2"
Now that one did the trick! Just what I was looking for.

Im curious though. In all the stuff I've read here, I thought the xpath thingy was just "//h2" not "//h:h2" Is there a reason for typing "//h:h2" and NOT "//h2"? Just curious..

Thanks,
Paul

Walt:
Well, it's impolite to argue, but I know for a fact it does, either way. I guess if Adobe can ignore memory leaks in every version of PS up to cs3 I guess I'll have to put up with the processes. It's the only game in town and besides I only use the GUI to transfer the books to my reader. I could use Sony for that I guess. When neither program is perfect, you have to use a bit of each I guess. It's no secret, Kovid knows the way I feel about the GUI. I'm just thankful for his command-line programs. If you can remember MS Dos 3.2, the command line isn't so bad.
p3aul is offline   Reply With Quote
Old 08-26-2010, 07:57 PM   #11
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,776
Karma: 3973173
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Quote:
Originally Posted by p3aul View Post
Jackie:
Now that one did the trick! Just what I was looking for.
Hurrah!

Quote:
Originally Posted by p3aul View Post
Im curious though. In all the stuff I've read here, I thought the xpath thingy was just "//h2" not "//h:h2" Is there a reason for typing "//h:h2" and NOT "//h2"? Just curious..
Er... I have no real understanding of XPath, I have to let the GUI Wizard (the Harry Potter magic wand button) generate my XPath for me. If you select a heading tag it always puts the //h: in front of it. If you select a div tag it becomes //h:div I guess one would have to set to with an XPath manual to understand it fully.
jackie_w is online now   Reply With Quote
Old 08-26-2010, 08:59 PM   #12
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,838
Karma: 12535517
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by p3aul View Post
Walt:
Well, it's impolite to argue, but I know for a fact it does, either way. I guess if Adobe can ignore memory leaks in every version of PS up to cs3 I guess I'll have to put up with the processes. It's the only game in town and besides I only use the GUI to transfer the books to my reader. I could use Sony for that I guess. When neither program is perfect, you have to use a bit of each I guess. It's no secret, Kovid knows the way I feel about the GUI. I'm just thankful for his command-line programs. If you can remember MS Dos 3.2, the command line isn't so bad.
Ok, I guess I can state it leaves no processes running on my Win XP machine. What OS do you run?
DoctorOhh is offline   Reply With Quote
Old 08-26-2010, 10:56 PM   #13
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
You know, I guess the "h:" is just a signal to the pre-processor that an h tag is coming. I should play with the GUI more I guess, I had forgotten he had included the Wizard! It really is a wonderful program. The problem with so many of the PG books is they just use the original's way of marking chapters. If the first word in a chapter heading is chapter, it's easy. I just load the text file in Word and do a search and replace, replacing all Chapter words with ##Chapter, and then convert.

Anyway, Thanks for helping me,
Paul
p3aul is offline   Reply With Quote
Old 08-27-2010, 05:44 AM   #14
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 235
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Quote:
Im curious though. In all the stuff I've read here, I thought the xpath thingy was just "//h2" not "//h:h2" Is there a reason for typing "//h:h2" and NOT "//h2"? Just curious..
From the Calibre Xpath Tutorial:
Quote:
The h: prefix in the above examples is needed to match XHTML tags. This is because internally, calibre represents all content as XHTML. In XHTML tags have a namespace, and h: is the namespace prefix for HTML tags.
Well I guess that cleared that up!
p3aul is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging multiple HTML files into one HTML file skoobwoman Workshop 45 07-11-2014 10:46 AM
pulling an entire website into Calibre and generate an epub file using news function? N13L5 Calibre 2 10-06-2010 09:00 PM
How can i convert HTML or txt file to EPUB file ? guguqiaqia ePub 7 05-28-2010 09:15 PM
How not to auto-generate TOC in Calibre -setting? Jundle Calibre 0 05-05-2010 02:53 AM
HTML Book + non HTML TOC to epub aarcane Calibre 4 03-02-2010 02:58 AM


All times are GMT -4. The time now is 01:50 PM.


MobileRead.com is a privately owned, operated and funded community.