![]() |
#1 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
![]()
I'm converting the 2nd of 6 original books/volumes from Word to MOBI and EPUB files which we will post on our website for free downloading. This book has 5 Parts, each of which contains a portion of 23 Chapters. Every chapter contains 8 titled Sections. The source files were created in Word 2008 for Mac on a MacBook Pro OS 10.5.8. Only parts are designated style "Heading 1", level 1 in Word; only chapters are H2, L2; only sections are H3, L3.
Within the text of the book, each chapter starts with the book title, part, chapter, and section centered on the page. For example: Sacred Memories Part 1 ~ Earth Time Chapter 1: The Awakening A Gift for Someone Right now, only the parts and chapters are showing up in the table of contents. Within the book, "Sacred Memories" and the part number/name each appear alone on a page with no other text. "Sacred Memories" is not designated as a heading, just part of the body of the text. I would like each chapter to appear in the Calibre (0.7.24) generated table of contents like this: Part X ~ Name of Part How do I get Calibre to detect the sections and place them in the T of C? Each has just an individual name and isn't labeled with the word "section" (or any other designation).Chapter X: Name of Chapter Name of section 1 Name of section 2 ... Name of section 8 Within the text of the book, how do I get Calibre to not place a page break after "Sacred Memories" and the part number/name? From studying the manual and searching this forum, I have an idea how to do this, but the actual procedure still eludes me. Any help or suggestions will be deeply appreciated. Thanks! |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Hi,
Let me say up front that I may not be the ideal person to help you as I am a Windows user with an old version of Word (ver 10). However I can try. Before I get into details perhaps you can look at the attached EPUB. If this is something like what you're aiming for then I can give you the sample Word doc and the Calibre conversion settings used. |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Quote:
I think I'll be able to read your Word file: have opened other older Windows versions successfully in the past. I looked at the EPUB file you attached in the SONY Reader Library software and the table of contents looks just as I'd like: with all the parts, chapters, & sections. Bravo! Within the text: The title phrase "Sacred Memories" at the beginning of each chapter no longer has a page break after it (although it appears once at the beginning). Thanks so much for your help!
The title, part, chapter, and section names appear together on the same page at the beginning of each chapter, just as they should. However, the part number/names appear a second time at the beginning of each part, each on an individual page with no other text, as do the chapter number/names at the beginning of each chapter. Can this be prevented? |
|
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Here is the source sample Word doc. The brightly coloured Heading styles I've used are only to try to make it clearer which Headings are in use.
As Calibre will not convert doc files, I prefer to save the .doc file as an HTML file before importing to Calibre, but in this case I saved as RTF as I wasn't sure whether you were comfortable with HTML. In either case the conversion settings for the earlier EPUB were: Code:
Structure Detection - Detect Chapters at - //*[name()='h1' or name()='h2'] Structure Detection - Chapter mark - pagebreak Structure Detection - Insert pagebreaks before - //h:h3 Table of Contents - Level 1 TOC - //h:h1 Table of Contents - Level 2 TOC - //h:h2 Table of Contents - Level 3 TOC - //h:h3 Look & Feel - Extra CSS - h3 {display:none} Code:
Structure Detection - Chapter mark - none Look & Feel - Extra CSS - h1, h2, h3 {display:none} I've attached an updated EPUB which used the red conversion settings. Anyway I'm sure you'll have fun playing around until you settle on your preferred solution. Good luck ![]() |
![]() |
![]() |
![]() |
#5 | ||
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Quote:
![]() Quote:
Your conversion settings have shed some light on my understanding of XPath and I'm planning to revisit the Calibre manual in light of this light and the adaptations to the source file. I will post my results (or possibly more questions!) here. Thanks for your help, Jackie. I appreciate the time and effort you took creating the test file. Color coding the conversion settings was very helpful. |
||
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Quote:
Frustration & confusion have eclipsed the fun. I've tried many ways to get this to work, but it's not happening. To precisely show what I'm going after, I'm attaching a PDF file intended for reading and printing from a computer. The internal table of contents (which is deleted in the HTML file I'm using for the ebook) looks like what the Calibre generated T of C should look like. Within the text, the headings at the beginning of each chapter are all on one page, no page breaks in between as is happening in the e-book files. The top heading ("Sacred Memories") that begins each chapter wasn't designated as a heading in Word, simply assigned a style with a body text outline level. There are illustrations at the beginning of Chapters 5, 10, 11, 14, 16, 17, 20 and 21 that should be at the top of the page followed by the headings, but in the conversion are isolated on a page. I'm also attaching the ZIP file created from the Word source file. Thanks for any help or light that can be shed on this! |
|
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Hi webfolk, Here's an interim reply...
The reason mine worked and yours doesn't is that you will notice that I had far fewer <h1>, <h2> and <h3> tags in my sample doc than you do. I only placed one <h1>...</h1> para at the start of each Part and one <h2>...</h2> para at the start of each Chapter and one <h3>...</h3> para at the start of each Section. These "headings" are never displayed in the EPUB body text because the ExtraCSS I detailed (h1, h2, h3 {display:none}) makes them "invisible". However the TOC will see them and uses the labels correctly. I actually styled the 4 lines of each "visible" Section heading as 'Heading 4' (i.e. <h4>) and these are not used in the TOC and not referenced in the Calibre conversion. I can see that you may want each of the 4 lines to be styled slightly differently so I'll have a think about how you can achieve what you want with as few changes to your HTML as possible. P.S. Another tip, when you save your Word doc as HTML try using the option SaveAs WebPage-filtered. You will get less 'MS excess baggage' in the resulting HTML file. |
![]() |
![]() |
![]() |
#8 | |||
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Hi Jackie,
Quote:
![]() Quote:
Quote:
I'm going to try the above as soon as I can, juggling my time with a web forum that also relates to our work. Thank you, Jackie, for continuing with this! |
|||
![]() |
![]() |
![]() |
#9 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Well I think I finally got there. I've attached updated HTML and the resulting EPUB.
Anyway, see what you think of the new EPUB ![]() Last edited by jackie_w; 10-30-2010 at 06:34 PM. |
![]() |
![]() |
![]() |
#10 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Thanks Jackie!
I've been zipping around today doing a lot of multitasking ... and there's more still. I'm going to work with this a little later, so perhaps I'll have something more substantial to add by the end of my evening/the beginning of your new day. With gratitude, webfolk |
![]() |
![]() |
![]() |
#11 |
tenjooberrymuds
![]() Posts: 58
Karma: 12
Join Date: Sep 2010
Device: Android
|
You may want to use something other than Word to generate your intermediate html, because Word is known to insert a LOT of extra tags and redundant full font instructions, causing considerable bloat to your final epub files.
For example: instead of just using <p> for each paragraph (linebreak in word) it uses <p fontface=blabla fontsize=blablabla fontcolor=blablabla> even though that's completely uneccessary, unless you used a different font specification in between. That and other things add up sufficiently to potentially double your epub file sizes. Opening your doc file in Open Office and saving as html makes cleaner html. Also, Open Office has a plugin available to save files directly to epub. There's other ways to convert an office doc file to html, but I don't know what's available on the Mac |
![]() |
![]() |
![]() |
#12 | |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Quote:
Thanks for the suggestion! |
|
![]() |
![]() |
![]() |
#13 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Jackie,
It looks just like the PDF. The illustrations are right there on the same page as the headings they accompany in their respective chapters and all the headings are also together on the same page. Excellent! Thank you again, Jackie. One thing still happening, though, is "Sacred Memories" is appearing on a separate page before each chapter and towards the bottom of the page. I've looked at the file in Sigil and don't see any indication of a page break either in CSS or the HTML, but then I'm not very experienced in this area either. This is not happening in those chapters with illustrations, which look exactly like they should. One way to correct this is simply to delete "Sacred Memories" from the beginning of each chapter, a sacrifice of aesthetics I'm willing to make at this point. For those chapters with illustrations, clicking on the level 2 chapter name in the table of contents takes you right to that page. Where there is no illustration, clicking on the chapter takes you to the prior page with the solitary "Sacred Memories". I noticed there's a lot less code. Did you clean it up? One of the things Word does when you "save as HTML" is include a cornucopia of font CSS and I don't see that anymore. Thank you for all your efforts, Jackie. I deeply appreciate it! |
![]() |
![]() |
![]() |
#14 | |||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
Code:
<p class="BeginChSpace"> </p> Quote:
Quote:
Code:
@font-face {font-family:"Courier New"; panose-1:2 7 3 9 2 2 5 2 4 4;} ![]() Compared to many Word-generated HTML files I've seen I thought yours was relatively clean. I suspect that save-as Webpage-filtered, or whatever the Mac equivalent is, would also have got rid of a lot of the mso-bidi-... type stuff. Word will generate very clean HTML if you apply named styles. It looked to me as if this was mainly what you'd done. However, its CSS is a lost cause in my experience. For my own books I remove all the Word-generated CSS and add a link to a standard ebook CSS file. You're welcome. |
|||
![]() |
![]() |
![]() |
#15 | |||
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Oct 2010
Location: Arizona, USA
Device: None
|
Jackie,
Quote:
I did this in Sigil and "Saved as" with an alternate file name (preserving the original file unchanged as backup). The new file, however, has a blank page inserted before the book cover. I've noticed this before when I saved from Sigil. How do I get rid of this and, for that matter, how do I prevent it from happening? Quote:
Quote:
The EPUB file required a bit more tweaking, some aesthetic formatting issues: for example, I reduced the indentation of each paragraph (and also 2 places where quoted phrases were indented) by altering the CSS style sheet in Sigil. I had also noticed the font was smaller and sans serif after the opening verse of some of the chapters. The style I had assigned to these in Word didn't carry over. I'm speculating this is because the style name I had used started with the numeral "1". I resolved this by creating a new CSS style and renaming the respective text block in HTML. The text looks a lot more balanced now in the Reader on my computer. I've attached the revised version for you to take a look. Please let me know if you agree or if there is some issue I'm not aware of with this. I think I've resolved the table of contents by going into the T of C editor in Sigil ... which I just found. ![]() The T of C looks a bit strange when you open it, in that some chapters don't appear until you reveal them by clicking on the arrow preceding the part they're in, but--all things considered--I don't think this is a problem and preferable, in my opinion, to what was happening before. I had framed this last bit as a question, but before hitting the "Submit Reply" button I got the inspiration to check out Sigil one more time. Unless you see or think of something else, the only thing left to do is remove the blank page before the cover. Again, thanks for all your help, Jackie! |
|||
![]() |
![]() |
![]() |
Tags |
calibre conversion, chapter detection, page breaks, table of contents, word conversion |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Table of Contents | peterinnes | Sigil | 1 | 09-29-2010 03:03 AM |
Help with my Table of Contents | Skylinefranc | Calibre | 0 | 03-19-2010 12:55 AM |
How to: table of contents | wizzofoz | Sigil | 1 | 10-08-2009 08:22 AM |
only the table of contents | wang960 | Sony Reader | 3 | 08-29-2008 12:45 PM |
Creator Table of Contents | Nate the great | Kindle Formats | 5 | 07-10-2008 05:55 AM |