Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-14-2011, 10:28 AM   #1
RachDvn
Member
RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.
 
RachDvn's Avatar
 
Posts: 24
Karma: 322
Join Date: Jan 2011
Device: Kindle
[Old Thread] html/zip to mobi not detecting chapter breaks

I have documents that I created in Word, saved as a web page, and am converting to mobi with Calibre. For the life of me, I can't get the chapters to be detected for a page break!

I posted a while back with a simialr problem for azw to mobi and someone was kind enough to help me edit the xpath for detecting chapters. That code is working beautifully for my azw files, but seems no luck for these html files. I've tried selecting the Heuristic option "Detect & markup unformatted chapter headings..." but hasn't made a difference. Have run a debug and I can't figure it out.

If anyone has any suggestions, I would be very appreciative!

My current Xpath is:
Code:
//*[((name()='span' or name()='h2') and re:test(., 'chapter|ch|book|section|part|pt|prologue|epilogue\s+', 'i') and (@class = 'bold')) ]
An example section surrounding an undefined Chapter:
Code:
"Alright, what I'd
like to do now is shoot some backlit winter shots, something that might be good
for a January or March scene. I'd like to have you on the skis in a full tuck
position, as if you were rounding a corner on a downhill slope. If you're
comfortable, I'd like to have you strip completely, or if you prefer, I have a
thong you can wear."</span></p>
<p class="MsoNormal" align="center" style="mso-margin-top-alt:auto;mso-margin-bottom-alt: auto;text-align:center;line-height:normal"><span style="font-size:14.0pt; font-family:&quot;Times New Roman&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;; color:black">
</span></p>
<p class="MsoNormal" align="center" style="mso-margin-top-alt:auto;mso-margin-bottom-alt: auto;text-align:center;line-height:normal"><span style="font-size:14.0pt; font-family:&quot;Times New Roman&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;; color:black">Chapter 2</span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto; line-height:normal"><span style="font-size:14.0pt;font-family:&quot;Times New Roman&quot;; mso-fareast-font-family:&quot;Times New Roman&quot;;color:black">The incredulous look
must have been plain on my face. As she realized how her offer sounded, her
face turned red and she quickly clarified, "Not one of <i>my </i>thongs.
Not that I'm trying to say that I even <i>have</i> thongs," her cheeks
were starting to remind me of the <span class="SpellE"><span class="GramE">claymation</span></span>
Rudolph when his nose cover popped off. "I just mean that I have a brand
new men's thong you can wear and I can Photoshop out the lines on your
hips."
Thanks so much!
~Rach
RachDvn is offline   Reply With Quote
Old 02-14-2011, 11:03 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Where exactly should the chapter start in your example?
Manichean is offline   Reply With Quote
Old 02-14-2011, 11:09 AM   #3
RachDvn
Member
RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.
 
RachDvn's Avatar
 
Posts: 24
Karma: 322
Join Date: Jan 2011
Device: Kindle
Where it says "Chapter 2" in the middle of the html garb
RachDvn is offline   Reply With Quote
Old 02-14-2011, 11:39 AM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I do have a suggestion.

Take the HTML you got from Word and load it into a text editor such as Notepad++ and clean up the mess Word left in and make it nice clean HTML code and then you can take your chapter headings and make them look like <h2>Chapter 2</h2> and you'll get a good ToC.

The problem s that when you save as a webpage from Word, you get one hell of a mess from Word. Take a look and you will see what I mean. It's not good code at all. It's a real mess. I've cleaned up my share of Word's mess and it can take a good while to do so.
JSWolf is offline   Reply With Quote
Old 02-14-2011, 11:39 AM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by RachDvn View Post
Where it says "Chapter 2" in the middle of the html garb
But how much of that garb is HTML and how much of that garb is Word?
JSWolf is offline   Reply With Quote
Old 02-14-2011, 11:43 AM   #6
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Try using filtered HTML when saving from Word. Also try enabling the heuristic options.
Manichean is offline   Reply With Quote
Old 02-14-2011, 11:57 AM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by JSWolf View Post
But how much of that garb is HTML and how much of that garb is Word?
It is all HTML

It just has been trashed (with un-necessary tags) up by Word
If you set a nice <body class=...> the mso-normal could dissapear.

Experiment! (Sigil is a good tool for this)
rename a class in the CSS (mso-normal->mXo-normal), leaving the usage in place.
See what happens to your masterpiece
theducks is offline   Reply With Quote
Old 02-14-2011, 12:06 PM   #8
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
You also probably want to change the tags for Chapter headings from paragraph tags to an h2 or something, so that it is easy to recognize to generate a TOC, etc. You may want to manually add the special Mobipocket-specific page break tag: <mbpagebreak/>
susan_cassidy is offline   Reply With Quote
Old 02-14-2011, 01:45 PM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Heuristics should work with that chapter - just enable Heuristics under the conversion options.
ldolse is offline   Reply With Quote
Old 02-15-2011, 04:09 PM   #10
RachDvn
Member
RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.
 
RachDvn's Avatar
 
Posts: 24
Karma: 322
Join Date: Jan 2011
Device: Kindle
Thanks everyone! Happy to report that chapter detection is working. Manichean, thanks for the suggestions to save as filtered web page. That seems to have done the trick!

Since I have a LARGE quantity of files to convert with Word origins, it would be too time consuming to hand edit the tags for each chapter of every file. But you're right, Word makes a mess of it.

Does anyone have a different recommendation for converting text to mobi? My project is that I'm organizing stories for my creative writing group, copy/pasting from our internet pages and then creating mobi's. Currently i paste to Word, save as web page, run the zip through Calibre.

TheDucks - sorry, but you lost me. I'm not that savvy with the lingo.

Thanks again everyone!
~Rach

Last edited by RachDvn; 02-15-2011 at 04:29 PM.
RachDvn is offline   Reply With Quote
Old 02-15-2011, 05:02 PM   #11
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by RachDvn View Post
Does anyone have a different recommendation for converting text to mobi? My project is that I'm organizing stories for my creative writing group, copy/pasting from our internet pages and then creating mobi's. Currently i paste to Word, save as web page, run the zip through Calibre.
Try using Sigil instead of Word. It creates ePubs, which should convert pretty easily to Mobi.
Manichean is offline   Reply With Quote
Old 02-15-2011, 05:05 PM   #12
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Depending on what the web pages look like you could just save the html directly from the website and load it to Calibre. If you need only a portion of the web pages you could look at Calibre's recipe framework, as it can grab web pages, extract the relevant portion, and convert that to a ebook (albeit one that uses 'news' features on some readers).

The problem you'll find with text is that you'll lose italics and other formatting with a straight copy/paste to a text editor. If the originals don't have any formatting that might be an ok option though.

If the recipe framework is too complicated for you, another thing you could look at is firebug plugin for Firefox. It's still a 'bit' complicated, but it provides you a gui where you can get to just the relevant html that contains the story and copy just that into a text editor. If that's of interest I can explain in a bit more detail.
ldolse is offline   Reply With Quote
Old 02-15-2011, 05:06 PM   #13
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Manichean View Post
Try using Sigil instead of Word. It creates ePubs, which should convert pretty easily to Mobi.
Forgot about that - I believe this should preserve italics/etc, so this might be easiest.
ldolse is offline   Reply With Quote
Old 02-15-2011, 05:49 PM   #14
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by ldolse View Post
Forgot about that - I believe this should preserve italics/etc, so this might be easiest.
I think you'll find that you'll still loss the formatting, though I do agree that Sigil will prove preferable to Word.

My current pet method for moving Web pages to MOBI format is to save the source while in my browser and edit that before converting with Calibre.

My workflow involves using Opera browser with Notepad++ set as my app for viewing the Source. I simply rightclick on a page and select Source from the menu. The source HTML appears in Notepad++ and I then:
  1. save it as HTML; Notepad++ will then color code the tags.
  2. make the basic edits and resave
  3. then import into Calibre.
  4. (sometimes) convert to ePub and do further edits in Sigil
  5. convert to MOBI

However you approach saving the original HTML source, doing so will preserve the formatting (bold, italic, ...).
dwig is offline   Reply With Quote
Old 02-16-2011, 08:43 AM   #15
RachDvn
Member
RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.RachDvn has a complete set of Star Wars action figures.
 
RachDvn's Avatar
 
Posts: 24
Karma: 322
Join Date: Jan 2011
Device: Kindle
Thanks so much. Haven't used Sigil before, so I'll play with it as well as trying dwig's method.

~Rach
RachDvn is offline   Reply With Quote
Reply

Tags
chapter break, detect chapter

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
xpath to insert chapter breaks - but chapter name cut off ? Rob557 Conversion 2 03-06-2014 06:59 AM
mobi to rtf chapter breaks arslonga Conversion 0 04-05-2012 12:50 PM
HTML to MOBI conversion ignores page breaks LeftHanded Matt Conversion 2 12-21-2011 12:25 PM
[Old Thread] HTML to MOBI for Kindle eggheadbooks1 Conversion 37 04-30-2011 01:48 PM
Convert HTML to MOBI (HTML recognized as ZIP file) pdubois Conversion 1 01-25-2011 12:55 PM


All times are GMT -4. The time now is 07:29 PM.


MobileRead.com is a privately owned, operated and funded community.