Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-05-2011, 06:47 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
structure detection - documentation ?

I am told in other threads that structure detection option is ignored for epub source. Is that true for any other source formats or epub only ?

what's the recommended way to force structure detection on epub - is it convert to zip then back again ?

is there / should there be any detailed documentation for the above + of what (processing operations) structure detection actually does. it seems like a tick-it-&-see black box thingie at present ?
cybmole is offline   Reply With Quote
Old 01-05-2011, 07:43 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
What structure detection are you referring to? I've used header/footer removal, which is part of the structure detection conversion settings, successfully on ePub, so I don't think it's ignored. Preprocessing might be, though.
As for documentation, as usual, refer to the manual.
Manichean is offline   Reply With Quote
Old 01-05-2011, 08:44 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
in was in a discussion about TOC & chapter detection. I wanted to automate having chapters flagged with h1 or h2 tags so that they would add to TOC.
someone ( not you) said that structure detection is not applied if source format = epub because source is assumed to be "already good". it could be that they meant only "preprocess to imporve structure detection" box
so a rephrase of my Q is: for what source formats is preprocess... tick box ignored ?

I re-read the manual but it does not define exactly what takes place when the box labelled preprocessing is ticked. the manual gives an overview of what it's intended to do, but I was wanting a programmer's definiton of what logic is apllied & how.

Last edited by cybmole; 01-05-2011 at 08:47 AM.
cybmole is offline   Reply With Quote
Old 01-05-2011, 09:00 AM   #4
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Ah, yes, I remember that discussion. Try asking for (or waiting for) whoever programmed the preprocessing engine to explain.
As for a workaround, I believe that adding the ePub as a ZIP file and reconverting ought to work.
Manichean is offline   Reply With Quote
Old 01-05-2011, 09:14 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Manichean View Post
Ah, yes, I remember that discussion. Try asking for (or waiting for) whoever programmed the preprocessing engine to explain.
As for a workaround, I believe that adding the ePub as a ZIP file and reconverting ought to work.
based on a 1 book test -converting from epub to zip, followed by converting from zip to epub mangles the chapter running order and also inserts some wierd prev / next html pages when viewed in sigil.

my input book was perfectly ordered, but my output from the test began at chapter 19 !

maybe copying the epub, renaming it as zip then adding it to the library is safe, but converting from epub to zip seems very flawed, unless I am misunderstanding what the conversion is meant to achieve.
cybmole is offline   Reply With Quote
Old 01-05-2011, 09:25 AM   #6
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I was thinking about exporting the ePub, renaming it as a ZIP, then reimporting it into the book and converting to ePub. You might want to store the ePub somewhere safe externally, as it is going to be overwritten.

As for the chapter order, there's a FAQ entry relevant to file ordering in multi-HTML books. I should have a small script still floating around for creating the index file discussed in that FAQ entry that I hacked together once for the purpose of converting a HTML reference book, but it only cares about what the first file is and dumps the rest of the files in as it finds them. Still, should that be useful, I can post it here.
Manichean is offline   Reply With Quote
Old 01-05-2011, 09:37 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
seems to me that calibre should not allow conversion options that will lose or destroy chapter ordering - or should at least warn against them

is there any e-book related need for calibre to offer an epub to zip conversion, or should should its epub to zip conversion option be disabled , along with any other combinations that do not preserve chapter ordering ?

yes, I'd taken a backup of the epub but others may be caught out.
cybmole is offline   Reply With Quote
Old 01-05-2011, 09:54 AM   #8
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
It is called YAFIYGI, as opposed to WYSIWYG. Deal with it.
Manichean is offline   Reply With Quote
Old 01-05-2011, 10:21 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Calibre won't insert previous/next navigation - that would have to be built into the original source.

The only structure detection option that doesn't work on epub is the 'preprocess html to possibly improve structure detection'.

This is due to the way the conversion pipeline works and that Calibre treats OEB as a sort of 'reference' format. All filetypes are converted from their native format to OEB internally, and then re-converted from OEB to whatever the desired output format is. Preprocessing occurs BEFORE conversion to OEB.

Epub IS OEB, just in a zipped container. Therefore Calibre doesn't bother converting from epub to OEB, it just unzips it and goes from there. Because of this epub bypasses the preprocess stage of the conversion pipeline.

If you want the preprocess option to work on an epub just rename the epub to .zip instead of .epub and add a the zip back to the same book record as zipped html. HTML goes through the full conversion process, so it's eligible for preprocessing.

As far as preprocessing messing up your book formatting - I highly doubt it could have changed the order of the books contents - I don't see how this could even be possible. There's also no way it would insert next/previous.

That said, preprocess does look for potential chapter breaks in a fairly aggressive manner, and this can infrequently cause the book to be split in undesired places. It is also checking for line unwrap options and may change paragraphs, along with a variety of other things. While it's usually quite harmless and will generally improve a poorly formatted book, it can't be guaranteed to work for every single book. If you read the help/documentation it clearly states that preprocessing could make your book worse. This is the reason the default is disabled.

The other structure detection options that apply to epub and all other formats could also introduce new split points in the doc that aren't neccessarily desired. If it's doing something like that then you can tweak the 'insert page breaks before' option, or tweak the chapter detection xpath.
ldolse is offline   Reply With Quote
Old 01-05-2011, 11:19 AM   #10
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by ldolse View Post
As far as preprocessing messing up your book formatting - I highly doubt it could have changed the order of the books contents - I don't see how this could even be possible. There's also no way it would insert next/previous.
What I believe he said he did to mess up the order of his book was to do a ePub to zip conversion with unknown settings checked. Then he took the resultant zip file and converted it to epub with unknown settings checked.

Why? I haven't a clue.
DoctorOhh is offline   Reply With Quote
Old 01-05-2011, 11:27 AM   #11
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by dwanthny View Post
What I believe he said he did to mess up the order of his book was to do a ePub to zip conversion with unknown settings checked. Then he took the resultant zip file and converted it to epub with unknown settings checked.

Why? I haven't a clue.
why - because someone here suggested that double conversion, as a workaround for the preprocess.. tick box being ignored with epub sources.

EDIT: the thread is here
https://www.mobileread.com/forums/sho...d.php?t=114420
but reading it again, I see that I misunderstood the suggestion - which was actually to copy, rename & reimport the epub as zip. NOT to convert it to ZIP in calibre.
END EDIT


i've not done any more testing but I suspect the epub to zip CONVERSION generates extra .xthml pages with next - prev links.

if I go epub to zip, then zip to epub, then open the end result in sigil I see lots of .xhtml sheets that wee not there before, and each sheet looks like a frame header with next and prev buttons on it . that would not be so bad if the order did not get mangled at the same time?

i think that the messed up order has nothing to do with pre-processing options and is most likely happening as a side effect in the epub-zip stage of conversion.

try it & see for yourselves, all books should behave similarly.

Last edited by cybmole; 01-05-2011 at 11:36 AM.
cybmole is offline   Reply With Quote
Old 01-05-2011, 11:37 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the zip output plugin creates html that is suitable for a website, do not do an epub to zip conversion, rename your epub to .zip and convert that if you want to use preprocessing options.
kovidgoyal is offline   Reply With Quote
Old 01-11-2011, 10:42 AM   #13
Wolfgan
Avid reader
Wolfgan began at the beginning.
 
Wolfgan's Avatar
 
Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
Double chapter detection

In converting a lit to mobi for my kindle3, the standard chapter detection XPath expression didn't detect anything (the source file is rather flat on that sense).

As the source file chapters seem to be just a # symbol starting a paragraph (ie #And then the beast run thru the forest... ), I created the following XPath expression:

Code:
//*[re:test(., '(?s)^#\w+','i')]
It works well on the detection side, what's puzzling me is that it detects the chapters twice, one after the other (as if the parser run twice).
Any clue? Thanks in advance, Wolf.
Wolfgan is offline   Reply With Quote
Old 01-11-2011, 10:44 AM   #14
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That means that the string is present in your input document twice.
kovidgoyal is offline   Reply With Quote
Old 01-11-2011, 12:33 PM   #15
Wolfgan
Avid reader
Wolfgan began at the beginning.
 
Wolfgan's Avatar
 
Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
Quote:
Originally Posted by kovidgoyal View Post
That means that the string is present in your input document twice.
Thanks kovid, but I checked that no text duplications exist in the html debug files to trigger the expression twice. What's weird is that /processed & /structure files show double toc entries, the first one empty.

/parsed
Code:
<p class="MsoPlainText"><span style="font-size:12.0pt;font-family:&quot;Trebuchet MS&quot;;">#Textbody...</span></p>
/structure
Code:
<hr/><p class="MsoPlainText" id="calibre_toc_2"><hr/><span style="font-size:12.0pt;font-family:&quot;Trebuchet MS&quot;;" id="calibre_toc_3">#Textbody...</span></p>
/processed
Code:
<p class="MsoPlainText1" id="calibre_toc_2"><hr class="calibre4"/><span id="calibre_toc_3" class="calibre3">#Textbody...</span></p>
Still without clue, I hope this helps. Thanks, Wolf.
Wolfgan is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Structure Detection - Remove Header (or Footer) Regex DarkKipper Conversion 69 11-09-2013 12:21 PM
Trouble w structure detection jeff47 Calibre 1 10-13-2010 12:51 AM
epub - force a 2nd pass to improve structure detection ? cybmole Calibre 10 10-08-2010 01:00 AM
Structure Detection Ceased To Exist? radiofred Calibre 3 10-01-2010 12:33 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 12:55 AM.


MobileRead.com is a privately owned, operated and funded community.