|  01-05-2011, 06:47 AM | #1 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
				
				structure detection - documentation ?
			 
			
			I am told in other threads that structure detection option is ignored for epub source. Is that true for any other source formats or epub only ? what's the recommended way to force structure detection on epub - is it convert to zip then back again ? is there / should there be any detailed documentation for the above + of what (processing operations) structure detection actually does. it seems like a tick-it-&-see black box thingie at present ? | 
|   |   | 
|  01-05-2011, 07:43 AM | #2 | 
| Wizard            Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3 | 
			
			What structure detection are you referring to? I've used header/footer removal, which is part of the structure detection conversion settings, successfully on ePub, so I don't think it's ignored. Preprocessing might be, though. As for documentation, as usual, refer to the manual. | 
|   |   | 
|  01-05-2011, 08:44 AM | #3 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
			
			in was in a discussion about TOC & chapter detection.  I wanted to automate having chapters flagged with h1 or h2 tags so that they would add to TOC. someone ( not you) said that structure detection is not applied if source format = epub because source is assumed to be "already good". it could be that they meant only "preprocess to imporve structure detection" box so a rephrase of my Q is: for what source formats is preprocess... tick box ignored ? I re-read the manual but it does not define exactly what takes place when the box labelled preprocessing is ticked. the manual gives an overview of what it's intended to do, but I was wanting a programmer's definiton of what logic is apllied & how. Last edited by cybmole; 01-05-2011 at 08:47 AM. | 
|   |   | 
|  01-05-2011, 09:00 AM | #4 | 
| Wizard            Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3 | 
			
			Ah, yes, I remember that discussion. Try asking for (or waiting for) whoever programmed the preprocessing engine to explain. As for a workaround, I believe that adding the ePub as a ZIP file and reconverting ought to work. | 
|   |   | 
|  01-05-2011, 09:14 AM | #5 | |
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | Quote: 
 my input book was perfectly ordered, but my output from the test began at chapter 19 ! maybe copying the epub, renaming it as zip then adding it to the library is safe, but converting from epub to zip seems very flawed, unless I am misunderstanding what the conversion is meant to achieve. | |
|   |   | 
|  01-05-2011, 09:25 AM | #6 | 
| Wizard            Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3 | 
			
			I was thinking about exporting the ePub, renaming it as a ZIP, then reimporting it into the book and converting to ePub. You might want to store the ePub somewhere safe externally, as it is going to be overwritten. As for the chapter order, there's a FAQ entry relevant to file ordering in multi-HTML books. I should have a small script still floating around for creating the index file discussed in that FAQ entry that I hacked together once for the purpose of converting a HTML reference book, but it only cares about what the first file is and dumps the rest of the files in as it finds them. Still, should that be useful, I can post it here. | 
|   |   | 
|  01-05-2011, 09:37 AM | #7 | 
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | 
			
			seems to me that  calibre should not allow conversion options that  will lose or destroy chapter ordering  - or should at least warn against them  is there any e-book related need for calibre to offer an epub to zip conversion, or should should its epub to zip conversion option be disabled , along with any other combinations that do not preserve chapter ordering ? yes, I'd taken a backup of the epub but others may be caught out. | 
|   |   | 
|  01-05-2011, 09:54 AM | #8 | 
| Wizard            Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3 | 
			
			It is called YAFIYGI, as opposed to WYSIWYG. Deal with it.
		 | 
|   |   | 
|  01-05-2011, 10:21 AM | #9 | 
| Wizard            Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone | 
			
			Calibre won't insert previous/next navigation - that would have to be built into the original source. The only structure detection option that doesn't work on epub is the 'preprocess html to possibly improve structure detection'. This is due to the way the conversion pipeline works and that Calibre treats OEB as a sort of 'reference' format. All filetypes are converted from their native format to OEB internally, and then re-converted from OEB to whatever the desired output format is. Preprocessing occurs BEFORE conversion to OEB. Epub IS OEB, just in a zipped container. Therefore Calibre doesn't bother converting from epub to OEB, it just unzips it and goes from there. Because of this epub bypasses the preprocess stage of the conversion pipeline. If you want the preprocess option to work on an epub just rename the epub to .zip instead of .epub and add a the zip back to the same book record as zipped html. HTML goes through the full conversion process, so it's eligible for preprocessing. As far as preprocessing messing up your book formatting - I highly doubt it could have changed the order of the books contents - I don't see how this could even be possible. There's also no way it would insert next/previous. That said, preprocess does look for potential chapter breaks in a fairly aggressive manner, and this can infrequently cause the book to be split in undesired places. It is also checking for line unwrap options and may change paragraphs, along with a variety of other things. While it's usually quite harmless and will generally improve a poorly formatted book, it can't be guaranteed to work for every single book. If you read the help/documentation it clearly states that preprocessing could make your book worse. This is the reason the default is disabled. The other structure detection options that apply to epub and all other formats could also introduce new split points in the doc that aren't neccessarily desired. If it's doing something like that then you can tweak the 'insert page breaks before' option, or tweak the chapter detection xpath. | 
|   |   | 
|  01-05-2011, 11:19 AM | #10 | |
| US Navy, Retired            Posts: 9,897 Karma: 13806776 Join Date: Feb 2009 Location: North Carolina Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen | Quote: 
 Why? I haven't a clue. | |
|   |   | 
|  01-05-2011, 11:27 AM | #11 | |
| Wizard            Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none | Quote: 
 EDIT: the thread is here https://www.mobileread.com/forums/sho...d.php?t=114420 but reading it again, I see that I misunderstood the suggestion - which was actually to copy, rename & reimport the epub as zip. NOT to convert it to ZIP in calibre. END EDIT i've not done any more testing but I suspect the epub to zip CONVERSION generates extra .xthml pages with next - prev links. if I go epub to zip, then zip to epub, then open the end result in sigil I see lots of .xhtml sheets that wee not there before, and each sheet looks like a frame header with next and prev buttons on it . that would not be so bad if the order did not get mangled at the same time? i think that the messed up order has nothing to do with pre-processing options and is most likely happening as a side effect in the epub-zip stage of conversion. try it & see for yourselves, all books should behave similarly. Last edited by cybmole; 01-05-2011 at 11:36 AM. | |
|   |   | 
|  01-05-2011, 11:37 AM | #12 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			the zip output plugin creates html that is suitable for a website, do not do an epub to zip conversion, rename your epub to .zip and convert that if you want to use preprocessing options.
		 | 
|   |   | 
|  01-11-2011, 10:42 AM | #13 | 
| Avid reader  Posts: 19 Karma: 10 Join Date: Feb 2009 Location: Argentina Device: Kindle 3 wifi | 
				
				Double chapter detection
			 
			
			In converting a lit to mobi for my kindle3, the standard chapter detection XPath expression didn't detect anything (the source file is rather flat on that sense). As the source file chapters seem to be just a # symbol starting a paragraph (ie #And then the beast run thru the forest... ), I created the following XPath expression: Code: //*[re:test(., '(?s)^#\w+','i')] Any clue? Thanks in advance, Wolf. | 
|   |   | 
|  01-11-2011, 10:44 AM | #14 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			That means that the string is present in your input document twice.
		 | 
|   |   | 
|  01-11-2011, 12:33 PM | #15 | |
| Avid reader  Posts: 19 Karma: 10 Join Date: Feb 2009 Location: Argentina Device: Kindle 3 wifi | Quote: 
 /parsed Code: <p class="MsoPlainText"><span style="font-size:12.0pt;font-family:"Trebuchet MS";">#Textbody...</span></p> Code: <hr/><p class="MsoPlainText" id="calibre_toc_2"><hr/><span style="font-size:12.0pt;font-family:"Trebuchet MS";" id="calibre_toc_3">#Textbody...</span></p> Code: <p class="MsoPlainText1" id="calibre_toc_2"><hr class="calibre4"/><span id="calibre_toc_3" class="calibre3">#Textbody...</span></p> | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Structure Detection - Remove Header (or Footer) Regex | DarkKipper | Conversion | 69 | 11-09-2013 12:21 PM | 
| Trouble w structure detection | jeff47 | Calibre | 1 | 10-13-2010 12:51 AM | 
| epub - force a 2nd pass to improve structure detection ? | cybmole | Calibre | 10 | 10-08-2010 01:00 AM | 
| Structure Detection Ceased To Exist? | radiofred | Calibre | 3 | 10-01-2010 12:33 AM | 
| Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 10:11 PM |