MobileRead Forums - View Single Post - structure detection

ldolse · 01-11-2011, 02:03 PM

Quote:

Originally Posted by cybmole

I have studies the example in that thread, plau other regex syntax stuff. I need to see a model answer for this particular challenge please. I suspect the xpah thingie in calibre may be limited in how much and/ or complexity it can accept.

I would also, for the sake of clarity, appreciate nailing down , for the three cases below iwhether calibre applies a) the structure detection line b) the preprocess option

ie complete the table with Y or N as needed?

apply preprocess apply structure detect
1. epub to epub N ? ???

2. epub to mobi

3. mobi to epub y ?

Preprocess happens before the xpath chapter detection. It never happens on an epub to anything conversion, as the conversion process bypasses that entire stage of the conversion pipeline for epub. All other formats can be preprocessed - IIRC mobi to epub at the moment doesn't go through the full preprocessing logic, though that could be changed if you are seeing a lot of badly formatted mobi files.

Preprocess will go through your document and look for common chapter headings using a heuristic type method - it should be able to mark up simple numeric headings like the ones you're listing in your code. If your doc already uses H1, h2, h3 tags, etc then the heuristic processor disables itself - you just need to look at your code and write the correct xpath.

If the preprocess stage finds a chapter header during its search it wraps the headings in <h2> tags. It wraps subtitles if they exist in <h3> tags.

If you have preprocess enabled and it's successfully detecting/marking up your chapters then you need to have the xpath look for <h2> tags - I often use the default XPATH and just change the regex to .*.

The xpath is processed later in the conversion processing of the document - I believe at the beginning of the output stage.

Also if you're having trouble getting the xpath right, but preprocess was successful, then Sigil will automatically create a TOC you can edit, as Sigil exclusively builds the TOC based off of headers like h1, h2, h3, h4 tags....