![]() |
#16 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,393
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm going to need the file to comment further.
|
![]() |
![]() |
![]() |
#17 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
twice is not so bad, I had a case (in another thread) that was detecting everything x 4 !
xpath is a pig to configure. I spent ages earlier today telling structure detection to look for the default h1 h2 tags + part||section etc OR for stand-alone (chapter) numbers in bold & no matter how I tried I got "invalid" from the syntax checker. the wizard will not build "OR" structures so I was trying to adapt the deafult and change the OR ...cass = chapter...construct to [class = bold and text is of form \d* ] but no joy. could someone please tell me if that is do-able. I have books where chapters are numbered 1, 2 etc. but do not have h1 or h2 tags. if those chapters are within sections or parts then structure detection (assisted by preprocess) seems unable/unwilling to build a full TOC, it just does the easy stuff and builds a TOC of parts / sections Last edited by cybmole; 01-11-2011 at 01:12 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#18 | |
Avid reader
![]() Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
|
Quote:
Use the debug feature of the conversion process, and check the operation log window to tweak your regex expression. Good luck! Wolf |
|
![]() |
![]() |
![]() |
#19 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
I have studies the example in that thread, plau other regex syntax stuff. I need to see a model answer for this particular challenge please. I suspect the xpah thingie in calibre may be limited in how much and/ or complexity it can accept.
I would also, for the sake of clarity, appreciate nailing down , for the three cases below iwhether calibre applies a) the structure detection line b) the preprocess option ie complete the table with Y or N as needed? apply preprocess apply structure detect 1. epub to epub N ? ??? 2. epub to mobi 3. mobi to epub y ? |
![]() |
![]() |
![]() |
#20 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,393
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
your xpath expression is matching both the p and the span tags. Use
Code:
//h:p instead of //* Last edited by kovidgoyal; 01-11-2011 at 01:54 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
update:
here's where I am stuck on syntax. I want to pick out lines like this which are chapter starts, as well as still picking out sections and parts. Code:
<p class="calibre8"><span class="calibre3 bold">7</span></p> so i try to amend the ending to test for both class = bold AND values are digits e.g. Code:
//*[((name()='h1' or name()='h2') and re:test(., 'chapter|part\s+', 'i')) or (@class = 'bold' and re:test (\d*))] Last edited by cybmole; 01-11-2011 at 01:55 PM. |
![]() |
![]() |
![]() |
#22 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
Preprocess will go through your document and look for common chapter headings using a heuristic type method - it should be able to mark up simple numeric headings like the ones you're listing in your code. If your doc already uses H1, h2, h3 tags, etc then the heuristic processor disables itself - you just need to look at your code and write the correct xpath. If the preprocess stage finds a chapter header during its search it wraps the headings in <h2> tags. It wraps subtitles if they exist in <h3> tags. If you have preprocess enabled and it's successfully detecting/marking up your chapters then you need to have the xpath look for <h2> tags - I often use the default XPATH and just change the regex to .*. The xpath is processed later in the conversion processing of the document - I believe at the beginning of the output stage. Also if you're having trouble getting the xpath right, but preprocess was successful, then Sigil will automatically create a TOC you can edit, as Sigil exclusively builds the TOC based off of headers like h1, h2, h3, h4 tags.... Last edited by ldolse; 01-11-2011 at 02:14 PM. |
|
![]() |
![]() |
![]() |
#23 |
Avid reader
![]() Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
|
|
![]() |
![]() |
![]() |
#24 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,393
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
looking at the code you posted the first character inside the spa tag is indeed a #
|
![]() |
![]() |
![]() |
#25 |
Avid reader
![]() Posts: 19
Karma: 10
Join Date: Feb 2009
Location: Argentina
Device: Kindle 3 wifi
|
|
![]() |
![]() |
![]() |
#26 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
reading your explanation again, maybe the logic engine sees SOME h2 tags -say on the section headers, & disables itself before the chapter numbers are processed ? PS thanks for explaining how the preprocess & xpath steps interact. Last edited by cybmole; 01-11-2011 at 04:58 PM. |
|
![]() |
![]() |
![]() |
#27 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
If you want open a bug with your book that's not working - I can see if I can improve the function, but I can't guarantee anything - there is an extremely wide range of html out there, some cases can't be easily handled in a general function. Last edited by ldolse; 01-11-2011 at 11:21 PM. |
|
![]() |
![]() |
![]() |
#28 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Structure Detection - Remove Header (or Footer) Regex | DarkKipper | Conversion | 69 | 11-09-2013 12:21 PM |
Trouble w structure detection | jeff47 | Calibre | 1 | 10-13-2010 12:51 AM |
epub - force a 2nd pass to improve structure detection ? | cybmole | Calibre | 10 | 10-08-2010 01:00 AM |
Structure Detection Ceased To Exist? | radiofred | Calibre | 3 | 10-01-2010 12:33 AM |
Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 10:11 PM |