|
|
Thread Tools | Search this Thread |
07-01-2013, 08:58 PM | #1 |
Zealot
Posts: 110
Karma: 972092
Join Date: Jan 2012
Device: iPhone
|
Are there efficient ways to make the table of contents on EPub of scanned files
Is there a way to at least partially automate the finding of different chapters of the book and automatically mark them up as heading1, heading 2, etc.?
|
07-01-2013, 10:32 PM | #2 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Preferences: Conversion. There are 3 areas: Input, Common: Output; There are a big number of choices (some exclusive, while others enable MORE) There is a section on detecting Unmarked up headings. BTW, I prefer to run a 'Vanilla' Calibre conversion and Find and Fix Headings with Sigil. So easy to LOOK at the code and write the perfect REGEX for that case (and step thru and see it really does what you want without the part) |
|
07-01-2013, 11:01 PM | #3 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
The only thing you might be able to do is use Search & Replace to find the next use of the word "Chapter" (if it's used in your book) and then manually change the title's paragraph formatting.
|
07-02-2013, 03:15 AM | #4 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
I've ended up building a series of regex's that fit different scenarios. I inspect the document and then chose the most appropriate and modify if needed. I've got a TOC search group with entries for Chapter + Number, Roman Numbers, Numbers only, Numbers as words etc.
|
07-02-2013, 08:49 AM | #5 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
If you feel like sharing, I'm sure many folks would appreciate having your scripts readily available. You could post it here or in the Regex Examples topic. (If you're inclined.)
|
07-02-2013, 10:28 AM | #6 |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I wonder if we should start a 'Saved Search thread' for frequently reused snips'
I define Frequent use as code that will see usage over many books rather than for a 1 time fix-n-patch job (which can be way more complicated in many cases) Some of my saved Searches are really Models that need per case tuning before use Code:
69\Name=Fixup/Promote Headings/Roman
69\Find="<p class=\"\\w\">([CLXVI]{1,7})</p>"
69\Replace="<hr class=\"sigil_split_marker\" /><h3 class=\"chapno\">\\1</h3>"
(I just copied the above direct from my saved search file) |
07-02-2013, 01:02 PM | #7 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
It should be clear from the above that there are ways, but there are few easy ways because none answers every case.
By the time you fiddle with regex you can do an ordinary search for what starts chapters and highlight and hit h1 for any ordinary numbers of chapters. Of course if this is rocket manual or the like, that's different. |
07-02-2013, 02:56 PM | #8 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Ok, here's some of mine. Remember you'll have to tweak these before use most times.
Code:
52\Name=TOC & Metadata/Part 52\Find="<p class=\".*?\">(?:(Part|PART)) (?:(One|Two|Three|Four|Five|Six|ONE|TWO|THREE|FOUR|FIVE|SIX))</p>" 52\Replace="<hr class=\"sigilChapterBreak\" /><h1>\\1 \\2</h1><hr class=\"sigilChapterBreak\" />" 53\Name=TOC & Metadata/Chapter Finder 53\Find="<p class=\".*?\">(Prologue|PROLOGUE|Epilogue|EPILOGUE|Chapter|CHAPTER)([^>]*)</p>" 53\Replace="<hr class=\"sigilChapterBreak\" /><h2>\\1 \\2</h2>" 55\Name=TOC & Metadata/Numbered Chapters 55\Find="<p class=\".*?\">(\\d+)</p>" 55\Replace="<hr class=\"sigilChapterBreak\" /><h2>\\1</h2>" 56\Name=TOC & Metadata/Roman Chapters 56\Find="<p class=\".*?\">([XVI]+)</p>" 56\Replace="<hr class=\"sigilChapterBreak\" /><h2>\\1</h2>" 57\Name=TOC & Metadata/Numbers 57\Find="<p class=\".*?\">(?:(ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGHTEEN|NINETEEN|TWENTY|TWENTY-ONE|TWENTY-TWO|TWENTY-THREE|TWENTY-FOUR|TWENTY-FIVE|TWENTY-SIX|TWENTY-SEVEN|TWENTY-EIGHT|TWENTY-NINE|THIRTHY|THIRTY-ONE|THIRTY-TWO|THIRTY-THREE|THIRTY-FOUR|THIRTY-FIVE|THIRTY-SIX|THIRTY-SEVEN|THIRTY-EIGHT|THIRTY-NINE|FORTY|FORTY-ONE|FORTY-TWO|FORTY-THREE|FORTY-FOUR|FORTY-FIVE|FORTY-SIX|FORTY-SEVEN|FORTY-EIGHT|FORTY-NINE|FIFTY|FIFTY-ONE|FIFTY-TWO|FIFTY-THREE|FIFTY-FOUR|FIFTY-FIVE|FIFTY-SIX|FIFTY-SEVEN|FIFTY-EIGHT|FIFTY-NINE|SIXTY|SIXTY-ONE|SIXTY-TWO|SIXTY-THREE|SIXTY-FOUR|SIXTY-FIVE|SIXTY-SIX|SIXTY-SEVEN|SIXTY-EIGHT|SIXTY-NINE|SEVENTY|SEVENTY-ONE|SEVENTY-TWO|SEVENTY-THREE|SEVENTY-FOUR|SEVENTY-FIVE|SEVENTY-SIX|SEVENTY-SEVEN|SEVENTY-EIGHT|SEVENTY-NINE|EIGHTY|EIGHTY-ONE|EIGHTY-TWO|EIGHTY-THREE|EIGHTY-FOUR|EIGHTY-FIVE|EIGHTY-SIX|EIGHTY-SEVEN|EIGHTY-EIGHT|EIGHTY-NINE|THIRTHY|NINETY-ONE|NINETY-TWO|NINETY-THREE|NINETY-FOUR|NINETY-FIVE|NINETY-SIX|NINETY-SEVEN|NINETY-EIGHT|NINETY-NINE))</p>" 57\Replace="<hr class=\"sigilChapterBreak\" /><h2>\\1</h2>" 58\Name=TOC & Metadata/Numbers2 58\Find="<p class=\".*?\">(?:(One|Two|Three|Four|Five|Six|Seven|Eight|Nine|Ten|Eleven|Twelve|Thirteen|Fourteen|Fifteen|Sixteen|Seventeen|Eighteen|Nineteen|Twenty|Twenty-One|Twenty-Two|Twenty-Three|Twenty-Four|Twenty-Five|Twenty-Six|Twenty-Seven|Twenty-Eight|Twenty-Nine|Thirthy|Thirty-One|Thirty-Two|Thirty-Three|Thirty-Four|Thirty-Five|Thirty-Six|Thirty-Seven|Thirty-Eight|Thirty-Nine|Forty|Forty-One|Forty-Two|Forty-Three|Forty-Four|Forty-Five|Forty-Six|Forty-Seven|Forty-Eight|Forty-Nine|Fifty|Fifty-One|Fifty-Two|Fifty-Three|Fifty-Four|Fifty-Five|Fifty-Six|Fifty-Seven|Fifty-Eight|Fifty-Nine|Sixty|Sixty-One|Sixty-Two|Sixty-Three|Sixty-Four|Sixty-Five|Sixty-Six|Sixty-Seven|Sixty-Eight|Sixty-Nine|Seventy|Seventy-One|Seventy-Two|Seventy-Three|Seventy-Four|Seventy-Five|Seventy-Six|Seventy-Seven|Seventy-Eight|Seventy-Nine|Eighty|Eighty-One|Eighty-Two|Eighty-Three|Eighty-Four|Eighty-Five|Eighty-Six|Eighty-Seven|Eighty-Eight|Eighty-Nine|Thirthy|Ninety-One|Ninety-Two|Ninety-Three|Ninety-Four|Ninety-Five|Ninety-Six|Ninety-Seven|Ninety-Eight|Ninety-Nine))</p>" 58\Replace="<hr class=\"sigilChapterBreak\" /><h2>\\1</h2>" I agree, a saved searches sticky would be very handy. |
07-02-2013, 03:55 PM | #9 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
A sticky will end up with people asking questions in it and other side tracked issues. It would be better to build a reference document in the wiki and link to the forum for discussion.
Dale |
07-02-2013, 04:00 PM | #10 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
But since this is aimed at the Sigil (saved) Search and Replace interface, is that a good idea? |
|
07-02-2013, 07:11 PM | #11 |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
|
07-02-2013, 07:41 PM | #12 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
I wonder what format it should take? Cookbook (type) Sections? I want to do: Chapter/Part Headings (Finding and restyling) Repair (broken Paragraphs, Mangled quotes, bad/invalid HTML ...) Cleanup (Removal of OCR leftovers: Headers and footers, Word kruft, removing excessive spans) Other (? ) And I guess there should be some way of indicating What programs/Where this applies (foot notes) eg 1 Sigil 2 Calibre Add Books 3 Calibre conversions 4 Notepad++ Ideas ? |
|
07-03-2013, 04:08 AM | #13 | |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Quote:
Formatting/Join Paragraphs Formatting/Speech/Broken dialog Formatting/Line Endings Formatting/Format Change Formatting/Quotes Formatting/Dashes TOC & Metadata |
|
07-03-2013, 06:21 AM | #14 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
Hmm... not sure what you might be looking for exactly, but I know I would like to see any "Known Exclusions" for a particular regex. Things that the regex command will not find so there is some idea to it's usefullness/limitations.
As for the Formatting/Quotes section, we might need two of them: Normal Speech Quotes and Smart/Curly Quotes. I've seen a few posts about people hating one or the other. Not to mention that some non-English languages use other symbols (I think). My two cents for the moment. |
07-03-2013, 03:14 PM | #15 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Hitch |
|
Tags |
regex wiki |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Table of contents in pdf files | pvdas | Onyx Boox | 3 | 12-14-2012 05:45 AM |
Most efficient way to process file contents of exploded ePub | Agama | Development | 4 | 09-23-2012 07:49 AM |
adding table of contents to html files | jfs999 | Conversion | 2 | 09-30-2011 02:25 PM |
Make Table of Contents? | banjobama | Calibre | 18 | 06-25-2011 08:13 AM |
How to make a PDF table of contents work in epub | ajbrutico | Calibre | 3 | 09-26-2010 09:31 AM |