Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-20-2020, 11:42 AM   #1
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Add title="" to h* based on existing TOC -- suggestion for new feature (or plugin?)

How easy / possible would it be to reverse engineer an existing TOC and add the existing titles as they appear, to a title="" in an h* tag at the appropriate point in the book? If the file is really badly made and the chapter titles are in some random tag like p or div it might be necessary to add a blank h* with a display:none to it.

I'm not sure whether this is better suited to a feature in the Tools > Table of Contents menu or as a plugin (super subtle WINK to any plugin coders who are bored and looking for a new challenge...) but it would be an amazing tool to have.

Use cases:

1. you need to combine several epubs into a "collected works" file, or

2. you need to separate a "collected works" file into its individual books and make independent epubs of each,

and

the original epubs you are given have chapter headings in two (or more) parts, and/or with extraneous code in them which will make regenerating the TOC complicated, for instance :

Code:
    <h1 epub:type="title" class="part_n"><span>4</span></h1>

    <h1 epub:type="title" class="part_tit"><span>The#160;Whale speaks of#160;what#160;she has#160;learned about#160;humans</span></h1>
(note I deleted the "&" to avoid the & #160's being parsed)

Existing (desired) TOC entry :
4. The Whale speaks of what she has learned about humans

Or (even worse...)

Code:
    <h1 id="toc_marker-26">21</h1>

    <h2><span class="Cap">E</span><span class="SmallCap">N CHEMIN POUR</span> <span class="Cap">S</span><span class="SmallCap">HADAR</span> <span class="Cap">L</span><span class="SmallCap">OGOTH</span></h2>
Existing (desired) TOC entry:
21. En chemin pour Shadar Logoth

(Note, just to be PERFECTLY CLEAR, I had absolutely nothing to do with making these monstrosities originally, or I wouldn't have this problem.)


Those examples are taken straight from actual books I'm working on: last week I had to deal with case 2 and this week I've got to tackle case 1 (14 epubs, not a single one of which has chapter titles that will facilitate re-generating the TOC once I've pulled them all into the collection), and that's a lot of fiddly regex-ing and / or hand-coding 1 by 1 to copy the TOC entries into title="" (that's what I did last week because I couldn't think of a better solution, and it was pretty damned annoying just on one book, let alone 14), not for the first time and certainly not for the last either so I'm hoping that by the next time I have to deal with this there will be a better way.

If there already is a better way and I just don't know about it (I did go through the plugin index just in case...), by all means PLEASE tell me.
Mister L is offline   Reply With Quote
Old 06-20-2020, 03:29 PM   #2
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 2,094
Karma: 13046250
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
Quote:
Originally Posted by Mister L View Post
...
If there already is a better way and I just don't know about it (I did go through the plugin index just in case...), by all means PLEASE tell me.
Unfortunately there are so many different examples of how people do them badly...it would be very difficult to encompass all cases. I usually just use regex.
Turtle91 is offline   Reply With Quote
Advert
Old 06-20-2020, 03:34 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 22,546
Karma: 125997190
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The problem is that Sigil already does the exact reverse of what you want. The title attribute of an h tag (if present) is used by Sigil to generate the text of the ToC. That's what allows users to generate ToCs that have different text than what's between the h tags.

Last edited by DiapDealer; 06-20-2020 at 09:40 PM.
DiapDealer is offline   Reply With Quote
Old 06-21-2020, 08:53 AM   #4
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by DiapDealer View Post
The problem is that Sigil already does the exact reverse of what you want. The title attribute of an h tag (if present) is used by Sigil to generate the text of the ToC. That's what allows users to generate ToCs that have different text than what's between the h tags.
Yes, my point exactly. That is precisely what makes me think it should be possible to add a variation of that feature.
When the TOC is made, Sigil already knows what each TOC entry should say and what part of the document it is linked to (whether it harvested the info from title attributes or otherwise). It can assemble this info into a new file (nav.xhtml and toc.ncx). It puts it together using appropriate tags.
From there, it should be possible to ask it to redistribute the same elements in the opposite direction: the nav is the source of the information rather than the destination and each title is copied back to its destination. Sigil will either have a toc id there, or an h* (with or without a title=""), or nothing if the link just goes to the file. If there is already a title="" overwriting it could be useful if you've made some changes directly in the nav. If there is no title, it can be added.

If you are worried about potential conflict with existing code, rather than asking it to add this to a title="" it can be added as an html comment or some other code that seems appropriate to you; maybe something like <section title="Text of title" /> or <a title="Text of title" /> or anything else. From there it would be fairly trivial to regex the text into a title="" and be able to easily regenerate the TOC as needed.

Obviously this would be a separate feature to generating the toc even if it's closely related, just like there is a separate "epub3 tools" menu to generate the ncx from the nav, and it wouldn't be necessary for every book, but it would be useful in a lot of cases, and when it's useful it's REALLY useful. I frequently have requests to modify files made by someone else, for example the cases I mentioned above or things like adding a preview of the next book at the end of a book that's already published or a new introduction or something like that. This almost always requires some intervention in the TOC and I have never once seen a book that wasn't my own that made it easy to modify the TOC.

Quote:
Originally Posted by Turtle91 View Post
Unfortunately there are so many different examples of how people do them badly...it would be very difficult to encompass all cases. I usually just use regex.
Ah, so you also have dealt with this mess. Yes, I use regex too, but it can be really time consuming because of all the variations and ultimately it's always necessary to do some of it by hand. Wouldn't it be nice if you could just grab all the correct titles from the toc and turn them into references and then just click "Generate TOC", instead of messing about with regex?

Last edited by Mister L; 06-21-2020 at 09:00 AM.
Mister L is offline   Reply With Quote
Old 06-21-2020, 10:07 AM   #5
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 4,380
Karma: 2622176
Join Date: Nov 2009
Device: many
A plugin might be best for this case.

Just so that everyone is on the same page ...

It would take an existing nav or ncx,, follow the links back to the target file and element, add a title attribute to it (remembering to html escape any text) based on the current TOC. If existing link is to top of file, inject a new h1 tag with nodisplay set on it with that title.

The idea is that after running this plugin, you should be able to regenerate the TOC from h tags in Sigil and get something very very close to the original TOC back.

Is that correct?

KevinH
KevinH is offline   Reply With Quote
Advert
Old 06-21-2020, 10:19 AM   #6
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,948
Karma: 16727733
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Mister L View Post
When the TOC is made, Sigil already knows what each TOC entry should say and what part of the document it is linked to (whether it harvested the info from title attributes or otherwise).
The problem is that heading formats aren't predictable. You yourself gave two examples. In the first example, the heading consisted of two <h1> tags and in the second example it consisted of <h1> and <h2> tags.

BTW, both problems can be easily fixed with the right regular expressions. For example, you could use the following expressions to merge the two <h1> tags:

Find:<h1 epub:type="title" class="part_n"><span>(\d+)</span></h1>\s+<h1 epub:type="title" class="part_tit"><span>(.*?)</span></h1>
Replace:<h1 epub:type="title" class="part_n" title="\1: \2"><span>\1</span><br /><span class="part_tit">\2</span></h1>

If you process the first heading format with it and then generate the TOC, Sigil will add the following entry:

4: The Whale speaks of what she has learned about humans
Doitsu is offline   Reply With Quote
Old 06-21-2020, 01:43 PM   #7
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by KevinH View Post
A plugin might be best for this case.

Just so that everyone is on the same page ...

It would take an existing nav or ncx,, follow the links back to the target file and element, add a title attribute to it (remembering to html escape any text) based on the current TOC. If existing link is to top of file, inject a new h1 tag with nodisplay set on it with that title.

The idea is that after running this plugin, you should be able to regenerate the TOC from h tags in Sigil and get something very very close to the original TOC back.

Is that correct?

KevinH
Yes that is exactly right. Would it be difficult to make a plugin for that?

Quote:
Originally Posted by Doitsu View Post
The problem is that heading formats aren't predictable. You yourself gave two examples. In the first example, the heading consisted of two <h1> tags and in the second example it consisted of <h1> and <h2> tags.

BTW, both problems can be easily fixed with the right regular expressions. For example, you could use the following expressions to merge the two <h1> tags:

Find:<h1 epub:type="title" class="part_n"><span>(\d+)</span></h1>\s+<h1 epub:type="title" class="part_tit"><span>(.*?)</span></h1>
Replace:<h1 epub:type="title" class="part_n" title="\1: \2"><span>\1</span><br /><span class="part_tit">\2</span></h1>

If you process the first heading format with it and then generate the TOC, Sigil will add the following entry:

4: The Whale speaks of what she has learned about humans
Yes, so far I have been relying on regex for these cases (and if you have a regex for the fake smallcaps example I'd love to know it, I did that one last week and ended up just copying over the titles by hand). But precisely because they are not predictable, I have to figure out a new regex every time depending on the specific characteristics of the file rather than just having a saved search I can run, and it can be very time-consuming (especially as my regex skills are somewhat limited), and it's a bit frustrating knowing that the exact information needed is already in the book but not easily exploited. I've had a whole series of these cases recently (and another very big one to do this week) which is why I started to think there must be a better way to do it.
Mister L is offline   Reply With Quote
Old 06-21-2020, 02:02 PM   #8
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,948
Karma: 16727733
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Mister L View Post
[...] and if you have a regex for the fake smallcaps example I'd love to know it, I did that one last week and ended up just copying over the titles by hand).
Why don't you post your fake smallcaps question in the Regex subforum?
Doitsu is offline   Reply With Quote
Old 06-22-2020, 07:09 AM   #9
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by Doitsu View Post
Why don't you post your fake smallcaps question in the Regex subforum?
Sure, why not, for my own education.
But to be clear, that question is independent of my real question here, because I really do believe there is a better way to handle this specific problem than regex.
Mister L is offline   Reply With Quote
Old 06-23-2020, 03:12 PM   #10
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Just curious, should I give up on this or does anyone with the skills to make a plugin think it's a good idea? (One day I want to learn to code plugins myself but I do not currently have those skills and not the time to learn them right now).

In case anyone is half-convinced of the usefulness of this hypothetical plugin I'm working on the giant "collected works" book right now and preparing the headings and yet another example of why regex is not the answer when it comes to redoing the TOC are all the parts of the books that are in the TOC but have no title in the page at all (I put a nodisplay h1 in those cases) or have a different title in the TOC to the one displayed in the page, such as the portrait of the author, the copyright page, the cover, the title page which is called "Title page" in the TOC but obviously not in the page, the "By the same author" / bibliography page, Publisher catalogue page... No choice for those cases but to do it all by hand.
Mister L is offline   Reply With Quote
Old 06-23-2020, 03:40 PM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,592
Karma: 7401109
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Mister L View Post
[...] or have a different title in the TOC to the one displayed in the page, such as the portrait of the author, the copyright page, the cover, the title page which is called "Title page" in the TOC but obviously not in the page, the "By the same author" / bibliography page, Publisher catalogue page...
The real question is: Does this belong in a TOC at all?

I would strongly lean towards No.

Quote:
Originally Posted by Mister L View Post
No choice for those cases but to do it all by hand.
Sometimes that's what you have to do. Especially if you get some hideous code that's inconsistent spaghetti gobbledeegook like you brought up in this thread.

I'm going to pull a JSWolf and say clean the code up and make it consistent first, then your life will be much easier with the Regex going forward.

* * *

On your Title Casing problem. There are a few solutions, but I've found almost all the be suboptimal and have their own issues on edge cases.

Back in 2014, I used this Regex:

https://www.mobileread.com/forums/sh...53#post2930153
https://www.mobileread.com/forums/sh...d.php?t=233018

(I still use similar nowadays.)

Calibre introduced a "Function Mode" and even has an entire section dedicated in the manual for it, "Automatically fixing the case of headings in the document".

But most of the solutions I've come across the years don't take into account the nuances needed for proper Title Casing (different Style Guides require different rules).

This is the site I use:

https://capitalizemytitle.com/

It handles title casing better than many of the other tools I've run across over the years... and it does handle edge cases like caps after : or EM DASH.

But you always get stuff like: DNA, RNA, mRNA, First/Last names (DeSanto, McDonald), etc.

Last edited by Tex2002ans; 06-23-2020 at 03:42 PM.
Tex2002ans is offline   Reply With Quote
Old 06-23-2020, 04:03 PM   #12
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by Tex2002ans View Post
The real question is: Does this belong in a TOC at all?

I would strongly lean towards No.
Well, you might lean towards no, but sometimes the publisher disagrees, and in many cases I also disagree.

Either way, to be honest, at this point I would prefer for the real question of this thread to be whether or not there is any hope of seeing a plugin just as we've described. Everything else, at this point, is sort of extraneous to the discussion.

Quote:
Originally Posted by Tex2002ans View Post

Sometimes that's what you have to do. Especially if you get some hideous code that's inconsistent spaghetti gobbledeegook like you brought up in this thread.

I'm going to pull a JSWolf and say clean the code up and make it consistent first, then your life will be much easier with the Regex going forward.
Trust me, I don't need to be told to clean up the code. But part of doing a good job is having the right tools for the job. As I've already said, I DO USE REGEX to clean up the code including to prepare the TOC generation, however I am convinced there is a better way to do that specific thing. It's a separate question to cleaning up the code. Unfortunately I'm not (currently) capable of making the tool I need. If no-one else is interested, that's fine, I'm not going to keep pushing it, it just seemed to me that we got a bit distracted by discussions about regex so I wanted to check where things stood.


Quote:
Originally Posted by Tex2002ans View Post

On your Title Casing problem. There are a few solutions, but I've found almost all the be suboptimal and have their own issues on edge cases.

Back in 2014, I used this Regex:

https://www.mobileread.com/forums/sh...53#post2930153
https://www.mobileread.com/forums/sh...d.php?t=233018

(I still use similar nowadays.)

Calibre introduced a "Function Mode" and even has an entire section dedicated in the manual for it, "Automatically fixing the case of headings in the document".

But most of the solutions I've come across the years don't take into account the nuances needed for proper Title Casing (different Style Guides require different rules).

This is the site I use:

https://capitalizemytitle.com/

It handles title casing better than many of the other tools I've run across over the years... and it does handle edge cases like caps after : or EM DASH.

But you always get stuff like: DNA, RNA, mRNA, First/Last names (DeSanto, McDonald), etc.
You're kind of making my point for me here to be honest... Do you see how complicated this is, when the work has already been done and the correct titles are already in the file?? Wouldn't it be easier to just click on a plugin and copy them over to each chapter?


Thanks for those suggestions though, and if I get stuck on something in future I will take a look. The small-caps titles were last week so I don't need that at the moment, I'm working on a different project now. Either way, I don't really want to fiddle around with a different site to fix the cases of 2 words every 3 chapters because frankly at that point it's just faster to do it by hand. Plus it looks like that site is in English, most of the books I work on are in French.

Like I said, I do know how to do this the hard way, I am trying to find a better way.
Mister L is offline   Reply With Quote
Old 06-24-2020, 02:48 AM   #13
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,592
Karma: 7401109
Join Date: Jul 2012
Device: Kobo Forma, Nook
Oh jeeze, I completely misread Mister L's and the other posts. I thought Title Casing methods were being discussed like:

Code:
<h2>TEXT</h2> -> <h2 title="Text">TEXT</h2>
<h2 title="Text">T<small>EXT</small></h2> -> <h2>Text</h2>
so I typed up the ultimate "Title Casing: Everything You Didn't Know You Ever Wanted to Know" post.

After rereading entire thread, I see Mister L meant the EPUB's TOC (nav/NCX) already had the chapters capitalized the way he wanted.

I'll do very minor answers here, then toss the enormous tangent in the Workshop in a few days.

Quote:
Originally Posted by Mister L View Post
Do you see how complicated this is, when the work has already been done and the correct titles are already in the file??
Well, the 2nd example you gave in Post #1 wasn't correct in the file... so those recommendations were mostly geared towards cleaning types like that.

But now I see what you mean by "correct in the file".

Quote:
Originally Posted by Mister L View Post
Plus it looks like that site is in English, most of the books I work on are in French.
Yeah, French title casing probably brings in its own issues like lowercase l’ before words, or keeping "pour" lowercase.

I definitely don't know any title casing tool that handles French exceptions. I've only seen American English only. (More details and edge cases will be in forthcoming topic.)

Last edited by Tex2002ans; 06-24-2020 at 02:56 AM.
Tex2002ans is offline   Reply With Quote
Old 06-24-2020, 08:32 AM   #14
Mister L
Zealot
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 124
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by Tex2002ans View Post
Oh jeeze, I completely misread Mister L's and the other posts. I thought Title Casing methods were being discussed like:

Code:
<h2>TEXT</h2> -> <h2 title="Text">TEXT</h2>
<h2 title="Text">T<small>EXT</small></h2> -> <h2>Text</h2>
so I typed up the ultimate "Title Casing: Everything You Didn't Know You Ever Wanted to Know" post.
Heh. Yes I did have the impression we had wandered a little bit.

Quote:
Originally Posted by Tex2002ans View Post
After rereading entire thread, I see Mister L meant the EPUB's TOC (nav/NCX) already had the chapters capitalized the way he wanted.
Correct.


In fact it's not limited to questions of case, it can also be the presentation of chapter number + title (with a line break or in 2 separate tags in the html, but separated with a point or a dash in the TOC...) or something else. Either way the point is they have already been correctly formatted for the TOC but it's impossible to retrieve that information easily so if you have to modify the TOC you have to re-do all the work which has already been done (in addition to all the work of fixing someone else's terrible code). And as my examples show there is no "one size fits all" solution when you start with the xhtml files so there aren't even any shortcuts, it's really inefficient and frustrating.

Quote:
Originally Posted by Tex2002ans View Post
Well, the 2nd example you gave in Post #1 wasn't correct in the file... so those recommendations were mostly geared towards cleaning types like that.

But now I see what you mean by "correct in the file".
"For certain values of 'file'"

(But yes obviously I would never consider those title formats "correct" in the html files. I think we agree on that question. It's astonishing the terrible state of some files made by so-called "professionals" who have charged for their services. These are books made for publishers and on sale in bookstores.)

Quote:
Originally Posted by Tex2002ans View Post
Yeah, French title casing probably brings in its own issues like lowercase l’ before words, or keeping "pour" lowercase.
French I think is simpler than English. The first word of the title is capitalised, and if that word is "The" then the second word generally is as well, but the rest is lower case, except for proper nouns, just like in a sentence. Obviously there can be other exceptions which further complicate the question for regex purposes (roman numerals, acronyms...).

Last edited by Mister L; 06-24-2020 at 08:47 AM.
Mister L is offline   Reply With Quote
Old 06-24-2020, 03:32 PM   #15
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,592
Karma: 7401109
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Mister L View Post
It's astonishing the terrible state of some files made by so-called "professionals" who have charged for their services. These are books made for publishers and on sale in bookstores.)
Yeah, that's also why I was downplaying wanting to go from their NCX backwards into the HTML itself.

For the most part, the NCX is messed up and I actually want to overwrite with my clean, beautiful code!

* * *

Another case which might also be helpful is:

Original TOC:

Code:
“Article Title” by Author Last
Original HTML:

Code:
<h2>Article Title</h2>
<p class="author">Author Last</p>
"Proper" Sigil HTML:

Code:
<h2 title="“Article Title” by Author Last">Article Title</h2>
<p class="author">Author Last</p>
99% of the time you want to go HTML->NCX (thus the Sigil Generate TOC), but 1% of the time, you might want to go backwards.

Quote:
Originally Posted by Mister L View Post
French I think is simpler than English. The first word of the title is capitalised, and if that word is "The" then the second word generally is as well, but the rest is lower case, except for proper nouns, just like in a sentence. Obviously there can be other exceptions which further complicate the question for regex purposes (roman numerals, acronyms...).
Oh, I have it all written down... I have it all...

And French with their "XIVth Century" stuff, or their little superscript e.

Side Note: One of my favorite games, Europa Universalis IV, takes place during the ~1450s-1850s, and has fans from around the world who are super into history. When discussing history on forums, since most are ESL (English as Second Language), they bring in all these quirky language styles from around the world.

Last edited by Tex2002ans; 06-24-2020 at 03:39 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
GUI Plugin "TOC View Generator" (was: Define Content) Mick2nd Plugins 15 05-08-2020 03:03 AM
V3 "Feature" Full Screen Add Book Dialog johnelle Library Management 3 08-11-2017 02:43 PM
A warning for Linux users: slow "Add Books", "Unknown" title and Author rolgiati Library Management 8 07-24-2013 04:36 PM
"Add existing files" doesn't show all directories Ripplinger Sigil 5 02-23-2013 11:43 AM
Feature Request - TOC Exclude "> My Books" chrisparker Library Management 2 10-13-2012 11:44 AM


All times are GMT -4. The time now is 06:48 PM.


MobileRead.com is a privately owned, operated and funded community.