Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2020, 01:52 PM   #46
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Mister L View Post
Is there specifically a Scramble Epub plugin? I was thinking of Toxaris' Borkify plugin, is that the same you had in mind?
Sorry about that, it is ScrambleEbook not ScrambleEpub.

Jackie_w's ScrambleEbook is a separate plugin that basically replaces text and images making the book unusable except for looking for structural issues. See ScrambleEbook: Getting help with copyrighted books for the link or use calibre's search for plugins.

There is also a standalone version for those who don't want to add yet another plugin. See EbookScramble utility post #2

Last edited by DNSB; 07-06-2020 at 02:22 PM. Reason: Edit Standalone version no longer exists.
DNSB is offline   Reply With Quote
Old 07-06-2020, 04:55 PM   #47
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by DNSB View Post
Sorry about that, it is ScrambleEbook not ScrambleEpub.

Jackie_w's ScrambleEbook is a separate plugin that basically replaces text and images making the book unusable except for looking for structural issues. See ScrambleEbook: Getting help with copyrighted books for the link or use calibre's search for plugins.

There is also a standalone version for those who don't want to add yet another plugin. See EbookScramble utility post #2
Oh, a Calibre plugin, right, I didn't know about that one, thank you. In the meantime I had been thinking about ways to be able to share this file without any copyright worries so I deleted most of the chapters and then all the text except the first paragraph of each chapter, since we are only interested in the titles currently. Now that I know about the scramble ebook plugin I went ahead and scrambled that result so there is definitely NO WAY this file could pose any problem at all.

In order to make it less confusing, I unscrambled the ncx to show the original titles and unscrambled all the titles in each file and I renamed the html files to match the toc titles, just to make it easy to see what is what.

I'm attaching the file here. This one is a pretty good test case for this because it manages to combine practically every problem you can think of in one file. As you can see, in this file:
  • All html files are referenced in the TOC, even if the toc reference (=the title displayed in the toc) is not visible in the html page.
  • There is only one toc reference per html file, even when there are multiple h* tags (because of the titles being formatted wrong).
  • Some html files have NO toc marker at all (eg the cover, titre.html...): in these files, the toc reference should be copied from the ncx into a new h1 tag OR an html comment at the top of the html file, whichever is easiest to code.
  • Some html files have an empty h1 tag which is only there to hold the toc-marker ID (eg dedicace.html). The toc title can be added as a title="" attribute to those OR an html comment immediately above the tag with the toc ID, whichever is easiest to code.
  • The chapters have a title split into 2 parts: h1 with the chapter number + h2 with the title, which has fake smallcaps. Chapter 6 has fake smallcaps with a bonus capitalised word for maximum span kludge fun. These are referenced in the toc as "1. Title of the chapter" (in sentence case; Chapter 6 is "Title of the chapter with Propernoun"). The toc reference should be copied exactly as it is (number followed by a period then the title, with zero modifications to the text or the case) to a title="" attribute in the h1 OR an html comment immediately above the tag with the toc ID, whichever is easiest to code.

Hopefully this makes it easier to see what I'm talking about. Let me know if you'd like me to make a 2nd epub which will have the desired results for each title.
Attached Files
File Type: epub Seigneur_ext_TEST_scrambled.epub (458.5 KB, 160 views)
Mister L is offline   Reply With Quote
Advert
Old 07-06-2020, 10:31 PM   #48
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
Thanks for your spec. I have to again question why you want a plugin to do that if, as I've said before, you can do that quite easily and quickly just using your current NCX TOC as a reference(unchanged). Just mark all the headings in the relevant xhtml files with an h1 tag if you want them in your TOC. You can use the 'h*' button in Sigil to do that. You only have to mark about 8 main headings with h1 in the xhtml files, which shouldn't take you long. After that, using your current NCX file as a reference(unchanged), you can then just use Sigil's Generate TOC dialog to edit and re-build the TOC the way you want it. And all that is achieved without using any regex! The alternative for me is to create a one-off much-more-complex plugin that is only good for one person because it's so specific. That's also why I'm rapidly losing interest in the plugin now -- because, as I see it, a plugin isn't really necessary.

In my ebooks the epub TOC page and NCX TOC are usually always different. I normally build a complex, mutilevel TOC for my epub TOC page and then I build a simpler single level NCX TOC with headings like "Chapter 1 ~ This is my chapter 1 subtitle", "Chapter 2 ~ This is my chapter 2 subtitle" etc. So my NCX TOC also combines the chapter heading with chapter subtitle just like in your spec. And I do all that quickly and easily just using the Generate TOC dialog in Sigil. You should first create your epub TOC page using Generate TOC + Create TOC and then you can create a different NCX TOC just using Generate TOC again on its own. In the Generate TOC dialog, you can edit the headings and just type in and combine the Chapter heading name with the Chapter subtitle name, exclude any unwanted headings and then press the OK button and you will have a brand new NCX TOC that is formatted to your own preference.

Last edited by slowsmile; 07-07-2020 at 04:09 AM.
slowsmile is offline   Reply With Quote
Old 07-07-2020, 09:21 AM   #49
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by slowsmile View Post
Thanks for your spec. I have to again question why you want a plugin to do that if, as I've said before, you can do that quite easily and quickly just using your current NCX TOC as a reference(unchanged). Just mark all the headings in the relevant xhtml files with an h1 tag if you want them in your TOC. You can use the 'h*' button in Sigil to do that. You only have to mark about 8 main headings with h1 in the xhtml files, which shouldn't take you long. After that, using your current NCX file as a reference(unchanged), you can then just use Sigil's Generate TOC dialog to edit and re-build the TOC the way you want it. And all that is achieved without using any regex! The alternative for me is to create a one-off much-more-complex plugin that is only good for one person because it's so specific. That's also why I'm rapidly losing interest in the plugin now -- because, as I see it, a plugin isn't really necessary.

In my ebooks the epub TOC page and NCX TOC are usually always different. I normally build a complex, mutilevel TOC for my epub TOC page and then I build a simpler single level NCX TOC with headings like "Chapter 1 ~ This is my chapter 1 subtitle", "Chapter 2 ~ This is my chapter 2 subtitle" etc. So my NCX TOC also combines the chapter heading with chapter subtitle just like in your spec. And I do all that quickly and easily just using the Generate TOC dialog in Sigil. You should first create your epub TOC page using Generate TOC + Create TOC and then you can create a different NCX TOC just using Generate TOC again on its own. In the Generate TOC dialog, you can edit the headings and just type in and combine the Chapter heading name with the Chapter subtitle name, exclude any unwanted headings and then press the OK button and you will have a brand new NCX TOC that is formatted to your own preference.
I am kind of at a loss here. Have you read my previous posts? Do you understand, the file I provided is intended only as an example? I am not asking for a plugin to handle one single specific file (especially not one with only 8 html files inside). Obviously this would be completely ridiculous. Apparently you have really misunderstood what I have been saying. I cannot just "easily and quickly" use the existing TOC, it is laborious and requires a lot of manual intervention. The point of the plugin is to simplify and automate that for use on many different files where "just generate the NCX" is not relevant; that's the entire point.

What is the precise function of the plugin you are working on? From your response here I think the problem you are trying to solve is not the same as the problem I am trying to solve. If you are not interested in pursuing this then please do not continue, it is starting to feel like we are talking at cross purposes and you are right, that is not a productive use of either of our time.
Mister L is offline   Reply With Quote
Old 07-07-2020, 12:27 PM   #50
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Mister L View Post
I am kind of at a loss here. Have you read my previous posts? Do you understand, the file I provided is intended only as an example? I am not asking for a plugin to handle one single specific file (especially not one with only 8 html files inside). Obviously this would be completely ridiculous. Apparently you have really misunderstood what I have been saying. I cannot just "easily and quickly" use the existing TOC, it is laborious and requires a lot of manual intervention. The point of the plugin is to simplify and automate that for use on many different files where "just generate the NCX" is not relevant; that's the entire point.
The problem for me is that when I played with similar issues a few years back, I ended up spending more time on trying to have my code handle special cases and not convert my ebook to egarbage than it would have taken me to do a regex to generate/modify the headers and then copy/paste. Entries that did not have header tags (<p> or <div> are more fun), h? tags wrapped around images, multiple h? tags in the same file (e.g. for the chapter title and subtitle), multiple chapters in the same file mixed with chapters split across multiple files (thank you, Gutenberg!), headers with multiple <a>, <span> and <br /> tags between the h? tags.

It was not a simple project and I was never happy with the results but it taught me quite a bit about Python and making sure I had backup copies.
DNSB is offline   Reply With Quote
Advert
Old 07-07-2020, 01:28 PM   #51
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by DNSB View Post
The problem for me is that when I played with similar issues a few years back, I ended up spending more time on trying to have my code handle special cases and not convert my ebook to egarbage than it would have taken me to do a regex to generate/modify the headers and then copy/paste. Entries that did not have header tags (<p> or <div> are more fun), h? tags wrapped around images, multiple h? tags in the same file (e.g. for the chapter title and subtitle), multiple chapters in the same file mixed with chapters split across multiple files (thank you, Gutenberg!), headers with multiple <a>, <span> and <br /> tags between the h? tags.

It was not a simple project and I was never happy with the results but it taught me quite a bit about Python and making sure I had backup copies.
Yes, depending on where your sources come from it can be a never-ending source of delight to discover all the ways people can "break" something as simple (in theory) as a chapter heading. Special mention for Gutenberg and their "an hr is as good as a new file" structure.

But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.

I really do think it's as close to a perfect solution as it's possible to get to simply find a way to automate copying the original TOC titles back into the files they link to... if you copy the title into an html comment I cannot even see how you could break the file at all, and that would be one single operation so you don't even have to figure out multiple scenarios. Obviously a bit of work would still be required after that to stick this text into the proper tag or add the attribute or whathaveyou but the most fastidious and annoying part would already be done, no copying and pasting by hand between two files, no mucking about with regexes for various wEiRd CaSeS and random spans or one-to-three br's or a's or sup or anything else, and the whole process would be much smoother because you wouldn't have nearly as many variations to adjust for.

I guess I am going to have to do like you and use it to "learn a lot about Python" (lesson 1: apparently Python is what I'll have to learn if I want to make my own plugin). I already have learned the painful lesson about having backup copies during previous "experiments".

How hard is Python to learn? (serious question). I am completely comfortable with html and css but I don't know any programming languages.
Mister L is offline   Reply With Quote
Old 07-07-2020, 03:53 PM   #52
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Mister L View Post
Yes, depending on where your sources come from it can be a never-ending source of delight to discover all the ways people can "break" something as simple (in theory) as a chapter heading. Special mention for Gutenberg and their "an hr is as good as a new file" structure.

But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.

I really do think it's as close to a perfect solution as it's possible to get to simply find a way to automate copying the original TOC titles back into the files they link to... if you copy the title into an html comment I cannot even see how you could break the file at all, and that would be one single operation so you don't even have to figure out multiple scenarios. Obviously a bit of work would still be required after that to stick this text into the proper tag or add the attribute or whathaveyou but the most fastidious and annoying part would already be done, no copying and pasting by hand between two files, no mucking about with regexes for various wEiRd CaSeS and random spans or one-to-three br's or a's or sup or anything else, and the whole process would be much smoother because you wouldn't have nearly as many variations to adjust for.

I guess I am going to have to do like you and use it to "learn a lot about Python" (lesson 1: apparently Python is what I'll have to learn if I want to make my own plugin). I already have learned the painful lesson about having backup copies during previous "experiments".

How hard is Python to learn? (serious question). I am completely comfortable with html and css but I don't know any programming languages.
As far as I know, more than one person, like DNSB, has tried to automate that and so far, nobody has been successful. You're a formatter; you know full well that the circumstances are always different. If you assume that all your chapter titles will be a chapter number, in H1 and a subtitle in English (or whatever language), as an H2, then sure, you could probably engineer something--but I think that others have tried that and given up.

But, hey, if you find a way, good on ya.

Hitch
Hitch is offline   Reply With Quote
Old 07-07-2020, 09:21 PM   #53
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by Hitch View Post
As far as I know, more than one person, like DNSB, has tried to automate that and so far, nobody has been successful. You're a formatter; you know full well that the circumstances are always different. If you assume that all your chapter titles will be a chapter number, in H1 and a subtitle in English (or whatever language), as an H2, then sure, you could probably engineer something--but I think that others have tried that and given up.

But, hey, if you find a way, good on ya.

Hitch
I had typed out a reply to this and then the stupid browser ate it.

Can't be bothered to try and remember all of it. In short: From what I've seen, previous attempts have been intended to fix or draw from the headings in the html files. DNSB will correct me if I'm wrong about his method. I agree, this is doomed to failure because too complex and unpredictable. No-one has said to me, "I (or someone) tried to automate copying the titles out of the toc.ncx or nav.xhtml and pasting them into the corresponding html files without touching the heading formatting at all, it's impossible for reason X."

KevinH understood precisely what I was trying to do and summed it up very succinctly (idea simplified since then to remove danger of breakage, now I just want to paste the titles into html comments instead of a title attribute or h1 tag) but apart from that the discussion has mostly been about how complicated / impossible it would be to make a plugin to fix the headings. Which is 1. 100% true, no argument from me, and also 2. 100% irrelevant, because I don't want a plugin to fix the bleeding headings, I want a plugin to copy the titles out of the toc.ncx or nav and paste them into an html comment in the corresponding files, which is an entirely different proposition and takes a detour around the whole problem of the unpredictable headings (that is a separate problem the way I see it and much better --and easier-- dealt with using regex).

You yourself mention the headings in the html file (h1, h2) which makes me think you haven't understood either what I've been getting at, which is understandable if you have only a passing / zero interest in the question and haven't been following the thread very closely, but in fact I am pretty sure I have found a way to do the specific thing I want to do, I just don't have the technical competencies to implement it. Maybe someday I will, in which case at least this thread will have forced me to think about precisely how to do this and what kind of problems need to be avoided.

Last edited by Mister L; 07-07-2020 at 09:27 PM.
Mister L is offline   Reply With Quote
Old 07-07-2020, 09:57 PM   #54
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Mister L View Post
But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.
My code—in theory—could pull from an epub2 toc.ncx, an epub3 nav.xhtml document or a html table of contents. The problem was the sheer number of special cases that had me wasting more and more time modifying the code as the complexity increased, time that I realized was taking longer than my manual process. I also ran into too many issues where trying to fix the code to work with one ebook broke it for a previously working ebook. Regressions 'Я Us.

Like most programming tasks, it is simple for the person who is not trying to implement it. For the person who is trying to implement it, you find yourself looking for a larger can so all the worms will fit back in.

"All the necessary elements are already in the file"? Bah, humbug. The issues are more that the structure of the epub is different. Even things like where the files are stored in the epub can be a PITA as in recent epub I edited where the text files were partly stored in the root of the archive and partly in a text folder.
DNSB is offline   Reply With Quote
Old 07-07-2020, 10:03 PM   #55
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@Mister L...I have some good reasons why I am so frustrated with your plugin requirements. These reasons are:

1. In your post above that contains your bullet-point spec you said that you wanted to transform or combine the headings of h1 and h2 in your xhtml files using the NCX TOC headings. So after 3 whole pages of telling me what you want in your plugin, you then -- on the 4th page -- mention for the first time ever in that spec that you want to combine h1 and h2 for certain headings only. That p*ssed me off because that single requirement means that I will have to redesign the plugin more or less from scratch again.

2. Your requirement that your xhtml headings must to look exactly the same as NCX headings really surprises me. Why do you insist on that? In my ebooks my xhtml headings will look like this:

CHAPTER 1

The Tiger Steps Out

...but my NCX TOC headings will look like this:

Chapter 1 ~ The Tiger Steps Out

I don't change my xhtml headings.

Your apparent insistence on the xhtml headings being combined look exactly the same as the NCX headings makes more [unnecessary] work for me wrt the plugin. I honestly can't see any good reasons for that requirement since those two different heading formats that I use(shown above), which are very similar to your current epub headings, shouldn't cause any confusion whatsoever for any reader as far as I can see.

Last edited by slowsmile; 07-08-2020 at 02:39 AM.
slowsmile is offline   Reply With Quote
Old 07-07-2020, 10:20 PM   #56
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
@slowsmile: One thing I have learned from years in IT is that unless the customer is willing to write a specification and stand by it, it's time to walk away while you still have your sanity.
DNSB is offline   Reply With Quote
Old 07-07-2020, 10:37 PM   #57
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@DNSB...Well said and so true! The worst are the one's that insist that they want this or that in their epub for no good reason. And even though you warn them or point out the dangers or mistakes they will just keep on insisting. Like you say, if there's no bend then it's time to go...

Last edited by slowsmile; 07-07-2020 at 10:43 PM.
slowsmile is offline   Reply With Quote
Old 07-07-2020, 11:18 PM   #58
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by DNSB View Post
My code—in theory—could pull from an epub2 toc.ncx, an epub3 nav.xhtml document or a html table of contents. The problem was the sheer number of special cases that had me wasting more and more time modifying the code as the complexity increased, time that I realized was taking longer than my manual process. I also ran into too many issues where trying to fix the code to work with one ebook broke it for a previously working ebook. Regressions 'Я Us.

Like most programming tasks, it is simple for the person who is not trying to implement it. For the person who is trying to implement it, you find yourself looking for a larger can so all the worms will fit back in.

"All the necessary elements are already in the file"? Bah, humbug. The issues are more that the structure of the epub is different. Even things like where the files are stored in the epub can be a PITA as in recent epub I edited where the text files were partly stored in the root of the archive and partly in a text folder.
Very interesting to know! From your previous post I had the impression you were not starting from the toc files, sorry for the confusion. If you don't mind my asking, did you try just pasting the toc title into an html comment, and then going back with a regex to move it to where it belonged (or even do that by hand, which would still be easier if the title was already in the right page), rather than making a plugin to do *all* the steps including insert a title attribute? Do you think, based on your experience, that if you only wanted to paste an html comment with the title, at the destination of the link, that would be possible to do without too much risk? I googled "how hard is it to learn Python" and all the answers said "super easy" (lol ) but at this point I've put enough energy into thinking about this stupid thing (plus that 14-book collection I did recently was really a gigantic pain in the a**) that I'm really pretty tempted to try and learn Python if I can manage to find the time, so I can break a few files myself before I give up completely.

Quote:
Originally Posted by slowsmile View Post
@Mister L...I have some good reasons why I am so frustrated with your plugin requirements. These reasons are:

1. In your post above that contains your bullet-point spec you said that you wanted to transform or combine the headings of h1 and h2 in your xhtml files using the NCX TOC headings. So after 3 whole pages of telling me what you want in your plugin, you then -- on the 4th page -- mention for the first time ever in that spec that you want to combine h1 and h2 for certain headings only. That p*issed me off because that single requirement means that I will have to redesign the plugin more or less from scratch again.
This is what I meant when I said we were talking at cross purposes. It is very unfortunate that you got the wrong idea, and I am sorry you were frustrated, but I never said this. In post 47, which I think is what you mean by the bullet-point spec, I attached a sample test file and explained what the results should be based on the various cases in that file. I specifically said:
Quote:
The chapters have a title split into 2 parts: h1 with the chapter number + h2 with the title, which has fake smallcaps. Chapter 6 has fake smallcaps with a bonus capitalised word for maximum span kludge fun. These are referenced in the toc as "1. Title of the chapter" (in sentence case; Chapter 6 is "Title of the chapter with Propernoun"). The toc reference should be copied exactly as it is (number followed by a period then the title, with zero modifications to the text or the case) to a title="" attribute in the h1 OR an html comment immediately above the tag with the toc ID, whichever is easiest to code.
Add the toc version to a title="" attribute (or html comment), NOT "modify the h1 and h2 headings to match the toc". I also offered to make a second file which would show the results I was hoping for after running the plugin.

I tried very hard to explain precisely what I meant, which was to copy the text FROM THE NCX file into title="" attributes in the html file, WITHOUT modifying the html headings. I gave examples of why I might want to do this, which included cases where the titles as displayed in the NCX files are *already* combined from h1 and h2 tags. I gave these explanations multiple times in this thread starting in the first post, but specifically in post 28 I gave examples of code showing the html, the toc code, and the desired result:

Quote:
After running the plugin the result I expected was:
Code:
<h1 id="toc_marker-6" title="1. Le lion sur la colline">1</h1>

<h2><span class="Cap">L</span><span class="SmallCap">E LION SUR LA COLLINE</span></h2>
The blue code is the code I wanted to add, ie the exact title as displayed in the toc, without modifying in any way the original headings in the html file (except to add the title="" attribute).

Quote:
Originally Posted by slowsmile View Post
2. Your requirement that your xhtml headings must look exactly the same as NCX headings really surprises me. Why do you insist on that? In my ebooks my xhtml headings will look like this:

CHAPTER 1

The Tiger Steps Out

...but my NCX TOC headings will look like this:

Chapter 1 ~ The Tiger Steps Out

I don't change my xhtml headings.

Your apparent insistence on the xhtml headings being combined and exactly the same as the NCX headings makes more [unnecessary] work for me wrt the plugin. I also can't see any good reasons for that requirement since those two different heading formats that I use(shown above) should'nt cause any confusion whatsoever for any reader as far as I can see.
I did not say this either. I specifically said, in these files, the titles in the TOC are not the same as the titles displayed in the html pages (exactly like in your example here, among other examples), that is why I need to copy the TOC titles into a title="" attribute (or html comment) so that when I regenerate the TOC after modifying the files, the TOC version of the titles is not lost. I did not want to modify the headings in the html pages.

I am very sorry if this was not clear to you but I tried my best to explain it, multiple times, starting in the first post of the thread, including directly in response to your posts, with examples of the results I was hoping for. I don't think I could explain more clearly than with the example code I included in post 28. I wish, if you did not understand, you had asked me to clarify certain points before continuing, rather than just going ahead. Frankly this was very frustrating for me too.
Mister L is offline   Reply With Quote
Old 07-07-2020, 11:22 PM   #59
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 159
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
Quote:
Originally Posted by DNSB View Post
@slowsmile: One thing I have learned from years in IT is that unless the customer is willing to write a specification and stand by it, it's time to walk away while you still have your sanity.
Quote:
Originally Posted by slowsmile View Post
@DNSB...Well said and so true! The worst are the one's that insist that they want this or that in their epub for no good reason. And even though you warn them or point out the dangers or mistakes they will just keep on insisting. Like you say, if there's no bend then it's time to go...
I believe I gave ample, precise, detailed and consistent indications throughout this thread, including providing a test file and various code samples of the code before and after the plugin. I am sorry if they were not sufficient but the problem is not always on the side you both seem to be indicating.
Mister L is offline   Reply With Quote
Old 07-07-2020, 11:56 PM   #60
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Mister L View Post
I believe I gave ample, precise, detailed and consistent indications throughout this thread, including providing a test file and various code samples of the code before and after the plugin. I am sorry if they were not sufficient but the problem is not always on the side you both seem to be indicating.
Quote:
Originally Posted by DiapDealer View Post
You misunderstood me. I wasn't really asking for human answers to most of those questions. I was asking what kind of spaghetti logic would be required within a plugin to get the plugin to always make the "right" decisions on its own? A human can look at the code and easily (sometimes) see where the new tag should be created. A script has to parse all of the html (and make informed guesses about where it makes sense to put it semantically speaking) to avoid creating malformed or improperly nested tags.
Basically what DiapDealer said in his response is what I ran into. For one epub, it is not an issue. For a mass of epubs with the variances in structure, code and even file locations in the zip container, I could find no way to program a reliable tool. I've mentioned some of the issues I ran into, all of which the plugin you want would either have to handle or fail gracefully. As I said, I found using regex to add title="" to an existing header or adding a new header to each file and then copy/pasting ended up taking less time especially when fixing the code to work with epub #25 broke the fixes I put in to deal with epubs #18 and #22.

Someone else may have had less trouble with the code. Perhaps someone without fond memories of a Fortran arithmetic if.
DNSB is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
GUI Plugin "TOC View Generator" (was: Define Content) Mick2nd Plugins 19 02-03-2022 09:41 AM
V3 "Feature" Full Screen Add Book Dialog johnelle Library Management 3 08-11-2017 02:43 PM
A warning for Linux users: slow "Add Books", "Unknown" title and Author rolgiati Library Management 8 07-24-2013 04:36 PM
"Add existing files" doesn't show all directories Ripplinger Sigil 5 02-23-2013 11:43 AM
Feature Request - TOC Exclude "> My Books" chrisparker Library Management 2 10-13-2012 11:44 AM


All times are GMT -4. The time now is 05:42 AM.


MobileRead.com is a privately owned, operated and funded community.