12-29-2018, 10:30 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
Mass Insertion of <a></a> tags
Hello, all. Author here, NOT a techie. I've been happily and successfully using Calibre to create my ebooks for years, and in the past whenever I've run into problems I just hack through it with trial and error until I figure it out. But this one has me stumped.
The latest version of Calibre (just downloaded yesterday) is responding to my p class = chapter and p class = title paragraphs by inserting <a> before each one and then </a> immediately after the next occurring </p>. This results in some ereaders showing pretty much the whole book as a hyperlink, besides throwing pages and pages of validation errors. I code my text files in jedit, convert to html, then feed that file into Calibre for conversion to epub. The problem described happens ONLY in files for boxed sets that have a 2-tiered TOC structure (Title & Chapter tiers). Books with a simple structure 1-tier TOC do not have this problem, even with identical coding in the frontmatter. The only other difference I can see in the affected books is that they contain bookmark codes to connect internal text links with images (book covers) later in the document. In one case, an html file converted by Calibre a year ago was fine, but the exact same file converted now with the new Calibre produces the problem. I can and have removed the unwanted tags manually within the edit function. But I don't need to tell you what a PITA that is. Many thanks. |
12-29-2018, 12:03 PM | #2 |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
chapter|book|section|part)|prolog|prologue|epilogu e are keywords used for (TOC) detection and splitting .
IMHO avoid using those for style classes. Take a look at the names other people use: FP, TX1 (first Paragraph) TX (or nothing) for a typical (indented) paragraph CN for chapter Number, CT for chapter title Some enumerate: every tiny detail: (the_first_paragraph_following_the_change_in_point _of_view) |
Advert | |
|
12-29-2018, 04:10 PM | #3 | |
null operator (he/him)
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
See How to ask a question about conversion problems Main thing is an example of the problem yas input and output files. Put them in a zip and attach it to a post via the Manage Attachments button below the Submit and Preview buttons. BR |
|
12-30-2018, 10:09 AM | #4 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
Mass Insertion of <a></a> tags
Okay, I've tried to attach the necessary files. The original document is a behemoth boxed set of 11 full-length novels, so it is difficult to work with, but I extracted just the frontmatter and first few pages and the error did indeed reproduce in the sample. Two notes:
1. My original description had a typo - the second of the inserted set of tags </a> appears just before the next occurring </body>, not </p>. Sorry. 2. The original frontmatter had bookmarks for the title page (linked to cover image) for all 11 books. I cropped out all but the first book to avoid incomplete references. But you can still see the architecture used. Thanks! |
12-31-2018, 12:32 AM | #5 |
creator of calibre
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Sorry I'm a little confused. I looked at the epub file you attached and I can see no mis-matched <a> tags and it renders fine in the viewer and editor. Have you already removed them from the epub? Or maybe I am missing them?
I also tried converting the attached html with the same level 1 and level2 toc expressions as you used and coule see no mismatched <a> tags either. |
Advert | |
|
12-31-2018, 01:13 AM | #6 |
Bibliophagist
Posts: 35,380
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
When I looked at the epub file in his sample.zip, there is a <a> tag immediately after the <body class="calibre"> which is closed by a </a> immediately before the </body> tag in his Sample_split_002.html, Sample_split_003.html, Sample_split_004.html and Sample_split_005.html segments.
No mismatch since the tag is closed but a rather odd addition. |
12-31-2018, 03:26 AM | #7 |
creator of calibre
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
ah ok, I see it, I'll look into it.
|
12-31-2018, 09:34 AM | #8 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
Thanks for looking into this!
I've attached a file that includes the validation errors thrown by these <a> tags, as well as a screenshot of the tags in the edit view of Calibre and a screenshot of how Adobe Digital Editions (but not all other viewers) interprets all the text as a hyperlink because of them. Thanks again. |
12-31-2018, 01:46 PM | #9 | |
Bibliophagist
Posts: 35,380
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
As a quick and dirty fix, I'd suggest replacing the "<body class="calibre"> <a>" with "<body class="calibre">" and the "</a> </body>" with "</body>". Copy/paste should keep the EOL between the two parts of the first search elements. |
|
12-31-2018, 03:41 PM | #10 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
That fix will certainly be more efficient than the method I was using. But I still hope to fix the root cause so that I don't have to keep editing all my files after conversion.
What I can't understand, if the problem is in the code I'm inputting, is why the problem did not occur with whatever version of Calibre I was using a year ago. I have confirmed different results with the exact same html source file. Thanks. |
01-01-2019, 02:50 AM | #11 | |
Bibliophagist
Posts: 35,380
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
01-01-2019, 04:29 PM | #12 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
Thanks for the advice. I'll doctor the files I need this week and hope for a root fix soon.
|
01-02-2019, 02:47 AM | #13 |
creator of calibre
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
it's caused by your original Sample.html file having an unclosed <a> tag in it.
Code:
<p class="centered"><span class="centered"><a id="titlebookmark1"><img src="NBCover.jpg" alt="Never Buried Cover" /></span></p> |
01-02-2019, 08:48 AM | #14 |
Junior Member
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
|
That makes perfect sense! I see exactly what you mean. Thanks so much, Kovid. I will go back and donate to you again!
|
Tags |
hyperlink, toc |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Category Editor (Tags) - Manage Tags | Pachuqismo | Library Management | 5 | 03-10-2021 02:19 PM |
Mass delete of unpopular tags | jacr | Library Management | 25 | 07-21-2018 10:43 PM |
disable insertion of page numbers | Taantric | Conversion | 2 | 01-23-2012 05:42 PM |
Amazon Tags - Popular tags vs Unique tags. | chrisanthropic | Writers' Corner | 6 | 09-19-2011 11:18 PM |
linux kernel module compilation + insertion | gdkags | Kindle Developer's Corner | 3 | 09-21-2010 05:27 PM |