Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2018, 10:30 AM   #1
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
Mass Insertion of <a></a> tags

Hello, all. Author here, NOT a techie. I've been happily and successfully using Calibre to create my ebooks for years, and in the past whenever I've run into problems I just hack through it with trial and error until I figure it out. But this one has me stumped.

The latest version of Calibre (just downloaded yesterday) is responding to my p class = chapter and p class = title paragraphs by inserting <a> before each one and then </a> immediately after the next occurring </p>. This results in some ereaders showing pretty much the whole book as a hyperlink, besides throwing pages and pages of validation errors.

I code my text files in jedit, convert to html, then feed that file into Calibre for conversion to epub. The problem described happens ONLY in files for boxed sets that have a 2-tiered TOC structure (Title & Chapter tiers). Books with a simple structure 1-tier TOC do not have this problem, even with identical coding in the frontmatter. The only other difference I can see in the affected books is that they contain bookmark codes to connect internal text links with images (book covers) later in the document.

In one case, an html file converted by Calibre a year ago was fine, but the exact same file converted now with the new Calibre produces the problem.

I can and have removed the unwanted tags manually within the edit function. But I don't need to tell you what a PITA that is.

Many thanks.
EClaire is offline   Reply With Quote
Old 12-29-2018, 12:03 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,784
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
chapter|book|section|part)|prolog|prologue|epilogu e are keywords used for (TOC) detection and splitting .

IMHO avoid using those for style classes.

Take a look at the names other people use:

FP, TX1 (first Paragraph)
TX (or nothing) for a typical (indented) paragraph
CN for chapter Number, CT for chapter title
Some enumerate: every tiny detail: (the_first_paragraph_following_the_change_in_point _of_view)
theducks is online now   Reply With Quote
Advert
Old 12-29-2018, 04:10 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,558
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by EClaire View Post
Hello, all. Author here, NOT a techie. I've been happily and successfully using Calibre to create my ebooks for years, and in the past whenever I've run into problems I just hack through it with trial and error until I figure it out. But this one has me stumped.

The latest version of Calibre (just downloaded yesterday) is responding to my p class = chapter and p class = title paragraphs by inserting <a> before each one and then </a> immediately after the next occurring </p>. This results in some ereaders showing pretty much the whole book as a hyperlink, besides throwing pages and pages of validation errors.

I code my text files in jedit, convert to html, then feed that file into Calibre for conversion to epub. The problem described happens ONLY in files for boxed sets that have a 2-tiered TOC structure (Title & Chapter tiers). Books with a simple structure 1-tier TOC do not have this problem, even with identical coding in the frontmatter. The only other difference I can see in the affected books is that they contain bookmark codes to connect internal text links with images (book covers) later in the document.

In one case, an html file converted by Calibre a year ago was fine, but the exact same file converted now with the new Calibre produces the problem.

I can and have removed the unwanted tags manually within the edit function. But I don't need to tell you what a PITA that is.

Many thanks.
Do you know what version of calibre you were using before you installed the latest version? If you reinstall that version it won't affect your libraries or configuration.

See How to ask a question about conversion problems

Main thing is an example of the problem yas input and output files. Put them in a zip and attach it to a post via the Manage Attachments button below the Submit and Preview buttons.

BR
BetterRed is online now   Reply With Quote
Old 12-30-2018, 10:09 AM   #4
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
Mass Insertion of <a></a> tags

Okay, I've tried to attach the necessary files. The original document is a behemoth boxed set of 11 full-length novels, so it is difficult to work with, but I extracted just the frontmatter and first few pages and the error did indeed reproduce in the sample. Two notes:

1. My original description had a typo - the second of the inserted set of tags </a> appears just before the next occurring </body>, not </p>. Sorry.

2. The original frontmatter had bookmarks for the title page (linked to cover image) for all 11 books. I cropped out all but the first book to avoid incomplete references. But you can still see the architecture used.

Thanks!
Attached Files
File Type: zip Sample.MobileRead.zip (296.2 KB, 186 views)
EClaire is offline   Reply With Quote
Old 12-31-2018, 12:32 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sorry I'm a little confused. I looked at the epub file you attached and I can see no mis-matched <a> tags and it renders fine in the viewer and editor. Have you already removed them from the epub? Or maybe I am missing them?

I also tried converting the attached html with the same level 1 and level2 toc expressions as you used and coule see no mismatched <a> tags either.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-31-2018, 01:13 AM   #6
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
When I looked at the epub file in his sample.zip, there is a <a> tag immediately after the <body class="calibre"> which is closed by a </a> immediately before the </body> tag in his Sample_split_002.html, Sample_split_003.html, Sample_split_004.html and Sample_split_005.html segments.

No mismatch since the tag is closed but a rather odd addition.
DNSB is offline   Reply With Quote
Old 12-31-2018, 03:26 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
ah ok, I see it, I'll look into it.
kovidgoyal is offline   Reply With Quote
Old 12-31-2018, 09:34 AM   #8
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
Thanks for looking into this!

I've attached a file that includes the validation errors thrown by these <a> tags, as well as a screenshot of the tags in the edit view of Calibre and a screenshot of how Adobe Digital Editions (but not all other viewers) interprets all the text as a hyperlink because of them.

Thanks again.
Attached Files
File Type: docx SampleValidationErrors.docx (222.4 KB, 185 views)
EClaire is offline   Reply With Quote
Old 12-31-2018, 01:46 PM   #9
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by EClaire View Post
Thanks for looking into this!

I've attached a file that includes the validation errors thrown by these <a> tags, as well as a screenshot of the tags in the edit view of Calibre and a screenshot of how Adobe Digital Editions (but not all other viewers) interprets all the text as a hyperlink because of them.

Thanks again.
I suspect most of the issue is that you are wrapping block level elements in an inline element.

As a quick and dirty fix, I'd suggest replacing the "<body class="calibre"> <a>" with "<body class="calibre">" and the "</a> </body>" with "</body>". Copy/paste should keep the EOL between the two parts of the first search elements.
DNSB is offline   Reply With Quote
Old 12-31-2018, 03:41 PM   #10
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
That fix will certainly be more efficient than the method I was using. But I still hope to fix the root cause so that I don't have to keep editing all my files after conversion.

What I can't understand, if the problem is in the code I'm inputting, is why the problem did not occur with whatever version of Calibre I was using a year ago. I have confirmed different results with the exact same html source file.

Thanks.
EClaire is offline   Reply With Quote
Old 01-01-2019, 02:50 AM   #11
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by EClaire View Post
That fix will certainly be more efficient than the method I was using. But I still hope to fix the root cause so that I don't have to keep editing all my files after conversion.

What I can't understand, if the problem is in the code I'm inputting, is why the problem did not occur with whatever version of Calibre I was using a year ago. I have confirmed different results with the exact same html source file.

Thanks.
Possibly you are seeing a bug introduced by fixing a different bug? Check regression for more information. Now Kovid is aware of the issue and given his usual speedy bug extermination habit, the issue should be resolved fairly shortly. What I suggested is a work-around until then.
DNSB is offline   Reply With Quote
Old 01-01-2019, 04:29 PM   #12
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
Thanks for the advice. I'll doctor the files I need this week and hope for a root fix soon.
EClaire is offline   Reply With Quote
Old 01-02-2019, 02:47 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
it's caused by your original Sample.html file having an unclosed <a> tag in it.

Code:
<p class="centered"><span class="centered"><a id="titlebookmark1"><img src="NBCover.jpg" alt="Never Buried Cover" /></span></p>
the reason it has started happening with never versions fo calibre is that the HTML parser has been changed to comply with the HTML 5 spec and that requires unclosed tags like that one to be handled like this. Close the ,a> tag and you will be fine.
kovidgoyal is offline   Reply With Quote
Old 01-02-2019, 08:48 AM   #14
EClaire
Junior Member
EClaire began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2018
Device: Kobo forma
That makes perfect sense! I see exactly what you mean. Thanks so much, Kovid. I will go back and donate to you again!
EClaire is offline   Reply With Quote
Reply

Tags
hyperlink, toc


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Category Editor (Tags) - Manage Tags Pachuqismo Library Management 5 03-10-2021 02:19 PM
Mass delete of unpopular tags jacr Library Management 25 07-21-2018 10:43 PM
disable insertion of page numbers Taantric Conversion 2 01-23-2012 05:42 PM
Amazon Tags - Popular tags vs Unique tags. chrisanthropic Writers' Corner 6 09-19-2011 11:18 PM
linux kernel module compilation + insertion gdkags Kindle Developer's Corner 3 09-21-2010 05:27 PM


All times are GMT -4. The time now is 09:06 PM.


MobileRead.com is a privately owned, operated and funded community.