![]() |
#1 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
How to change wrong xml:lang in multiple files?
I have books in English and German in my library. When I add a new book, I download the metadata, do an EPUB to EPUB conversion and then polish it. I don't know when this error occured, but suddenly all my German books have the wrong language tag embedded in the title page file.
When I open an EPUB in German with the editor, the titlepage shows this line: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">. Is there a way to change this to xml:lang="de" in all affected EPUB files at once? |
![]() |
![]() |
![]() |
#2 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
I've done some testing and it seems that Calibre adds xml:lang="en" during conversion, although the metadata show German as the correct language.
I've tried adding xml:lang="de" as a replacement text to the "search & replace" section in the conversion wizard, but it doesn't work. After conversion, the titlepage still shows xml:lang="en". |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The titlepage language is english, it doesnt contain any german text., all it contains is an image.
|
![]() |
![]() |
![]() |
#4 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
The problem is that xml:lang overrides the metadata language setting. So KOReader on my Pocketbooks recognizes those books as English, not as German, and sets hyphenation accordingly.
When I change it manually to xml:lang="de", the files are recognized as German in KOReader. |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That would be a bug in the pocketbook. lang attributes on individual html files must not override book language.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
If the titlepage only contains an image, why should the language be defined for it? Wouldn't it be better to have xml:lang be the same language as the actual language of the book?
It would be great if that could be changed. |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Because the execrable epubcheck complains if there is no language.
|
![]() |
![]() |
![]() |
#8 | |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 14,016
Karma: 105092227
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
Say an English version of "Das Boot". Or a German version of "Three Men in a Boat". It's a bit redundant but does make some sense. |
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not to mention that while the *calibre* titlepage is image only, in general titlepages may contain text.
|
![]() |
![]() |
![]() |
#10 | |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
Quote:
And if the titlepage contains any text, it is usually in the same language as the book, as well. |
|
![]() |
![]() |
![]() |
#11 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
The developers of KOReader referred me to this source which says that xml:lang on an element has priority over what's specified on a parent element or in some kind of global metadata. So, it's not an error on their part.
https://html.spec.whatwg.org/multipa...html#attr-lang |
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What nonsense. The lang attribute in an individual html file inside an ebook DOES NOT override the overall language for the book. Your PocketBook developers dont seem to understand what an ebook is. The spec they quote is for a *single html document*, which is completely irrelevant for an ebook tht can be composed of multiple html documents each having their own potentially conflicting lang attributes.
The overall language of an ebook must be read from the metadata of the book. In the case fo EPUB books that means from the <metadata> section of the OPF file. Relevant spec: http://idpf.org/epub/30/spec/epub30-...-metadata-elem |
![]() |
![]() |
![]() |
#13 | |
Not Quite Dead
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 195
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
Quote:
I never noticed that the Calibre titlepage is handled differently before this discussion. All my Spanish books have attr "en" there--but if I let Calibre make a jacket page, there is no lang attr at all. |
|
![]() |
![]() |
![]() |
#14 |
Librocubicularist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 617
Karma: 11908623
Join Date: Dec 2013
Location: Germany
Device: PocketBook InkPad 3 & Touch HD 3, Kobo Clara HD, Kindle Paperwhite 5
|
@Brett: That happens to my books when I do a spellcheck, too. That's why I thought that xml:lang overrides other language tags. At least this behaviour should be fixed, then.
|
![]() |
![]() |
![]() |
#15 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
xml:lang (or actually just lang in modern html) sets the language for the contents of the tag it appears on, *and that is all*. Not the whole book, or even not the whole html file (assuming the tag is not the root <html> tag). And spellcheck respects that, as it is supposed to. That has *nothing* to do with what the overall language for the book is.
|
![]() |
![]() |
![]() |
Tags |
metadata language |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Add xml:lang to ePub | abraum | ePub | 14 | 07-12-2025 08:08 PM |
xml:lang | tage fredheim | Conversion | 5 | 04-25-2019 06:45 AM |
xml:lang oddities | jcsalomon | ePub | 1 | 06-06-2016 05:28 PM |
xml:lang empty (pdf to epub) | fxp33 | Conversion | 3 | 05-07-2015 11:40 PM |
After merging all the .xml files, how do you divide it back into .xml files? | automa | Sigil | 10 | 08-13-2013 07:43 AM |