URI issues

odedta · 04-12-2014, 01:05 PM

Hello,

When I validate an ePub I made using: http://validator.idpf.org/application/validate

I get an error saying:

Quote:

value of attribute "href" is invalid; must be a URI

Reading online I see that what might cause this issue is the ampersand symbols in the url given at the href attribute.

example link that gives an error:

Quote:

http://hebrewbooks.org/pdfpager.aspx...=&pgnum=19

How can I get those links to pass validation and still make them go where they should?

On another note, I have a file that is 372KB and Calibre Book check spits out that the file is too large. On which specific eBooks readers there might be a problem with those files?
Does this limitation still exist in ePub 3? I don't mind performance issues.

mrmikel · 04-13-2014, 07:01 AM

If you only care about ipads and epub3 reader programs, then it doesn't matter how big the files are. It will make many other readers either slow down or completely fail to open.

eschwartz · 04-13-2014, 09:22 AM

If the big file has multiple chapters in it, you should split anyway, for organizational and chapter-breaking purposes.

skreutzer · 04-13-2014, 01:46 PM

Your URL validated perfectly well in an EPUB2, if placed in an XHTML 1.1 file like this:

Code:

<a href="http://hebrewbooks.org/pdfpager.aspx?req=22413&amp;st=&amp;pgnum=19">Test</a>

The only way I found to reproduce the mentioned error message was something like

Code:

<a href="http:">Test</a>

where the syntax of an URI specification is invalid, because otherwise epubcheck would complain about missing files or wouldn't complain at all. It would be of great help if you could specify in greater detail how your EPUB (especially the <a> element) looks like.

odedta · 04-14-2014, 05:47 AM

Thanks for the answers guys!

I have created a sample ePub 3 file which will be validated if it wasn't for the URI error I get, please view the attached file.

skreutzer · 04-14-2014, 01:41 PM

Well, since you've provided an EPUB3 example in your last post, I'm concluding that you're aiming to construct an EPUB3 file instead of EPUB2 (which wasn't explicitly stated in your initial post). Depending on the version of the EPUB standard, the package and the packaged files have to look differently. Validating your example file with the IDPF EPUB validator results in

Code:

Type: ERROR, File: Text/index.html, Line: 8, Position: 786, Message: value of attribute "href" is invalid; must be a UR

With this information, the source of the problem can be localized. After unpacking your example file and looking at the mentioned index.html file in the "Text" subdirectory, it already looks suspicious, because EPUB3 uses HTML5, but index.html looks like an incomplete XHTML file. After validating index.html with the HTML validator of W3C, it at first seems to be OK, but as the validator points out in the warning message

Code:

No DOCTYPE found! Checking with default XHTML 1.0 Transitional Document Type.

that's only because the validator wasn't instructed to do HTML5 validation, and so he picked XHTML 1.0 Transitional, which results in validation success. However, in order to trigger HTML5 validation for the W3C validator and statisfy the requirement of HTML5 for EPUB3, you need to place the HTML5 doctype

Code:

<!doctype html>

between the XML processing instruction and the root element:

Code:

<?xml version='1.0' encoding='utf-8'?>
<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">

Now validating with the W3C validator leads to a rather different result: 3 errors, 2 warnings. The first complaint is

Code:

The character encoding was not declared. Proceeding using windows-1252.

and the second is

Code:

Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) <?xml version='1.0' encoding='utf-8'?>

which are both somewhat connected. Unfortunately, for whatever reason, the W3C designed HTML5 with XML incompatibility, and so there's no way to specify the character encoding in the XML processing instruction as one would usually expect. Instead, the encoding needs to be specified as meta-element in the head area:

Code:

<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>URI test</title>
    <meta charset="utf-8"/>
    [...]
  </head>
  [...]
</html>

while the XML processing instruction needs to be removed. After that, 1 error and 1 warning are remaining:

Code:

Bad value http://www.hebrewbooks.org/pdfpager.aspx?sits=1&req=36784&st=%u05DE%u05D9%20%u05E9%u05D9%u05E9%20%u05DC%u05D5%20%u05E7%u05E6%u05EA%20%u05E9%u05E8%u05E8%u05D4 for attribute href on element a: Percentage ("%") is not followed by two hexadecimal digits.

Obviously, in difference to your initial post, the URL is much longer than the portion you've posted. The error message already points out the problem: URL escaping is done with a percentage character, followed by two digits. In your URL, the percentage character is followed by the character 'u', and it absolutely looks like this was meant to do UTF-8 character escaping, because 'u' will probably stay for Unicode, and hexadecimal 05DE is the HEBREW LETTER MEM (מ). The main question which needs to be answered is, if the referenced website expects this data in this invalid format, and indeed, if the invalid URL is browsed, it leads to a different result than an URL without the st-parameter in HTTP-GET. To preserve the percentage character while still being compliant with URL escaping rules, the percentage character itself needs to be URL escaped, because otherwise it would get interpreted as the start marker for a regular URL escaping entity: just replace every percent character with its URL encoded representation %25, so %u05DE will get %25u05DE. See

http://en.wikipedia.org/wiki/Percent-encoding

for more details on URL escaping. After that, the URL still won't pass HTML5 validation, because you're still missing the XML entity encoding of the ampersand with &, which was present in your initially posted link. After this last adjustment, index.html is valid HTML5, and therefore shouldn't cause any further problems when packaged to EPUB3. I haven't tried to package the files together and validate the resulting EPUB3, because I guess that there will be HTML5 invalidity in the other files as well, which you need to fix. If you need a packaging tool for EPUB3, just let me know, I could develop a simple one which would be absolutely sufficient for this kind of files. In any case, I don't know how you got to those HTML files, if you've written them by yourself or if you obtained them from an application, but whoever/whatever is responsible for producing this files seems to be not overly concerned about web standards, but indeed should.

odedta · 04-14-2014, 05:34 PM

First of all I have to say i'm impressed and surprised by your extensive reply, it's not taken for granted and I want to thank you for explaining everything in such great detail.

I am familiar with HTML 4.01 and HTML5 standards, however, it never crossed my mind to use the doctype declaration since other ePubs were passing validation via ePubCheck for ePub3. Thanks for pointing out the correct form, I was wondering actually why Calibre automatically inserts a meta tag for charset, I guess next time I need to stop and think before I do

I have used an online URL escape tool via google search, input data was:

Quote:

http://hebrewbooks.org/pdfpager.aspx...=&pgnum=19

Output data:

Quote:

http%3A%2F%2Fhebrewbooks.org%2Fpdfpager.aspx%3Freq %3D22413%26amp%3Bst%3D%26amp%3Bpgnum%3D19

Now the:

Quote:

value of attribute "href" is invalid; must be a URI

error is gone but I get a new one:

Quote:

'Text/http://hebrewbooks.org/pdfpager.aspx': referenced resource missing in the package.

Do I really need to declare each link resource in the content.opf file for it to get passed validation or am I doing something wrong?

Calibre says:

Quote:

The resource pointed to by this link does not exist. You should either fix, or remove the link.

I assume the url conversion is not supported or something similar...

skreutzer · 04-15-2014, 01:21 PM

I wouldn't replace :// with %3A%2F%2F, there's no reason to do that, it might even cause trouble in some instances. You only need to URL escape characters which are invalid for URLs, the protocol specification by :// and forward slashes are allowed in general, except in the arguments of HTTP-GET data.

If your href attribute doesn't start with a protocol specification (like http://, file://, ftp:// ...), its value will get interpreted as a reference to a local file, as a path relative to the location of the current document.

Code:

Text/http://hebrewbooks.org/pdfpager.aspx

therefore references a file

Code:

http://hebrewbooks.org/pdfpager.aspx

in the

Code:

Text

subdirectory of the directory in which the HTML file is located, which obviously doesn't make any sense, because you're actually trying to create a HTTP link to an external resource. Solution: just remove the 'Text' part, you might have copied and pasted it in error.

eschwartz · 04-17-2014, 02:05 AM

the Text part is from the relative linking. try pasting in (to the url escape tool) only the part after the "?" in the link.

That should properly escape only the parts that need to be escaped.

odedta · 04-17-2014, 09:27 AM

Quote:

Originally Posted by eschwartz

the Text part is from the relative linking. try pasting in (to the url escape tool) only the part after the "?" in the link.

That should properly escape only the parts that need to be escaped.

Exactly what I was thinking! did that and it passes validation, thanks eschwartz and skreutzer

eschwartz · 04-17-2014, 12:46 PM

Quote:

Originally Posted by odedta

Exactly what I was thinking! did that and it passes validation, thanks eschwartz and skreutzer

04-13-2014, 01:46 PM	#4
skreutzer Software Developer Posts: 190 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	Your URL validated perfectly well in an EPUB2, if placed in an XHTML 1.1 file like this: Code: <a href="http://hebrewbooks.org/pdfpager.aspx?req=22413&st=&pgnum=19">Test</a> The only way I found to reproduce the mentioned error message was something like Code: <a href="http:">Test</a> where the syntax of an URI specification is invalid, because otherwise epubcheck would complain about missing files or wouldn't complain at all. It would be of great help if you could specify in greater detail how your EPUB (especially the <a> element) looks like.

04-14-2014, 01:41 PM	#6
skreutzer Software Developer Posts: 190 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	Well, since you've provided an EPUB3 example in your last post, I'm concluding that you're aiming to construct an EPUB3 file instead of EPUB2 (which wasn't explicitly stated in your initial post). Depending on the version of the EPUB standard, the package and the packaged files have to look differently. Validating your example file with the IDPF EPUB validator results in Code: Type: ERROR, File: Text/index.html, Line: 8, Position: 786, Message: value of attribute "href" is invalid; must be a UR With this information, the source of the problem can be localized. After unpacking your example file and looking at the mentioned index.html file in the "Text" subdirectory, it already looks suspicious, because EPUB3 uses HTML5, but index.html looks like an incomplete XHTML file. After validating index.html with the HTML validator of W3C, it at first seems to be OK, but as the validator points out in the warning message Code: No DOCTYPE found! Checking with default XHTML 1.0 Transitional Document Type. that's only because the validator wasn't instructed to do HTML5 validation, and so he picked XHTML 1.0 Transitional, which results in validation success. However, in order to trigger HTML5 validation for the W3C validator and statisfy the requirement of HTML5 for EPUB3, you need to place the HTML5 doctype Code: <!doctype html> between the XML processing instruction and the root element: Code: <?xml version='1.0' encoding='utf-8'?> <!doctype html> <html xmlns="http://www.w3.org/1999/xhtml"> Now validating with the W3C validator leads to a rather different result: 3 errors, 2 warnings. The first complaint is Code: The character encoding was not declared. Proceeding using windows-1252. and the second is Code: Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) <?xml version='1.0' encoding='utf-8'?> which are both somewhat connected. Unfortunately, for whatever reason, the W3C designed HTML5 with XML incompatibility, and so there's no way to specify the character encoding in the XML processing instruction as one would usually expect. Instead, the encoding needs to be specified as meta-element in the head area: Code: <!doctype html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>URI test</title> <meta charset="utf-8"/> [...] </head> [...] </html> while the XML processing instruction needs to be removed. After that, 1 error and 1 warning are remaining: Code: Bad value http://www.hebrewbooks.org/pdfpager.aspx?sits=1&req=36784&st=%u05DE%u05D9%20%u05E9%u05D9%u05E9%20%u05DC%u05D5%20%u05E7%u05E6%u05EA%20%u05E9%u05E8%u05E8%u05D4 for attribute href on element a: Percentage ("%") is not followed by two hexadecimal digits. Obviously, in difference to your initial post, the URL is much longer than the portion you've posted. The error message already points out the problem: URL escaping is done with a percentage character, followed by two digits. In your URL, the percentage character is followed by the character 'u', and it absolutely looks like this was meant to do UTF-8 character escaping, because 'u' will probably stay for Unicode, and hexadecimal 05DE is the HEBREW LETTER MEM (מ). The main question which needs to be answered is, if the referenced website expects this data in this invalid format, and indeed, if the invalid URL is browsed, it leads to a different result than an URL without the st-parameter in HTTP-GET. To preserve the percentage character while still being compliant with URL escaping rules, the percentage character itself needs to be URL escaped, because otherwise it would get interpreted as the start marker for a regular URL escaping entity: just replace every percent character with its URL encoded representation %25, so %u05DE will get %25u05DE. See http://en.wikipedia.org/wiki/Percent-encoding for more details on URL escaping. After that, the URL still won't pass HTML5 validation, because you're still missing the XML entity encoding of the ampersand with &, which was present in your initially posted link. After this last adjustment, index.html is valid HTML5, and therefore shouldn't cause any further problems when packaged to EPUB3. I haven't tried to package the files together and validate the resulting EPUB3, because I guess that there will be HTML5 invalidity in the other files as well, which you need to fix. If you need a packaging tool for EPUB3, just let me know, I could develop a simple one which would be absolutely sufficient for this kind of files. In any case, I don't know how you got to those HTML files, if you've written them by yourself or if you obtained them from an application, but whoever/whatever is responsible for producing this files seems to be not overly concerned about web standards, but indeed should.

04-15-2014, 01:21 PM	#8
skreutzer Software Developer Posts: 190 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	I wouldn't replace :// with %3A%2F%2F, there's no reason to do that, it might even cause trouble in some instances. You only need to URL escape characters which are invalid for URLs, the protocol specification by :// and forward slashes are allowed in general, except in the arguments of HTTP-GET data. If your href attribute doesn't start with a protocol specification (like http://, file://, ftp:// ...), its value will get interpreted as a reference to a local file, as a path relative to the location of the current document. Code: Text/http://hebrewbooks.org/pdfpager.aspx therefore references a file Code: http://hebrewbooks.org/pdfpager.aspx in the Code: Text subdirectory of the directory in which the HTML file is located, which obviously doesn't make any sense, because you're actually trying to create a HTTP link to an external resource. Solution: just remove the 'Text' part, you might have copied and pasted it in error.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
problematic uri, containing a redirection	atlantique	Recipes	4	05-03-2012 01:02 AM

04-13-2014, 07:01 AM	#2
mrmikel Color me gone Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300	If you only care about ipads and epub3 reader programs, then it doesn't matter how big the files are. It will make many other readers either slow down or completely fail to open.

04-13-2014, 09:22 AM	#3
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	If the big file has multiple chapters in it, you should split anyway, for organizational and chapter-breaking purposes.

04-17-2014, 02:05 AM	#9
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	the Text part is from the relative linking. try pasting in (to the url escape tool) only the part after the "?" in the link. That should properly escape only the parts that need to be escaped.

Advert

Advert