Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 04-12-2014, 01:05 PM   #1
odedta
Addict
odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.
 
Posts: 398
Karma: 96448
Join Date: Dec 2013
Device: iPad
URI issues

Hello,

When I validate an ePub I made using: http://validator.idpf.org/application/validate

I get an error saying:
Quote:
value of attribute "href" is invalid; must be a URI
Reading online I see that what might cause this issue is the ampersand symbols in the url given at the href attribute.

example link that gives an error:
How can I get those links to pass validation and still make them go where they should?

On another note, I have a file that is 372KB and Calibre Book check spits out that the file is too large. On which specific eBooks readers there might be a problem with those files?
Does this limitation still exist in ePub 3? I don't mind performance issues.
odedta is offline   Reply With Quote
Old 04-13-2014, 07:01 AM   #2
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If you only care about ipads and epub3 reader programs, then it doesn't matter how big the files are. It will make many other readers either slow down or completely fail to open.
mrmikel is offline   Reply With Quote
Advert
Old 04-13-2014, 09:22 AM   #3
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
If the big file has multiple chapters in it, you should split anyway, for organizational and chapter-breaking purposes.
eschwartz is offline   Reply With Quote
Old 04-13-2014, 01:46 PM   #4
skreutzer
Software Developer
skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.
 
skreutzer's Avatar
 
Posts: 190
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
Your URL validated perfectly well in an EPUB2, if placed in an XHTML 1.1 file like this:

Code:
<a href="http://hebrewbooks.org/pdfpager.aspx?req=22413&amp;st=&amp;pgnum=19">Test</a>
The only way I found to reproduce the mentioned error message was something like

Code:
<a href="http:">Test</a>
where the syntax of an URI specification is invalid, because otherwise epubcheck would complain about missing files or wouldn't complain at all. It would be of great help if you could specify in greater detail how your EPUB (especially the <a> element) looks like.
skreutzer is offline   Reply With Quote
Old 04-14-2014, 05:47 AM   #5
odedta
Addict
odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.
 
Posts: 398
Karma: 96448
Join Date: Dec 2013
Device: iPad
URI test

Thanks for the answers guys!

I have created a sample ePub 3 file which will be validated if it wasn't for the URI error I get, please view the attached file.
Attached Files
File Type: epub uri-test.epub (76.7 KB, 410 views)
odedta is offline   Reply With Quote
Advert
Old 04-14-2014, 01:41 PM   #6
skreutzer
Software Developer
skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.
 
skreutzer's Avatar
 
Posts: 190
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
Well, since you've provided an EPUB3 example in your last post, I'm concluding that you're aiming to construct an EPUB3 file instead of EPUB2 (which wasn't explicitly stated in your initial post). Depending on the version of the EPUB standard, the package and the packaged files have to look differently. Validating your example file with the IDPF EPUB validator results in

Code:
Type: ERROR, File: Text/index.html, Line: 8, Position: 786, Message: value of attribute "href" is invalid; must be a UR
With this information, the source of the problem can be localized. After unpacking your example file and looking at the mentioned index.html file in the "Text" subdirectory, it already looks suspicious, because EPUB3 uses HTML5, but index.html looks like an incomplete XHTML file. After validating index.html with the HTML validator of W3C, it at first seems to be OK, but as the validator points out in the warning message

Code:
No DOCTYPE found! Checking with default XHTML 1.0 Transitional Document Type.
that's only because the validator wasn't instructed to do HTML5 validation, and so he picked XHTML 1.0 Transitional, which results in validation success. However, in order to trigger HTML5 validation for the W3C validator and statisfy the requirement of HTML5 for EPUB3, you need to place the HTML5 doctype

Code:
<!doctype html>
between the XML processing instruction and the root element:

Code:
<?xml version='1.0' encoding='utf-8'?>
<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
Now validating with the W3C validator leads to a rather different result: 3 errors, 2 warnings. The first complaint is

Code:
The character encoding was not declared. Proceeding using windows-1252.
and the second is

Code:
Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.) <?xml version='1.0' encoding='utf-8'?>
which are both somewhat connected. Unfortunately, for whatever reason, the W3C designed HTML5 with XML incompatibility, and so there's no way to specify the character encoding in the XML processing instruction as one would usually expect. Instead, the encoding needs to be specified as meta-element in the head area:

Code:
<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>URI test</title>
    <meta charset="utf-8"/>
    [...]
  </head>
  [...]
</html>
while the XML processing instruction needs to be removed. After that, 1 error and 1 warning are remaining:

Code:
Bad value http://www.hebrewbooks.org/pdfpager.aspx?sits=1&req=36784&st=%u05DE%u05D9%20%u05E9%u05D9%u05E9%20%u05DC%u05D5%20%u05E7%u05E6%u05EA%20%u05E9%u05E8%u05E8%u05D4 for attribute href on element a: Percentage ("%") is not followed by two hexadecimal digits.
Obviously, in difference to your initial post, the URL is much longer than the portion you've posted. The error message already points out the problem: URL escaping is done with a percentage character, followed by two digits. In your URL, the percentage character is followed by the character 'u', and it absolutely looks like this was meant to do UTF-8 character escaping, because 'u' will probably stay for Unicode, and hexadecimal 05DE is the HEBREW LETTER MEM (מ). The main question which needs to be answered is, if the referenced website expects this data in this invalid format, and indeed, if the invalid URL is browsed, it leads to a different result than an URL without the st-parameter in HTTP-GET. To preserve the percentage character while still being compliant with URL escaping rules, the percentage character itself needs to be URL escaped, because otherwise it would get interpreted as the start marker for a regular URL escaping entity: just replace every percent character with its URL encoded representation %25, so %u05DE will get %25u05DE. See

http://en.wikipedia.org/wiki/Percent-encoding

for more details on URL escaping. After that, the URL still won't pass HTML5 validation, because you're still missing the XML entity encoding of the ampersand with &amp;, which was present in your initially posted link. After this last adjustment, index.html is valid HTML5, and therefore shouldn't cause any further problems when packaged to EPUB3. I haven't tried to package the files together and validate the resulting EPUB3, because I guess that there will be HTML5 invalidity in the other files as well, which you need to fix. If you need a packaging tool for EPUB3, just let me know, I could develop a simple one which would be absolutely sufficient for this kind of files. In any case, I don't know how you got to those HTML files, if you've written them by yourself or if you obtained them from an application, but whoever/whatever is responsible for producing this files seems to be not overly concerned about web standards, but indeed should.
skreutzer is offline   Reply With Quote
Old 04-14-2014, 05:34 PM   #7
odedta
Addict
odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.
 
Posts: 398
Karma: 96448
Join Date: Dec 2013
Device: iPad
First of all I have to say i'm impressed and surprised by your extensive reply, it's not taken for granted and I want to thank you for explaining everything in such great detail.

I am familiar with HTML 4.01 and HTML5 standards, however, it never crossed my mind to use the doctype declaration since other ePubs were passing validation via ePubCheck for ePub3. Thanks for pointing out the correct form, I was wondering actually why Calibre automatically inserts a meta tag for charset, I guess next time I need to stop and think before I do

I have used an online URL escape tool via google search, input data was:
Output data:
Quote:
http%3A%2F%2Fhebrewbooks.org%2Fpdfpager.aspx%3Freq %3D22413%26amp%3Bst%3D%26amp%3Bpgnum%3D19
Now the:
Quote:
value of attribute "href" is invalid; must be a URI
error is gone but I get a new one:
Quote:
'Text/http://hebrewbooks.org/pdfpager.aspx': referenced resource missing in the package.
Do I really need to declare each link resource in the content.opf file for it to get passed validation or am I doing something wrong?

Calibre says:
Quote:
The resource pointed to by this link does not exist. You should either fix, or remove the link.
I assume the url conversion is not supported or something similar...

Last edited by odedta; 04-14-2014 at 05:43 PM.
odedta is offline   Reply With Quote
Old 04-15-2014, 01:21 PM   #8
skreutzer
Software Developer
skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.
 
skreutzer's Avatar
 
Posts: 190
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
I wouldn't replace :// with %3A%2F%2F, there's no reason to do that, it might even cause trouble in some instances. You only need to URL escape characters which are invalid for URLs, the protocol specification by :// and forward slashes are allowed in general, except in the arguments of HTTP-GET data.

If your href attribute doesn't start with a protocol specification (like http://, file://, ftp:// ...), its value will get interpreted as a reference to a local file, as a path relative to the location of the current document.

Code:
Text/http://hebrewbooks.org/pdfpager.aspx
therefore references a file

Code:
http://hebrewbooks.org/pdfpager.aspx
in the

Code:
Text
subdirectory of the directory in which the HTML file is located, which obviously doesn't make any sense, because you're actually trying to create a HTTP link to an external resource. Solution: just remove the 'Text' part, you might have copied and pasted it in error.
skreutzer is offline   Reply With Quote
Old 04-17-2014, 02:05 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
the Text part is from the relative linking. try pasting in (to the url escape tool) only the part after the "?" in the link.

That should properly escape only the parts that need to be escaped.
eschwartz is offline   Reply With Quote
Old 04-17-2014, 09:27 AM   #10
odedta
Addict
odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.odedta read the news today, oh boy.
 
Posts: 398
Karma: 96448
Join Date: Dec 2013
Device: iPad
Quote:
Originally Posted by eschwartz View Post
the Text part is from the relative linking. try pasting in (to the url escape tool) only the part after the "?" in the link.

That should properly escape only the parts that need to be escaped.
Exactly what I was thinking! did that and it passes validation, thanks eschwartz and skreutzer
odedta is offline   Reply With Quote
Old 04-17-2014, 12:46 PM   #11
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by odedta View Post
Exactly what I was thinking! did that and it passes validation, thanks eschwartz and skreutzer
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
problematic uri, containing a redirection atlantique Recipes 4 05-03-2012 01:02 AM


All times are GMT -4. The time now is 04:41 AM.


MobileRead.com is a privately owned, operated and funded community.