![]() |
#16 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,042
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
I ran the following on the strace output:
grep " E" strace_flightcrew.log | grep -v ENOENT and see EISDIR errors generated when files with a full directory path but no name are attempted to be opened. That seems suspicious to me. The ENOENT errors are expected, since stat is being used to see if a file or directory already exists before it is attempted to be created. |
![]() |
![]() |
![]() |
#17 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Okay a ran flightcrew-plugin from the command line inside lldb focused on an unzipped folder holding your test ebook and then put a breakpoint to catch the throwing of an exception.
Here is the backtrace: Code:
(lldb) process launch dug/ Process 808 launched: './flightcrew-plugin' (x86_64) Error during run: std::exception Process 808 exited with status = 1 (0x00000001) (lldb) b __cxa_throw Breakpoint 1: where = libc++abi.dylib`__cxa_throw, address = 0x00007fff501bb1f4 (lldb) run Process 816 launched: './flightcrew-plugin' (x86_64) Process 816 stopped * thread #1: tid = 0x22343, 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw libc++abi.dylib`__cxa_throw: -> 0x7fff501bb1f4 <+0>: pushq %rbp 0x7fff501bb1f5 <+1>: movq %rsp, %rbp 0x7fff501bb1f8 <+4>: pushq %r15 0x7fff501bb1fa <+6>: pushq %r14 (lldb) bt * thread #1: tid = 0x22343, 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw frame #1: 0x000000010002dc09 flightcrew-plugin`void boost::throw_exception<FlightCrew::PathNotInUtf8>(FlightCrew::PathNotInUtf8 const&) + 201 frame #2: 0x0000000100029d63 flightcrew-plugin`void boost::exception_detail::throw_exception_<FlightCrew::PathNotInUtf8>(FlightCrew::PathNotInUtf8 const&, char const*, char const*, int) + 179 frame #3: 0x00000001000293ed flightcrew-plugin`FlightCrew::Util::Utf8PathToBoostPath(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 765 frame #4: 0x000000010006467b flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetLinkedResourcesFromOps(boost::filesystem::path const&) + 683 frame #5: 0x0000000100063590 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetLinkedResourcesFromAllOps(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 208 frame #6: 0x00000001000630d6 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetDirectlyReachableResources(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 54 frame #7: 0x000000010005fb42 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::DetermineReachableResources(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 82 frame #8: 0x000000010005ebc9 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::ValidateXml(xercesc_3_1::DOMDocument const&, boost::filesystem::path const&) + 105 frame #9: 0x0000000100023ee3 flightcrew-plugin`FlightCrew::ValidateOpf(boost::filesystem::path const&) + 451 frame #10: 0x0000000100014ccb flightcrew-plugin`FlightCrew::DescendToContentXml(boost::filesystem::path const&) + 363 frame #11: 0x0000000100015a82 flightcrew-plugin`FlightCrew::ValidateEpubRootFolder(boost::filesystem::path const&) + 402 frame #12: 0x000000010000ae95 flightcrew-plugin`FlightCrew::ValidateEpubRootFolder(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 37 frame #13: 0x0000000100001bf2 flightcrew-plugin`ValidateFiles(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 50 frame #14: 0x0000000100002695 flightcrew-plugin`main + 1445 frame #15: 0x00007fff52118115 libdyld.dylib`start + 1 frame #16: 0x00007fff52118115 libdyld.dylib`start + 1 I will try to get more info. |
![]() |
![]() |
Advert | |
|
![]() |
#18 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
From looking closer at the code, it is walking the xhtml or ncx to check for the existence of every link when it runs into something that will not convert to/from utf-8
|
![]() |
![]() |
![]() |
#19 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Without recompiling it is hard to track down exactly what is the issue but from searching your ebook for href attributes, you seem to be using external links to wikipedia and even a mailto href.
Please try removing those and see if you exception goes away. If so, we may be able to track down what is going on. |
![]() |
![]() |
![]() |
#20 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,358
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Just deleting the "The_Bellybuttons.xhtml" file from the epub was enough to get rid of the exception for me. So there's something in that file.
|
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
And by a binary search of hrefs in that file, it seems the culprit is this line:
href="https://en.wikipedia.org/wiki/Chamb%C3%A9ry" So the boost library to unquote that url seems to be barfing on Chambéry and utf-8 for é is the bytesequence 0xc3 0xa9 so I think this is a valid url The problem must be someplace in boosts library but I am not sure where or a missing url unquote someplace. |
![]() |
![]() |
![]() |
#22 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
|
Hello guys!
I compiled a flightcrew latest release from this link: https://github.com/Sigil-Ebook/fligh...e/0.9.2.tar.gz and it still shows me: 0.9.1 Is it ok ? I don't think so.. Quote:
About epubcheck vs flightcrew. Actually, we use epubcheck and then validate epub againt flightcrew to avoid any risk of uploading bad epub/mobi to amazon I tried to compile also flightcrew latest release on my local pc (gentoo multilib), still the same error. |
|
![]() |
![]() |
![]() |
#23 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
|
I'll check this problem with links, thank you for the help.
|
![]() |
![]() |
![]() |
#24 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,358
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
|
|
![]() |
![]() |
![]() |
#25 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Okay, I found out that my change just hid the problem by causing a normal error so flightcrew never reached the problem statement.
I rebuilt flightcrew with some debugging statements added and received the following url as causing the incorrect utf8-encoding problem: Code:
Utf8PathToBoostPath: https://www.normacomics.com/ficha.as...crees_que_eres Error during run: std::exception Code:
The_Bellybuttons.xhtml: <a class="external text el" href="https://www.normacomics.com/ficha.asp?562829001/0/ombligos_01:_%BFtu_quien_te_crees_que_eres"> It is in fact an unconverted latin-1 character: BF ¿ ¿ inverted question mark So fc in trying to test this href tries to decode the path and finds a character that is not a utf-8 character. The problem is that servers do not need to use utf-8 for their internal files and paths and as a result you can end up with strange urls that can not be represented in a utf-8 encoded xhtml file. I am not sure what to do about this. An epub with lots of external http:// links will simply not work on many e-readers and on e-readers that have no network connection. AFAIK, it is illegal/discouraged in epub2 to load external resources that way. Epub2 is not a website in a box and was never meant to be. They are allowed in epub3 for audio and video resources (not text) but only if marked as external resources and listed in the epub3 manifest as such. The real issue is that Old servers can still use file paths and things that are in latin-1 (although this is highly discouraged) but urls in xhtml inside an epub must be represented in the encoding of the xhtml page and that is utf-8. Either way, the correct thing to do in this case is not tell flightcrew your xhtml file is encoded at utf-8 and then have a url in it that is latin-1 encoded but hidden behind an urlencoding. Last edited by KevinH; 03-01-2018 at 04:37 PM. |
![]() |
![]() |
![]() |
#26 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
FWIW: The proper utf-8 encoding for that character is %C2%BF
After changing that one external url, flightcrew passed with no errors. Manually loading this url in either encoding into a browser results in page not found so the url is not currently valid anyway. Last edited by KevinH; 03-01-2018 at 10:45 AM. |
![]() |
![]() |
![]() |
#27 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Okay, I have now pushed commits to flightcrew master to hopefully handle this a bit better.
Instead of silently barfing up an exception and terminating when an url to an external resource is not properly utf-8 encoded, flightcrew will now simply print a a warning to stderr so that the end user knows something is up. Flightcrew will not report a real error or warning in this case as that path may be a legitimate file path on that external server and it is properly url encoded latin-1. Here is a result of flightcrew-plugin built with the new code running on an unpacked copy of your problem epub: Code:
KevinsiMac:Desktop kbhend$ ./flightcrew-plugin hug/ Warning: URL Not properly utf-8 encoded: https://www.normacomics.com/ficha.as...crees_que_eres No problems found. So please pull from flightcrew master and rebuild and let me know if this works better for you. Last edited by KevinH; 03-01-2018 at 12:11 PM. |
![]() |
![]() |
![]() |
#28 | ||
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
|
Quote:
Thanks for extra information. I get this result using python urllib.parse.quote method And this url works fine in a browser. https://en.wikipedia.org/wiki/Chamb%C3%A9ry But the one you mentioned, this one: https://en.wikipedia.org/wiki/Chamb%C2%BFry seems to me incorrect. Quote:
|
||
![]() |
![]() |
![]() |
#29 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,483
Karma: 5703586
Join Date: Nov 2009
Device: many
|
No,
The actual error was after that line in that same file. See this post above for the real problem: https://www.mobileread.com/forums/sh...2&postcount=25 That is an incorrect url as it uses only %BF (a Latin 1 invert question) but that is not a proper utf-8 character. That character in utf-8 is 0xc2 0xbf and so the url is not utf-8 compliant. I have modified flightcrew master to spit out a warning on this type of problem: See this post: https://www.mobileread.com/forums/sh...4&postcount=27 KevinH See Quote:
|
|
![]() |
![]() |
![]() |
#30 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
@Sergey...I agree with Doitsu -- it's better to use Epubcheck because it's more up to date. Or you could perhaps use the validator from IDPF online, shown here:
http://validator.idpf.org/ Doing the above should, at least, confirm whether your epub is OK or not. |
![]() |
![]() |
![]() |
Tags |
flightcrew, std |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Standalone FlightCrew | capidamonte | Sigil | 8 | 04-25-2012 05:20 PM |
Is this a FlightCrew bug? | JSWolf | Sigil | 5 | 10-04-2011 04:01 AM |
Bug in FlightCrew | JSWolf | Sigil | 11 | 07-30-2011 04:12 AM |
Standalone flightcrew? | bfollowell | Sigil | 4 | 06-30-2011 11:21 AM |
FlightCrew and Norton IS | bobcdy | ePub | 5 | 11-16-2010 05:28 PM |