Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-28-2018, 12:18 PM   #16
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,985
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I ran the following on the strace output:

grep " E" strace_flightcrew.log | grep -v ENOENT

and see EISDIR errors generated when files with a full directory path but no name are attempted to be opened. That seems suspicious to me. The ENOENT errors are expected, since stat is being used to see if a file or directory already exists before it is attempted to be created.
rkomar is offline   Reply With Quote
Old 02-28-2018, 01:33 PM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay a ran flightcrew-plugin from the command line inside lldb focused on an unzipped folder holding your test ebook and then put a breakpoint to catch the throwing of an exception.

Here is the backtrace:
Code:
(lldb) process launch dug/
Process 808 launched: './flightcrew-plugin' (x86_64)
Error during run: std::exception
Process 808 exited with status = 1 (0x00000001) 
(lldb) b __cxa_throw
Breakpoint 1: where = libc++abi.dylib`__cxa_throw, address = 0x00007fff501bb1f4
(lldb) run
Process 816 launched: './flightcrew-plugin' (x86_64)
Process 816 stopped
* thread #1: tid = 0x22343, 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
->  0x7fff501bb1f4 <+0>: pushq  %rbp
    0x7fff501bb1f5 <+1>: movq   %rsp, %rbp
    0x7fff501bb1f8 <+4>: pushq  %r15
    0x7fff501bb1fa <+6>: pushq  %r14
(lldb) bt
* thread #1: tid = 0x22343, 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x00007fff501bb1f4 libc++abi.dylib`__cxa_throw
    frame #1: 0x000000010002dc09 flightcrew-plugin`void boost::throw_exception<FlightCrew::PathNotInUtf8>(FlightCrew::PathNotInUtf8 const&) + 201
    frame #2: 0x0000000100029d63 flightcrew-plugin`void boost::exception_detail::throw_exception_<FlightCrew::PathNotInUtf8>(FlightCrew::PathNotInUtf8 const&, char const*, char const*, int) + 179
    frame #3: 0x00000001000293ed flightcrew-plugin`FlightCrew::Util::Utf8PathToBoostPath(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 765
    frame #4: 0x000000010006467b flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetLinkedResourcesFromOps(boost::filesystem::path const&) + 683
    frame #5: 0x0000000100063590 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetLinkedResourcesFromAllOps(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 208
    frame #6: 0x00000001000630d6 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::GetDirectlyReachableResources(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 54
    frame #7: 0x000000010005fb42 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::DetermineReachableResources(boost::unordered::unordered_set<boost::filesystem::path, boost::hash<boost::filesystem::path>, std::__1::equal_to<boost::filesystem::path>, std::__1::allocator<boost::filesystem::path> > const&) + 82
    frame #8: 0x000000010005ebc9 flightcrew-plugin`FlightCrew::ReachabilityAnalysis::ValidateXml(xercesc_3_1::DOMDocument const&, boost::filesystem::path const&) + 105
    frame #9: 0x0000000100023ee3 flightcrew-plugin`FlightCrew::ValidateOpf(boost::filesystem::path const&) + 451
    frame #10: 0x0000000100014ccb flightcrew-plugin`FlightCrew::DescendToContentXml(boost::filesystem::path const&) + 363
    frame #11: 0x0000000100015a82 flightcrew-plugin`FlightCrew::ValidateEpubRootFolder(boost::filesystem::path const&) + 402
    frame #12: 0x000000010000ae95 flightcrew-plugin`FlightCrew::ValidateEpubRootFolder(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 37
    frame #13: 0x0000000100001bf2 flightcrew-plugin`ValidateFiles(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 50
    frame #14: 0x0000000100002695 flightcrew-plugin`main + 1445
    frame #15: 0x00007fff52118115 libdyld.dylib`start + 1
    frame #16: 0x00007fff52118115 libdyld.dylib`start + 1
So it is walking the OPF making sure that all files are there and runs into a filename or path that can NOT be converted to utf-8 via boost.

I will try to get more info.
KevinH is offline   Reply With Quote
Advert
Old 02-28-2018, 01:59 PM   #18
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
From looking closer at the code, it is walking the xhtml or ncx to check for the existence of every link when it runs into something that will not convert to/from utf-8
KevinH is offline   Reply With Quote
Old 02-28-2018, 02:22 PM   #19
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Without recompiling it is hard to track down exactly what is the issue but from searching your ebook for href attributes, you seem to be using external links to wikipedia and even a mailto href.

Please try removing those and see if you exception goes away. If so, we may be able to track down what is going on.
KevinH is offline   Reply With Quote
Old 02-28-2018, 04:07 PM   #20
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Just deleting the "The_Bellybuttons.xhtml" file from the epub was enough to get rid of the exception for me. So there's something in that file.
DiapDealer is online now   Reply With Quote
Advert
Old 02-28-2018, 04:47 PM   #21
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
And by a binary search of hrefs in that file, it seems the culprit is this line:

href="https://en.wikipedia.org/wiki/Chamb%C3%A9ry"

So the boost library to unquote that url seems to be barfing on Chambéry

and utf-8 for é is the bytesequence 0xc3 0xa9 so I think this is a valid url

The problem must be someplace in boosts library but I am not sure where or a missing url unquote someplace.
KevinH is offline   Reply With Quote
Old 02-28-2018, 05:35 PM   #22
Sergey Glazyrin
Junior Member
Sergey Glazyrin began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
Hello guys!
I compiled a flightcrew latest release from this link: https://github.com/Sigil-Ebook/fligh...e/0.9.2.tar.gz and it still shows me: 0.9.1
Is it ok ? I don't think so..
Quote:
sergeyg  sergeyg-gentoo  ~  tmp  flightcrew  flightcrew-0.9.2  $  bin/flightcrew-cli --version
flightcrew-cli version: 0.9.1
It still produces std::exception for this epub

About epubcheck vs flightcrew. Actually, we use epubcheck and then validate epub againt flightcrew to avoid any risk of uploading bad epub/mobi to amazon

I tried to compile also flightcrew latest release on my local pc (gentoo multilib), still the same error.
Sergey Glazyrin is offline   Reply With Quote
Old 02-28-2018, 05:37 PM   #23
Sergey Glazyrin
Junior Member
Sergey Glazyrin began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
I'll check this problem with links, thank you for the help.
Sergey Glazyrin is offline   Reply With Quote
Old 02-28-2018, 06:37 PM   #24
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Sergey Glazyrin View Post
Hello guys!
I compiled a flightcrew latest release from this link: https://github.com/Sigil-Ebook/fligh...e/0.9.2.tar.gz and it still shows me: 0.9.1
Is it ok ? I don't think so..
It's fine. We just forgot to advance the version string. You're using the latest.
DiapDealer is online now   Reply With Quote
Old 03-01-2018, 10:36 AM   #25
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, I found out that my change just hid the problem by causing a normal error so flightcrew never reached the problem statement.

I rebuilt flightcrew with some debugging statements added and received the following url as causing the incorrect utf8-encoding problem:

Code:
Utf8PathToBoostPath: https://www.normacomics.com/ficha.as...crees_que_eres
Error during run: std::exception
A grep for this string in the book html found the following:

Code:
The_Bellybuttons.xhtml:
<a class="external text el" href="https://www.normacomics.com/ficha.asp?562829001/0/ombligos_01:_%BFtu_quien_te_crees_que_eres">
And in this case flightcrew is correct, the encoded part is %BF or the byte 0xbf which is in and of itself not a valid utf-8 character.

It is in fact an unconverted latin-1 character:
BF ¿ &iquest; inverted question mark

So fc in trying to test this href tries to decode the path and finds a character that is not a utf-8 character.

The problem is that servers do not need to use utf-8 for their internal files and paths and as a result you can end up with strange urls that can not be represented in a utf-8 encoded xhtml file.

I am not sure what to do about this.

An epub with lots of external http:// links will simply not work on many e-readers and on e-readers that have no network connection.

AFAIK, it is illegal/discouraged in epub2 to load external resources that way. Epub2 is not a website in a box and was never meant to be.

They are allowed in epub3 for audio and video resources (not text) but only if marked as external resources and listed in the epub3 manifest as such.

The real issue is that Old servers can still use file paths and things that are in latin-1 (although this is highly discouraged) but urls in xhtml inside an epub must be represented in the encoding of the xhtml page and that is utf-8.

Either way, the correct thing to do in this case is not tell flightcrew your xhtml file is encoded at utf-8 and then have a url in it that is latin-1 encoded but hidden behind an urlencoding.

Last edited by KevinH; 03-01-2018 at 04:37 PM.
KevinH is offline   Reply With Quote
Old 03-01-2018, 10:41 AM   #26
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW: The proper utf-8 encoding for that character is %C2%BF

After changing that one external url, flightcrew passed with no errors.

Manually loading this url in either encoding into a browser results in page not found so the url is not currently valid anyway.

Last edited by KevinH; 03-01-2018 at 10:45 AM.
KevinH is offline   Reply With Quote
Old 03-01-2018, 11:29 AM   #27
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
Okay, I have now pushed commits to flightcrew master to hopefully handle this a bit better.

Instead of silently barfing up an exception and terminating when an url to an external resource is not properly utf-8 encoded, flightcrew will now simply print a a warning to stderr so that the end user knows something is up.

Flightcrew will not report a real error or warning in this case as that path may be a legitimate file path on that external server and it is properly url encoded latin-1.


Here is a result of flightcrew-plugin built with the new code running on an unpacked copy of your problem epub:

Code:
KevinsiMac:Desktop kbhend$ ./flightcrew-plugin hug/
Warning: URL Not properly utf-8 encoded: https://www.normacomics.com/ficha.as...crees_que_eres
No problems found.
This should enable you to track down these strange urls in your code and if needed "fix" them if they even indeed need to be "fixed".

So please pull from flightcrew master and rebuild and let me know if this works better for you.

Last edited by KevinH; 03-01-2018 at 12:11 PM.
KevinH is offline   Reply With Quote
Old 03-01-2018, 07:35 PM   #28
Sergey Glazyrin
Junior Member
Sergey Glazyrin began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2018
Device: Nexus 9
Quote:
Originally Posted by KevinH View Post
FWIW: The proper utf-8 encoding for that character is %C2%BF

After changing that one external url, flightcrew passed with no errors.

Manually loading this url in either encoding into a browser results in page not found so the url is not currently valid anyway.
Hello Kevin
Thanks for extra information.
I get this result using python urllib.parse.quote method
And this url works fine in a browser.
https://en.wikipedia.org/wiki/Chamb%C3%A9ry

But the one you mentioned, this one: https://en.wikipedia.org/wiki/Chamb%C2%BFry
seems to me incorrect.
Quote:
In [25]: from urllib.parse import unquote

In [26]: unquote(a)
Out[26]: '¿'

In [27]: unquote('%C2%BF')
Out[27]: '¿'

In [28]: unquote('%C3%A9')
Out[28]: 'é'
Sergey Glazyrin is offline   Reply With Quote
Old 03-01-2018, 09:27 PM   #29
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
No,
The actual error was after that line in that same file.

See this post above for the real problem:

https://www.mobileread.com/forums/sh...2&postcount=25

That is an incorrect url as it uses only %BF (a Latin 1 invert question) but that is not a proper utf-8 character. That character in utf-8 is 0xc2 0xbf and so the url is not utf-8 compliant.

I have modified flightcrew master to spit out a warning on this type of problem:

See this post:

https://www.mobileread.com/forums/sh...4&postcount=27

KevinH


See
Quote:
Originally Posted by Sergey Glazyrin View Post
Hello Kevin
Thanks for extra information.
I get this result using python urllib.parse.quote method
And this url works fine in a browser.
https://en.wikipedia.org/wiki/Chamb%C3%A9ry

But the one you mentioned, this one: https://en.wikipedia.org/wiki/Chamb%C2%BFry
seems to me incorrect.
KevinH is offline   Reply With Quote
Old 03-02-2018, 12:06 AM   #30
slowsmile
Witchman
slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.slowsmile ought to be getting tired of karma fortunes by now.
 
Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
@Sergey...I agree with Doitsu -- it's better to use Epubcheck because it's more up to date. Or you could perhaps use the validator from IDPF online, shown here:

http://validator.idpf.org/

Doing the above should, at least, confirm whether your epub is OK or not.
slowsmile is offline   Reply With Quote
Reply

Tags
flightcrew, std


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Standalone FlightCrew capidamonte Sigil 8 04-25-2012 05:20 PM
Is this a FlightCrew bug? JSWolf Sigil 5 10-04-2011 04:01 AM
Bug in FlightCrew JSWolf Sigil 11 07-30-2011 04:12 AM
Standalone flightcrew? bfollowell Sigil 4 06-30-2011 11:21 AM
FlightCrew and Norton IS bobcdy ePub 5 11-16-2010 05:28 PM


All times are GMT -4. The time now is 10:54 PM.


MobileRead.com is a privately owned, operated and funded community.