08-12-2013, 03:10 PM | #31 | |
Junior Member
Posts: 8
Karma: 2062
Join Date: Aug 2011
Location: Queens, NYC
Device: Kindle
|
Quote:
With the inclusion of the Doctype entry, as has been stated previously, everything worked fine. There was no 'not well formed' message, the nbsp entries were included, and the file validated. My question is - Could the presence of 'Doctype' be noted, and inserted if not present, as the very first thing when the xhtml or html files are parsed? Bob |
|
08-12-2013, 03:59 PM | #32 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
But I'm not so sure a DOCTYPE is needed in order to use entities. It might be needed if you want to ensure XHTML compliance on its own, but inside an ePub the requirements for XHTML documents seem to be a bit different, and I'm not claiming I fully understand it. What does epubcheck say? |
|
Advert | |
|
08-12-2013, 05:58 PM | #33 |
Junior Member
Posts: 8
Karma: 2062
Join Date: Aug 2011
Location: Queens, NYC
Device: Kindle
|
Jellby, I think you lost me not to far after the above. I did run my test epub through EpubCheck after it was cleaned by Sigil and there were no problems noted.
At this point, I think I'm beyond my knowledge level. I just wanted to put out there the idea to see if someone may want to look at the order of how the files are parsed when cleaned, whether it's an xhtml or html file, to get the nbsp replacement code in v0.7.3 to work. Bob |
08-13-2013, 03:13 AM | #34 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I meant that adding a DOCTYPE when there was none to start with might be incorrect (whether or not flightcrew and/or epubcheck complain about it is a different matter).
Quote:
|
|
08-13-2013, 09:17 AM | #35 | ||
Junior Member
Posts: 8
Karma: 2062
Join Date: Aug 2011
Location: Queens, NYC
Device: Kindle
|
Quote:
Isn't it added though, every time an epub is cleaned by Sigil? Quote:
I'm still thinking that if DOCTYPE could be checked for first (If at all possible) at the start of the cleaning process, everything that followed would work like DOCTYPE had always been there. I realize it's easier said than done! |
||
Advert | |
|
08-13-2013, 11:06 AM | #36 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
With the new version of Sigil 0.7.3 (Linux 64 bits compiled by DiapDealer), I can use normally nbsp like it was before, for example for version 0.5.3. They are no more lost on opening and this is very good. However, I failed to insert nnbsp (either in &#_x202f or &#_8239; forms - without the _ of course) in an EPUB. When I saved and opened the EPUB they were gone and replaced by a white space. Up to now, I cannot point this problem squarely to Sigil and I still have some checks to do. I hope to be able to post a test EPUB soon. This is because with ADE 2.0, I experiment also some other unexplained problems: the text-indent is forgotten, and a space between paragraphs has been added without any code for it in the CSS. So, there could be some other murky things going on... Last edited by roger64; 08-13-2013 at 11:09 AM. |
08-14-2013, 02:15 AM | #37 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
nnbsp (narrow no-break space) bug in Sigil 0.7.3 Usually with Sigil, the nnbsp entities are not displayed in code view: this means that you can ascertain the presence of a nnbsp only using your cursor. For example, when a nnbp has been inserted between e and ; you will need three cursors moves to go from e to ; but you will see only two characters (in code view). In book view, you can see the - small- white space. For many months now, I have been consistently producing epubs for MR using some nbsp and mostly nnbsp, according to French typographic rules. For this, I always used some version of Sigil to fine-tune the book and insert the nnbsp within the EPUB. The version of Sigil (0.7.3) I use has been compiled by DiapDealer for Linux 64 bits, like the previous one I used. It just seems to ignore the nnbsp: - when I insert a nnbsp (either as &#_x202f; -or as &#_8239; - without the _ of course ) it does not appear again once the EPUB has been saved and reopened. I consistently get a normal white space. - worse, when I reopen one of my former ebooks which has been published with nnbsp throughout, these entities are not displayed. One can see white spaces instead of nnbsp. Here for example. @DiapDealer In the meantime, I would need to reinstalll your former deb version (0.7.0) Could you provide me with a link? Last edited by roger64; 08-14-2013 at 02:28 AM. |
08-14-2013, 03:40 AM | #38 | |
Sigil developer
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
Quote:
The alternative, before 0.7.3, was that if you had an nbsp character in a UTF8 file, Sigil would remove it and replace it with a normal space - regardless of cleaning settings. So you were definitely losing information (enough that at least some people wanted it fixed). Now, Sigil is preserving the nbsp character but to do so it has to convert it from a character to an entity so it doesn't get lost. For files with DOCTYPE already defined it isn't an issue. But in files that don't have the DOCTYPE set it means you either need to manually add the DOCTYPE or allow Sigil to clean the file. You still have the issue that if you manually insert an nbsp character in Sigil (not entity) it will immediately become a normal space. I think the biggest issue was not knowing why it was suddenly giving the error - at least now it's a little clearer why the error message is shown. |
|
08-14-2013, 04:01 AM | #39 | |
Sigil developer
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
Quote:
The attached example epub contains nbsp, nnbsp, and mdash entities and characters (the html has been hand edited outside of Sigil since Sigil won't save an nbsp character - if you open the epub in 0.7.3 you will see that both nbsp entries turn into entities). The nnbsp character and entity show up as a space in Code View. But if you actually do a Find for an nnbsp character it will find the character, even after saving and re-opening. (As you probably already know, you can create this character by, for example, opening gedit on linux, typing ctrl-shift-u followed by 202f [RETURN], and can then paste that into Sigil. Or just copy it from Book View.) |
|
08-14-2013, 04:22 AM | #40 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@meme
Thanks very much for your quick and informative reply. nnbsp seems to get better and better support. For this version of Sigil (0.7.3) this new and unannounced display of nnbsp in Sigil confused me and made me look for an -unexistent and unneeded- solution quite a long time. For your information, a bug report has been sent about the display of nnbsp on Bugzilla for LibreOfice users (odt files). https://bugs.freedesktop.org/show_bug.cgi?id=67669 It is probable that this conversion in Code view of nnbsp into a plain white space will confuse some other people. It would be probably better to use some kind of greyed character. Hopefully, this will be for Qt 6. ;-) Last edited by roger64; 08-14-2013 at 09:23 AM. |
08-14-2013, 09:49 AM | #41 |
Junior Member
Posts: 8
Karma: 2062
Join Date: Aug 2011
Location: Queens, NYC
Device: Kindle
|
IF I'm comfortable with how my epub was constructed prior to opening it with Sigil, then all I need to do is just answer YES to fixing the 'not well formed HTML', my nbsp will be back, and I'll be a happy camper?
|
08-14-2013, 08:16 PM | #42 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Feature Requests:
Tools - Reports - Characters in HTML Files. Currently, this report correctly goes from HTML entity names/decimal/hexadecimal and actual characters -> the report. What it does not do, is go the opposite. When double clicking on a character in the report, it brings you to the next instance of only the actual character, but it does not find named/decimal/hexadecimal instances. I attached an image showing how clicking on 'ô' does not lead to "ô" in the HTML file. Tools - Reports - Links While clicking on a link in this report, you would expect it to lead you directly to the location. Instead, it seems to only open up the HTML file in which the link occurs. From there, you have to manually search for the link. Entity -> Character + Character -> Entity Also, a nice thing to have added might be a setting for Sigil to automatically go from entity (names/decimal/hexadecimal) -> character, character -> entity (names/decimal/hexadecimal).... currently, I am doing this the slow way by mass running a huge batch of Saved Searches. |
08-15-2013, 12:28 AM | #43 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
@Tex2002ans To my shame, I never used the "report" feature of Sigil and it's indeed a nice one. You have trouble displaying French (mostly) characters - à, é, è, ô, etc. I show you part of my report (figure one), how ô is displayed in code view (arrow as an example in fig 2) but also é, è, à, and the declaration I use for every xhtml file (Note). I make no use of a complicated translation table for entities. It all goes smoothly and displays conveniently all French characters. Maybe this could help. Note: <?xml version="1.0" encoding="UTF-8" standalone="no" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr-FR"> Last edited by roger64; 08-15-2013 at 12:45 AM. |
08-15-2013, 01:29 AM | #44 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Over the past week and a half I have been importing 15 years worth of articles (~6500) into Sigil, and cleaning them all up to prepare a few gigantic yearly EPUB releases (~300 articles per EPUB). In this case, the original HTML used entity names. I wanted to do some cleanup in Sigil, then do code comparison to the originals (this is why I want entities there), then I want to easily be able to swap back to characters before proofreading and releasing the EPUB (actual characters allow me to read the code much easier, and be able to catch more mistakes). I rarely use the Link Report, but in this case, there are THOUSANDS of links pointing everywhere on the internet. The Link Report allows me to easily spot links which do not belong in the EPUB, footnotes I have not normalized (over 15 years... you can imagine all the different tools/programs that were used to generate these things). The Class Reports allow me to catch outliers in the code itself (a weird class name that was only used once in all the articles, etc.). I will definitely be using it more in the future, it is really helping me consolidate code. HUGE time savers. Quote:
And I just thought of another slight tweak on the Entity -> Character, Character -> Entity request. Perhaps it can be added to the Right Click Menu -> Reformat HTML. So you will get 4 extra options there: Characters to Entities Characters to Entities - All HTML files Entities to Characters Entities to Characters - All HTML files Last edited by Tex2002ans; 08-15-2013 at 01:31 AM. |
||
08-15-2013, 05:25 PM | #45 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Feature Requests:
Duplicate Images Yes/No to All: When Adding Images which already exist, you get this dialog: When adding tens/hundreds of potential duplicate files, you have to press "ok/cancel" tens/hundreds of times. It would be nice if there was a "Yes to All" and "No to All" added. Duplicate HTML files Overwrite: When adding duplicate XHTML files, you get this dialog: You have zero choice in what you want to do (in this case I want to overwrite, but I just have to press "ok"). This dialog also falls into having to click "ok" tens/hundreds of times. Most of the time I just kill Sigil, instead of dealing with hurting my mouse finger. This dialog can also benefit from an "Overwrite", "Overwrite All", "Ok" choice. Or maybe it can complain and in one shot just give a list of files it cannot insert. Currently, I either: 1. Open the EPUB in 7-zip, overwrite the XHTML files manually, and then reopen the EPUB in Sigil and continue working. 2. Use Sigil to mass delete all the XHTML, then add all the XHTML again using Sigil. Easier Recognition of Not Well Formed Files When you open up an EPUB with malformed files, and tell Sigil to not automatically clean the file, you get this dialog: Would be nice if somehow it was possible to get a dialog such as this one, without having to exit/reenter Sigil. Also, it would be nice if this dialog would alphabetically sort the malformed files. Currently, it seems like they are just randomly placed there: 5296.xhtml, 5357.xhtml, 5100.xhtml, 5050.xhtml, 5400.xhtml, ... Currently while you are in Sigil, and try to save when you have a malformed document in your EPUB, it is very hard to tell which file is the exact culprit. Perhaps this sentence can have at least some mention of which HTML file is causing the problem: "EPUB saved, but not all HTML files are well formed: 5396.xhtml" Yes yes, I know, you could always FlightCrew and try to spot the broken file, but in the case of having hundreds/thousands of other non-EPUB compliant code, it becomes impossible to spot which file is malformed. Perhaps the FlightCrew output can put some CRITICAL problems at the top of the list. HTML File Report Problem (?) Also, this column in the HTML Files Report seems to be me to be worthless. You cannot get into Reports unless the files are all well formed. Last edited by Tex2002ans; 08-15-2013 at 06:30 PM. Reason: More Suggestions |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil 0.7.2 Released | user_none | Sigil | 40 | 06-24-2013 11:35 PM |
Sigil 0.7.1 Released | user_none | Sigil | 64 | 03-26-2013 10:02 PM |
Sigil 0.6.0 Released | user_none | Sigil | 93 | 11-24-2012 06:50 PM |
Sigil 0.5.3 Released | user_none | Sigil | 85 | 05-13-2012 05:29 AM |
Sigil 0.4.2 Released | user_none | Sigil | 41 | 10-26-2011 06:03 AM |