Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Archive > iSilo/X

Notices

 
 
Thread Tools Search this Thread
Old 10-18-2003, 04:44 PM   #1
cwong15
Junior Member
cwong15 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2003
Device: Palm IIIx
Unhappy iSiloXC misconverting character entities

With both iSiloXC 3.x and 4.0, certain character entities are misconverted. A sample ixl file follows with one such offending document. Look at the first paragraph. With iSilo 4.0, I see text that reads "had neuroscience?indeed, all the life sciences?in a chokehold". If you view it in a browser, you will see that those question marks are actually em dashes.

The convertor for iSilo 2.x converts the document just fine.


<?xml version="1.0"?>
<iSiloXDocumentList>
<iSiloXDocument>
<Source>
<Sources>
<Path>http://www.firstthings.com/ftissues/ft0305/reviews/dembski.html</Path>
</Sources>
</Source>
<Destination>
<Title>Book review</Title>
<Files>
<Path>/tmp/foo.pdb</Path>
</Files>
</Destination>
<LinkOptions>
<MaximumDepth value="0"/>
</LinkOptions>
<ImageOptions>
<AltText value="exclude"/>
<Images value="exclude"/>
</ImageOptions>
<TableOptions>
<IgnoreTables value="yes"/>
</TableOptions>
</iSiloXDocument>
cwong15 is offline  
Old 10-19-2003, 07:35 AM   #2
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
The problem here is that the site in question does not specify its text encoding and yet uses non-standard characters (the em dash).

To fix this, go to your channel settings -> tab Document -> click on Text Encoding Options -> and choose "Western European (ISO-8859-1)" in the second field.

You can also make this default for all sites that do not specify their encoding. To do so, go to iSiloX -> Document -> Default Properties.
Alexander Turcic is offline  
Old 10-19-2003, 04:25 PM   #3
cwong15
Junior Member
cwong15 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2003
Device: Palm IIIx
Thanks for your suggestion, but I don't think that is it. When the charset is not specified, iSilo will let you specify it on the PDA itself. But no encoding option allows the correct display of these characters.

The page I pointed to is almost certainly Windows-1252. At least, other pages on that site specifies that encoding. But I have tried specifying that encoding, both in iSilo and in the ixl file for iSiloXC, to no avail. Also, there is no ISO-8859-1 encoding offered, neither for iSilo nor iSiloXC.

So there does not seem to be a way for iSilo 3.x/4.0 to display these character entities, at least with their "native" formats. Again, the old iSilo 2.x convertor converts these characters fine.
cwong15 is offline  
Old 10-20-2003, 02:56 AM   #4
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
I converted your page,

http://www.firstthings.com/ftissues/...s/dembski.html

with the correct text encoding and it looks just fine in iSilo. I attached both, a converted database and its corresponding ixl.
Attached Files
File Type: pdb EncodeTest.pdb (23.0 KB, 507 views)
File Type: ixl encodetest.ixl (2.9 KB, 575 views)
Alexander Turcic is offline  
Old 10-20-2003, 03:21 AM   #5
iSilo
Developer
iSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it isiSilo knows what time it is
 
iSilo's Avatar
 
Posts: 33
Karma: 2314
Join Date: Oct 2002
Location: US
Specifying the character set in iSilo (as opposed to specifying it in iSiloX during conversion) is really only useful for the case where the text of the document is raw such as plain text files or iSilo 2.x/3.x documents that definitely do not have any embedded encoding information (but even then the results might not be perfect in some instances).

For documents generated by iSiloX/C 4.0, if the source content itself does not specify it's encoding, you really do need to specify what the source encoding is during conversion if you want predictable results.

Follow what Alexander suggests. It does work.
iSilo is offline  
Old 10-25-2003, 09:46 PM   #6
cwong15
Junior Member
cwong15 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2003
Device: Palm IIIx
Thanks, you're both right

Thanks to the both of you for replying. You're right: the CharSet setting did the trick. My mistake was to use it only under DocumentOptions, when I needed to use it under Sources to specify the input encoding.
cwong15 is offline  
 


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unutterably Silly Name this character! AprilHare Lounge 7 08-23-2009 05:01 PM
Named entities or not? alecE ePub 17 07-21-2009 12:24 PM
Test for custom entities in ePUB Jellby ePub 9 05-27-2009 06:45 AM
Can I preserve entities when converting from html? (To avoid unicode on kindle) krunkster Calibre 1 04-07-2009 05:11 PM
Using Turcic with isiloXC on linux goaliemn Feedback 2 10-29-2003 04:37 PM


All times are GMT -4. The time now is 02:08 AM.


MobileRead.com is a privately owned, operated and funded community.