Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-28-2010, 03:16 PM   #1
Strether
Addict
Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.
 
Strether's Avatar
 
Posts: 399
Karma: 848413
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1, 3, PW2, Voyage, Kindle 7" HD Fire
Problems editing XML documents

The last couple of books I've downloaded from Gutenberg in HTML format have arrived as XML documents, and so far I haven't discovered any way to edit these with MS Word. Anyone know if/how this can be done? I'm using Word 2003.

I do note that if I save the document as a Rich Text File, I can import it into Book Designer and all the extraneous material will be gone, but that means I have to completely edit a book in BD, and I'd rather do it in Word first.

Any advice would be appreciated.

Jim
Strether is offline   Reply With Quote
Old 03-28-2010, 09:32 PM   #2
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Are you sure they're not html documents just with an .xml extension? Try renaming them.

Can you give an example of such a document, or a link to where on Gutenberg where you found them? It might be easier to help if I knew what they were like.
frabjous is offline   Reply With Quote
 
Advertisement
Old 03-28-2010, 11:06 PM   #3
Strether
Addict
Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.
 
Strether's Avatar
 
Posts: 399
Karma: 848413
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1, 3, PW2, Voyage, Kindle 7" HD Fire
Here's one of the books I downloaded:

http://www.gutenberg.org/etext/6801

Don't know if this is something new that Gutenberg's doing, or something to do with my version of Word.

Jim
Strether is offline   Reply With Quote
Old 03-29-2010, 02:12 AM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
That is a normal xhtml file, but I think some word processors might be confused by the first line of the code, which is:

Code:
<?xml version='1.0' encoding='UTF-8'?>
HTML and XHTML are a species of XML file, but it's weird for it to have this declaration rather than the <!DOCTYPE ...> first.

I don't have MS Word installed, but I was able to convert it to .doc format with AbiWord without making any changes to the file at all. With OpenOffice, I only had to delete this first line in a text editor before opening it, and then it worked fine. You could try deleting the first line and then opening them in Word and see if they work then. (Use a text editor like Notepad or Wordpad to delete the first line.)

For good measure, I attach the .doc file created by OpenOffice here.
Attached Files
File Type: doc pg6801.doc (1.06 MB, 175 views)
frabjous is offline   Reply With Quote
Old 03-29-2010, 11:00 AM   #5
Strether
Addict
Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.
 
Strether's Avatar
 
Posts: 399
Karma: 848413
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1, 3, PW2, Voyage, Kindle 7" HD Fire
That's brilliant, frabjous. Deleted the first line in Notepad, opened the XHTML document, and it opened as a normal file. I'm much obliged to you for spending the time and for your expertise.

Jim
Strether is offline   Reply With Quote
Old 03-29-2010, 03:17 PM   #6
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 514
Karma: 24612
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
In fact you can also delete the first line in Openoffice.org, then save it, and reopen it. The first round it will be treated as a text file, the second round as (X)HTML.
pietvo is offline   Reply With Quote
Old 03-30-2010, 04:35 PM   #7
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by frabjous View Post
HTML and XHTML are a species of XML file, but it's weird for it to have this declaration rather than the <!DOCTYPE ...> first.
1. HTML has nothing to do with XML. XHTML on the other hand is HTML reformulated using XML syntax.

2. Having the XML declaration ("<?xml version='1.0' encoding='UTF-8'?>") at the start of an XHTML document (before the DOCTYPE) is mandatory. The fact that a lot of XHTML documents don't have it is besides the point.
Valloric is offline   Reply With Quote
Old 03-30-2010, 06:54 PM   #8
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
[accidental double post, sorry)

Last edited by frabjous; 03-30-2010 at 07:09 PM.
frabjous is offline   Reply With Quote
Old 03-30-2010, 07:03 PM   #9
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Quote:
2. Having the XML declaration ("<?xml version='1.0' encoding='UTF-8'?>") at the start of an XHTML document (before the DOCTYPE) is mandatory. The fact that a lot of XHTML documents don't have it is besides the point.
Sigh. The original poster comes here with a problem. I help to solve that problem, and you have to come and sound all holier-than-thou-esque.

You're right about regular html not being xml. I remembered that after I posted, but forgot to change it. Wasn't really the issue here; these documents are xhtml. (Saying "nothing to do" with it, however, is certainly misleading. They are two children of common ancestors so-to-speak. Still I apologize if anyone was misled.)

As for the tag, however, according to the W3C xhtml spec, this tag is not mandatory.

Quote:
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.
Moreover, the lack of this tag does not prevent xhtml from validating.

But even if some other spec somewhere says it's mandatory, but given what this thread is about, and the problem the original poster was having, what could possibly be more to the point than the fact this is unusual? It's obvious that in this case that it is this tag that was preventing the file from being correctly read and converted by Word, and, apparently, by OpenOffice.

Pointing out that it's "mandatory" is in fact, given what the OP asked, what is besides the point. What is to the point is that removing it solves the problem.

Last edited by frabjous; 03-30-2010 at 07:14 PM.
frabjous is offline   Reply With Quote
Old 03-30-2010, 08:24 PM   #10
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,188
Karma: 727236
Join Date: Sep 2009
Device: PRS-505
An even easier way is simply to open the document in Firefox (which has no problem with the xml declaration) then save the page.
charleski is offline   Reply With Quote
Old 03-31-2010, 05:09 AM   #11
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 514
Karma: 24612
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
Firefox saves the file as is, including the xml declaration. And rightly so, because it saves files verbatim.

And indeed, the XML declaration is not required. The XML spec says: `XML documents SHOULD begin with an XML declaration', not `XML documents MUST begin with an XML declaration'. The grammar also clearly marks it as optional.

It is, however, the fault of Openoffice.org that it doesn't recognise the file as XHTML. But this issue has been on the list of future enhancements since 2005. The problem is that Openoffice.org doesn't have a proper XHTML import filter, and it treats them as HTML, but therefore it doesn't recognize the XML declaration.
pietvo is offline   Reply With Quote
Old 03-31-2010, 09:52 AM   #12
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by frabjous View Post
Sigh. The original poster comes here with a problem. I help to solve that problem, and you have to come and sound all holier-than-thou-esque.
Didn't mean to sound all "holier-than-thou-esque". I have a lot of respect for you frabjous.

Quote:
Originally Posted by frabjous View Post
(Saying "nothing to do" with it, however, is certainly misleading. They are two children of common ancestors so-to-speak. Still I apologize if anyone was misled.)
But it doesn't have anything to do with it, even though they are the children of common ancestors.

XML by itself is used for many, many things other than XHTML and was created primarily for those "other" things.

Quote:
Originally Posted by frabjous View Post
As for the tag, however, according to the W3C xhtml spec, this tag is not mandatory.
You're right, the XML spec uses the RFC 2119 term "SHOULD", not "MUST".

My bad, I was wrong.

Quote:
Originally Posted by frabjous View Post
But even if some other spec somewhere says it's mandatory, but given what this thread is about, and the problem the original poster was having, what could possibly be more to the point than the fact this is unusual? It's obvious that in this case that it is this tag that was preventing the file from being correctly read and converted by Word, and, apparently, by OpenOffice.

Pointing out that it's "mandatory" is in fact, given what the OP asked, what is besides the point. What is to the point is that removing it solves the problem.
I didn't mean that as a reply to the OP, it's just that seeing someone say that having the XML declaration in an XHTML document is "weird" rubbed me the wrong way. It really should be there, and I've been burned many, many times by documents that would have been far better off having it.

And OO and Word not being able to handle it... that's just damn stupid on their part. And pretty surprising for OO.
Valloric is offline   Reply With Quote
Old 03-31-2010, 10:10 AM   #13
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
I'll admit I was surprised that OO couldn't handle it. I gather from Pietvo's remark that it's already been reported as a bug, but perhaps I'll put in a "vote" for it.

I'm never surprised by Microsoft products not working well, but this is Word 2003. Maybe it's been fixed by now.
frabjous is offline   Reply With Quote
Old 03-31-2010, 10:19 AM   #14
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by frabjous View Post
I'm never surprised by Microsoft products not working well, but this is Word 2003. Maybe it's been fixed by now.
I've just opened this file in Word 2007, and it opens just fine. It opens in XML mode (there are special symbols for the tags displayed), but I believe those symbols can be hidden.

Removing the XML declaration makes Word just import it like any other HTML file.
Valloric is offline   Reply With Quote
Old 03-31-2010, 05:09 PM   #15
Strether
Addict
Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.
 
Strether's Avatar
 
Posts: 399
Karma: 848413
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1, 3, PW2, Voyage, Kindle 7" HD Fire
Quote:
Originally Posted by Valloric View Post
I've just opened this file in Word 2007, and it opens just fine. It opens in XML mode (there are special symbols for the tags displayed), but I believe those symbols can be hidden.

Removing the XML declaration makes Word just import it like any other HTML file.
It opened just fine in Word 2003, too, but with the special symbols displayed; and if they can be hidden, it would be nice to know how to do that. I couldn't find a way. In the meantime, though, removing the declaration with a text editor as suggested by frabjous, works just great, and I'm grateful for his solution to my problem.

Jim
Strether is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Editing problems Brientje ePub 14 10-05-2010 07:22 AM
Need help on Sony cache.xml janpardo Reading and Management 0 05-24-2010 09:22 AM
Question about editing documents once they are in Calibre ficbot Calibre 4 09-10-2009 10:58 PM
Why xml?? real_yoni Sony Reader Dev Corner 1 01-20-2009 12:45 PM
PRS-500 Available XML commands johnmcelfresh Sony Reader Dev Corner 0 08-18-2007 02:55 PM


All times are GMT -4. The time now is 11:53 AM.


MobileRead.com is a privately owned, operated and funded community.