Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 11-10-2015, 08:23 AM   #1
crankypants
Hmm.
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
How to allow extended ASCII German characters in my EPUB?

I'm writing a tool to make basic EPUB2 books from a configuration file, and Markdown files (which are converted to XHTML). I didn't find anything else that suited my needs as I have large text files I want to convert to EPUB, and add basic formatting to them.

Currently, the Firefox plugin EPUBReader, which I use to test with, seems to choke on ASCII characters over 127. How do I allow ASCII characters and German characters in my EPUB? Do I have to change something in the XHTML header?

I can write my software to change these extended ASCII characters to Unicode but EPUBreader seems to produce an error with any unicode also, except named entities.

The ERROR I'm getting from EPUBReader about the German characters is:

Code:
XML Parsing Error: not well-formed Location: 
file:///C:/Users/XXX/AppData/Roaming/Mozilla/Firefox/Profiles/0ddipa6u.default/epub/54/OEBPS/Text/00intro.xhtml#H2_00intro_00003 
Line Number 27, Column 4:
And then it points to an extended ASCII character.

The header for each XHTML file is this:


Quote:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<title>$bldtitle</title>

Last edited by crankypants; 11-10-2015 at 09:46 AM. Reason: added code tags to cope with very long line
crankypants is offline   Reply With Quote
Old 11-10-2015, 09:09 AM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,495
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
You must specify the character set used in the XHTML files properly. This is most easily done with unicode text, IMO.

You could make one sample ePub using your converted Unicode text and something like Sigil, to ensure that you're creating your XHTML and the rest of the ePub with the proper declarations and metadata.
pdurrant is offline   Reply With Quote
Old 11-10-2015, 09:26 AM   #3
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You must use UTF-8 for that. All reading systems will expect that. The error you mention is because another error probably.
Toxaris is offline   Reply With Quote
Old 11-10-2015, 09:47 AM   #4
crankypants
Hmm.
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
Quote:
Originally Posted by Toxaris View Post
You must use UTF-8 for that. All reading systems will expect that. The error you mention is because another error probably.
I already specified utf-8 in the header of each XHTML file. I edited my original post to add the XHTML header.

What else do I need to do? Modifiy some of the other files like content.opf? This is the top of my content.opf file which already specifies utf8.

Quote:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlnspf="http://www.idpf.org/2007/opf">
<dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:a507e61b-f4b4-4982-be43-0cc046f4b053</dc:identifier>
<dc:title>Pblah</dc:title>
<dc:creator opf:role="aut">author name</dc:creator>
<dc:language>en</dc:language>
<dc:date opf:event="modification">2015-11-10</dc:date>
</metadata>
I even pasted some German text into Sigil, and made my XHTML headers match exactly what Sigil produced. I still get an error in EPUBreader.

Last edited by crankypants; 11-10-2015 at 09:50 AM.
crankypants is offline   Reply With Quote
Old 11-10-2015, 09:55 AM   #5
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,495
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
If you specify UTF-8, the text must be UTF-8. Not a high-ASCII German encoding.
pdurrant is offline   Reply With Quote
Old 11-10-2015, 12:11 PM   #6
crankypants
Hmm.
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
I believe Perl 5.18 has a mode for writing utf-8. I'll look into it. Thanks.
crankypants is offline   Reply With Quote
Old 11-10-2015, 12:41 PM   #7
crankypants
Hmm.
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
It worked! Thank you. Here's the Perl open statement I used:

Quote:
open($OUTFILE,">:encoding(UTF-8)",$outfile) || die "$procname ERROR: Could not open $outfile.\n";
print $OUTFILE "blah blah foo\n";
...
crankypants is offline   Reply With Quote
Old 11-10-2015, 01:00 PM   #8
Skeeve
Zealot
Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.Skeeve ought to be getting tired of karma fortunes by now.
 
Skeeve's Avatar
 
Posts: 142
Karma: 669192
Join Date: Nov 2013
Device: Kindle 4.1.1 no touch
You could have simply used any texteditor which allows to change a text's encoding like (I guess) Notepad+ or (I know) jEdit.
Skeeve is offline   Reply With Quote
Old 11-10-2015, 04:04 PM   #9
dickloraine
Guru
dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.dickloraine ought to be getting tired of karma fortunes by now.
 
Posts: 631
Karma: 7544080
Join Date: Apr 2013
Location: Berlin
Device: PRS 350, Kobo Aura
Not to discourage you to write your own script, but maybe you want to look at calibre or pandoc to do what you want. With pandoc you can make whatever you want out of markdown. Epub, html, docx, latex, excellent pdfs via latex (without needing to know anything about latex) etc.
dickloraine is offline   Reply With Quote
Reply

Tags
ascii, epub, german


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extended ASCII characters in txt file pshute Conversion 10 02-28-2012 06:57 AM
Non-ASCII characters in recipe titles show as ü bubak Recipes 2 11-30-2011 07:49 AM
Converting non-ASCII characters davidnye Recipes 0 08-20-2011 07:16 PM
advanced text search and non-ascii characters msz59 General Discussions 0 05-05-2011 09:47 AM
Typing non-ASCII characters with the keyboard Edmundo Amazon Kindle 5 01-20-2011 01:18 PM


All times are GMT -4. The time now is 03:28 AM.


MobileRead.com is a privately owned, operated and funded community.