View Full Version : Convert Ascii to UTF char


gardefjord
11-28-2011, 10:43 AM
Hi all,
I have html-files with a bunch of ASCII-signs inside. Like so:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv">
<head>
<title>De andra</title>
<link rel="stylesheet" href="Styles.css" type="text/css" />
<link rel="stylesheet" type="application/vnd.adobe-page-template+xml" href="page-template.xpgt" />
</head>
<body>
<div class="booksection">
<h1 id="ch001"><a id="page_011"></a>Molly Beslutet</h1>
<p class="noindent_j1">N&#x00E4;r Molly vaknade str&#x00E4;ckte hon ut ena armen mot den andra kudden. Den var lika tom som den varit det senaste halv&#x00E5;ret. Ingen kind att smeka, ingen kropp att krypa intill. Pelle fanns helt enkelt inte d&#x00E4;r.</p>
<p class="indent_j">Hon satte sig upp och sl&#x00E4;ppte ner f&#x00F6;tterna i f&#x00E5;rskinnsf&#x00E4;llen. Den mjuka, lockiga k&#x00E4;nslan fick hennes kropp att l&#x00E5;ngsamt vakna. Hon tog ett par steg fram till f&#x00F6;nstret, &#x00F6;ppnade det och drog f&#x00F6;rsiktigt in den kalla luften i lungorna. &#x00C4;ven om vintern h&#x00F6;ll p&#x00E5; att sl&#x00E4;ppa sitt grepp och det mesta av sn&#x00F6;n hade sm&#x00E4;lt undan var morgnarna fortfarande svartm&#x00E5;lade. Molly huttrade och drog igen f&#x00F6;nstret.</p>
<p class="indent_j">I k&#x00F6;ket sl&#x00E4;ngde hon n&#x00E5;gra vedklampar i spisen och kaminen. Det k&#x00E4;ndes som om hon inte hade gjort n&#x00E5;got annat den sista tiden &#x00E4;n huggit ved och eldat upp den igen.</p>

If anyone knows how i can switch all the ASCII to normal UTF?
&#x00E5;l =

Jellby
11-28-2011, 12:37 PM
In linux, there is a small program called "recode":

recode html..utf8 file.html

(it will also change all &amp;, &lt; and &gt; to &, < and >, though)

I'm sure any decent HTML editor will have an option for that.

By the way, that way of coding characters is not "ascii", but numeric character references.

susan_cassidy
11-28-2011, 12:57 PM
Also called HTML Entities.

Toxaris
11-28-2011, 01:08 PM
You could try Notepad++.

Jellby
11-29-2011, 06:11 AM
Also called HTML Entities.

Strictly, HTML entities are named, i.e.: &rsquo; vs. ’

Doitsu
11-29-2011, 04:32 PM
Sigil (http://code.google.com/p/sigil/) does this automatically, if you add an .html file to a project. However, it'll also run HTMLTidy and will consolidate style elements, if present.

gardefjord
12-02-2011, 03:36 AM
Thanks a lot for all the different answers!
I'm running Oxygen XML so just went Unescape Selection.

But, I'll be sure to refer my colleges to this thread.