08-16-2010, 11:10 AM | #1 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
epub edited - now it's messed up
Hello everybody
I have a strange, new (new for me) problem with an epub file. Now, before I start, I just want to say: In the past, I edited many xhtml files and turned them to epub with Calibre, that was no problem. But this problem is different ... I've got this epub file from a "friend" ... However, I noticed, that there are 2 things that I want to change. In Germany (and Austria), quotation marks look like this: »text«. In Switzerland however (or at least in the part where they speak german) they use them like this:«text» Since this is very uncommon for me, I wanted to change that. Also I noticed that many times, there have been like 3-4 empty spaces between two words - in a normal sentence. Normally, there should be only 1 empty space. So I unpacked the epub file and edited all the html files (about 65) with notepad ++. I changed the quotation marks and I replaced all double empty spaces with 1 empty space. Also I edited the css file to add justification. And I noticed that some p tags in the css included font size. I think font size should be selected by the reader, so I removed that ,too. Then I packed the file again and opened it with the ebook reader of calibre. Everything looked fine. Then I put it on my ebook reader ... The first chapter starts normal with the headline. On the right side, you see the ADE page number 6 (that fits, because there are some other pages before the chapter 1 starts. But then it becomes strange. After the headline, there is the first line - and a new ADE page number. And right below that, there is another ADE page number. I took another look at the unedited epub file, and that one looked allright ... So, what do you think, what did I do wrong ? Could it be the toc file ? Or because I removed the double empty spaces ? Or the css stuff ? Thanks for your help. |
08-16-2010, 01:42 PM | #2 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
I've encountered a very similar problem and found that it was caused by Notepad++ failing to detect the encoding properly. If it edits the file in ANSI mode then it will insert codes causing strange behaviour in ADE. Open the source files again and check to see if Notepad++ is recognising them as utf-8. If not, convert to utf-8 before editing.
|
Advert | |
|
08-16-2010, 10:33 PM | #3 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Hello Charleski
I just opened one of the html files with notepad ++. The encoding is Ansi as UTF8 (or UTF8 without BOM, as is stated in the Menu). So this could really be the problem. I think I will convert the first few files (I don't want to convert ALL files, before I know that it is worth the work) to UTF8 and see if that works (but not right now, maybe later today). Thanks so far. Also I noticed that the "end of the line" is set to Unix. Usually I use Windows. But I am not sure if that is a problem. |
08-16-2010, 11:01 PM | #4 |
Zealot
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
|
I have noticed that, some things such as — in the original xhtml file when getting converted by Calibre does not get converted properly. For this reason I need to use — this however gets converted by Calibre and works fine.
For you, try using » for » and « for « Edit; I have tested it and using the HTML Code of it works. For more HTML code go to http://www.ascii.cl/htmlcodes.htm Last edited by Dark123; 08-16-2010 at 11:08 PM. |
08-17-2010, 06:18 AM | #5 |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
To make sure Notepad++ uses utf-8 correctly, go to Settings->Preferences->New Document tab. Set New Document Encoding to UTF-8 without BOM and check 'Apply to opened ANSI files'.
|
Advert | |
|
08-17-2010, 09:52 AM | #6 | |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Quote:
@Charleski thanks for your hint. Everyday, I learn something new. And when I think how I started - I had absolutely no knowledge about html. And in the beginning, I used the normal notepad that comes with Windows. I still remember, when I changed german umlaus from their html code to the direct character, the search and replace on notepad took like 20 seconds for 4000 replacements. With Notepad ++ it takes like 5 seconds ... I love this program. |
|
08-17-2010, 11:13 AM | #7 | |
Zealot
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
|
Quote:
|
|
08-17-2010, 11:14 AM | #8 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
just to keep you updated:
I took the first 5 html files and looked at the encoding: they were all ansi as utf. I konverted them to utf8 and repacked the files. Then I put it on my ebook-reader - but it still doesn't work. Maybe I have to convert all the remaining html files. |
08-17-2010, 08:33 PM | #9 | |
Zealot
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
|
Quote:
|
|
08-18-2010, 10:27 PM | #10 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
now, before I become totally confused:
"utf8 without bom" is the same as "ansi as utf8" right ? Because, when I open one of the files in notepad in the right corner it says ansi as utf8, but under encodings, it says utf8 without bom. So it seems as if the files are already coded as utf8 without bom By the way, the epubs that I created with calibre are all based on utf8 files. Should I change them to uft8 without bom ? I googled and all I found was the recommendation to use utf8. No page ever mentioned bom. Oh, but to throw that in: I know the base source file for epub have to be xhtml 1.1 valid. So I guess the same appeals to the html files in the epub ? Because I just opened one with a validator and it gave me 8 errors. The most interesting 2: The <!DOCTYPE> tag is missing And something is also wrong with the media content. I just looked at the basic source html file that was also included and it is also missing there. So I let the validator check the source file and within 3300 lines, it found 106 errors. Some of them are very strange, like: it used div style when it should use div class .... I guess that is the main problem. I expected the whole thing xhtml valid ... so when it comes to epubs, never rely on others using valid xhtml files. |
08-19-2010, 07:44 AM | #11 |
Zealot
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
|
Sorry I meant, extract the HTML files from the ePub and convert them to UTF-8 without bom, otherwise some of the characters do not display on my eBook reader (I tested it)
I think it would be a lot better if you added the ePub into Calibre, and then click Convert and in the ePub Output (on the left side) click it and then tick, "Do not split on page breaks" and set the Split files larger than 999999 KB. This way Calibre will convert it to into an ePub but it will only have 1 HTML file in there. You can now make this an original and edit it the way you like it and then use Calibre to make it into an ePub afterwards. |
08-19-2010, 08:27 AM | #12 | ||
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Quote:
Quote:
Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?> Code:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> |
||
08-19-2010, 09:38 AM | #13 | |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Quote:
When I started with epub, I knew nothing about html. xhtml ? css ? valid files ? I never heard stuff like that. I used mobipocket to convert pdf files to html, then I used Calibre to convert them to epub. I put them on my ebook reader - and to my surprise, some lines where longer then the screen itself. So half a word, or maybe even 1 or 2 words were missing. In the beginning, I didn't knew what was wrong, so I took a closer look at the html file and I noticed that <p> tags and <br> tags where totally mixed up. I replaced all the p tags with br and in the end, the problem with the "too-long" lines was gone, but of course, the book didn't look good. I mean, no text intent, and the text was not justified. When I found out what you can do with p tags, it got better - and then I learned how important validity is when it comes to xhtml files @ebooknewbie: thanks for your hint about how to create epubs with just 1 html file. I know, in generell, split html files are no problem, but when you have to edit them ... you helped me a lot :-) |
|
08-19-2010, 12:04 PM | #14 | |
Zealot
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
|
Quote:
I know how you feel about ebooks showing weird, I had to learn a bit of CSS and XML (I knew HTML). My worse was the header (Chapter 1) and then half the ebook reader taken up by <br> tag. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mistakenly overwrote edited epub file -- Can I recover? | PatNY | Calibre | 3 | 08-23-2010 07:55 PM |
Any how to edit a messed up epub book? | bob1xxx | ePub | 12 | 04-19-2010 03:16 PM |
page numbers messed up in my epub | verybadcat | ePub | 1 | 04-13-2010 04:47 PM |
Unutterably Silly The last edited notification | ShortNCuddlyAm | Lounge | 1 | 03-21-2010 10:54 PM |
ePub eBooks (Fully Edited w/ TOC) Fanfiction, Forumfiction [Links removed by OP] | Guns4Hire | Reading Recommendations | 12 | 02-25-2010 03:53 AM |