Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 08-16-2010, 11:10 AM   #1
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
epub edited - now it's messed up

Hello everybody

I have a strange, new (new for me) problem with an epub file.

Now, before I start, I just want to say: In the past, I edited many xhtml files and turned them to epub with Calibre, that was no problem. But this problem is different ...

I've got this epub file from a "friend" ... However, I noticed, that there are 2 things that I want to change. In Germany (and Austria), quotation marks look like this: »text«. In Switzerland however (or at least in the part where they speak german) they use them like this:«text»
Since this is very uncommon for me, I wanted to change that. Also I noticed that many times, there have been like 3-4 empty spaces between two words - in a normal sentence. Normally, there should be only 1 empty space.

So I unpacked the epub file and edited all the html files (about 65) with notepad ++. I changed the quotation marks and I replaced all double empty spaces with 1 empty space.

Also I edited the css file to add justification. And I noticed that some p tags in the css included font size. I think font size should be selected by the reader, so I removed that ,too. Then I packed the file again and opened it with the ebook reader of calibre. Everything looked fine.

Then I put it on my ebook reader ...

The first chapter starts normal with the headline. On the right side, you see the ADE page number 6 (that fits, because there are some other pages before the chapter 1 starts.
But then it becomes strange. After the headline, there is the first line - and a new ADE page number. And right below that, there is another ADE page number.

I took another look at the unedited epub file, and that one looked allright ...

So, what do you think, what did I do wrong ? Could it be the toc file ? Or because I removed the double empty spaces ? Or the css stuff ?

Thanks for your help.
NASCARaddicted is offline   Reply With Quote
Old 08-16-2010, 01:42 PM   #2
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
I've encountered a very similar problem and found that it was caused by Notepad++ failing to detect the encoding properly. If it edits the file in ANSI mode then it will insert codes causing strange behaviour in ADE. Open the source files again and check to see if Notepad++ is recognising them as utf-8. If not, convert to utf-8 before editing.
charleski is offline   Reply With Quote
Advert
Old 08-16-2010, 10:33 PM   #3
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
Hello Charleski

I just opened one of the html files with notepad ++. The encoding is Ansi as UTF8 (or UTF8 without BOM, as is stated in the Menu). So this could really be the problem. I think I will convert the first few files (I don't want to convert ALL files, before I know that it is worth the work) to UTF8 and see if that works (but not right now, maybe later today).

Thanks so far.

Also I noticed that the "end of the line" is set to Unix. Usually I use Windows. But I am not sure if that is a problem.
NASCARaddicted is offline   Reply With Quote
Old 08-16-2010, 11:01 PM   #4
Dark123
Zealot
Dark123 doesn't litterDark123 doesn't litter
 
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
I have noticed that, some things such as — in the original xhtml file when getting converted by Calibre does not get converted properly. For this reason I need to use — this however gets converted by Calibre and works fine.
For you, try using
» for »
and
« for «

Edit; I have tested it and using the HTML Code of it works. For more HTML code go to http://www.ascii.cl/htmlcodes.htm

Last edited by Dark123; 08-16-2010 at 11:08 PM.
Dark123 is offline   Reply With Quote
Old 08-17-2010, 06:18 AM   #5
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
To make sure Notepad++ uses utf-8 correctly, go to Settings->Preferences->New Document tab. Set New Document Encoding to UTF-8 without BOM and check 'Apply to opened ANSI files'.
charleski is offline   Reply With Quote
Advert
Old 08-17-2010, 09:52 AM   #6
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
Quote:
Originally Posted by eBookNewbie123 View Post
I have noticed that, some things such as — in the original xhtml file when getting converted by Calibre does not get converted properly. For this reason I need to use — this however gets converted by Calibre and works fine.
For you, try using
» for »
and
« for «

Edit; I have tested it and using the HTML Code of it works. For more HTML code go to http://www.ascii.cl/htmlcodes.htm
I always use html code for special characters, like – » and « because a normal german keyboard doesn't have this keys. The original document however used the special signs directly. The only special characters that I use directly are the german umlauts äöü, because they are on every german keyboard.

@Charleski thanks for your hint. Everyday, I learn something new. And when I think how I started - I had absolutely no knowledge about html. And in the beginning, I used the normal notepad that comes with Windows. I still remember, when I changed german umlaus from their html code to the direct character, the search and replace on notepad took like 20 seconds for 4000 replacements. With Notepad ++ it takes like 5 seconds ... I love this program.
NASCARaddicted is offline   Reply With Quote
Old 08-17-2010, 11:13 AM   #7
Dark123
Zealot
Dark123 doesn't litterDark123 doesn't litter
 
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
Quote:
Originally Posted by NASCARaddicted View Post
I always use html code for special characters, like – » and « because a normal german keyboard doesn't have this keys. The original document however used the special signs directly. The only special characters that I use directly are the german umlauts äöü, because they are on every german keyboard.

@Charleski thanks for your hint. Everyday, I learn something new. And when I think how I started - I had absolutely no knowledge about html. And in the beginning, I used the normal notepad that comes with Windows. I still remember, when I changed german umlaus from their html code to the direct character, the search and replace on notepad took like 20 seconds for 4000 replacements. With Notepad ++ it takes like 5 seconds ... I love this program.
Change it in the .html files in the ePub, it should work. Load the original and change to the html code and just put it back, it should display it fine.
Dark123 is offline   Reply With Quote
Old 08-17-2010, 11:14 AM   #8
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
just to keep you updated:

I took the first 5 html files and looked at the encoding: they were all ansi as utf. I konverted them to utf8 and repacked the files. Then I put it on my ebook-reader - but it still doesn't work.

Maybe I have to convert all the remaining html files.
NASCARaddicted is offline   Reply With Quote
Old 08-17-2010, 08:33 PM   #9
Dark123
Zealot
Dark123 doesn't litterDark123 doesn't litter
 
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
Quote:
Originally Posted by NASCARaddicted View Post
just to keep you updated:

I took the first 5 html files and looked at the encoding: they were all ansi as utf. I konverted them to utf8 and repacked the files. Then I put it on my ebook-reader - but it still doesn't work.

Maybe I have to convert all the remaining html files.
Don't use UTF8. You need to convert them to UTF-8 without BOM it's under Encoding in Notepad++. Try that and see if it helps.
Dark123 is offline   Reply With Quote
Old 08-18-2010, 10:27 PM   #10
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
now, before I become totally confused:

"utf8 without bom" is the same as "ansi as utf8" right ?

Because, when I open one of the files in notepad in the right corner it says ansi as utf8, but under encodings, it says utf8 without bom.

So it seems as if the files are already coded as utf8 without bom

By the way, the epubs that I created with calibre are all based on utf8 files. Should I change them to uft8 without bom ? I googled and all I found was the recommendation to use utf8. No page ever mentioned bom.


Oh, but to throw that in: I know the base source file for epub have to be xhtml 1.1 valid. So I guess the same appeals to the html files in the epub ? Because I just opened one with a validator and it gave me 8 errors.

The most interesting 2: The <!DOCTYPE> tag is missing

And something is also wrong with the media content.

I just looked at the basic source html file that was also included and it is also missing there. So I let the validator check the source file and within 3300 lines, it found 106 errors. Some of them are very strange, like: it used div style when it should use div class ....

I guess that is the main problem. I expected the whole thing xhtml valid ... so when it comes to epubs, never rely on others using valid xhtml files.
NASCARaddicted is offline   Reply With Quote
Old 08-19-2010, 07:44 AM   #11
Dark123
Zealot
Dark123 doesn't litterDark123 doesn't litter
 
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
Sorry I meant, extract the HTML files from the ePub and convert them to UTF-8 without bom, otherwise some of the characters do not display on my eBook reader (I tested it)
I think it would be a lot better if you added the ePub into Calibre, and then click Convert and in the ePub Output (on the left side) click it and then tick, "Do not split on page breaks" and set the Split files larger than 999999 KB.
This way Calibre will convert it to into an ePub but it will only have 1 HTML file in there. You can now make this an original and edit it the way you like it and then use Calibre to make it into an ePub afterwards.
Dark123 is offline   Reply With Quote
Old 08-19-2010, 08:27 AM   #12
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Quote:
Originally Posted by NASCARaddicted View Post
now, before I become totally confused:

"utf8 without bom" is the same as "ansi as utf8" right ?

Because, when I open one of the files in notepad in the right corner it says ansi as utf8, but under encodings, it says utf8 without bom.

So it seems as if the files are already coded as utf8 without bom

By the way, the epubs that I created with calibre are all based on utf8 files. Should I change them to uft8 without bom ? I googled and all I found was the recommendation to use utf8. No page ever mentioned bom.
I wouldn't get too worried about the BOM. If it says 'UTF-8 without BOM' under the Encodings menu then you're fine. ePub readers shouldn't need a BOM (which is a bit archaic) anyway.

Quote:
The most interesting 2: The <!DOCTYPE> tag is missing

And something is also wrong with the media content.

I just looked at the basic source html file that was also included and it is also missing there. So I let the validator check the source file and within 3300 lines, it found 106 errors. Some of them are very strange, like: it used div style when it should use div class ....
You'll want the top of each xhtml file to have
Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
at the very least, and more properly
Code:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
But if there are strange errors in the source then that might be the root of your problems. Calibre isn't very strict about checking the syntax, which is why I always use Sigil instead.
charleski is offline   Reply With Quote
Old 08-19-2010, 09:38 AM   #13
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
Quote:
Originally Posted by charleski View Post
But if there are strange errors in the source then that might be the root of your problems. Calibre isn't very strict about checking the syntax, which is why I always use Sigil instead.
Yeah. I noticed that in the past. Small errors can cause big problems.

When I started with epub, I knew nothing about html. xhtml ? css ? valid files ? I never heard stuff like that.

I used mobipocket to convert pdf files to html, then I used Calibre to convert them to epub. I put them on my ebook reader - and to my surprise, some lines where longer then the screen itself. So half a word, or maybe even 1 or 2 words were missing. In the beginning, I didn't knew what was wrong, so I took a closer look at the html file and I noticed that <p> tags and <br> tags where totally mixed up. I replaced all the p tags with br and in the end, the problem with the "too-long" lines was gone, but of course, the book didn't look good. I mean, no text intent, and the text was not justified. When I found out what you can do with p tags, it got better - and then I learned how important validity is when it comes to xhtml files

@ebooknewbie: thanks for your hint about how to create epubs with just 1 html file. I know, in generell, split html files are no problem, but when you have to edit them ... you helped me a lot :-)
NASCARaddicted is offline   Reply With Quote
Old 08-19-2010, 12:04 PM   #14
Dark123
Zealot
Dark123 doesn't litterDark123 doesn't litter
 
Posts: 112
Karma: 105
Join Date: Jan 2010
Device: Kindle 3 WiFi
Quote:
Originally Posted by NASCARaddicted View Post
@ebooknewbie: thanks for your hint about how to create epubs with just 1 html file. I know, in generell, split html files are no problem, but when you have to edit them ... you helped me a lot :-)
It's nothing. Hopefully it'll help you fix the problem, that you're having.
I know how you feel about ebooks showing weird, I had to learn a bit of CSS and XML (I knew HTML). My worse was the header (Chapter 1) and then half the ebook reader taken up by <br> tag.
Dark123 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mistakenly overwrote edited epub file -- Can I recover? PatNY Calibre 3 08-23-2010 07:55 PM
Any how to edit a messed up epub book? bob1xxx ePub 12 04-19-2010 03:16 PM
page numbers messed up in my epub verybadcat ePub 1 04-13-2010 04:47 PM
Unutterably Silly The last edited notification ShortNCuddlyAm Lounge 1 03-21-2010 10:54 PM
ePub eBooks (Fully Edited w/ TOC) Fanfiction, Forumfiction [Links removed by OP] Guns4Hire Reading Recommendations 12 02-25-2010 03:53 AM


All times are GMT -4. The time now is 03:25 AM.


MobileRead.com is a privately owned, operated and funded community.