View Full Version : ePub creation tools : what's missing ? wishlist / dialogue


zelda_pinwheel
01-23-2009, 12:26 PM
We've been having an interesting discussion in the french forum recently about what is missing in the apps we currently have available to us for creating epub files (i'm talking about editors, not converters), and what we wish we had. In other threads, at least 2 people (that i have seen) have mentioned their intentions to write an epub editor. In still other threads, there have been references to various details that could be done better. calibre is great in many ways, but it's not a fully-featured editor. feedbooks does a brilliant job making valid code and hierarchical structure, but a lot of people want an offline app, and feedbooks doesn't handle images yet.

So, I thought I would create ONE THREAD TO RULE THEM ALL where users can list their desiderata and any interested developpers could discuss what is possible / not possible, their intentions, etc. With a little luck we might even end up creating the ONE APP TO RULE THEM ALL right here ! wouldn't that be brilliant. :)

Here to start are some of the things we want as users and creators of epub books, taken from the discussion in french in this thread (http://www.mobileread.com/forums/showthread.php?t=36651). Please feel free to add your own, and I hope the developpers will be interested in participating / responding as well !! Valloric, llasram, kovid, wallcraft, Komenor, et al. i'm thinking of you, for example. ;) (<-- non-exhaustive list !!)

DESIRED FEATURES

1. epub files must be valid (html tidy, epub check...) and conform to best practices.

2. the editor should be able to accept multiple html / xhtml flows and create one document with a hierarchical TOC. It should also be able to accept one html flow and semantically parse the hierarchy of the document (part, chapter, section...) according to the tags used (h1, h2, h3, etc.), creating logical divisions for a properly structured epub document and a hierarchical TOC, the way that feedbooks does.

3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).

4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.

I'll probably have more to add later but i've got to go for the moment, so i'll stop here and open the discussion. What do you want from an editor ? i'll be looking forward to seeing what comes out of this !

Jellby
01-23-2009, 12:53 PM
Since I like writing my XHTML and CSS code by hand, I won't ask this from an editor, but I would appreciate a user-friendly way to generate and edit the opf (metadata, manifest, spine, guide) and the ncx (hierarchical table of contents) files.

GeoffC
01-23-2009, 02:05 PM
5. Choice of end formats, to cater for readers that do not support ePub.

6. Pick and chose title-page image.

brewt
01-23-2009, 02:05 PM
Font Embedding. The only thing that'll do that off-the-shelf now is Indesign, and well, that's the only thing it does "well".

Macros, search & replace (not just text but styles, fonts, sizes, etc), grammar checker, spell-checker, target viewing application compensation, multiple format input & output, auto paragraph re-justification & hyphenation, multiple language dictionaries, built-in browser, metadata management, multi-style select, discontinuous select, table generation/correction, cross references such as indicies and footnotes, rss feeds....

You said "killer app", didn't you Z? :)

-bjc

Valloric
01-23-2009, 02:13 PM
So, I thought I would create ONE THREAD TO RULE THEM ALL where users can list their desiderata and any interested developpers could discuss what is possible / not possible, their intentions, etc.

This is a very good idea.


1. epub files must be valid (html tidy, epub check...) and conform to best practices.

I've been thinking about this and writing down ideas in my "epub editor idea book"... there are certain problems here.

For one, we have the epub standard. Naturally, all editors should output valid epub, that's a given, but the best practices... currently, the Sony Reader--more specifically, mobile DE--has limits on chapter length (content file limit is 300KB). While this is unfortunate, I can live with it. But there are other problems and flukes of mobile DE that one has to take into account. The page numbers on the side, for instance. One needs to add margins so the numbers don't overlap the text, etc.

So a standards compliant epub may not work in mobile DE, and if it does, it may not look as nice as one specifically tailored to it. And I'm not blaming DE on certain limitations (like the 300KB limit) that are the result of the platform it runs on. Other epub reading applications for the Reader (or other devices) would probably hit a similar wall.

My current working idea is this: the editor exports two types of epub files. Both are standards compliant. One follows the standard and nothing else. No size limits, no special margins etc. Pure epub, without any consideration for the reading application or the platform. Let's call this "Standards Epub".

Then there's the second output option. This epub type is also fully compliant, but has the size limitations, special margins and other necessities for an enjoyable read on something like the Sony Reader. Let's call this "Mobile Epub".

I know no one likes the idea of two different epub files. One risks the creation of a sub-format. But I don't see a way around this. A purely standards compliant epub is a must; current practical limitations may (and will) disappear with time. But one must also be realistic: there's no point in high-horsing around with compliant epubs that nobody can read on portable devices like the Sony Reader.

I am very much interested in what other people think about this.


3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).

I've been following the dropcaps thread(s), and it's a good example of why I would want an editor to export two kinds of epub files: one that follows the standard, and one that's tailored around flukes and idiosyncrasies of certain reading software.


4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.

My working idea is to have a WYSIWYG interface, but with a "view code" view, like for instance in Dreamweaver. If someone wants to mess with it directly, they should be able to.

kovidgoyal
01-23-2009, 02:19 PM
1. and 2. are incompatible.

kovidgoyal
01-23-2009, 02:43 PM
Also I think you need to more clearly define what the use case is for this tool. Is it for authors who want to write books, or is it for proofreaders who want to touch up and convert existing books.

If the former, far more important that the features you have outlined would be features to support actual writing, like for example, keeping notes associated with characters/places/events.

If the latter then all that's needed is a wrapper around any good html editor with a wyswyg mode.

Valloric
01-23-2009, 03:04 PM
If the former, far more important that the features you have outlined would be features to support actual writing, like for example, keeping notes associated with characters/places/events.

Who in their right mind would use an ebook editor to actually write a book? The editor I'm thinking of is for proofreaders and copyeditors.

zelda_pinwheel
01-23-2009, 03:14 PM
This is a very good idea.
i'm glad you think so ; i had you in mind when i thought of it. ;)

I know no one likes the idea of two different epub files. One risks the creation of a sub-format. But I don't see a way around this. A purely standards compliant epub is a must; current practical limitations may (and will) disappear with time. But one must also be realistic: there's no point in high-horsing around with compliant epubs that nobody can read on portable device like the Sony Reader.

I am very much interested in what other people think about this.
well, you're right, this isn't the optimum situation, however it seems a reasonable compromise to me. particularly since the "standards epub" can also be used for archival purposes and as a source format for conversion to other formats as needed. i'm not an expert though ; i'll be interested to see other reactions.

My working idea is to have a WYSIWYG interface, but with a "view code" view, like for instance in Dreamweaver. If someone wants to mess with it directly, they should be able to.
right, that's rather the idea i had in my head as well.

1. and 2. are incompatible.
why are 1 and 2 incompatible, kovid ?

Also I think you need to more clearly define what the use case is for this tool. Is it for authors who want to write books, or is it for proofreaders who want to touch up and convert existing books.

the users of this tool as i see it are very much the *latter* case you describe : they are the same kinds of people using BookDesigner or ETI's eBook Publisher or Mobipocket Creator today, but who want a good tool for making epub files. so really closer to dreamweaver than to word or any writing tools. i imagine they would *already* have the finished text, either PD texts or their own manuscripts, and just want to turn them into ebooks. so, yes, a wrapper around a good html editor with a wysiwyg mode. but i think there are some particularities specific to epub that would need to be addressed (creation of the different file types, creation of the epub container...).

anyway this discussion looks to be off to a good start ! i'm glad to see it !

Valloric
01-23-2009, 03:25 PM
particularly since the "standards epub" can also be used for archival purposes and as a source format for conversion to other formats as needed.

Exactly. There are people out there who want to use epubs purely for storage.


the users of this tool as i see it are very much the *latter* case you describe : they are the same kinds of people using BookDesigner or ETI's eBook Publisher or Mobipocket Creator today, but who want a good tool for making epub files. so really closer to dreamweaver than to word or any writing tools. i imagine they would *already* have the finished text, either PD texts or their own manuscripts, and just want to turn them into ebooks. so, yes, a wrapper around a good html editor with a wysiwyg mode. but i think there are some particularities specific to epub that would need to be addressed (creation of the different file types, creation of the epub container...).

Again, I agree completely. This is exactly how I see an epub editor: a tool that facilitates the manual creation of an epub book from pre-existing text. BD for epub... without the suck.

kovidgoyal
01-23-2009, 03:33 PM
why are 1 and 2 incompatible, kovid ?


If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).

So really what the tool will have to do is:

1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB

mtravellerh
01-23-2009, 03:36 PM
That sounds really good. I would like to have a way for direct source access and editing, though, maybe a search and replace with RegEx (I use regular expressions quite a lot). Otherwise a GUI thing would be great. One free open source wysiwyg editor that springs to mind is NVU, an open source editor I use a lot is Notepad ++. As those are open source, they could be easily integrated as edit tools.

Valloric
01-23-2009, 03:37 PM
So really what the tool will have to do is:

1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB

My understanding exactly. Only I'm thinking of making the "simple" internal markup not so simple. But yes, one has to parse the initial HTML and create a new one.

mtravellerh
01-23-2009, 03:41 PM
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).

So really what the tool will have to do is:

1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB

If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.

zelda_pinwheel
01-23-2009, 03:47 PM
This is exactly how I see an epub editor: a tool that facilitates the manual creation of an epub book from pre-existing text. BD for epub... without the suck.
yes, please without the suck. :p
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).

So really what the tool will have to do is:

1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
okay, i see what you mean. you are right of course ; i was making the assumption that the input would be clean and valid code, which cannot necessarily be assumed.

out of curiosity, Valloric, have you seen the feedbooks wysiwyg editor ? i HIGHLY recommend you take a look at it and at how the book creation process is handled ; it is excellent, and it is easily accessible even to people who know nothing at all about html / css, however also gives access to the source code for more knowledgeable users. really, in my mind, the tool i would like would be very close to the feedbooks interface, with just a few modifications. notably i definitely want to be able to insert images, which feedbooks does not yet support.
If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.

why ? the exported code can easily be clean, that is one of the goals.

Valloric
01-23-2009, 03:50 PM
If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.

An epub file either conforms to the standard, or it doesn't. It is not a matter of opinion.

On "clean" vs "dirty" code, this is very much a matter of opinion, so it depends on the person reading the code.

kovidgoyal
01-23-2009, 03:54 PM
Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.

The editor will allow you to edit txt files in a simple lightweight markup language like rest or markdown (it will have GUI controls to make it easy, rather like the editor we use to make posts to mobileread). As you make changes the result will be automatically updated and displayed in a pane to the side of the editor pane.

So editing books will be about as hard as making posts on mobileread.

tompe
01-23-2009, 03:57 PM
Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.


Of course it is. Emacs is the ultimate editor and LaTeX the ultimate markup language...

I think the approach you describe is a good approach.

Jellby
01-23-2009, 03:58 PM
Font Embedding. The only thing that'll do that off-the-shelf now is Indesign, and well, that's the only thing it does "well".

Font embedding (or at least basic embedding) seems to be quite simple. I guess WISIWYG support is a different thing.

My current working idea is this: the editor exports two types of epub files. Both are standards compliant. One follows the standard and nothing else. No size limits, no special margins etc. Pure epub, without any consideration for the reading application or the platform. Let's call this "Standards Epub".

Then there's the second output option. This epub type is also fully compliant, but has the size limitations, special margins and other necessities for an enjoyable read on something like the Sony Reader. Let's call this "Mobile Epub".

I would have different configurable parameters, like maximum file size, maximum nesting level, whether or not to support some optional features... and have the software issue a warning if these limits are exceeded. It shouldn't be so hard to have a "Warn if text files are larger that ____KB" setting.

mtravellerh
01-23-2009, 04:00 PM
out of curiosity, Valloric, have you seen the feedbooks wysiwyg editor ? i HIGHLY recommend you take a look at it and at how the book creation process is handled ; it is excellent, and it is easily accessible even to people who know nothing at all about html / css, however also gives access to the source code for more knowledgeable users. really, in my mind, the tool i would like would be very close to the feedbooks interface, with just a few modifications. notably i definitely want to be able to insert images, which feedbooks does not yet support.

Hmm, yes. Actually, like I said elsewhere, if one were able to get Hadrien's codes, one could easily run an internal webserver with those on it and have an "offline" app (with a few addons for image manipulations, for example)

Valloric
01-23-2009, 04:01 PM
Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.

The editor will allow you to edit txt files in a simple lightweight markup language like rest or markdown (it will have GUI controls to make it easy, rather like the editor we use to make posts to mobileread). As you make changes the result will be automatically updated and displayed in a pane to the side of the editor pane.

So editing books will be about as hard as making posts on mobileread.

That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)

But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.

mtravellerh
01-23-2009, 04:06 PM
That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)

But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.

What I hate in NVU for example, is the way it just plays around with your code. For example the mobi pagebreak code will be changed without any notice to you. Therefore Kovid's idea with the permanent code view seems like a great idea to me. For advanced users, you could easily do a search and paste over the whole file or files and work a lot faster while still seeing the result of your work in real time.

Valloric
01-23-2009, 04:08 PM
I would have different configurable parameters, like maximum file size, maximum nesting level, whether or not to support some optional features... and have the software issue a warning if these limits are exceeded. It shouldn't be so hard to have a "Warn if text files are larger that ____KB" setting.

This is the "presets over options" debate.

Naturally, one would want to provide advanced options for advanced users, but the more options you provide, the higher the chances of someone screwing up their options that end up making epubs that work for them and not for anyone else.

I'm thinking of the mobileread community here, and the upload forum.

But the advanced options should be there for those who want them... just tucked away a bit.

Valloric
01-23-2009, 04:13 PM
What I hate in NVU for example, is the way it just plays around with your code. For example the mobi pagebreak code will be changed without any notice to you. Therefore Kovid's idea with the permanent code view seems like a great idea to me. For advanced users, you could easily do a search and paste over the whole file or files and work a lot faster while still seeing the result of your work in real time.

Hell if you want to able to see exactly what each and every keystroke or action you perform does, that's easy. WYSIWYG editor with a side pane code view that updates the code with each action in the main view pane. That's already in my design document, only the panes are horizontal.

But you should be able to turn off the code view and work in design view exclusively. I don't need to see all of the text duplicated on the screen all of the time.

zelda_pinwheel
01-23-2009, 04:25 PM
it occurs to me that not everybody on this forum is a native english speaker and maybe some of them aren't familiar with some tech jargon, so for the sake of clarity (just in case anybody doesn't know) :

WYSIWYG = What You See Is What You Get

it means an interface which allows you to click on a button marked "B" (like in the MR reply box) and shows you the result this way :
this text is bold.
and not this way :
this text is <strong>bold</strong>

mtravellerh
01-23-2009, 04:29 PM
Hell if you want to able to see exactly what each and every keystroke or action you perform does, that's easy. WYSIWYG editor with a side pane code view that updates the code with each action in the main view pane. That's already in my design document, only the panes are horizontal.

But you should be able to turn off the code view and work in design view exclusively. I don't need to see all of the text duplicated on the screen all of the time.

True. And the other way around (only source view) What about the RegEx (oh: taking Zelda as a model: RegEx=Regular Expressions) integration? That one is very important for me (to get rid of the pagenums in PG source, for example).

kovidgoyal
01-23-2009, 04:33 PM
That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)

But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.

Yeah but coding a wyswyg editor is a lot more work, for relatively little benefit. And I would envisage the preview pane being collapsible, not the code pane.

And I would urge you to consider makeing the panes vertical since most ebook readers are vertically oriented and in any case most people prefer to read text in narrow columns

=X=
01-23-2009, 04:36 PM
I've yet to work with a tool that worked well when the user was allowed to modify the source as well as work in a WYSIWYG environment. Either the user is forced to work with both modes or has to choose one view. In other words most of these editors tend to favor one style of edition or implement both poorly.

When I created the BookCreator tool I wanted to be able to use a really good editor and not have to worry about HTML, format shift issues. I just wanted to edit the book how I wanted it to look. Kovid, calibre does a fantastic job format shifting HTML to LRF/ePUB/LIT/ and more to come.

Right now the direct of this thread is geared towards format shifters not editors or writers. But the demand for a good editing en ePublishing tool is there I’ve had serveal athuors. conatct me thanking me for the BookCreator tool and how it facilitates their eBook creation process withouth having to get down and dirty in the code.

If the developers on this group would really like to build an end all ePublishing tool I'd like to see us create a Plugin around Open Office, an excellent Word Processing tool.

Going this route we would build a house on a solid Word Procesing foundation and can easily extend the UI to include other ePublishing features. Such as a GUI TOC builder, format(HTML/LIT/ePUB…) imports, etc..

Just my 2cents
=X=

mtravellerh
01-23-2009, 04:36 PM
vertically oriented

:rofl:(sorry for my perverted thinking):o

When I do proofreading, I prefer to have the window panes side by side

Valloric
01-23-2009, 04:36 PM
True. And the other way around (only source view) What about the RegEx (oh: taking Zelda as a model: RegEx=Regular Expressions) integration? That one is very important for me (to get rid of the pagenums in PG source, for example).

Only source view --> editing in Notepad :D

RegExes are important and should be included. I'm not sure if they'll make it to the first release (which is probably a few months off), but they're something that is definitely on the list.

mtravellerh
01-23-2009, 04:42 PM
Only source view --> editing in Notepad :D

RegExes are important and should be included. I'm not sure if they'll make it to the first release (which is probably a few months off), but they're something that is definitely on the list.

You're so clever:rolleyes:! But hey, you ARE right on that one!

I guess I just was trying to get rid of application jumping. If you do 4 to 5 books a day in 4 different formats, you tend to get lazy.

Valloric
01-23-2009, 04:48 PM
If the developers on this group would really like to build an end all ePublishing tool I'd like to see us create a Plugin around Open Office, an excellent Word Processing tool.

I hate depending on other people's code. Also, to me this is supposed to be fun and challenging. I also don't want to depend on something like Word (which people have to buy first) or OpenOffice (which is a resource hog). Both are made for entirely different things.

I've stated my goal: something like BD for epubs, without the suck and with better features. Further down the line, the goal is to extend it to export most of the other ebook formats.

And who said anything about making a be-all-end-all editor for the publishing industry? When did that come into play? The target audience should be MobileRead members and other ebook enthusiasts. Naturally, the more people who like it and use it, the better.

If anyone wants to create a plugin for some pre-existing editor, great. The more editors we have, the higher the chances one of them won't suck.:rolleyes:

Valloric
01-23-2009, 04:54 PM
And I would urge you to consider makeing the panes vertical since most ebook readers are vertically oriented and in any case most people prefer to read text in narrow columns

I prefer horizontal, but I'll make it switchable to vertical. I'm writing this down.

Zelda, this is a very useful thread.

zelda_pinwheel
01-23-2009, 04:55 PM
Zelda, this is a very useful thread.

i am thrilled to see the response it's getting, and i really can't wait to see what tools will come out of it. :) :2thumbsup

mtravellerh
01-23-2009, 05:41 PM
Right now the direct of this thread is geared towards format shifters not editors or writers. But the demand for a good editing en ePublishing tool is there Iíve had serveal athuors. conatct me thanking me for the BookCreator tool and how it facilitates their eBook creation process withouth having to get down and dirty in the code.
=X=

Actually "format shifters" have to do a lot of editing if they want to create good books

=X=
01-23-2009, 07:11 PM
I hate depending on other people's code. Also, to me this is supposed to be fun and challenging. I also don't want to depend on something like Word (which people have to buy first) or OpenOffice (which is a resource hog). Both are made for entirely different things.

something like BD for epubs,
Understand. A couple points.
I never mentioned using somebody's else code, so I don't understand where you are make such a statement.

If you are referring to using somebody's else product, there isn't a tool here that isn't using another persons product.

Writing everything from scratch might sound fun but it is daunting and long endeavor, also remember you idea of fun might not necessarily mean the same for others.

Actually "format shifters" have to do a lot of editing if they want to create good books

So you do want a feature rich WYSIWYG editor? Or would you be happy editing with a tool lie emacs/VI. ( Note Noting wrong with the latter I use VI all the time either for righting software or before importing html code generated from pdftohtml into Word.) I'm just not clear on the point your making is all.

=X=

Valloric
01-23-2009, 08:31 PM
Understand. A couple points.
I never mentioned using somebody's else code, so I don't understand where you are make such a statement.

If you are referring to using somebody's else product, there isn't a tool here that isn't using another persons product.

Um... if you're building a plugin for an existing application, your code directly depends, by definition, on code someone else wrote. The code of the application your plugin is plugging. So Book Creator directly depends on Microsoft code: if they make certain changes that break your application, you can try to work around that, but that's it.

And since Book Creator (I believe) uses Calibre to actually create LRF, EPUB and other formats, it also depends on code Kovid writes (even though it's not a Calibre plugin). So if Kovid makes an unfortunate mistake in his, say, LRF output code, your application's LRF output breaks.

Book Creator is an amazing application for people who want a Word plugin. I don't. I'm sure a lot of people do.


Writing everything from scratch might sound fun but it is daunting and long endeavor, also remember you idea of fun might not necessarily mean the same for others.

I've never said I'd be writing everything from scratch. Do I look insane? I hope not. I plan on using as much GPL code as I can get. But stable GPL code, like wxWidgets, Webkit etc. Everything I don't really need I don't plan on using. I can't use Calibre, because it would defeat the whole purpose of the editor, which is eliminating the converter from the equation and having output on your display that is as near to final as you can get. Also, it's a dependency I don't need. Epub is an open standard, and I can write code that outputs it myself.

And although my primary objective is to create a useful epub editor, my second objective is to create a library of clean, fast, portable and very well documented C++ code for outputting ebook formats, starting with epub. Others can then use that code to create other editors, converters or whatever.

mtravellerh
01-23-2009, 08:37 PM
So you do want a feature rich WYSIWYG editor? Or would you be happy editing with a tool like emacs/VI. ( Note Noting wrong with the latter I use VI all the time either for righting software or before importing html code generated from pdftohtml into Word.) I'm just not clear on the point your making is all.

=X=

Well, I wasn't talking about me specifically at all. And my point was that sometimes converting a text to a readable ebook is involving a lot of editing.

GeoffC
01-25-2009, 05:55 AM
:rofl:(sorry for my perverted thinking):o

When I do proofreading, I prefer to have the window panes side by side

I prefer horizontal, but I'll make it switchable to vertical. I'm writing this down.

Zelda, this is a very useful thread.

I prefer window panes vertical for proofreading, especially when each pane is looking directly at the same line.
But there ought to be no problem in giving either option?

HarryT
01-25-2009, 06:23 AM
I've yet to work with a tool that worked well when the user was allowed to modify the source as well as work in a WYSIWYG environment. Either the user is forced to work with both modes or has to choose one view. In other words most of these editors tend to favor one style of edition or implement both poorly.


I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.

Xenophon
01-25-2009, 03:12 PM
I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.
Take a look at the LyX editor for TeX files. It provides exactly that sort of interface -- with the caveat that the on-screen view is representative of what you'll get rather than exactly what you'll get. You still need to look at the final output.

Anyway, LyX provides both direct insertion of raw TeX (it shows up on-screen as evil-red-boxed-text) and direct editing of the underlying TeX source when necessary.

Xenophon

=X=
01-27-2009, 12:52 PM
I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.

Yes the HTML editor is something BD did well and it did a great job displaying the HTML in the WYSIGYG editor. However my experience with the editor was terrible. I found doing anything but the most trivial was extremely difficult.

It was actually Patricia that gave me the idea to write a BookCreator, she mentioned all her editing work was in Word then imported it to BookDesigner. I thought what a great idea, so when I created BookCreator all I wanted was a template with some marcos to make the editing easy. Then import it to BD.

Eventually it's grew to building it's own formats.
=X=

Komenor
02-09-2009, 12:59 PM
DESIRED FEATURES

1. epub files must be valid (html tidy, epub check...) and conform to best practices.

2. the editor should be able to accept multiple html / xhtml flows and create one document with a hierarchical TOC. It should also be able to accept one html flow and semantically parse the hierarchy of the document (part, chapter, section...) according to the tags used (h1, h2, h3, etc.), creating logical divisions for a properly structured epub document and a hierarchical TOC, the way that feedbooks does.

3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).

4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.

I'll probably have more to add later but i've got to go for the moment, so i'll stop here and open the discussion. What do you want from an editor ? i'll be looking forward to seeing what comes out of this !It must be very interesting (and relatively hard ;)) to write a small XHTLM editor with basic functions. I think about something like the interface used for writing the messages in this forum or like that : http://www.kevinroth.com/rte/demo.htm. To be sure to be complient with XHTML format, the source code should not be shown/editable to/by the user. Zelda, do you think that a thing like that, integrated in an ePub generator tool, can be useful ? :chinscratch:

zelda_pinwheel
02-09-2009, 01:50 PM
It must be very interesting (and relatively hard ;)) to write a small XHTLM editor with basic functions. I think about something like the interface used for writing the messages in this forum or like that : http://www.kevinroth.com/rte/demo.htm. To be sure to be complient with XHTML format, the source code should not be shown/editable to/by the user. Zelda, do you think that a thing like that, integrated in an ePub generator tool, can be useful ? :chinscratch:
yes, possibly, although i think the source code should be accessible to users who know what they are doing. i prefer to check my source and write most of my code myself ; that should be possible.

Komenor
02-09-2009, 03:17 PM
yes, possibly, although i think the source code should be accessible to users who know what they are doing. i prefer to check my source and write most of my code myself ; that should be possible.Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...:chinscratch:

Valloric
02-09-2009, 07:22 PM
Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...:chinscratch:

You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.

Let me elaborate...

Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.

OK, so the application accepts an HTML file. Is the file valid HTML? You can't make that a precondition. I'm sorry, you just can't. Most HTML out there is nowhere near being valid, and the user could need to import HTML he didn't write himself.

So the app needs to accept invalid HTML, that is, HTML that display OK on a modern browser but that does not follow the required standards. And with that, you just blew any possibility of having a guarantee that the epub you export will always be standards compliant.

Why?

Well, you can't design a useful algorithm that accepts invalid HTML and outputs valid HTML. A useful algorithm would have these requirements, for any input:

1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.

The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.

Now, you could design an algorithm that fulfills both requirements for some input, but not for all. And no, not even Tidy can give you that, because it is theoretically impossible.

So you're stuck now. You can't guarantee your users that you will always output a valid epub file no matter what they import. You can do your best (and you should), but in the end... The second requirement is much more important than the first one. So you fix what you can and possibly tell the user about what you can't.

If they really care about producing a valid epub file, they will have to fix the errors your app can't fix themselves. And so you make it easy for them and give them access to the source. And if they introduce any errors whilst editing the source, it's their fault. They will probably have to fix it by editing the source, too.

Now if you wanted an editor that could only create epub files from scratch, then you could guarantee standard compliance if you disallow direct source code editing. But you don't want to make that kind of editor.

Your output can only be as good as the input (maybe slightly better, for trivial errors in the original file). The editor can't turn shit into gold, and can not give guarantees about compliance. Any that does is flat-out lying.

llasram
02-10-2009, 12:01 AM
1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.

The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.

Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.

Jellby
02-10-2009, 05:25 AM
The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.

Valloric
02-10-2009, 09:42 AM
Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.

Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.

There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.

But that doesn't mean the application can't fix some errors and output valid XHTML. I'm just saying you can't guarantee compliance and not have to mangle the input in some situations. And even then it wouldn't work for some cases.

The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.

My working idea too. Fix what you can, inform about what you can't, but don't mangle the input in any way or form. It is more important to guarantee to the user that you won't make some tiny change half-way through the novel he's importing than it is to guarantee standards compliance.

You can't piss off your users by trying to twist and turn their HTML into something it can't automatically become.

llasram
02-10-2009, 11:15 AM
Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.

There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.

I really don't understand what you're getting at I'm afraid. I could write "fubby ducky loopy sunbird" and mean "Good morning, how are you?" and there would be no chance of conversion because the intent is all in my mind. With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering. Converting arbitrarily bad HTML into XHTML which displays the same is simply a matter applying the same rules the browser does in order to produce the box model instance it renders.

XHTML validity is a property of two components: XML validity and adherence to the XHTML schema, yah? Conversion of HTML w/o closing tags to valid XML with complete elements can be tricky, but the browser necessarily does essential the same thing in deciding what content ends up within what boxes. The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce. Producing schema-validating XHTML is where my proposal to strip all semantic tags comes in. CSS-based rendering doesn't care if you have a <div/> within a <p/> or a <sup/> within an <a/>. One just needs to extract the CSS applied to each element, then convert the element tags into ones which validate against the schema.

Valloric
02-10-2009, 12:11 PM
With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering.

The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce.

There is no argument here.

I agree that you could very well design an algorithm that converts non-valid HTML into valid XHTML for most HTML people will write. It's what your "lxml.html" library does (although I've never used it) and it's what Tidy does as well.

But you can't do it for all possible arbitrarily bad HTML. You're assuming the user checked how his source displayed in a browser. If he did, then it's not a matter of parsing arbitrarily bad HTML. It's not a non-deterministic rule system anymore: the source follows the deterministic rendering rules of the browser he used to check his work. Converting from a deterministic language to another deterministic language is certainly possible. And while you could say that the vast majority of HTML authors would do just that (check the display in a browser) before importing, you can't categorically state it.

So let's sum this up... you can create an algorithm that can convert most practical non-conforming HTML into valid XHTML, but not all HTML one could write. If one were to say he could, one would be shoving a grave ignorance of computer science theory.

Komenor
02-11-2009, 10:25 AM
You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.

Let me elaborate...

Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.I never said that my hypothetic editor will be able to import (X)HTML !

If it is only for modifying the fonts, the justification and other text formatting, the editor must only accept to import pure text file and only that.
Then give tools for text formatting (plus eventually "tables" and "pictures" support).

It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).

Valloric
02-11-2009, 01:46 PM
It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).

A "good" editor would embed some sort of validation of the final epub file. So if you don't get a warning when exporting, you're in the clear. And most of the time, the editor will be able to convert the user's non-conforming HTML into conforming XHTML.

Here's several use cases:

1. The user imports valid HTML. It is easily converted into XHTML. He then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

2. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and succeeds. The user then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

3. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.

4. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds errors. The user is informed, but the file remains--maybe the user doesn't care (if it's a file for personal use... who knows). If he does care, he makes more changes, and tries to export the file. The change/export process repeats until no errors are thrown.

So you see, the user can get an epub file that is certifiably valid.

mtravellerh
02-12-2009, 05:59 PM
Now THAT makes sense. Can't wait for that piece of software, honestly!

Timoleon
02-13-2009, 01:25 PM
Valloric's comments #53 should be used as a touchstone for any decent ePub editor. Great analysis and synopsis! :2thumbsup

GeoffC
02-13-2009, 01:39 PM
Seemingly complex, though?

Valloric
02-13-2009, 05:36 PM
Seemingly complex, though?

It's as complex as it needs to be. If you remove something, you negatively impact the quality and usefulness of the editor.

From the programmer's perspective though, it is fairly complex. But the user doesn't care about that, does he? Of course he doesn't, nor should he.

GeoffC
02-14-2009, 05:48 AM
It was, of course, the complexity of the programmers task that I was referring to.

richardigp
02-23-2009, 07:01 AM
We just threw an Open Office to ePub Convertor into fray. It goes by the name of eScape. It does most of the advanced styling and formatting that is on the wish list above. Auto generation of OPF, NCX, etc. and free form modification of Stylesheets to create a book the way you want it to look. You can read about it and try it here (http://www.infogridpacific.com/igp/AZARDI/eScape%20-ODT2ePub/). It's completely free for non-commercial use, but not Open Source.

It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on. There is a growing online tutorial here (http://www.publisherdams.com/reader/content/c-0002184/?a=lc), so you can see if you can live with this different approach.

There are about 30 styles including drop & raised caps, small-caps, and lots of other blocks like epigraph, extract, notebox, code, boxed text, poem, notes, references, etc. All major book sections are predefined. If you want to comment, suggest please do so at our Publishing With XML (http://infogridpacific.typepad.com/publishing_with_xml/2009/02/escape-open-office-to-epub-convertor.html) blog.

Valloric
02-23-2009, 03:31 PM
It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on.

Questions:

1. How do you convert existing epub books to your format? Is it even possible to load existing epub books and edit them?

2. How do you guarantee display fidelity? Last time I checked, OpenOffice.org did not have an advanced XHTML renderer.

3. SVG? OO.org doesn't support it. Do you?

4. How do you handle the "longdesc" attribute? Do you support it?

5. Object tags?

6. DTBook?

7. XML islands?

8. Font embedding?

These are just from the top of my head. Haven't yet had the time to try out eScape, but I'm going to.

richardigp
02-24-2009, 01:36 AM
Questions:
1. How do you convert existing epub books to your format? Is it even possible to load existing epub books and edit them?

Over on the XML blog (http://infogridpacific.typepad.com/publishing_with_xml/2009/02/escape-open-office-to-epub-convertor.html) we explain that we created eScape as an easy way to make more than respectable ePubs from Open Office, more or less because we could do it. This was in response from a lament by Dave on the Teleread blog about there being no way to work with ODT and there should be a plug-in. So its ODT2EPUB only at present. eScape makes the packaging so easy, just focus on the editing (following a few basic rules), export XHTML and click.

I have also been on a lot of the forums here and seen how hard people find it to even make a simple drop cap. It doesn't have to be so hard! The problem is the approach is wrong meshing content & styles, instead of having clarity with structure. The whole idea of eScape is to have an existing ODT and make editorial corrections there in a user friendly, powerful content editing environment. You can then generate an ePub in just seconds, corrections and all. But the output is pure, consistent XHTML, with class statements as structural identifiers. You then select the style sheet of your choice, and eScape processes it together into an ePub package.

We don't have an ePub to ODT importer, but probably could. The approach would be to strip all styles inside if they aren't the eScape Structure-Styles because that would be the only way to maintain the structural integrity. That I think would get howls of anguish. We are looking at putting a file edit/repackage mode into AZARDI, but that is only on the drawing board and we have a lot more to do there with package checking/reporting first.

2. How do you guarantee display fidelity? Last time I checked, OpenOffice.org did not have an advanced XHTML renderer.

Not quite sure what you mean by display fidelity - making it look like the OOo file? That is exactly what we are not doing with eScape. We are using OOo to apply consistent Structural styles so the XHTML that comes out is absolutely the same for all books, and addresses the core text book structures, block styles, paragraph styles and inline presentation issues that seem to plague all e-book production environments.

eScape is a different approach. It lets anyone achieve the XHTML nirvana of separating content, structure and style. The XHTML is so consistent you can have any number of optional stylesheets and apply them to any ePub package and they will present accordingly. In effect eScape is saying to the reader, I will handle the XHTML (trust me!?), just use these Structure styles to tell us what you want various content blocks to be.

This is the core of good e-book packaging - especially for reflow. So while we exist in a Babel of ePub readers you can have a stylesheet optimized for Stanza, one optimized for ADE/Sony, another for whatever, and we can all stop crying about how everything doesn't do everything.


3. SVG? OO.org doesn't support it. Do you?

We don't support it in eScape, in fact we don't support any images in eScape for the reason that OOo makes it hard to get the output size from XHTML. We would have to put into place more "rules" which would make the tool more difficult to use at the OO level. We left it out of this version to see if there is significant interest. For 99% of the books on MobileRead images (other than cover) - and trade & retail books in general don't have images. For those that do - there are other options.

The biggest problem most people seem to have is getting a drop cap, or other text presentation styles. or simple formatting for text-only books without a lot of anguish unless they are talented HTML/CSS experts. We tried to bring in clarification of structure vs. styling with advanced content block structures, lines and character styles applied directly to the content. We are waiting to see if it is of interest. eScape addresses the presentation issues of standard books by separating the XHTML structure and CSS at the point of origin. The CSS files can then be manipulated to your hearts content.

4. How do you handle the "longdesc" attribute? Do you support it?
No images - No longdesc necessary.

5. Object tags?

Not with eScape.

6. DTBook?

Not with eScape.

7. XML islands?

Not with eScape. Not easy in OOo, or any other visual editor probably. This would require some style based "include/insert" statement to process the remote XML in at an insertion point - Eg: to put native MML into the file as an Inline or Out of Line Island. This implies a level of expertise that I think goes beyond the target user of eScape - and just about any other system. It also goes beyond the ability of most reader apps to handle it. As with the MML example, the Reader has to be able to render it, or send it to a processor such as MML2SVG and then display it. Inserting some DocBook into XHTML using islands is probably trivial, but why bother?

8. Font embedding?

Not with eScape. We see font embedding as really important for e-books with class that compete with print for presentation quality. Interestingly (from what we have observed) ADE doesn't handle these correctly in that they display fonts that are not in the manifest but are declared in the CSS - I have a feeling (but can't say for sure) InDesign ePub packaging works like this. Our full commercial packager handles font embedding, and I suppose it wouldn't be hard to have an extra input directory - fonts; but the user would be responsible for the application of the fonts in the CSS to specific styles. We wouldn't do font embedding allowed permission checks however: that would be up to the integrity of the user.

eScape is a pure production environment for ODT to ePub (via an XHTML intermediary). It uses powerful separation of content, structure and styles to give new production options if someone wants to maintain their source content in an ODT.

All of the advanced issues you questioned, Islands, fonts, objects, etc. are non-trivial, andit is interesting to see these brought up. These issues are addressed in IGP:FLIP (http://www.infogridpacific.com/igp/Products/Publishing%20Products/IGP:FLIP/), but unfortunately we are not giving that away today, although there is a sandbox site (http://www.publisherdams.com/sandbox/) where anyone can play.

Valloric
02-24-2009, 10:39 AM
Thank you for the very thorough response. While the lack of image, SVG and font support is unfortunate, you do cover a large volume of books one would want to create.

By "display fidelity" I meant "It looks on my screen the same way it will look on conformant Reading Systems". You seem to be going a different route though (not necessarily a bad thing).

The other thing that bothers me a bit is that a power user does not have direct access to the source code that will end up in the epub file. While most people don't need this level of control, some do.

Lastly, it's disappointing that you use a nice cross-platform editor like Writer from OpenOffice.org and then make your final converter Windows only. :(

richardigp
02-24-2009, 11:23 AM
On the fidelity issue, we have created the OO stylesheet to look like a classic book where possible (except for the coloured lines), but in this version we were unsure about how to put leading lines above and below a block extract for example, and then adding a first para style for the non-indented paras so they look good in the OO file. That will be more XSL work, but make the using/learning curve steeper. We will look at that in the future.

Interesting point about the XHTML source code, and I may blog about that a bit further, the difference between the Structure-Styling and other HTML environments, is that it always looks the same for the same structure, so in some respect you don't really need to see it (if you believe this!). I am not sure that this is the place to get to technical on this matter, but I might put an extra "chapter" into the online tutorial. So assuming the XSL's and slicers and dicers are working nicely, the XHTML elements and class statements are totally predictable and I can just crack open the style sheet, or make a whole series of custom style sheets for a range of looks and feels.

In one development version we did have an XHTML exporter, but thought that got too complicated. It output the book as a single, fully processed XHTML file - the one that is used internally before being split apart to create the final package.

Having said that, I think we were trying to address the make a good looking eBook fast with a bit more styling for someone who would prefer never to see the source. (are there any of those on this forum?)

From the believe it or not department; the last point is our shame! :o We are a Linux development house working primarily in Python and mainly do Web services applications. Some silly little interface issue stopped the deb, but its coming.

Mcnaz
04-22-2009, 08:22 AM
Hi all.

Sorry to jump into this thread at this late stage but I stumbled in from Google whilst doing research on an EPUB Maker type application I am working on.

Brief Background.

I am a Book Designer (BD)/PRS-505 fan but got very frustrated with the inability to customise my own searches/replaces though advanced RegEx.

I've started coding an APP in Delphi that presents the user with a Rich Edit (based in RichEd20.dll V3) control which gives some semblance of WYSIWYG (i.e. bolding and simple formatting).

The goal is to have a BD type tool that is able to import from various formats (lit, pdf and so on) convert to html (using clit, pdf2html and so on),
clean the HTML and present to the user.

ATM I am at the import and clean HTML phase (using a few LIT -> clit.exe) examples and the produced HTML is dreadful! I am resorting to stripping all the HTML and rebuilding paragraphs. HTML stripping is done via loading the html into an IE COM object and saving as text... not ideal so I am looking at LIBXML2 implementation instead

I can see one obstacle ahead and that is that internally the document is stored as RTF (for the WYSIWYG) and this will need to be converted into HTML/XML when exporting to EPUB. I am currently researching the ability to embed control characters into RTF (i.e. chapter 1, heading, subheading) for use during the export.

I've read the first few pages of this thread and although a few interesting features are mentioned my goal is primarily to develop a tool that is aimed more towards importing/converting/tarting up then EPUB output as opposed to a full blown publishing platform (maybe in version 4!).

Again, suggestions/help will be welcome (I will lurk here for a while)

I will keep the thread updated on my progress

Cheers.

zelda_pinwheel
04-23-2009, 08:37 AM
hi Mcnaz and welcome to the forum ! we are always really happy to hear about new epub creation tools being developped since at the moment we are still waiting for the "perfect" tool, and since different users have different needs. :) please do keep us informed of your progress ; you'll find plenty of people around here interested in trying it out and giving suggestions, if you are looking for that.

and don't hesitate to take a look around the epub forum to see what else is available, and get some useful information.

RealImages1
09-03-2010, 11:42 PM
Since I like writing my XHTML and CSS code by hand, I won't ask this from an editor, but I would appreciate a user-friendly way to generate and edit the opf (metadata, manifest, spine, guide) and the ncx (hierarchical table of contents) files.

I realise this is an old thread, but I agree absolutely with this post ! Has anyone found a way to do this ?
:thanks:

Adjust
09-03-2010, 11:58 PM
Just unpack an existing epub and reuse those files and simply change the content, and keep those files for new titles.

RealImages1
09-04-2010, 12:23 AM
Just unpack an existing epub and reuse those files and simply change the content, and keep those files for new titles.

Thanks for that, Adjust. That's exactly what I'm doing already. It just becomes very tedious (especially with the OPF). And it's too easy to make a mistake. I'm not a programmer, but I would have thought it would be relatively simple to build an applet which would scan the unzipped project files and generate an OPF. The NCX of course would need to be to generated first and would need some user input to determine play order.

GeoffC
09-04-2010, 02:51 AM
well; there's the hard way, and there's the hard way, no one ever said it was easy !

Andrew Brooks
10-25-2010, 06:27 AM
Hi, not quite sure where on this forum to post this,

I'm a photographer whos pretty new to ePubs/iBooks, been working on one of my work for the last few weeks and have it looking good on my iPhone and iPab, but now really interested in making it available to my network using the iBooks store. On running it through http://threepress.org/document/epub-validate/ it came up with this error on every sound and video file that is in the book

ERROR: New Worlds.epub/OPS/chapter-1.xhtml(5): unknown element "audio" from namespace "http://www.w3.org/1999/xhtml"

and

ERROR: New Worlds.epub/OPS/chapter-30.xhtml(5): unknown element "video" from namespace "http://www.w3.org/1999/xhtml"

Anyone know a good solution or way round this problem?

Also is there a file size limit on iBooks that can be uploaded, as mine features embedded video and sound it works out as just over 80meg.

I have put together the iBook using iWork pages.

Any help would be very appreciated,

Thanks for your time,

www.andrewbrooksphotography.com

GeoffC
10-25-2010, 07:54 AM
:hatsoff: Andrew

Welcome to mobileread ....

An expert will be along, no doubt!
Hope you get a solution.....iPad/iBooks are beyond my ken....

Jellby
10-25-2010, 02:31 PM
It seems you are using the <audio> and <video> elements, and those are not allowed in ePUB (yet).

st_albert
10-25-2010, 03:28 PM
Since I like writing my XHTML and CSS code by hand, I won't ask this from an editor, but I would appreciate a user-friendly way to generate and edit the opf (metadata, manifest, spine, guide) and the ncx (hierarchical table of contents) files.

I realise this is an old thread, but I agree absolutely with this post ! Has anyone found a way to do this ?
:thanks:

You could import the XHTML and CSS files into sigil. It will do the grunt work.

RealImages1
10-31-2010, 07:30 PM
You could import the XHTML and CSS files into sigil. It will do the grunt work.

Thanks for that st_albert. I've found that Sigil does a very odd thing with the css when I import - it makes a new copy of the css for every xhtml file. Is there a way around this ? I should mention that I'm running vers 0.2.4 because I'm on a PPC G5 and the latest version is not universal binary.

theducks
10-31-2010, 08:33 PM
Thanks for that st_albert. I've found that Sigil does a very odd thing with the css when I import - it makes a new copy of the css for every xhtml file. Is there a way around this ? I should mention that I'm running vers 0.2.4 because I'm on a PPC G5 and the latest version is not universal binary.

You are importing multiple EPUB filesand trying to combine them into a single EPUB?
Even if the stylesheets are identical, Sigil will rename them and keep them separate because it does not examine each item and compare.

If YOU know stylesheet1 is the same (or subset) of stylesheet:
just edit (CV) the line in each document (Search and replace)to read stylesheet. (or pick the one that is the super set of them all)

Then delete the 'extras'

Ankh
10-31-2010, 09:17 PM
Font embedding (or at least basic embedding) seems to be quite simple. I guess WISIWYG support is a different thing.

Ancient thread, but my need is current. Like Jellby, I also like to edit files by hand.

But what I really lack at this time is a command line tool, accepting two parameters, input and output ePub file. What I want that tool to do for me is to parse all XHTML elements, and then use those results to do font subsetting for all embedded fonts (cut out all glyphs that are not used in that particular edition).

RealImages1
10-31-2010, 10:30 PM
You are importing multiple EPUB filesand trying to combine them into a single EPUB?

No - it's a single ePub, but I'm importing the original unzipped files. There are 35 xhtml files, 1 css and 5 images. No problem with the import process itself. Just that Sigil creates new css files for each xhtml file. So I have 35 duplicate css files which seems a tad wasteful.

st_albert
11-01-2010, 08:52 PM
Hmm, I don't think I've had that happen to me, though my usual MO when I do this is to import a single (huge) xhtml with embedded CSS, then create my own separate CSS file using the embedded elements, then split the xhtml into chapters. Which all inherit the same CSS file after splitting.

So I'm guessing here: It is probably because all of the xhtml files contain a reference to the external CSS file, so sigil imports that too, over and over, and presumably renames it to something like Style001.css, Style002.css, etc.

Are all the imported CSS files identical (except in name, of course)? If so you could delete all but one, then modify the references in each xhtml file to point to the one css file. Note that you can do search-and-replace with a regular expression ("regex") over all your xhtml files, so all the modifications can be done in one lick, rather than having to edit them all individually.

Help on using the power of sigil's regex search and replace is available elsewhere in this forum. Or if you get stuck, ask.


ETA: I should read more thoroughly. What I just said above would seem to be the case, and my suggestion is the same (though perhaps more verbose) as that of theducks, a couple of posts previous. Should be an easy fix if our assumptions are correct.

RealImages1
11-02-2010, 05:09 AM
Hmm, I don't think I've had that happen to me, though my usual MO when I do this is to import a single (huge) xhtml with embedded CSS, then create my own separate CSS file using the embedded elements, then split the xhtml into chapters. Which all inherit the same CSS file after splitting.

So I'm guessing here: It is probably because all of the xhtml files contain a reference to the external CSS file, so sigil imports that too, over and over, and presumably renames it to something like Style001.css, Style002.css, etc.

Are all the imported CSS files identical (except in name, of course)? If so you could delete all but one, then modify the references in each xhtml file to point to the one css file. Note that you can do search-and-replace with a regular expression ("regex") over all your xhtml files, so all the modifications can be done in one lick, rather than having to edit them all individually.

Help on using the power of sigil's regex search and replace is available elsewhere in this forum. Or if you get stuck, ask.


ETA: I should read more thoroughly. What I just said above would seem to be the case, and my suggestion is the same (though perhaps more verbose) as that of theducks, a couple of posts previous. Should be an easy fix if our assumptions are correct.

Many thanks for that clarification, St Albert,

Very interesting to hear of your workflow/MO. For myself, I prefer to get to the seperate chapter (etc) xhtml files and css as quickly as possible because that is where I feel most comfortable working. In her excellent ePub book, Liz Castro goes into great detail on how to manipulate files from Word or InDesign. I appreciate the effort she's put into it and the clarity with she writes, but I have to say I hate all that 'recovery from...' approach.

I see you are right in that theducks said pretty much the same thing as you did, but your (more detailed) version was more immediately understandable.

I have a number of thoughts in response, but rather than rambling on in the abstract, I would like to take a few days to experiment further. I'll probably get back on this !

The one thing I would say is that repeatedly importing the same file would seem to me to be a bug in Sigil. But then - I am on an older version. Maybe the latest one has fixed the problem.

Thanks again for your advice and especially the offer to help more.

Mike

st_albert
11-02-2010, 11:24 AM
The one thing I would say is that repeatedly importing the same file would seem to me to be a bug in Sigil. But then - I am on an older version. Maybe the latest one has fixed the problem.
Mike

Hmm, I don't think sigil checks each "new" possible css file to see if it's identical to one it has already imported. It just sees what resources are referred to in each xhtml, and tries to import them too.

RealImages1
11-03-2010, 08:37 PM
Hmm, I don't think sigil checks each "new" possible css file to see if it's identical to one it has already imported. It just sees what resources are referred to in each xhtml, and tries to import them too.

Well - I've learnt a bit of stuff !

I should probably begin by reiterating that all I'm wanting to do is find a simple way to generate clean, accurate OPF and NCX files once I've got all my xhtml and css done. Since my first post here in September, I have been working on an applescript to make the OPF. I have developed a version that gets me half-way, but still leaves a fair amount of hand coding. As I don't really know applescript (instead relying pretty heavily on snippets picked up from forums such as macscripter.net) I am not able to get it fully functional. Also it's very limited in that it has hard coded POSIX paths which are specific to my machine. I will probably continue to pick away at it as time allows, but for the moment I need something more immediately. Generating an NCX with applescript has defeated me altogether.

So - with Sigil, I decided the simplest way around the multiple css file imports was to make a duplicate of my main project folder (with all relevant sub-folders and files) and then do a multifile Replace to remove all links to the css file from every xhtml file. I can then import this modified version into Sigil with no problems. And Sigil then gives me a very nice OPF, which (after un-zipping the ePub) I can put back into my original project.

But the NCX is still a little problematic in that the TOC Editor in Sigil doesn't seem to allow for re-ordering. In fact it generates the TOC in alphabetical order and I cannot see a way to change this. Please tell me if I'm wrong about this ! The only solution I can think of is to name every original xhtml file with an ascending numeric system based on the order in which I want them to appear in the book - which is, of course, exactly what Sigil does when it's doing the whole job itself. It's important I think, to use a naming convention with increments of 10 (so: Chap010.xhtml, Chap020.xhtml, etc) in order to allow for later changes and additions. For instance an author might decide at the last minute that he or she wants to put in a dedication page.

Thanks to all those who've helped me here and if anyone's got comments or (hopefully, improvements) on this I would love to hear them.

Mike

theducks
11-03-2010, 09:35 PM
Well - I've learnt a bit of stuff !

I should probably begin by reiterating that all I'm wanting to do is find a simple way to generate clean, accurate OPF and NCX files once I've got all my xhtml and css done. Since my first post here in September, I have been working on an applescript to make the OPF. I have developed a version that gets me half-way, but still leaves a fair amount of hand coding. As I don't really know applescript (instead relying pretty heavily on snippets picked up from forums such as macscripter.net) I am not able to get it fully functional. Also it's very limited in that it has hard coded POSIX paths which are specific to my machine. I will probably continue to pick away at it as time allows, but for the moment I need something more immediately. Generating an NCX with applescript has defeated me altogether.

So - with Sigil, I decided the simplest way around the multiple css file imports was to make a duplicate of my main project folder (with all relevant sub-folders and files) and then do a multifile Replace to remove all links to the css file from every xhtml file. I can then import this modified version into Sigil with no problems. And Sigil then gives me a very nice OPF, which (after un-zipping the ePub) I can put back into my original project.

But the NCX is still a little problematic in that the TOC Editor in Sigil doesn't seem to allow for re-ordering. In fact it generates the TOC in alphabetical order and I cannot see a way to change this. Please tell me if I'm wrong about this ! The only solution I can think of is to name every original xhtml file with an ascending numeric system based on the order in which I want them to appear in the book - which is, of course, exactly what Sigil does when it's doing the whole job itself. It's important I think, to use a naming convention with increments of 10 (so: Chap010.xhtml, Chap020.xhtml, etc) in order to allow for later changes and additions. For instance an author might decide at the last minute that he or she wants to put in a dedication page.

Thanks to all those who've helped me here and if anyone's got comments or (hopefully, improvements) on this I would love to hear them.

Mike
Just re-order (drag to the proper read order place) your book list and the TOC follows

RealImages1
11-03-2010, 11:17 PM
Just re-order (drag to the proper read order place) your book list and the TOC follows

Ha ! Of course. Silly me. Guess this is what happens when you try to use part of an application without actually learning the whole thing. On the other hand - if it's called a "TOC Editor", maybe it's not entirely crazy to expect it to edit the TOC...

Many thanks.

theducks
11-04-2010, 12:02 AM
Ha ! Of course. Silly me. Guess this is what happens when you try to use part of an application without actually learning the whole thing. On the other hand - if it's called a "TOC Editor", maybe it's not entirely crazy to expect it to edit the TOC...

Many thanks.

It does, in a way ;)
Say your Entry says "Bapter 7"
You can change it in the TOC editor to "Chapter 7" and it makes the change in the main document.
Anything you do in the TOC editor, affects the main document. Un-ticking, makes a notation in the h# tag to hide from Sigil TOC.

The thing you CAN NOT do is make the TOC say one thing (from the editor) and have another value in the actual document. For that, you need to manually add title="BlahBlah" to the tag, which overides the normal text.

st_albert
11-04-2010, 07:16 PM
Regarding the multiple css files issue, this appears to have been addressed in the new sigil 0.3.0 FINAL release.

Just announced today. See http://sigildev.blogspot.com/2010/11/030-final.html

RealImages1
11-04-2010, 09:38 PM
Regarding the multiple css files issue, this appears to have been addressed in the new sigil 0.3.0 FINAL release.

Just announced today. See http://sigildev.blogspot.com/2010/11/030-final.html

You're right. Unfortunately, although this version is headlined as "Universal", it doesn't run on an older PowerPC G5. :eek:

st_albert
11-05-2010, 01:03 AM
You're right. Unfortunately, although this version is headlined as "Universal", it doesn't run on an older PowerPC G5. :eek:

Sadly, support for Mac OS Tiger (10.4) was dropped a month or two ago. Maybe one of the early 10.5 (leopard) versions will run on your machine. Look into linux if you can.

Else, buy a cheap PC (and put linux on it).

[sardonic] You can't stop progress. [/sardonic off]

RealImages1
11-05-2010, 02:13 AM
Sadly, support for Mac OS Tiger (10.4) was dropped a month or two ago. Maybe one of the early 10.5 (leopard) versions will run on your machine. Look into linux if you can.

Else, buy a cheap PC (and put linux on it).

[sardonic] You can't stop progress. [/sardonic off]

In fact, I am running Leopard. It's not having an Intel processor that's the problem. Sigil is not alone in abandoning support for PPCs. Apple themselves now pretty much ignore them as well. So - it's not so much about 'progress' as such - it's more a form of prejudice like racism or sexism. Owning a PPC puts one very firmly on the wrong side of the tracks.

Having said that, this old G5 hums along very sweetly.

After 25 years on PCs I switched to Mac about 18 months ago. Exercising great self-restraint, I will refrain from giving you the usual "Mac-convert" quotes, but I will say that I weep now thinking of all those wasted years and all that unnecessary gnashing of teeth.

Valloric
11-05-2010, 06:32 AM
In fact, I am running Leopard. It's not having an Intel processor that's the problem. Sigil is not alone in abandoning support for PPCs. Apple themselves now pretty much ignore them as well. So - it's not so much about 'progress' as such - it's more a form of prejudice like racism or sexism. Owning a PPC puts one very firmly on the wrong side of the tracks.

I was about to respond with sensible technical reasons why Sigil dropped support for Tiger and PPC's, but then I read that bolded part, and now I don't see the point anymore. It's so patently absurd it doesn't deserve a response.

RealImages1
11-05-2010, 06:26 PM
I was about to respond with sensible technical reasons why Sigil dropped support for Tiger and PPC's, but then I read that bolded part, and now I don't see the point anymore. It's so patently absurd it doesn't deserve a response.

My sincere apologies, Valloric. I meant no offense. It was supposed to be tongue in cheek. I should have put a smilie there.:smack:

Sigil is a great piece of software and it leaves me gobsmacked that you make it available free.

If you can accept my apology, I would actually be very, very interested to learn something about the limitations of PPCs.

Mike

KLUTCH
11-16-2010, 09:14 AM
A full regex search and replace :)

N13L5
06-11-2011, 12:25 PM
Guys, why u stop posting? :dry:

This is such an interesting necro thread!



Valloric somehow manages that my eyes don't glaze over when he explains stuff about xhtml and such...


Anyway, Sigil rules and ID CS5.5 sux

ID5.5 It lets you format away without warning until you've overdone it and then just crashes on epub export :P

No wonder that old prick Jobs is kicking Adobe to the curb.

Sigil should cost $500 and ID be freeware... but everything is upside down on this lost planet in the boondocks of the milkyway...

:party4::dtw::party4:

ProDigit
04-14-2012, 05:46 AM
The perfect program for me would not be an automated program, but a manual one.
Since I use so many advanced feats to get 'perfect' formatting, which I think a good manual format, can oft be better than an automatic one.

I ALWAYS format my books from basic HTML code. In HTML I can clearly see what codes/brackets are opened and unclosed etc.
For this purpose my "perfect" program would need to consist of a mix of many advanced programs out there (which I will never find).

First I use Kompozer, to copy and paste HTML text. Its a WYSIWYG editor, so I can see the overall layout.
Then I can remove most hyperlinks with a rightclick.

Then I need notepad++ for it's advanced editing functions and macro's, something Kompozer or NVU does not have.

Once my HTML is cleaned up, and manually edited,I need a program to compress my ePub book with. Since Notepad++ can be used to create all files including toc.ncx, and mimetype and all others, all I'm missing is a compression program that will be intelligent enough to know that mimetype does not need to be compressed, and powerful enough to compress the rest of the files with a high compression ratio.
It also needs to be able to update files within it's archive. Currently I'm still looking for that program, but something like winrar or 7z, but for ebooks.

And lastly it needs to have an internal epub reader, to verify if the epub is displaying correctly.

For all these steps I use separate programs.

Why I DON'T want a program that does it all, is because 7z specializes in compression. Notepad++ specializes in search and replace, and html editing as well as any other text (non binary) document; and Kompozer is a WYSIWYG editor, that also is able to remove certain anchors and links, that will help me clean up my HTML.

A single program would need to be so extensive, that it will take more than 4 years to develop (which is less than 1/4th of what it took to create Kompozer, Winrar/7z, adobe Digital Editions, and Notepad++ combined).
A single program just performing the basics of what I wrote, will most likely be a buggy, and not a very efficient one.
Even if it was, it would probably put a lot of bogus code in a book. If I as a writer decide not to put a title in my book, I should have the freedom to do so.
If I decide that using 'class' and other CSS codes in my htmls, to save time and code, I should be able to use it, however if I find that using CSS takes too much code, and I can write my book simpler and better using no CSS, just basic HTML, I should also be able to do that.

I think large programs have issues with that. That's why I prefer ultimate control over what goes into my epub, and what does not.

democrite
07-03-2012, 04:58 PM
Does anyone have a favorite Word to ePub maker? I use Atlantis and it has most of the features I want except tables. Have tried a few others like Calibre but I'd prefer to have:

tables
chapters from word headings
footnotes

Not sure if there's anything that'll do that.

ProDigit
07-03-2012, 06:39 PM
Does anyone have a favorite Word to ePub maker? I use Atlantis and it has most of the features I want except tables. Have tried a few others like Calibre but I'd prefer to have:

tables
chapters from word headings
footnotes

Not sure if there's anything that'll do that.

Try making your documents in HTML. You'll have most flexibility when converting html to epub, as depending on which word editor you use, you may lose formatting over conversion.

democrite
07-03-2012, 08:16 PM
Try making your documents in HTML. You'll have most flexibility when converting html to epub, as depending on which word editor you use, you may lose formatting over conversion.

Tried it. Thanks. calibre seems to make footnote numbers superscript without the change in size. Maybe that's part
of Word's HTML but there were other issues. I could copy and paste the tables but something simpler in the future would be nice.

Toxaris
07-04-2012, 03:08 AM
I just use Word and export it to HTML via my macro. That will keep your tables. However, only 'simple' tables. No merged cells.

ProDigit
07-05-2012, 02:55 PM
I have several tools to make HTML.
If you use MS OFFICE or OOO OpenOffice, they add too much crap to a HTML.

Create your document in MS word (or something). Save it as an HTML, open it in a browser, and select and copy all the information on your screen. Then paste it in VMU or KOMPOZER. You can further trim down with these programs if you wish. Once you finish with that, you'd have about 10-30% of your HTML cleaned up.

What I personally do, is after that, do further downtrimming in Notepad++, because of it's superior 'search and replace' feats. It makes it very easy to trim an HTML between 30 upto 50% of it's size, by only keeping the very basics of the HTML.

I don't really care about font, as one can always change it in the reader.

For that reason, what makes less of an importance on e-ink devices is fonts and classes; thus it's one of the first to go on an html.
Get rid of classes, and font changes (size/type) and advanced commands and stick to basic commands. Just keep basic Headings (H1, H2,H3), Perhaps define font type once in the body, text formatting (Bold, Italic, Underlined), and that's it.
text that is Heading automatically will be centered on some e-readers, and you can also get rid of strange numbers, or font colors.

keep pagebreaks (HR; any heading will automatically have a page break before it, so no need to insert it after a chapter), in your case tables (TR), etc.

There are many codes. Keep a list like this close to you:
http://www.htmlcodetutorial.com/quicklist.html

There are also sites out there that allow you to test your code in a browser, but even if it works on a computer, there's no guarantee how well, or even if, it displays on the e-ink device.


Not all HTML codes can be converted to epub, and some auto-converters, like calibre, might substitute codes for other (or similar) codes that may not have the effect you wished to achieve.
It's with trial and error that you'll figure out which codes are transported into Epub and which are not; and which programs transport the most codes. For that reason I prefer a tool I found somewhere on this site, that basically zips the HTML into an epub zip file, after you manually created all other docs (like TOC and so forth).

To have the best (perfect) results, best is to do it manually.
Create a good and clean HTML, trimmed down to the basic necessities of displaying the info, then convert it to Epub. You'd often find that apart from some specific font, or text alignment, applying this principle could save you lots of book space, and reduces auto-conversion errors (which are not solvable).
Doing things manually takes up a lot more time, but gives you control of even the smallest details. In automatic conversion, you basically rely on the convertor to understand or interpret your creation correctly (and you'll not have that annoying BookDesigner/Calibre sticker pasted at the end of your book).

N13L5
07-06-2012, 02:20 AM
Does anyone have a favorite Word to ePub maker? I use Atlantis and it has most of the features I want except tables. Have tried a few others like Calibre but I'd prefer to have:

tables
chapters from word headings
footnotes

Not sure if there's anything that'll do that.

I've tried all kinds of tools recommended here or found with google searches...

The first one I'm really happy with is Adobe InDesign CS 5.5...

I am normally not fond of Adobe, their user interfaces have been all backwards and unintuitive to me (except GoLive) and InDesign has some of that too, where you really need to study the manual first, cause the software UI doesn't make it obvious... but the capabilities are great, I can't think of anything you can't do in it. It has a great, integrated plain text editor too. I mean really integrated, with all the spell checking and search/replace functions you could want. And you have your choice to publish to print, ePub, PDF or html pages. ePubs can be optimized for multiple specific resolutions, so your book will display nice on small screens or big screens with embedded pictures, tables or videos staying where they're supposed to be.

Since Adobe apparently wants print publishers to switch from Quark Xpress to InDesign, they had to make things work in a very precise and predictable manner for print output, and the ePub creation benefits from all those features.

Blablabla, I never thought I'd shill for any expensive, bloated Adobe product, but I like this one. And last time I checked, there was a free 30 day trial to try if it does what you want first, but be ready to use Youtube extensively to figure out how...

grumbles
08-28-2012, 05:52 PM
ProDigit said

I'm missing is a compression program that will be intelligent enough to know that mimetype does not need to be compressed, and powerful enough to compress the rest of the files with a high compression ratio.

Funny you should say that. I am currently working on just a program. It takes the name of the epub you want to create, the name of the directory holding all the files and creates the zip. the mimetype is added first (stored) and then all the files it finds are added. It searches all found subdirectories as well. It only creates the zip, it will not add or replace files in an existing zip (epub). The code is rough at the moment but I tried it on few unzipped epubs and it appeared to work. It's written in FreePascal and I have w32 and Linux executables. It might be useful to someone else. It will be useful to me.

Wall
02-07-2013, 11:06 AM
I'm missing a programme that can take an existing EPUB file and create a short sample - basically, something that will automatically cut everything but the first "chapter" and make the corresponding changes in the TOC and .opf files. Does anyone know if there's anything out there like that?

Turtle91
02-07-2013, 04:14 PM
Sigil

Open ePub...delete everything after first chapter...click the "TOC" button to regenerate...save (I recommend saving as a different filename otherwise you will lose everything....ask me how I know!)

JSWolf
02-07-2013, 08:21 PM
Sigil

Open ePub...delete everything after first chapter...click the "TOC" button to regenerate...save (I recommend saving as a different filename otherwise you will lose everything....ask me how I know!)

Open ePub...save as (to a different filename)...delete everything after first chapter...click the "TOC" button to regenerate...save.

Wall
02-08-2013, 05:36 AM
Thanks!