|
|
View Full Version : ePub creation tools : what's missing ? wishlist / dialogue
zelda_pinwheel 01-23-2009, 12:26 PM We've been having an interesting discussion in the french forum recently about what is missing in the apps we currently have available to us for creating epub files (i'm talking about editors, not converters), and what we wish we had. In other threads, at least 2 people (that i have seen) have mentioned their intentions to write an epub editor. In still other threads, there have been references to various details that could be done better. calibre is great in many ways, but it's not a fully-featured editor. feedbooks does a brilliant job making valid code and hierarchical structure, but a lot of people want an offline app, and feedbooks doesn't handle images yet.
So, I thought I would create ONE THREAD TO RULE THEM ALL where users can list their desiderata and any interested developpers could discuss what is possible / not possible, their intentions, etc. With a little luck we might even end up creating the ONE APP TO RULE THEM ALL right here ! wouldn't that be brilliant. :)
Here to start are some of the things we want as users and creators of epub books, taken from the discussion in french in this thread (http://www.mobileread.com/forums/showthread.php?t=36651). Please feel free to add your own, and I hope the developpers will be interested in participating / responding as well !! Valloric, llasram, kovid, wallcraft, Komenor, et al. i'm thinking of you, for example. ;) (<-- non-exhaustive list !!)
DESIRED FEATURES
1. epub files must be valid (html tidy, epub check...) and conform to best practices.
2. the editor should be able to accept multiple html / xhtml flows and create one document with a hierarchical TOC. It should also be able to accept one html flow and semantically parse the hierarchy of the document (part, chapter, section...) according to the tags used (h1, h2, h3, etc.), creating logical divisions for a properly structured epub document and a hierarchical TOC, the way that feedbooks does.
3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).
4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.
I'll probably have more to add later but i've got to go for the moment, so i'll stop here and open the discussion. What do you want from an editor ? i'll be looking forward to seeing what comes out of this !
Jellby 01-23-2009, 12:53 PM Since I like writing my XHTML and CSS code by hand, I won't ask this from an editor, but I would appreciate a user-friendly way to generate and edit the opf (metadata, manifest, spine, guide) and the ncx (hierarchical table of contents) files.
GeoffC 01-23-2009, 02:05 PM 5. Choice of end formats, to cater for readers that do not support ePub.
6. Pick and chose title-page image.
brewt 01-23-2009, 02:05 PM Font Embedding. The only thing that'll do that off-the-shelf now is Indesign, and well, that's the only thing it does "well".
Macros, search & replace (not just text but styles, fonts, sizes, etc), grammar checker, spell-checker, target viewing application compensation, multiple format input & output, auto paragraph re-justification & hyphenation, multiple language dictionaries, built-in browser, metadata management, multi-style select, discontinuous select, table generation/correction, cross references such as indicies and footnotes, rss feeds....
You said "killer app", didn't you Z? :)
-bjc
Valloric 01-23-2009, 02:13 PM So, I thought I would create ONE THREAD TO RULE THEM ALL where users can list their desiderata and any interested developpers could discuss what is possible / not possible, their intentions, etc.
This is a very good idea.
1. epub files must be valid (html tidy, epub check...) and conform to best practices.
I've been thinking about this and writing down ideas in my "epub editor idea book"... there are certain problems here.
For one, we have the epub standard. Naturally, all editors should output valid epub, that's a given, but the best practices... currently, the Sony Reader--more specifically, mobile DE--has limits on chapter length (content file limit is 300KB). While this is unfortunate, I can live with it. But there are other problems and flukes of mobile DE that one has to take into account. The page numbers on the side, for instance. One needs to add margins so the numbers don't overlap the text, etc.
So a standards compliant epub may not work in mobile DE, and if it does, it may not look as nice as one specifically tailored to it. And I'm not blaming DE on certain limitations (like the 300KB limit) that are the result of the platform it runs on. Other epub reading applications for the Reader (or other devices) would probably hit a similar wall.
My current working idea is this: the editor exports two types of epub files. Both are standards compliant. One follows the standard and nothing else. No size limits, no special margins etc. Pure epub, without any consideration for the reading application or the platform. Let's call this "Standards Epub".
Then there's the second output option. This epub type is also fully compliant, but has the size limitations, special margins and other necessities for an enjoyable read on something like the Sony Reader. Let's call this "Mobile Epub".
I know no one likes the idea of two different epub files. One risks the creation of a sub-format. But I don't see a way around this. A purely standards compliant epub is a must; current practical limitations may (and will) disappear with time. But one must also be realistic: there's no point in high-horsing around with compliant epubs that nobody can read on portable devices like the Sony Reader.
I am very much interested in what other people think about this.
3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).
I've been following the dropcaps thread(s), and it's a good example of why I would want an editor to export two kinds of epub files: one that follows the standard, and one that's tailored around flukes and idiosyncrasies of certain reading software.
4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.
My working idea is to have a WYSIWYG interface, but with a "view code" view, like for instance in Dreamweaver. If someone wants to mess with it directly, they should be able to.
kovidgoyal 01-23-2009, 02:19 PM 1. and 2. are incompatible.
kovidgoyal 01-23-2009, 02:43 PM Also I think you need to more clearly define what the use case is for this tool. Is it for authors who want to write books, or is it for proofreaders who want to touch up and convert existing books.
If the former, far more important that the features you have outlined would be features to support actual writing, like for example, keeping notes associated with characters/places/events.
If the latter then all that's needed is a wrapper around any good html editor with a wyswyg mode.
Valloric 01-23-2009, 03:04 PM If the former, far more important that the features you have outlined would be features to support actual writing, like for example, keeping notes associated with characters/places/events.
Who in their right mind would use an ebook editor to actually write a book? The editor I'm thinking of is for proofreaders and copyeditors.
zelda_pinwheel 01-23-2009, 03:14 PM This is a very good idea.
i'm glad you think so ; i had you in mind when i thought of it. ;)
I know no one likes the idea of two different epub files. One risks the creation of a sub-format. But I don't see a way around this. A purely standards compliant epub is a must; current practical limitations may (and will) disappear with time. But one must also be realistic: there's no point in high-horsing around with compliant epubs that nobody can read on portable device like the Sony Reader.
I am very much interested in what other people think about this.
well, you're right, this isn't the optimum situation, however it seems a reasonable compromise to me. particularly since the "standards epub" can also be used for archival purposes and as a source format for conversion to other formats as needed. i'm not an expert though ; i'll be interested to see other reactions.
My working idea is to have a WYSIWYG interface, but with a "view code" view, like for instance in Dreamweaver. If someone wants to mess with it directly, they should be able to.
right, that's rather the idea i had in my head as well.
1. and 2. are incompatible.
why are 1 and 2 incompatible, kovid ?
Also I think you need to more clearly define what the use case is for this tool. Is it for authors who want to write books, or is it for proofreaders who want to touch up and convert existing books.
the users of this tool as i see it are very much the *latter* case you describe : they are the same kinds of people using BookDesigner or ETI's eBook Publisher or Mobipocket Creator today, but who want a good tool for making epub files. so really closer to dreamweaver than to word or any writing tools. i imagine they would *already* have the finished text, either PD texts or their own manuscripts, and just want to turn them into ebooks. so, yes, a wrapper around a good html editor with a wysiwyg mode. but i think there are some particularities specific to epub that would need to be addressed (creation of the different file types, creation of the epub container...).
anyway this discussion looks to be off to a good start ! i'm glad to see it !
Valloric 01-23-2009, 03:25 PM particularly since the "standards epub" can also be used for archival purposes and as a source format for conversion to other formats as needed.
Exactly. There are people out there who want to use epubs purely for storage.
the users of this tool as i see it are very much the *latter* case you describe : they are the same kinds of people using BookDesigner or ETI's eBook Publisher or Mobipocket Creator today, but who want a good tool for making epub files. so really closer to dreamweaver than to word or any writing tools. i imagine they would *already* have the finished text, either PD texts or their own manuscripts, and just want to turn them into ebooks. so, yes, a wrapper around a good html editor with a wysiwyg mode. but i think there are some particularities specific to epub that would need to be addressed (creation of the different file types, creation of the epub container...).
Again, I agree completely. This is exactly how I see an epub editor: a tool that facilitates the manual creation of an epub book from pre-existing text. BD for epub... without the suck.
kovidgoyal 01-23-2009, 03:33 PM why are 1 and 2 incompatible, kovid ?
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).
So really what the tool will have to do is:
1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
mtravellerh 01-23-2009, 03:36 PM That sounds really good. I would like to have a way for direct source access and editing, though, maybe a search and replace with RegEx (I use regular expressions quite a lot). Otherwise a GUI thing would be great. One free open source wysiwyg editor that springs to mind is NVU, an open source editor I use a lot is Notepad ++. As those are open source, they could be easily integrated as edit tools.
Valloric 01-23-2009, 03:37 PM So really what the tool will have to do is:
1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
My understanding exactly. Only I'm thinking of making the "simple" internal markup not so simple. But yes, one has to parse the initial HTML and create a new one.
mtravellerh 01-23-2009, 03:41 PM If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).
So really what the tool will have to do is:
1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.
zelda_pinwheel 01-23-2009, 03:47 PM This is exactly how I see an epub editor: a tool that facilitates the manual creation of an epub book from pre-existing text. BD for epub... without the suck.
yes, please without the suck. :p
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).
So really what the tool will have to do is:
1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
okay, i see what you mean. you are right of course ; i was making the assumption that the input would be clean and valid code, which cannot necessarily be assumed.
out of curiosity, Valloric, have you seen the feedbooks wysiwyg editor ? i HIGHLY recommend you take a look at it and at how the book creation process is handled ; it is excellent, and it is easily accessible even to people who know nothing at all about html / css, however also gives access to the source code for more knowledgeable users. really, in my mind, the tool i would like would be very close to the feedbooks interface, with just a few modifications. notably i definitely want to be able to insert images, which feedbooks does not yet support.
If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.
why ? the exported code can easily be clean, that is one of the goals.
Valloric 01-23-2009, 03:50 PM If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.
An epub file either conforms to the standard, or it doesn't. It is not a matter of opinion.
On "clean" vs "dirty" code, this is very much a matter of opinion, so it depends on the person reading the code.
kovidgoyal 01-23-2009, 03:54 PM Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.
The editor will allow you to edit txt files in a simple lightweight markup language like rest or markdown (it will have GUI controls to make it easy, rather like the editor we use to make posts to mobileread). As you make changes the result will be automatically updated and displayed in a pane to the side of the editor pane.
So editing books will be about as hard as making posts on mobileread.
tompe 01-23-2009, 03:57 PM Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.
Of course it is. Emacs is the ultimate editor and LaTeX the ultimate markup language...
I think the approach you describe is a good approach.
Jellby 01-23-2009, 03:58 PM Font Embedding. The only thing that'll do that off-the-shelf now is Indesign, and well, that's the only thing it does "well".
Font embedding (or at least basic embedding) seems to be quite simple. I guess WISIWYG support is a different thing.
My current working idea is this: the editor exports two types of epub files. Both are standards compliant. One follows the standard and nothing else. No size limits, no special margins etc. Pure epub, without any consideration for the reading application or the platform. Let's call this "Standards Epub".
Then there's the second output option. This epub type is also fully compliant, but has the size limitations, special margins and other necessities for an enjoyable read on something like the Sony Reader. Let's call this "Mobile Epub".
I would have different configurable parameters, like maximum file size, maximum nesting level, whether or not to support some optional features... and have the software issue a warning if these limits are exceeded. It shouldn't be so hard to have a "Warn if text files are larger that ____KB" setting.
mtravellerh 01-23-2009, 04:00 PM out of curiosity, Valloric, have you seen the feedbooks wysiwyg editor ? i HIGHLY recommend you take a look at it and at how the book creation process is handled ; it is excellent, and it is easily accessible even to people who know nothing at all about html / css, however also gives access to the source code for more knowledgeable users. really, in my mind, the tool i would like would be very close to the feedbooks interface, with just a few modifications. notably i definitely want to be able to insert images, which feedbooks does not yet support.
Hmm, yes. Actually, like I said elsewhere, if one were able to get Hadrien's codes, one could easily run an internal webserver with those on it and have an "offline" app (with a few addons for image manipulations, for example)
Valloric 01-23-2009, 04:01 PM Actually, I've been thinking about this problem on and off and to me it seems like the whole concept of WYSWYG editors is flawed. Instead I've been thinking about a side-by-side editor.
The editor will allow you to edit txt files in a simple lightweight markup language like rest or markdown (it will have GUI controls to make it easy, rather like the editor we use to make posts to mobileread). As you make changes the result will be automatically updated and displayed in a pane to the side of the editor pane.
So editing books will be about as hard as making posts on mobileread.
That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)
But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.
mtravellerh 01-23-2009, 04:06 PM That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)
But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.
What I hate in NVU for example, is the way it just plays around with your code. For example the mobi pagebreak code will be changed without any notice to you. Therefore Kovid's idea with the permanent code view seems like a great idea to me. For advanced users, you could easily do a search and paste over the whole file or files and work a lot faster while still seeing the result of your work in real time.
Valloric 01-23-2009, 04:08 PM I would have different configurable parameters, like maximum file size, maximum nesting level, whether or not to support some optional features... and have the software issue a warning if these limits are exceeded. It shouldn't be so hard to have a "Warn if text files are larger that ____KB" setting.
This is the "presets over options" debate.
Naturally, one would want to provide advanced options for advanced users, but the more options you provide, the higher the chances of someone screwing up their options that end up making epubs that work for them and not for anyone else.
I'm thinking of the mobileread community here, and the upload forum.
But the advanced options should be there for those who want them... just tucked away a bit.
Valloric 01-23-2009, 04:13 PM What I hate in NVU for example, is the way it just plays around with your code. For example the mobi pagebreak code will be changed without any notice to you. Therefore Kovid's idea with the permanent code view seems like a great idea to me. For advanced users, you could easily do a search and paste over the whole file or files and work a lot faster while still seeing the result of your work in real time.
Hell if you want to able to see exactly what each and every keystroke or action you perform does, that's easy. WYSIWYG editor with a side pane code view that updates the code with each action in the main view pane. That's already in my design document, only the panes are horizontal.
But you should be able to turn off the code view and work in design view exclusively. I don't need to see all of the text duplicated on the screen all of the time.
zelda_pinwheel 01-23-2009, 04:25 PM it occurs to me that not everybody on this forum is a native english speaker and maybe some of them aren't familiar with some tech jargon, so for the sake of clarity (just in case anybody doesn't know) :
WYSIWYG = What You See Is What You Get
it means an interface which allows you to click on a button marked "B" (like in the MR reply box) and shows you the result this way :
this text is bold.
and not this way :
this text is <strong>bold</strong>
mtravellerh 01-23-2009, 04:29 PM Hell if you want to able to see exactly what each and every keystroke or action you perform does, that's easy. WYSIWYG editor with a side pane code view that updates the code with each action in the main view pane. That's already in my design document, only the panes are horizontal.
But you should be able to turn off the code view and work in design view exclusively. I don't need to see all of the text duplicated on the screen all of the time.
True. And the other way around (only source view) What about the RegEx (oh: taking Zelda as a model: RegEx=Regular Expressions) integration? That one is very important for me (to get rid of the pagenums in PG source, for example).
kovidgoyal 01-23-2009, 04:33 PM That sounds like a WYSIWYG editor with a side pane code view, and you can only edit the code. :)
But I get your idea. It sounds nice. It possibly has a few problems... for instance, you're display the same text twice on the screen. Seems more than a bit redundant. And if you make the display or markup view collapsible, you get a WYSIWYG editor again.
Yeah but coding a wyswyg editor is a lot more work, for relatively little benefit. And I would envisage the preview pane being collapsible, not the code pane.
And I would urge you to consider makeing the panes vertical since most ebook readers are vertically oriented and in any case most people prefer to read text in narrow columns
I've yet to work with a tool that worked well when the user was allowed to modify the source as well as work in a WYSIWYG environment. Either the user is forced to work with both modes or has to choose one view. In other words most of these editors tend to favor one style of edition or implement both poorly.
When I created the BookCreator tool I wanted to be able to use a really good editor and not have to worry about HTML, format shift issues. I just wanted to edit the book how I wanted it to look. Kovid, calibre does a fantastic job format shifting HTML to LRF/ePUB/LIT/ and more to come.
Right now the direct of this thread is geared towards format shifters not editors or writers. But the demand for a good editing en ePublishing tool is there I’ve had serveal athuors. conatct me thanking me for the BookCreator tool and how it facilitates their eBook creation process withouth having to get down and dirty in the code.
If the developers on this group would really like to build an end all ePublishing tool I'd like to see us create a Plugin around Open Office, an excellent Word Processing tool.
Going this route we would build a house on a solid Word Procesing foundation and can easily extend the UI to include other ePublishing features. Such as a GUI TOC builder, format(HTML/LIT/ePUB…) imports, etc..
Just my 2cents
=X=
mtravellerh 01-23-2009, 04:36 PM vertically oriented
:rofl:(sorry for my perverted thinking):o
When I do proofreading, I prefer to have the window panes side by side
Valloric 01-23-2009, 04:36 PM True. And the other way around (only source view) What about the RegEx (oh: taking Zelda as a model: RegEx=Regular Expressions) integration? That one is very important for me (to get rid of the pagenums in PG source, for example).
Only source view --> editing in Notepad :D
RegExes are important and should be included. I'm not sure if they'll make it to the first release (which is probably a few months off), but they're something that is definitely on the list.
mtravellerh 01-23-2009, 04:42 PM Only source view --> editing in Notepad :D
RegExes are important and should be included. I'm not sure if they'll make it to the first release (which is probably a few months off), but they're something that is definitely on the list.
You're so clever:rolleyes:! But hey, you ARE right on that one!
I guess I just was trying to get rid of application jumping. If you do 4 to 5 books a day in 4 different formats, you tend to get lazy.
Valloric 01-23-2009, 04:48 PM If the developers on this group would really like to build an end all ePublishing tool I'd like to see us create a Plugin around Open Office, an excellent Word Processing tool.
I hate depending on other people's code. Also, to me this is supposed to be fun and challenging. I also don't want to depend on something like Word (which people have to buy first) or OpenOffice (which is a resource hog). Both are made for entirely different things.
I've stated my goal: something like BD for epubs, without the suck and with better features. Further down the line, the goal is to extend it to export most of the other ebook formats.
And who said anything about making a be-all-end-all editor for the publishing industry? When did that come into play? The target audience should be MobileRead members and other ebook enthusiasts. Naturally, the more people who like it and use it, the better.
If anyone wants to create a plugin for some pre-existing editor, great. The more editors we have, the higher the chances one of them won't suck.:rolleyes:
Valloric 01-23-2009, 04:54 PM And I would urge you to consider makeing the panes vertical since most ebook readers are vertically oriented and in any case most people prefer to read text in narrow columns
I prefer horizontal, but I'll make it switchable to vertical. I'm writing this down.
Zelda, this is a very useful thread.
zelda_pinwheel 01-23-2009, 04:55 PM Zelda, this is a very useful thread.
i am thrilled to see the response it's getting, and i really can't wait to see what tools will come out of it. :) :2thumbsup
mtravellerh 01-23-2009, 05:41 PM Right now the direct of this thread is geared towards format shifters not editors or writers. But the demand for a good editing en ePublishing tool is there I’ve had serveal athuors. conatct me thanking me for the BookCreator tool and how it facilitates their eBook creation process withouth having to get down and dirty in the code.
=X=
Actually "format shifters" have to do a lot of editing if they want to create good books
I hate depending on other people's code. Also, to me this is supposed to be fun and challenging. I also don't want to depend on something like Word (which people have to buy first) or OpenOffice (which is a resource hog). Both are made for entirely different things.
something like BD for epubs,
Understand. A couple points.
I never mentioned using somebody's else code, so I don't understand where you are make such a statement.
If you are referring to using somebody's else product, there isn't a tool here that isn't using another persons product.
Writing everything from scratch might sound fun but it is daunting and long endeavor, also remember you idea of fun might not necessarily mean the same for others.
Actually "format shifters" have to do a lot of editing if they want to create good books
So you do want a feature rich WYSIWYG editor? Or would you be happy editing with a tool lie emacs/VI. ( Note Noting wrong with the latter I use VI all the time either for righting software or before importing html code generated from pdftohtml into Word.) I'm just not clear on the point your making is all.
=X=
Valloric 01-23-2009, 08:31 PM Understand. A couple points.
I never mentioned using somebody's else code, so I don't understand where you are make such a statement.
If you are referring to using somebody's else product, there isn't a tool here that isn't using another persons product.
Um... if you're building a plugin for an existing application, your code directly depends, by definition, on code someone else wrote. The code of the application your plugin is plugging. So Book Creator directly depends on Microsoft code: if they make certain changes that break your application, you can try to work around that, but that's it.
And since Book Creator (I believe) uses Calibre to actually create LRF, EPUB and other formats, it also depends on code Kovid writes (even though it's not a Calibre plugin). So if Kovid makes an unfortunate mistake in his, say, LRF output code, your application's LRF output breaks.
Book Creator is an amazing application for people who want a Word plugin. I don't. I'm sure a lot of people do.
Writing everything from scratch might sound fun but it is daunting and long endeavor, also remember you idea of fun might not necessarily mean the same for others.
I've never said I'd be writing everything from scratch. Do I look insane? I hope not. I plan on using as much GPL code as I can get. But stable GPL code, like wxWidgets, Webkit etc. Everything I don't really need I don't plan on using. I can't use Calibre, because it would defeat the whole purpose of the editor, which is eliminating the converter from the equation and having output on your display that is as near to final as you can get. Also, it's a dependency I don't need. Epub is an open standard, and I can write code that outputs it myself.
And although my primary objective is to create a useful epub editor, my second objective is to create a library of clean, fast, portable and very well documented C++ code for outputting ebook formats, starting with epub. Others can then use that code to create other editors, converters or whatever.
mtravellerh 01-23-2009, 08:37 PM So you do want a feature rich WYSIWYG editor? Or would you be happy editing with a tool like emacs/VI. ( Note Noting wrong with the latter I use VI all the time either for righting software or before importing html code generated from pdftohtml into Word.) I'm just not clear on the point your making is all.
=X=
Well, I wasn't talking about me specifically at all. And my point was that sometimes converting a text to a readable ebook is involving a lot of editing.
GeoffC 01-25-2009, 05:55 AM :rofl:(sorry for my perverted thinking):o
When I do proofreading, I prefer to have the window panes side by side
I prefer horizontal, but I'll make it switchable to vertical. I'm writing this down.
Zelda, this is a very useful thread.
I prefer window panes vertical for proofreading, especially when each pane is looking directly at the same line.
But there ought to be no problem in giving either option?
HarryT 01-25-2009, 06:23 AM I've yet to work with a tool that worked well when the user was allowed to modify the source as well as work in a WYSIWYG environment. Either the user is forced to work with both modes or has to choose one view. In other words most of these editors tend to favor one style of edition or implement both poorly.
I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.
Xenophon 01-25-2009, 03:12 PM I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.
Take a look at the LyX editor for TeX files. It provides exactly that sort of interface -- with the caveat that the on-screen view is representative of what you'll get rather than exactly what you'll get. You still need to look at the final output.
Anyway, LyX provides both direct insertion of raw TeX (it shows up on-screen as evil-red-boxed-text) and direct editing of the underlying TeX source when necessary.
Xenophon
I like BD's approach - a WYSIWYG editor, with a separate "tool" for viewing and editing the underlying HTML of a region of selected text for "problem" situations. That, for me, is an excellent way of working. I do not like working directly in HTML - not because I don't understand it (I write websites as a part of my job), but because the tags get in the way of seeing the layout of the text.
Yes the HTML editor is something BD did well and it did a great job displaying the HTML in the WYSIGYG editor. However my experience with the editor was terrible. I found doing anything but the most trivial was extremely difficult.
It was actually Patricia that gave me the idea to write a BookCreator, she mentioned all her editing work was in Word then imported it to BookDesigner. I thought what a great idea, so when I created BookCreator all I wanted was a template with some marcos to make the editing easy. Then import it to BD.
Eventually it's grew to building it's own formats.
=X=
Komenor 02-09-2009, 12:59 PM DESIRED FEATURES
1. epub files must be valid (html tidy, epub check...) and conform to best practices.
2. the editor should be able to accept multiple html / xhtml flows and create one document with a hierarchical TOC. It should also be able to accept one html flow and semantically parse the hierarchy of the document (part, chapter, section...) according to the tags used (h1, h2, h3, etc.), creating logical divisions for a properly structured epub document and a hierarchical TOC, the way that feedbooks does.
3. It should be able to handle images and relatively advanced css markup (dropcaps, for instance).
4. Ideally, it should be accessible even to users with no knowledge of html / css code with a full wysiwyg UI, although advanced options should be available if you do know the code (direct code editing should be possible), again similar to the feedbooks interface.
I'll probably have more to add later but i've got to go for the moment, so i'll stop here and open the discussion. What do you want from an editor ? i'll be looking forward to seeing what comes out of this !It must be very interesting (and relatively hard ;)) to write a small XHTLM editor with basic functions. I think about something like the interface used for writing the messages in this forum or like that : http://www.kevinroth.com/rte/demo.htm. To be sure to be complient with XHTML format, the source code should not be shown/editable to/by the user. Zelda, do you think that a thing like that, integrated in an ePub generator tool, can be useful ? :chinscratch:
zelda_pinwheel 02-09-2009, 01:50 PM It must be very interesting (and relatively hard ;)) to write a small XHTLM editor with basic functions. I think about something like the interface used for writing the messages in this forum or like that : http://www.kevinroth.com/rte/demo.htm. To be sure to be complient with XHTML format, the source code should not be shown/editable to/by the user. Zelda, do you think that a thing like that, integrated in an ePub generator tool, can be useful ? :chinscratch:
yes, possibly, although i think the source code should be accessible to users who know what they are doing. i prefer to check my source and write most of my code myself ; that should be possible.
Komenor 02-09-2009, 03:17 PM yes, possibly, although i think the source code should be accessible to users who know what they are doing. i prefer to check my source and write most of my code myself ; that should be possible.Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...:chinscratch:
Valloric 02-09-2009, 07:22 PM Ok, but if the user can edit the source code, it's harder to certify the quality of the ePub output. We can use Tidy, of course, but it is an additional pre-requirement to the application...:chinscratch:
You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.
Let me elaborate...
Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.
OK, so the application accepts an HTML file. Is the file valid HTML? You can't make that a precondition. I'm sorry, you just can't. Most HTML out there is nowhere near being valid, and the user could need to import HTML he didn't write himself.
So the app needs to accept invalid HTML, that is, HTML that display OK on a modern browser but that does not follow the required standards. And with that, you just blew any possibility of having a guarantee that the epub you export will always be standards compliant.
Why?
Well, you can't design a useful algorithm that accepts invalid HTML and outputs valid HTML. A useful algorithm would have these requirements, for any input:
1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.
The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.
Now, you could design an algorithm that fulfills both requirements for some input, but not for all. And no, not even Tidy can give you that, because it is theoretically impossible.
So you're stuck now. You can't guarantee your users that you will always output a valid epub file no matter what they import. You can do your best (and you should), but in the end... The second requirement is much more important than the first one. So you fix what you can and possibly tell the user about what you can't.
If they really care about producing a valid epub file, they will have to fix the errors your app can't fix themselves. And so you make it easy for them and give them access to the source. And if they introduce any errors whilst editing the source, it's their fault. They will probably have to fix it by editing the source, too.
Now if you wanted an editor that could only create epub files from scratch, then you could guarantee standard compliance if you disallow direct source code editing. But you don't want to make that kind of editor.
Your output can only be as good as the input (maybe slightly better, for trivial errors in the original file). The editor can't turn shit into gold, and can not give guarantees about compliance. Any that does is flat-out lying.
llasram 02-10-2009, 12:01 AM 1. Always output valid HTML.
2. The resultant HTML would always correctly represent the content of the original HTML and the intent of its author.
The first one is easy. If you remove the second one, for any input, just output whatever you like. But with the second requirement, you get a specification that cannot be fulfilled by any implementation, because it's incomputable.
Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.
Jellby 02-10-2009, 05:25 AM The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.
Valloric 02-10-2009, 09:42 AM Well, would depend on what you meant by "represent the content of the original HTML." It would be fairly easy to strip all semantic tag information from source HTML and translate into it into nothing but <div/>, <span/>, <a/>, and <img/> tags with appropriate CSS. That would make it trivial to output valid XHTML which retained exactly the same formatting characteristics as specified by the author.
Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.
There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.
But that doesn't mean the application can't fix some errors and output valid XHTML. I'm just saying you can't guarantee compliance and not have to mangle the input in some situations. And even then it wouldn't work for some cases.
The program could accept invalid (X)HTML, and issue a warning if the final (X)HTML does not validate.
My working idea too. Fix what you can, inform about what you can't, but don't mangle the input in any way or form. It is more important to guarantee to the user that you won't make some tiny change half-way through the novel he's importing than it is to guarantee standards compliance.
You can't piss off your users by trying to twist and turn their HTML into something it can't automatically become.
llasram 02-10-2009, 11:15 AM Again, this would work for some input, but not for all. I also put "the intent of the author" in that prerequisite too. The author of the original file could write relatively complex HTML that does not validate and that you could not convert into standards compliant XHTML which faithfully represents the input file.
There's really no point discussing it, this is computer science 101: conversion of input from one language with non-deterministic rules (that is, non-validating HTML) to another with deterministic rules (standards compliant XHTML) whilst keeping all of the source information. An algorithm to perform this conversion for all input cannot be designed. It is theoretically impossible.
I really don't understand what you're getting at I'm afraid. I could write "fubby ducky loopy sunbird" and mean "Good morning, how are you?" and there would be no chance of conversion because the intent is all in my mind. With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering. Converting arbitrarily bad HTML into XHTML which displays the same is simply a matter applying the same rules the browser does in order to produce the box model instance it renders.
XHTML validity is a property of two components: XML validity and adherence to the XHTML schema, yah? Conversion of HTML w/o closing tags to valid XML with complete elements can be tricky, but the browser necessarily does essential the same thing in deciding what content ends up within what boxes. The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce. Producing schema-validating XHTML is where my proposal to strip all semantic tags comes in. CSS-based rendering doesn't care if you have a <div/> within a <p/> or a <sup/> within an <a/>. One just needs to extract the CSS applied to each element, then convert the element tags into ones which validate against the schema.
Valloric 02-10-2009, 12:11 PM With arbitrarily bad HTML the only possible interpretation of the author's intent is how some renderer renders that content. All contemporary HTML renderers use the same CSS box model for all rendering.
The Python lxml.html library calibre uses does an excellent job, matching for all practical purposes what most Web browsers produce.
There is no argument here.
I agree that you could very well design an algorithm that converts non-valid HTML into valid XHTML for most HTML people will write. It's what your "lxml.html" library does (although I've never used it) and it's what Tidy does as well.
But you can't do it for all possible arbitrarily bad HTML. You're assuming the user checked how his source displayed in a browser. If he did, then it's not a matter of parsing arbitrarily bad HTML. It's not a non-deterministic rule system anymore: the source follows the deterministic rendering rules of the browser he used to check his work. Converting from a deterministic language to another deterministic language is certainly possible. And while you could say that the vast majority of HTML authors would do just that (check the display in a browser) before importing, you can't categorically state it.
So let's sum this up... you can create an algorithm that can convert most practical non-conforming HTML into valid XHTML, but not all HTML one could write. If one were to say he could, one would be shoving a grave ignorance of computer science theory.
Komenor 02-11-2009, 10:25 AM You cannot guarantee that your application's output will be valid epub. Not in any realistic (and useful) editor.
Let me elaborate...
Any ebook editor needs to able to import (X)HTML. That's a given. If it's a good editor, then it will handle a lot more than just HTML, but let's stick to just that for now.I never said that my hypothetic editor will be able to import (X)HTML !
If it is only for modifying the fonts, the justification and other text formatting, the editor must only accept to import pure text file and only that.
Then give tools for text formatting (plus eventually "tables" and "pictures" support).
It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).
Valloric 02-11-2009, 01:46 PM It is a choice : a "poor" editor with certified XHTML/ePub output or a good editor with no certification (or warnings on bad inputs).
A "good" editor would embed some sort of validation of the final epub file. So if you don't get a warning when exporting, you're in the clear. And most of the time, the editor will be able to convert the user's non-conforming HTML into conforming XHTML.
Here's several use cases:
1. The user imports valid HTML. It is easily converted into XHTML. He then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.
2. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and succeeds. The user then makes certain edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.
3. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds no errors. All is well in ebook land.
4. The user imports invalid HTML. An algorithm tries to correct the input and create valid XHTML, and does not succeed: errors are thrown, the user is informed. The user opens the source view and tries to fix the problems. The user then makes certain other edits, and tries to export the book as an epub file. The epub file is created, the validator runs through it and finds errors. The user is informed, but the file remains--maybe the user doesn't care (if it's a file for personal use... who knows). If he does care, he makes more changes, and tries to export the file. The change/export process repeats until no errors are thrown.
So you see, the user can get an epub file that is certifiably valid.
mtravellerh 02-12-2009, 05:59 PM Now THAT makes sense. Can't wait for that piece of software, honestly!
Timoleon 02-13-2009, 01:25 PM Valloric's comments #53 should be used as a touchstone for any decent ePub editor. Great analysis and synopsis! :2thumbsup
GeoffC 02-13-2009, 01:39 PM Seemingly complex, though?
Valloric 02-13-2009, 05:36 PM Seemingly complex, though?
It's as complex as it needs to be. If you remove something, you negatively impact the quality and usefulness of the editor.
From the programmer's perspective though, it is fairly complex. But the user doesn't care about that, does he? Of course he doesn't, nor should he.
GeoffC 02-14-2009, 05:48 AM It was, of course, the complexity of the programmers task that I was referring to.
richardigp 02-23-2009, 07:01 AM We just threw an Open Office to ePub Convertor into fray. It goes by the name of eScape. It does most of the advanced styling and formatting that is on the wish list above. Auto generation of OPF, NCX, etc. and free form modification of Stylesheets to create a book the way you want it to look. You can read about it and try it here (http://www.infogridpacific.com/igp/AZARDI/eScape%20-ODT2ePub/). It's completely free for non-commercial use, but not Open Source.
It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on. There is a growing online tutorial here (http://www.publisherdams.com/reader/content/c-0002184/?a=lc), so you can see if you can live with this different approach.
There are about 30 styles including drop & raised caps, small-caps, and lots of other blocks like epigraph, extract, notebox, code, boxed text, poem, notes, references, etc. All major book sections are predefined. If you want to comment, suggest please do so at our Publishing With XML (http://infogridpacific.typepad.com/publishing_with_xml/2009/02/escape-open-office-to-epub-convertor.html) blog.
Valloric 02-23-2009, 03:31 PM It's a different approach. Rather than try and interprete endless inline and para styles, we define custom Structure-Styles and you have to put those on.
Questions:
1. How do you convert existing epub books to your format? Is it even possible to load existing epub books and edit them?
2. How do you guarantee display fidelity? Last time I checked, OpenOffice.org did not have an advanced XHTML renderer.
3. SVG? OO.org doesn't support it. Do you?
4. How do you handle the "longdesc" attribute? Do you support it?
5. Object tags?
6. DTBook?
7. XML islands?
8. Font embedding?
These are just from the top of my head. Haven't yet had the time to try out eScape, but I'm going to.
richardigp 02-24-2009, 01:36 AM Questions:
1. How do you convert existing epub books to your format? Is it even possible to load existing epub books and edit them?
Over on the XML blog (http://infogridpacific.typepad.com/publishing_with_xml/2009/02/escape-open-office-to-epub-convertor.html) we explain that we created eScape as an easy way to make more than respectable ePubs from Open Office, more or less because we could do it. This was in response from a lament by Dave on the Teleread blog about there being no way to work with ODT and there should be a plug-in. So its ODT2EPUB only at present. eScape makes the packaging so easy, just focus on the editing (following a few basic rules), export XHTML and click.
I have also been on a lot of the forums here and seen how hard people find it to even make a simple drop cap. It doesn't have to be so hard! The problem is the approach is wrong meshing content & styles, instead of having clarity with structure. The whole idea of eScape is to have an existing ODT and make editorial corrections there in a user friendly, powerful content editing environment. You can then generate an ePub in just seconds, corrections and all. But the output is pure, consistent XHTML, with class statements as structural identifiers. You then select the style sheet of your choice, and eScape processes it together into an ePub package.
We don't have an ePub to ODT importer, but probably could. The approach would be to strip all styles inside if they aren't the eScape Structure-Styles because that would be the only way to maintain the structural integrity. That I think would get howls of anguish. We are looking at putting a file edit/repackage mode into AZARDI, but that is only on the drawing board and we have a lot more to do there with package checking/reporting first.
2. How do you guarantee display fidelity? Last time I checked, OpenOffice.org did not have an advanced XHTML renderer.
Not quite sure what you mean by display fidelity - making it look like the OOo file? That is exactly what we are not doing with eScape. We are using OOo to apply consistent Structural styles so the XHTML that comes out is absolutely the same for all books, and addresses the core text book structures, block styles, paragraph styles and inline presentation issues that seem to plague all e-book production environments.
eScape is a different approach. It lets anyone achieve the XHTML nirvana of separating content, structure and style. The XHTML is so consistent you can have any number of optional stylesheets and apply them to any ePub package and they will present accordingly. In effect eScape is saying to the reader, I will handle the XHTML (trust me!?), just use these Structure styles to tell us what you want various content blocks to be.
This is the core of good e-book packaging - especially for reflow. So while we exist in a Babel of ePub readers you can have a stylesheet optimized for Stanza, one optimized for ADE/Sony, another for whatever, and we can all stop crying about how everything doesn't do everything.
3. SVG? OO.org doesn't support it. Do you?
We don't support it in eScape, in fact we don't support any images in eScape for the reason that OOo makes it hard to get the output size from XHTML. We would have to put into place more "rules" which would make the tool more difficult to use at the OO level. We left it out of this version to see if there is significant interest. For 99% of the books on MobileRead images (other than cover) - and trade & retail books in general don't have images. For those that do - there are other options.
The biggest problem most people seem to have is getting a drop cap, or other text presentation styles. or simple formatting for text-only books without a lot of anguish unless they are talented HTML/CSS experts. We tried to bring in clarification of structure vs. styling with advanced content block structures, lines and character styles applied directly to the content. We are waiting to see if it is of interest. eScape addresses the presentation issues of standard books by separating the XHTML structure and CSS at the point of origin. The CSS files can then be manipulated to your hearts content.
4. How do you handle the "longdesc" attribute? Do you support it?
No images - No longdesc necessary.
5. Object tags?
Not with eScape.
6. DTBook?
Not with eScape.
7. XML islands?
Not with eScape. Not easy in OOo, or any other visual editor probably. This would require some style based "include/insert" statement to process the remote XML in at an insertion point - Eg: to put native MML into the file as an Inline or Out of Line Island. This implies a level of expertise that I think goes beyond the target user of eScape - and just about any other system. It also goes beyond the ability of most reader apps to handle it. As with the MML example, the Reader has to be able to render it, or send it to a processor such as MML2SVG and then display it. Inserting some DocBook into XHTML using islands is probably trivial, but why bother?
8. Font embedding?
Not with eScape. We see font embedding as really important for e-books with class that compete with print for presentation quality. Interestingly (from what we have observed) ADE doesn't handle these correctly in that they display fonts that are not in the manifest but are declared in the CSS - I have a feeling (but can't say for sure) InDesign ePub packaging works like this. Our full commercial packager handles font embedding, and I suppose it wouldn't be hard to have an extra input directory - fonts; but the user would be responsible for the application of the fonts in the CSS to specific styles. We wouldn't do font embedding allowed permission checks however: that would be up to the integrity of the user.
eScape is a pure production environment for ODT to ePub (via an XHTML intermediary). It uses powerful separation of content, structure and styles to give new production options if someone wants to maintain their source content in an ODT.
All of the advanced issues you questioned, Islands, fonts, objects, etc. are non-trivial, andit is interesting to see these brought up. These issues are addressed in IGP:FLIP (http://www.infogridpacific.com/igp/Products/Publishing%20Products/IGP:FLIP/), but unfortunately we are not giving that away today, although there is a sandbox site (http://www.publisherdams.com/sandbox/) where anyone can play.
Valloric 02-24-2009, 10:39 AM Thank you for the very thorough response. While the lack of image, SVG and font support is unfortunate, you do cover a large volume of books one would want to create.
By "display fidelity" I meant "It looks on my screen the same way it will look on conformant Reading Systems". You seem to be going a different route though (not necessarily a bad thing).
The other thing that bothers me a bit is that a power user does not have direct access to the source code that will end up in the epub file. While most people don't need this level of control, some do.
Lastly, it's disappointing that you use a nice cross-platform editor like Writer from OpenOffice.org and then make your final converter Windows only. :(
richardigp 02-24-2009, 11:23 AM On the fidelity issue, we have created the OO stylesheet to look like a classic book where possible (except for the coloured lines), but in this version we were unsure about how to put leading lines above and below a block extract for example, and then adding a first para style for the non-indented paras so they look good in the OO file. That will be more XSL work, but make the using/learning curve steeper. We will look at that in the future.
Interesting point about the XHTML source code, and I may blog about that a bit further, the difference between the Structure-Styling and other HTML environments, is that it always looks the same for the same structure, so in some respect you don't really need to see it (if you believe this!). I am not sure that this is the place to get to technical on this matter, but I might put an extra "chapter" into the online tutorial. So assuming the XSL's and slicers and dicers are working nicely, the XHTML elements and class statements are totally predictable and I can just crack open the style sheet, or make a whole series of custom style sheets for a range of looks and feels.
In one development version we did have an XHTML exporter, but thought that got too complicated. It output the book as a single, fully processed XHTML file - the one that is used internally before being split apart to create the final package.
Having said that, I think we were trying to address the make a good looking eBook fast with a bit more styling for someone who would prefer never to see the source. (are there any of those on this forum?)
From the believe it or not department; the last point is our shame! :o We are a Linux development house working primarily in Python and mainly do Web services applications. Some silly little interface issue stopped the deb, but its coming.
Mcnaz 04-22-2009, 08:22 AM Hi all.
Sorry to jump into this thread at this late stage but I stumbled in from Google whilst doing research on an EPUB Maker type application I am working on.
Brief Background.
I am a Book Designer (BD)/PRS-505 fan but got very frustrated with the inability to customise my own searches/replaces though advanced RegEx.
I've started coding an APP in Delphi that presents the user with a Rich Edit (based in RichEd20.dll V3) control which gives some semblance of WYSIWYG (i.e. bolding and simple formatting).
The goal is to have a BD type tool that is able to import from various formats (lit, pdf and so on) convert to html (using clit, pdf2html and so on),
clean the HTML and present to the user.
ATM I am at the import and clean HTML phase (using a few LIT -> clit.exe) examples and the produced HTML is dreadful! I am resorting to stripping all the HTML and rebuilding paragraphs. HTML stripping is done via loading the html into an IE COM object and saving as text... not ideal so I am looking at LIBXML2 implementation instead
I can see one obstacle ahead and that is that internally the document is stored as RTF (for the WYSIWYG) and this will need to be converted into HTML/XML when exporting to EPUB. I am currently researching the ability to embed control characters into RTF (i.e. chapter 1, heading, subheading) for use during the export.
I've read the first few pages of this thread and although a few interesting features are mentioned my goal is primarily to develop a tool that is aimed more towards importing/converting/tarting up then EPUB output as opposed to a full blown publishing platform (maybe in version 4!).
Again, suggestions/help will be welcome (I will lurk here for a while)
I will keep the thread updated on my progress
Cheers.
zelda_pinwheel 04-23-2009, 08:37 AM hi Mcnaz and welcome to the forum ! we are always really happy to hear about new epub creation tools being developped since at the moment we are still waiting for the "perfect" tool, and since different users have different needs. :) please do keep us informed of your progress ; you'll find plenty of people around here interested in trying it out and giving suggestions, if you are looking for that.
and don't hesitate to take a look around the epub forum to see what else is available, and get some useful information.
|