Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-05-2014, 02:35 PM   #1
skreutzer
PublishingToolsDeveloper
skreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exercise
 
skreutzer's Avatar
 
Posts: 87
Karma: 38234
Join Date: Jan 2014
Location: Germany
Device: Only devices that support open formats!
Sigil as front end for automated XML based processing workflows?

Hello,

I'm developing automated XML processing workflows for my own projects and for everyone as and with free software. I'm interested in improving Sigil to make it a valuable front end for such processing workflows.

For obvious reasons, direct formatting (for instance with inline CSS declarations) is a pretty bad idea, since it will lead to the loss of logical information and also will make the output files specific to a target format. With semantic markup, the information is of universal use and can be processed to various output formats.

I have heard that Sigil already relies on formatting by style templates (no direct formatting), so I assume Sigil output to be quite usable for automated processing workflows. However, I would also like the feature of exporting and importing style templates, so that a definition of CSS classes could be loaded and used by the Sigil user to prepare a text for an automated processing system. The definitions should be editable to comfort the user in terms of the WYSIWYG rendering, but also to prepare styles that are going to be exported. Selection of a style template should be a matter of one or two clicks (most used styles on the top, or applying one currently selected style multiple times).

I've already implemented one automated processing workflow for one of my projects (well, the project description is in German, but you may just look at the images - a translation into English will follow in future, also videos in English to show the workflow and the tools), and I'm now going to build such workflows for more common input, that is XHTML or EPUB to process any kind of text automatically into various output formats. As a first step, I started to develop a small tool html2epub (which is in a very early stage, CSS in the header gets lost etc.). I plan to work on a full processing backend to support output in XHTML, EPUB2, EPUB3, PDF via FO, PDF via LaTeX, plain text. Supported input formats could be custom XML, XHTML, EPUB. Each step of the processing workflow could be adjusted by XML manipulation, for instance to automatically add linked footnotes (back and forth references) for EPUB2 output. Also, one is still capable of manually adjusting the files involved in this process to get a perfect result, if the automated result isn't good enough or some parts of the processing aren't automatable yet. Little helper tools could make configuration more easy, or provide a GUI for the command line tools.

I assume that it would be beneficial for a lot of people to have a software which encourages semantic markup, so authors could be required to use this software if they want to feed into the automated processing system and get their files out in various formats with good quality. Even if authors refuse to use such software, they could hand their plain text over to some other person who would do the semantic markup for them, as part of a service. The question is, if Sigil could become this software, since it is already specialized for such kind of tasks. Unfortunately, Calibre is going into the opposite direction, making Calibre output less usable.

For an author, importing style definitions of, let's say, a publishing house, self-publishing-online-platform or the typesetting guy would ensure that the resulting files will fit the automated processing system, or could be converted to it easily since the definition of the used style templates is known. In any case, an author could also just specify his own style templates or use a default, so that other software will be able to interpret it after configuration (just match style templates to the formatting options a processing system supports).

Since I'm a C++ developer (no Qt or Boost yet though) and Sigil is licensed under GNU GPL3 (which I really appreciate), I may start to play around with it a little and write code for it. I tried to build it on the 100% entirely free operating system gNewSense 3.0, but unfortunately it refuses to link due to
Code:
Qt5.2.0/5.2.0/gcc/lib/libQt5WebKit.so.5.2.0: undefined reference to `gst_x_overlay_set_window_handle'
, and I don't know how to fix this, since I already installed the gstreamer dev-packages, don't know how to link with cmake against a more recent version from the gstreamer website or the problem might be that the needed package is not entirely free, so Qt5.2.0 and Sigil wouldn't be usable for the free software world.

Please note that I'm not interested in working on the support of non-free, proprietary operating systems or software, nor on secret formats or anything like it. I'm trying to free things up, not the opposite.

So I would like to get ideas from people who run and/or implement such processing workflows or would like to have one available, there might be the opportunity to collaborate on a solution as and with free software. You could also use my results I've produced so far, but as always, time is limited, so progress is made in small steps ;-)


Sincerely,
Stephan Kreutzer



Please note, the original post was much longer, talking about the advantages of semantic markup and the disadvantages of direct formatting (inline CSS), but as I've learned that Sigil is already using a style template approach, I cut all that out.

Last edited by skreutzer; 02-20-2014 at 04:28 PM.
skreutzer is offline   Reply With Quote
Old 01-06-2014, 11:30 AM   #2
st_albert
Fanatic
st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.st_albert calls his or her ebook reader Vera.
 
Posts: 544
Karma: 64420
Join Date: Feb 2010
Device: none
Do you envision Sigil being used as the primary "word processor" (i.e. creating the document in Sigil, rather than, say, LibreOffice or such)?

And FWIW, I heartily agree with the exclusive use of styles. I often have to convert Word docs (written by others) to ebooks, and I spend a lot of time cleaning up the messes of direct formatting, and converting it to styles, before exporting to epub via "writer2epub" and doing some final tweaking in Sigil.

But if I were writing a book from scratch, I don't think I'd choose Sigil as the tool for that.

Albert
st_albert is offline   Reply With Quote
Old 01-06-2014, 11:34 AM   #3
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Hi, Stephan:

As I skimmed the post (sorry), can you just clarify: are you talking about a variant on the XML-->XSLT-->desired output approach, or are you discussing something new? When I read this:

Quote:
I'm now going to build such workflows for more common input, that is XHTML or EPUB to process any kind of text automatically into various output formats. As a first step, I started to develop a small tool html2epub (which is in a very early stage, CSS in the header gets lost etc.). I plan to work on a full processing backend to support output in XHTML, EPUB2, EPUB3, PDF via FO, PDF via LaTeX, plain text.
Am I to understand from this that your intent is to take existing ePUBs and XHTML, and...what? Work it back to XML, so as to then support the output to these other various formats?

FWIW, a company in Chennai has already developed, and will be showing demos on, a process that takes existing XML input, parses it, and then puts the existing "book" into a browser-like interface that "looks" like Word for authors and editors, so that they can edit on the fly, collaboratively, save the output, and then it's parsed back to XML, and provided (this is the part that cracks me up) "to the bookmakers" so that the bookmakers can make the book all over again from XML.

So...in short, do you mind restating what it is you're doing? I admit, from my point, XHTML/ePUB is the output, not the input, so I'm pondering, other than converting ePUB to Markup, what the goal is here?

Thanks!
Hitch
Hitch is offline   Reply With Quote
Old 01-06-2014, 03:42 PM   #4
skreutzer
PublishingToolsDeveloper
skreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exercise
 
skreutzer's Avatar
 
Posts: 87
Karma: 38234
Join Date: Jan 2014
Location: Germany
Device: Only devices that support open formats!
@st_albert: No, not as primary "word processor". The primary software for authors should be a writing program, with lots of features to help the writing process itself (note taking, timeline and stuff like this). There could also be some initial (semantic!) markup be involved. Such writing software shouldn't deal with typesetting issues and e-book formatting at all. The output would be the input for Sigil, and with Sigil, the author (or the formatter guy of choice) would do additional markup, if it was missing in the first place. The Sigil output would be passed to an automated XML processing system. All three steps could be done by the author himself by using the tools, but also by another person or a service (online or offline). Cleaning up the messes of direct formatting would be eliminated, the file would only get better, if a writer didn't do the markup himself in the first place while writing (some time somebody has to add such structural information anyway, right?).

@Hitch: Yes, XML via XSLT and/or specialized tools to various output formats. Since XHTML is XML and EPUB is a collection of XHTMLs, besides of XML also XHTML and EPUB could be considered as input formats for automated processing workflows. However, those formats could easily be converted amongst each other. XHTML can be used in a very generic way, so that it won't be much different to a non-document specific XML, but a web representation could be "suggested" for the XHTML file by defining the CSS classes.

As long as you have the data in XML or any XML based format (with proper semantic markup of course), it can be used to generate nearly all output formats, be it directly or after intermediate conversion. There's no technical reason why it should be impossible to not convert convert EPUBs "back" for online use as a website, to create PDFs from EPUB or to convert it into custom XML formats. Same in the other direction: from a custom XML format, other custom XML formats, XHTML, EPUB or the input XML format of a processing workflow can be created with relative ease (well, implement it once, use it any time).

For my own use, I've started with a very primitive XHTML to EPUB converter, since I want to reuse texts from my website for e-book distribution platforms. Another project of mine is digitalizing and proofreading an old book (printed in blackletter) into XHTML and generate EPUB and PDF from it. If working, my website XHTML texts could also be converted to PDF, even if I don't plan to do so now. With any of those projects, I want to automatically replicate any changes of the text into all output formats (which is possible with high quality, if the semantic markup is done right - except it is a very special work which is not worth automating). In general, one could come up with a lot of ideas how to interconvert XML based content into the various formats, one of them are "end formats", and others are XML based and can be reused for the processing workflow.

To make full use of such workflows, it is important to get valid, semantic XHTML (or any XML) in the first place. Sigil could be a tool to create such from plain text or horrible pseudo-HTML, so a lot of self-publishers could benefit from the automated processing of their texts. If, like the company in Chennai, the author is enabled to provide a semantic XML/XHTML himself, then e-book publication as well as the printed book via book-on-demand technology could be fully automated, without any human intervention needed (but possible - for instance, to change the book layout the author has specified for his books for a single print run or so).
skreutzer is offline   Reply With Quote
Old 01-06-2014, 04:15 PM   #5
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Stephan:

Well, if you can create what is always the missing piece--getting the source material INTO XML, that would be nice. Everyone here, I'm sure, is well aware of the XML-->XSLT discussions that go back quite a way; I remember when we all thought it was going to be the next big thing, but of course, that never really happened, except for medical records and the like.

The problem, as I see it, is that most content creators can't be weaned from their existing tools, and have zero interest in writing in markup. Markup is reasonably fashionable and popular amongst geeks, but not authors. And with regard to inline styling, everyone in the business has to clear that crap out every day, ranging from any type of Adobe output (whether INDD or simply Acrobat Pro) to Word to Wordperfect, and the like.

It's one of the reasons that I usually recommend XYWriter to people who want a "writing" program, because it simply outputs RTF, which at least doesn't bollix up the content with ad hoc styling other than bold, italic, underline. But even that isn't a fix.

My {sigh} moment about all of this is just that training writers to read the instructions and use the software correctly may be your hurdle. I know that I have maybe 1 out of several hundred authors that actually understand styles, or even headers versus paragraphs, in Word, et al, so...I presume that your writing program will somehow force them to do this. In that case, it has possibilities.

Hitch
Hitch is offline   Reply With Quote
Old 01-06-2014, 07:44 PM   #6
skreutzer
PublishingToolsDeveloper
skreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exerciseskreutzer juggles running chainsaws for a bit of light exercise
 
skreutzer's Avatar
 
Posts: 87
Karma: 38234
Join Date: Jan 2014
Location: Germany
Device: Only devices that support open formats!
Well, yes! Basically, it would be enough to disable direct font and color selection (and other stuff that is direct formatting), and instead provide predefined styles, which could be changed and applied to the text easily, also imported and exported. If this isn't going to happen with the current word processors, there is still the alternative that advanced writer software like Scrivener would provide such a feature, since it would improve their very own typesetting facility (currently manual work via direct formatting word processor methodology) and directly benefit the self-publisher. Even without it, Sigil could be used as specialized software for the encoding task by people who do e-book and print book preparation in volume. Sigil already provides XML based output that is usable for all kinds of target formats. Sigil won't, for instance, have to deal with PDF creation, EPUB would be sufficient to feed into a processing system that would generate the PDF from it. Yes, the crucial point is to get valid, semantic XML, and the author himself or the writing software of the author should take care of it. If neither of those two does, a tool would be very valuable to strip down an input file to the pure text (probably still showing the visual appearance of the original) and allow semantic encoding to the person who is producing the target formats. This tool could be Sigil or a separate tool, creating valid XHTML input for Sigil, so Sigil would be the backend to create an EPUB. Alternatively, the software of the writer would initially take care of semantic encoding. Hopefully, writer and output producer talk to each other in order to either save the time of useless direct formatting on the writers side, or save time on the side of the output producer person by getting semantically encoded XML from the writer. If writer and output producer are the same person, the software should educate and require him to do the right thing, to do semantic markup.

I don't know if Sigil could be used as software to create semantic XML from junk input, since semantic XHTMLs in EPUB as Sigil output are usable as XML input for a automated processing workflow, or if a separate tool would me more reasonable, which could provide Sigil with semantic XHTML input in order to create EPUB. Since Sigil as a semantic EPUB editor could also be considered as a semantic XML editor, my main question is if Sigil could be changed to make the process of converting junk input into semantic XML as easy as possible, probably by the approach mentioned above (to throw all direct formatting away and apply style templates). For users who do conversions in volume and self-publishers who are serious about their process, this would be of great value, I guess.

I would be personally interested why you think that the XML -> XSLT never took off, which seems to be true, but I wonder if it is for the simple reason that such workflows were always implemented as proprietary, restricted software, and not something you could set up and run on your own computer. Are there other reasons than this? Indeed, the topic looks complex, one needs to know several aspects of XML technology (Schema, XPath, XSLT, etc.), but it also can be build into tools and hidden away from the user. Today, in some programming languages, a XSLT processor is part of the libraries, so you could pack it all together to user friendly software, or use it as separate tools together with some automation scripts on a server in order to provide an online service or your own processing environment to generate output for people you work for/with. As example, I did XSLTs and a shell script to produce an EPUB from custom XML called hag2epub2, but now I did html2epub written in Java, which could have used the very same XSLTs if hag2epub2 weren't written for custom XML input. html2epub however, is a single tool without external dependencies except a Java VM, while hag2epub2 uses several external tools like a zip tool and XMLStarlet. html2epub is easier to use for more people, but hag2epub2 is easier to change and allows more flexibility.

Last edited by skreutzer; 01-06-2014 at 07:54 PM.
skreutzer is offline   Reply With Quote
Old 01-07-2014, 03:29 AM   #7
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by skreutzer View Post
Well, yes! Basically, it would be enough to disable direct font and color selection (and other stuff that is direct formatting), and instead provide predefined styles, which could be changed and applied to the text easily, also imported and exported. If this isn't going to happen with the current word processors, there is still the alternative that advanced writer software like Scrivener would provide such a feature, since it would improve their very own typesetting facility (currently manual work via direct formatting word processor methodology) and directly benefit the self-publisher. Even without it, Sigil could be used as specialized software for the encoding task by people who do e-book and print book preparation in volume. Sigil already provides XML based output that is usable for all kinds of target formats. Sigil won't, for instance, have to deal with PDF creation, EPUB would be sufficient to feed into a processing system that would generate the PDF from it. Yes, the crucial point is to get valid, semantic XML, and the author himself or the writing software of the author should take care of it. If neither of those two does, a tool would be very valuable to strip down an input file to the pure text (probably still showing the visual appearance of the original) and allow semantic encoding to the person who is producing the target formats. This tool could be Sigil or a separate tool, creating valid XHTML input for Sigil, so Sigil would be the backend to create an EPUB. Alternatively, the software of the writer would initially take care of semantic encoding. Hopefully, writer and output producer talk to each other in order to either save the time of useless direct formatting on the writers side, or save time on the side of the output producer person by getting semantically encoded XML from the writer. If writer and output producer are the same person, the software should educate and require him to do the right thing, to do semantic markup.

I don't know if Sigil could be used as software to create semantic XML from junk input, since semantic XHTMLs in EPUB as Sigil output are usable as XML input for a automated processing workflow, or if a separate tool would me more reasonable, which could provide Sigil with semantic XHTML input in order to create EPUB. Since Sigil as a semantic EPUB editor could also be considered as a semantic XML editor, my main question is if Sigil could be changed to make the process of converting junk input into semantic XML as easy as possible, probably by the approach mentioned above (to throw all direct formatting away and apply style templates). For users who do conversions in volume and self-publishers who are serious about their process, this would be of great value, I guess.
I'm now not really sure if we're saying the same thing, or different things. At this juncture, I don't see any tool, at all, that is assisting in providing clean, properly-formatted XML into Sigil or any other workflow. My comprehension of your posts is that this is what you were considering creating, as it's extremely unlikely that any of the current writing tools on the market, whether Word, Scrivener, etc., are going to go in that direction?

Quote:
I would be personally interested why you think that the XML -> XSLT never took off, which seems to be true, but I wonder if it is for the simple reason that such workflows were always implemented as proprietary, restricted software, and not something you could set up and run on your own computer.
I don't think that's hard; the truth is that you either take a word-processed document, or something that's been through, say, INDD, and you can a) clean it and b) then export it to HTML in order create an ePUB for instant commercial use, and then c) export it into MOBI for commercial use, or b) clean it to create semantic XML in the first place, which then has to be processed again to create an ePUB and/or MOBI. In the former case, you essentially run 1+ processes, in the latter it's 2+ or 3, as creating a mobi from a good ePUB is simplicity itself. I think it's as simple as, XML isn't natively suited for print or faux-print layout, as it's basically Markup. Writers and editors don't write in Markup.

When Amazon came into the marketplace, they bought Mobipocket creator, and the Kindle ran on HTML 3.2. This drove the bookmaking market. I can't say I've done a boatload of XML cleanup, but the XML I've tried to export from Word, to investigate this idea (XML to XSLT) hasn't looked like a party to clean. Moreover, the retailers change their standards and their devices every 5 minutes. No major reader runs on XML; so...I think it was, quite simply, creating a process that would be able to reuse a file, to create other outputs, in a market that is primarily driven by entertainment books, seemed like extra work and an extra step that's unnecessary. PLUS, even if you assume arguendo that it's a good idea, then you have the problem of (say, with Textbooks), trying to export the initial content into a usable form for the author/editor to do an UPDATED version...whereas, with HTML, you can reimport the content easily back into Word or another word-processor for an author and editor to work collaboratively to update the material for a next Edition or updated textbook. Trust me: they are NOT going to sit there over something that looks like an RSS feed or XML and edit it. I think that's a major hurdle, too.


Quote:
Are there other reasons than this? Indeed, the topic looks complex, one needs to know several aspects of XML technology (Schema, XPath, XSLT, etc.), but it also can be build into tools and hidden away from the user. Today, in some programming languages, a XSLT processor is part of the libraries, so you could pack it all together to user friendly software, or use it as separate tools together with some automation scripts on a server in order to provide an online service or your own processing environment to generate output for people you work for/with.
If you say so. I admit, I've not seen anything that looks remotely user-friendly to which I could point my clients. And as I said somewhere in this thread, a major converter of books in India just invested a ton of money to invent/develop a system by which XML could be displayed in a Word-like, browser interface in order to provide a collaborative environment for textbook revisers/editors to work in. I'd have thought that if the environment existed, they wouldn't have spent all that money to create it, specifically for one client. I know someone else on this very forum considering creating a markup editor at one point in time; I don't know what happened with that.

Quote:
As example, I did XSLTs and a shell script to produce an EPUB from custom XML called hag2epub2, but now I did html2epub written in Java, which could have used the very same XSLTs if hag2epub2 weren't written for custom XML input. html2epub however, is a single tool without external dependencies except a Java VM, while hag2epub2 uses several external tools like a zip tool and XMLStarlet. html2epub is easier to use for more people, but hag2epub2 is easier to change and allows more flexibility.
Yes, but again: all of those, every single one, all depend on the cleaned, ready-to-go XML being prepared and ready. I see that as the huge stumbling block, myself. For commercial users, it would have to be as simple, and as easy, as "simply" exporting and cleaning to HTML/XHTML, and it would have to be something that we could convince our users that they want, and are willing to pay for. THAT would also be a fairly big block; convincing them that they want a cleaned XML file that they themselves likely won't ever open or use, or even foresee a need for. But, I could be wrong.

It's just that I could swear that I remember, way back, Josh Tallent being big on the whole XML-->XSLT-->DocBook Idea, and I'm pretty sure if it made any type of economic sense, then, he would have pursued it heavily. That he didn't is telling, at least, as a commercial producer. And I don't see XML--> XSLT--> etc. being a big seller for private users or DIY authors, but again: I could be utterly wrong.

I'm surprised more of the gang haven't chimed in here.???

Hitch
Hitch is offline   Reply With Quote
Old 01-07-2014, 04:46 AM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,969
Karma: 3427611
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Well, I was actually still making up my mind around this. It sounds like a great tool, but I doubt if Sigil would be the best for it.

My biggest questionmark would actually be the first step, from a wordprocessor to 'clean' XML. Clean XML is important, because if it is clean, it is relatively easy to go to anything else again. I agree with Hitch that an ePUB would just be an export/conversion product of that XML, just as e.g. a PDF could be.
The first step however is where the pain is. There are a couple of painpoints there, most of them already mentioned by Hitch.
  1. Most writers are still stuck in the typewriter age, they are not making use of the actual tooling available (styles, etc)
  2. Writers don't want to change/learn, it should work as they want (tab or spaces instead of style with indent, enter instead of margins, etc)
  3. The large number of wordprocessing programs with all different formats
  4. The garbage these programs spit out as HTML/XML/etc

It will be an almost impossible task to be able to filter/convert the output of all these programs to XML/XHTML while maintaining all the markup and taking the bizar things writers do in their documents into account. I only do it for Word and that is already a nightmare sometimes. Writers still surprise me with their workmethod and output.
There are basically two methods. Either take the native format and convert that into clean XML or use the XML/XHTML output from a wordprocessor program and clean that up if you can. I don't know if you have tried it with Word output and XSLT, but good luck. In all the years it is available, there is not even one good XSLT out there that actually works. And that is just the major wordprocessor.

The ambition is good, but the number of writers that wants to be bothered with this is very slim, especially for novelists.

However, the second part of converting the clean XML to other outputs could be very useful. That being said, XML itself is meaningless without the structure. What structure should be used? XHTML? A kind of LaTeX perhaps?
Toxaris is offline   Reply With Quote
Old 01-07-2014, 01:46 PM   #9
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Just for a chuckle, on this topic (you can't make this stuff up), this week I received a manuscript from a prospect (upon whom I unfortunately wasted a lot of time, but that's a bitch for another day), who, at the end of each "paragraph," hit the space bar to get the cursor to wrap to the "next line" to start his new paragraph. For 59,000 words. (I think I already told this story here on MR, apologies to those who read it if I did.)

So...I'm with Tox. Getting the clean XML is the major hurdle. I just don't know how to get there from here. And I test-exported a *clean* Word file to XML last night...and, ayup, good luck with THAT.

Hitch
Hitch is offline   Reply With Quote
Old 01-07-2014, 03:41 PM   #10
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
Posts: 6,155
Karma: 9662058
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch (Wifi only)
Quote:
Originally Posted by Hitch View Post
Just for a chuckle, on this topic (you can't make this stuff up), this week I received a manuscript from a prospect (upon whom I unfortunately wasted a lot of time, but that's a bitch for another day), who, at the end of each "paragraph," hit the space bar to get the cursor to wrap to the "next line" to start his new paragraph. For 59,000 words. (I think I already told this story here on MR, apologies to those who read it if I did.)

So...I'm with Tox. Getting the clean XML is the major hurdle. I just don't know how to get there from here. And I test-exported a *clean* Word file to XML last night...and, ayup, good luck with THAT.

Hitch
Sounds like some people I knew at school; the other thing they did was press enter at the end of each line.
eschwartz is offline   Reply With Quote
Old 01-07-2014, 04:01 PM   #11
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by eschwartz View Post
Sounds like some people I knew at school; the other thing they did was press enter at the end of each line.
Oh, yeah. We get that one all the time. The "sit on the spacebar" approach, though...we get that far more rarely. Or the "linefeed" at the end of half the lines, with the pilcrow at the other half.

Somewhat OT:
My personal favorite? The "every paragraph is aligned differently" approach. I don't know what the hell is going on out there, educationally, but we've had a number of manuscripts in which dialogue paragraphs are unindented, and narrative are indented, or vice-versa. No, these aren't the James Joyce's of the future; they're illiterate (literally. I'm not being mean. The books are usually hardly readable). There appears to be someone out there "teaching" aspiring authors that this is the correct way to write.

I had a book come in recently (some of you on the V&R thread will remember me bemoaning it; it was kinda-porn, with a 14-y.o. black female protagonist, written by a man from the girl's FP POV), with this type of half-deranged paragraph formatting, except it wasn't organized this way; paragraphs just started willy-nilly, wherever. It also had a lovely number of typos and grammar errors ("he road me until...")...and when I asked the guy, before I realized what it was and declined it muy pronto, if the paragraph formatting was deliberate, he said (I swear: really, you cannot make this s**t up): "this is how I got it back from the person I paid to do my editing." (italic emphasis added). I was positively aghast.

When I realized what the book was, and declined it, I couldn't help myself; I told him that truly, it was the worst-formatted book, with the most typos, grammar, punctuation and spelling errors, we'd ever seen (true). And that he needed to find his so-called "editor" and get every penny back. I mean, EVERY type of horrid mistake, from not closing dialogue tags, to opening new dialogue tags inside unclosed dialogue tags...you name it. Homonym errors, the whole schmear. I can only assume, given the nature of the material, that he lied to me about having any type of editor, because, in US terms, a Fifth-Grader would have caught and corrected most of them. Horrid, horrid stuff all 'round; the content AND the formatting.

Back OT:

So: how does a front-end piece of software fix THAT and produce clean XML?

Hitch
Hitch is offline   Reply With Quote
Old 01-07-2014, 04:59 PM   #12
eschwartz
Irrational Optimist
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
Posts: 6,155
Karma: 9662058
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch (Wifi only)
Quote:
Originally Posted by Hitch View Post
Oh, yeah. We get that one all the time. The "sit on the spacebar" approach, though...we get that far more rarely. Or the "linefeed" at the end of half the lines, with the pilcrow at the other half.
Wait, doesn't that take extra work to do???
Quote:
Somewhat OT:
My personal favorite? The "every paragraph is aligned differently" approach. I don't know what the hell is going on out there, educationally, but we've had a number of manuscripts in which dialogue paragraphs are unindented, and narrative are indented, or vice-versa. No, these aren't the James Joyce's of the future; they're illiterate (literally. I'm not being mean. The books are usually hardly readable). There appears to be someone out there "teaching" aspiring authors that this is the correct way to write.

I had a book come in recently (some of you on the V&R thread will remember me bemoaning it; it was kinda-porn, with a 14-y.o. black female protagonist, written by a man from the girl's FP POV), with this type of half-deranged paragraph formatting, except it wasn't organized this way; paragraphs just started willy-nilly, wherever. It also had a lovely number of typos and grammar errors ("he road me until...")...and when I asked the guy, before I realized what it was and declined it muy pronto, if the paragraph formatting was deliberate, he said (I swear: really, you cannot make this s**t up): "this is how I got it back from the person I paid to do my editing." (italic emphasis added). I was positively aghast.

When I realized what the book was, and declined it, I couldn't help myself; I told him that truly, it was the worst-formatted book, with the most typos, grammar, punctuation and spelling errors, we'd ever seen (true). And that he needed to find his so-called "editor" and get every penny back. I mean, EVERY type of horrid mistake, from not closing dialogue tags, to opening new dialogue tags inside unclosed dialogue tags...you name it. Homonym errors, the whole schmear. I can only assume, given the nature of the material, that he lied to me about having any type of editor, because, in US terms, a Fifth-Grader would have caught and corrected most of them. Horrid, horrid stuff all 'round; the content AND the formatting.
Yikes!
eschwartz is offline   Reply With Quote
Old 01-07-2014, 05:14 PM   #13
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 2,414
Karma: 13022651
Join Date: Apr 2010
Location: Phoenix, AZ
Device: Kindle2, iPad, KindleFire and NookColor
Quote:
Originally Posted by eschwartz View Post
Wait, doesn't that take extra work to do???
Yeah, that's the killer, isn't it? And then, when I quoted him, even giving him a break on the clean-up, he wanted me to do it for one-third of our published rates. I'm like...What the frack?


Quote:
Yikes!
Yeah. We get a lot of that. One of the hardest parts of quoting is weeding out the tire-kickers. I get a ton of emails from people asking every conceivable type of pre-publishing question, (utterly unrelated to the deliverables, ranging from "where do I get my ISBN" to "how to market my book") who then never send in a ms, (once they have the answers), or want SAMPLES formatted for their inspection and approval (we're not talking complex books here, mind you), or...you name it. I could understand it if we were talking $1K, but...really? It's amazing. I find the lack of research truly shocking, and it's reinforced every day.

H
Hitch is offline   Reply With Quote
Old 01-07-2014, 07:17 PM   #14
MyDK
Junior Member
MyDK began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2014
Device: none
Hi
I read your discussion with great interest. Looking forward to someone (maybe skreutzer.) improving Sigils import options for text
MyDK is offline   Reply With Quote
Old 01-08-2014, 01:38 AM   #15
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,969
Karma: 3427611
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by MyDK View Post
Hi
I read your discussion with great interest. Looking forward to someone (maybe skreutzer.) improving Sigils import options for text
Will not happen. There is only so much you can clean up automatically. As always GIGO.

Hitch, dialogue quotes issues? Those they can be solved easily, there are tools for that... I also know it was bad, but not that bad... I understand your desire for automated analysis more and more...
Toxaris is offline   Reply With Quote
Reply

Tags
sigil, wysiwym, xml

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Marvin as a cloud front-end taguntumi Marvin 9 11-22-2013 08:21 PM
[Old Thread] Web Front end DezmondFinney Development 24 12-18-2012 08:53 AM
soPDF GUI Front-End Nathan Campos PDF 37 11-04-2011 07:45 PM
Web front end DezmondFinney Development 7 08-10-2011 09:51 AM
Hacking the front-end DezmondFinney Development 18 08-05-2011 03:22 AM


All times are GMT -4. The time now is 04:22 PM.


MobileRead.com is a privately owned, operated and funded community.