View Single Post
Old 09-18-2015, 07:17 AM   #3
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
Quote:
Originally Posted by eschwartz View Post
But I imagine you could do quite a bit using calibre's APIs from a custom calibre-debug script. Of course it only supports EPUB2.
Haven't looked at that, I admit. Though a great admirer of the sheer power of Calibre, I'm not particularly fond of what it does to the epub code.
Quote:
Scripted creation of EPUBs seems a bit counterintuitive, since at it's core an EPUB GUI like Sigil or calibre's ebook-edit is an API for automatically resolving name changes, wrapped around a plaintext editor and a ZIP compressor.

And it doesn't get much more basic than a plaintext editor.
Well, I actually use UNIX sed quite a lot
I repeat my belief in GUI's being just a passing fad More seriously, I'm very fond of something more batch-like. My work-flow currently looks something like this:
  • OCR
  • grep for obvious OCR errors
  • insert single-character "home-made" tags for headings, images, footnotes etc.
  • filter text through scripts to join words split over lines, handle footnotes, images etc, and convert to XHTML.
  • Automatically split text into chapters & generate NCX, OPF and stuff.
  • Edit stylesheet.
  • Zip up & proofread.
If I feel the need for something closer to WYSIWYG editing, I have the relevant xhtml file(s) open in a browser, edit them in an editor, and refresh the browser as required.
Currently I've a shell script to do the job, but it's a bit clunky, and generally I find that any good ideas I come up with regarding SW can already be found on github.

Apart from being old-fashioned to an almost bloody-minded degree, I do find there are definite advantages to this methodology. First and foremost that licking an OCR text into shape with least effort requires access to an astounding variety of command-line tools, which are somewhat more difficult to access even with a sensibly designed GUI system. Secondly, that it makes the separation of content and presentation very natural. Thirdly, it'll always beat a GUI tool for flexibility, for example when editing several volumes at once. My chief gripe against GUI systems is that they make tasks easy to learn, but not easy to do.
SBT is offline   Reply With Quote