Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-28-2016, 06:30 AM   #1
capidamonte
Not who you think I am...
capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.
 
capidamonte's Avatar
 
Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
Interactive Conversion?

Would such a feature be possible?

Where all the elements in the source file are parsed individually and group-assigned via gui to a pre-determined set of selectors/classes/markup? IE: whatever selectors you choose from a css file you've designed.

There would be a preview of how it displayed originally: so that you can recognize what part of a book it is -- chapter heading, epigram, first letter of the chapter, etc. Maybe you could jump through to each of the set of identical styling occurrences until you were satisfied that you knew what it was. Then you assign them all to the structure and classes that you prefer from your pre-developed list. It wouldn't exclude you from additional occurences later to selectors already assigned. (IE: you've found some more chapter headings with a slightly different style or another picture description or epigram or whatever.)

When you finish, you would have the structure of the book determined as something like a pure html file that could be opened and edited easily or convert pretty darn cleanly.

I don't know why, but this idea has recurred to me several times since the ebook editor went into Calibre. I know I'm not knowledgable enough on how complex this might be, but it seems to me that the conversion process does most of this determination already when it assigns individual "calibre#" classes to things. Lots of these classes are redundant, and if the original source was lacksadaisical about properly distinguishing similar things... I'm thinking particularly about conversion from DOC files -- so many paragraphs that are all variations on the same thing, generally the body paragraphs that make up the work overall. Or headers that are stylized paragraphs instead of properly labeled header styles, etc.

Seems to me this would be a way to use a wizard to harness human judgement into the conversion, collapse the number of classifications and assign structure efficiently. With the ebook editor, all the display stuff is there. With the conversion ability, all the parsing of groups of similar elements is being done already. It's just the assigning of those into structure that is the hard part to program -- so why not ask our pattern-recognition-genius brains instead? It could still occasionally be a lot of classes to look through, but at least the computer is doing a lot of the hard part of grouping things at the largest possible levels.

I hope I've explained clearly; it's a simple idea in my head but hard to put into words tonight. I feel a bit like I understand the concepts but lack the proper terminology.

Also, please point me to where it is if something like this already exists! It seems so obvious to me that this would result in clean conversions that I can't help feeling that it had to occur to someone else earlier that something like this is possible?

PS: this might work best as a conversion to FB2 -- pure structure? Then develop a CSS file for FB2 that converts easily to ePub, etc. Not sure. Maybe it limits you from adding things that didn't occur to the FB2 developers?

Last edited by capidamonte; 06-28-2016 at 06:33 AM. Reason: addendum
capidamonte is offline   Reply With Quote
Old 06-29-2016, 12:44 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
The conversion pipeline actually does a lot more changes than necessary to format-shift, since it disambiguates styles and fixes some device-specific bugs.

Directly opening a DOCX or HTML in the Editor, for example, does a much more direct "conversion" to EPUB.
Mainly because Conversion is meant for end-users, to get the book onto the device as fast as possible, whereas the Editor is designed for people who care about the book internals and/or want to fine-tune the look and feel of the book.

Any such feature would never be implemented for Conversion, but could, arguably, be implemented as a tool or import mode for the Editor.

...

Rather than inventing an IMHO over-engineered styles wizard, I think most people who clean up ebooks use personal collections of regular expressions to find common patterns. Which can handle more types of fixes.

Is there anything that you feel could be done more easily than by using a regular expression (and possibly a python replacement macro)?

Last edited by eschwartz; 06-29-2016 at 12:56 PM.
eschwartz is offline   Reply With Quote
Advert
Old 06-30-2016, 07:30 AM   #3
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Quote:
Originally Posted by eschwartz View Post
Rather than inventing an IMHO over-engineered styles wizard, I think most people who clean up ebooks use personal collections of regular expressions to find common patterns. Which can handle more types of fixes.
+1 for this: it works very well.
___

The editor also has a very useful built-in tool to remove:

1) Redundant style classes - declared in the CSS files/s but not actually used in the book's xhtml files.

2) Undefined style classes - referenced in the book files but not declared in CSS.
Agama is offline   Reply With Quote
Old 06-30-2016, 11:32 PM   #4
capidamonte
Not who you think I am...
capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.capidamonte can even cheer up an android equipped with a defective Genuine Personality Prototype.
 
capidamonte's Avatar
 
Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
I agree that it could and possibly should be an import mode into the editor. But I don't see it as "fixes" exactly, although one can argue that any attempt to simplify and make more robust is a fix.

I use regex when I do ebook fixes too, although not everyone can. But I think my idea here was to avoid having to examine the entire book for minor variations in styling and have the wizard group as many as possible for me without my having to design regex to collect them myself. Then just push each collected group (after some sort of visual confirmation for each one being a proper member of the group) into a set of selectors I already prefer that cover elements I find common to most books (fiction or non-fiction book at least.) Those minor variations are more difficult to find when individually designing regex on a case-by-case basis. And as I implied, you cannot easily catch the individual variations of books that were poorly designed in the first place.

It could be tedious in a book with terrible styling (and thus hundreds of practically individual styles), but certainly no more tedious than determining regex for some of the books I've seen.

It would also allow you to get as fine-grained with your selectors as you like, and help to standardize the markup in your collection of books. You could perhaps optionally even transfer the original styling of the book into your preferred selectors, which would preserve much of the design if you found you liked that book's look -- then you're just editing a familiar set of CSS selectors to any style you later prefer instead of trying to mentally model the result of whatever terrible naming convention the Word/InDesign/Calibre intersection has produced. (I'm not criticizing Calibre, here, it's just the nature of the conversion beast.)

Of course, I'm also talking about working on the classes that actually apply to something in the book, but wouldn't want to rid the book of something referenced in the book that had no style applied -- that markup information might actually be useful. I'm thinking for instance of something like every paragraph's first-letter -- it could be marked up but not styled and I certainly wouldn't want to lose that information.

I think what I'm focused on here is the book structure which is the most difficult part of properly marking up a book and definitely the most difficult part of cleaning up bad design. I think it is generally the first and most important thing to get right, and the hardest part to programmatically deal with.
capidamonte is offline   Reply With Quote
Old 06-30-2016, 11:43 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Well, if you want a view of the styles in play, there is the reports feature.

The way I see it, most if not all of the functionality is already there. It is arguable whether a wizard would be more helpful in doing the things you can already do -- by which I mean there are arguments both ways.

So, I guess it depends on whether anyone cares sufficiently to implement it. I guess it couldn't hurt.
I suspect Kovid would say he isn't interested, "patches accepted".
eschwartz is offline   Reply With Quote
Advert
Old 06-30-2016, 11:50 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,251
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by eschwartz View Post
I suspect Kovid would say he isn't interested, "patches accepted".
Yeah this seems like a lot of work for relatively little gain over existing methods. However, if someone wants to try, I have no objections, it just wont be me -- I have enough other calibre things on my plate as it is
kovidgoyal is online now   Reply With Quote
Old 07-02-2016, 11:25 AM   #7
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
@capidamonte:

I see what you're getting at and agree there are times when this would be useful but it sounds a lot of work to implement.

If you start from scratch with a plain text source you can quickly apply Markdown, (calibre can process this), and thereby generate output with minimal style classes which can then be used with a standardised stylesheet of your own design.

With purchased books I've given up prettifying the source files as it just takes too long. If the styling is causing display problems then it's worth fixing but otherwise I'd sooner get on reading.

If you do write a plugin for this wizard then I'll be happy to beta test it.
Agama is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Interactive Fiction Namekuseijin Reading Recommendations 8 03-15-2014 01:32 AM
PRS-600 Interactive Interpreter luma Sony Reader Dev Corner 1 07-24-2010 12:19 PM
Interactive Fiction... how likely is this? el.astrologo Sony Reader 22 12-02-2009 01:27 PM
Interactive ebooks Alan Pearce Introduce Yourself 0 07-01-2009 11:32 AM
Interactive story spooky69 Lounge 6 06-26-2008 02:13 AM


All times are GMT -4. The time now is 10:24 AM.


MobileRead.com is a privately owned, operated and funded community.