Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 08-01-2015, 10:53 PM   #31
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by Turtle91 View Post
Is there an idiot's guide to the specificity rules?? I knew they were out there as well, but don't totally understand them...
Probably depends on just how "idiot" you mean, but this is what I read after seeing dgatwood's post: http://www.vanseodesign.com/css/css-...ance-cascaade/
mattmc is offline   Reply With Quote
Old 08-02-2015, 06:02 AM   #32
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,519
Karma: 987654
Join Date: Dec 2012
Device: Kindle
Quote:
Originally Posted by mattmc View Post
Would you recommend doing style inlining for all of my Kindle files? I didn't want to Juice the whole file, because it's going to bloat it with style="..." attributes on practically every tag, but if it's necessary for KF7, I could make it part of my Kindle workflow...
As you probably know, the KDP conversion rips out the style sheet and inserts inline styles for the KF7 / Mobi 7 version of the book. I have never seen a problem arising therefrom. However, my styling is fairly simple, and I don't use media calls. You can see the stylesheet at notjohnkdp.blogspot.com
Notjohn is offline   Reply With Quote
Advert
Old 08-02-2015, 07:33 PM   #33
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by mattmc View Post
Probably depends on just how "idiot" you mean, but this is what I read after seeing dgatwood's post: http://www.vanseodesign.com/css/css-...ance-cascaade/
Awesome! Thank You!!
Turtle91 is offline   Reply With Quote
Old 08-03-2015, 12:12 AM   #34
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by Turtle91 View Post
Awesome! Thank You!!
No problem

Quote:
Originally Posted by dgatwood View Post
For KF7, Kindlegen converts the CSS into HTML markup, but does it very badly, with a CSS parser that doesn't properly handle selectors containing multiple elements, multiple class names on single elements, lists of selectors applied to a single rule set, etc., resulting in all sorts of joy for those of us who routinely use nontrivial CSS.
Do you have this precisely documented in any way? Do I really need to do this:

Code:
// Before
p.blah, p.foo, p.bar { ... }

// After
p.blah { ... }
p.foo { ... }
p.bar { ... }
And this:
Code:
// Before
<p class="blah foo">Words</p>
p.blah.foo { ... }

// After
<p class="blah-foo">Words</p>
p.blah-foo { ... }
And this:
Code:
// Before
<p><span>Words</span></p>
p span { ... }

// After
<p><span class="p-span">Words</span></p>
span.p-span { ... }
What about this, is this necessary?
Code:
// Before
<p class="blah foo">Words</p>
p.blah { color:red }
p.foo {size:1.2em }

// After
<p class="blah-foo">Words</p>
p.blah-foo { color:red;size:1.2em }
Hungry for details, if they're available.

Quote:
Originally Posted by dgatwood View Post
Of course, if I were producing a general-purpose solution, I'd use a different approach, using a WebKit WebView to render the content, walk the DOM tree, and blow in tags based on the computed styles for each node. It would probably take only double-digit lines of code in total, and it would put Kindlegen to shame by being 100% correct in its interpretation of the CSS every freaking time.
Although you did point out that my iBooks problem with the popovers was probably a specificity issue, I'm still wondering if it would be worth using Juice to do this to my Kindle file. (It'd obviously affect KF8 as well as KF7 since you can't split source files.)

I could probably get this going using NodeJS in a few hours, make the script available, maybe even set up an NPM package. Any thoughts?
mattmc is offline   Reply With Quote
Old 08-04-2015, 10:40 PM   #35
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by mattmc View Post
Do you have this precisely documented in any way? Do I really need to do this:

Code:
// Before
p.blah, p.foo, p.bar { ... }

// After
p.blah { ... }
p.foo { ... }
p.bar { ... }
I'd expect comma-separated rules to be fine. The rules that haven't worked correctly for me are multi-selector rules like:

Code:
p.blah span.foo { ... }
because kindlegen treats that as though you had specified:

Code:
p.blah span.foo, span.foo { ... }
if memory serves, but I'm not 100% certain—it might have interpreted it as:

Code:
p.blah, span.foo { ... }
Either way, all the span tags with the foo class matched, including span tags that shouldn't have matched because they weren't in the specified enclosing element.


Quote:
Originally Posted by mattmc View Post

And this:
Code:
// Before
<p class="blah foo">Words</p>
p.blah.foo { ... }

// After
<p class="blah-foo">Words</p>
p.blah-foo { ... }
I have had problems with styles not getting merged correctly, though it is possible that those problems were false positives caused by the gross misinterpretation of multi-element selectors as previously mentioned.

Whether combined rules were actually necessary or not... would require lots of software archaeology to determine definitively.




Quote:
Originally Posted by mattmc View Post
And this:
Code:
// Before
<p><span>Words</span></p>
p span { ... }

// After
<p><span class="p-span">Words</span></p>
span.p-span { ... }
Yes, that's very similar to the rules I've had trouble with.


Quote:
Originally Posted by mattmc View Post
What about this, is this necessary?
Code:
// Before
<p class="blah foo">Words</p>
p.blah { color:red }
p.foo {size:1.2em }

// After
<p class="blah-foo">Words</p>
p.blah-foo { color:red;size:1.2em }
Unclear, for the same reason that the second one was unclear.


Quote:
Originally Posted by mattmc View Post
Although you did point out that my iBooks problem with the popovers was probably a specificity issue, I'm still wondering if it would be worth using Juice to do this to my Kindle file. (It'd obviously affect KF8 as well as KF7 since you can't split source files.)
Blowing things into inline styles is kind of messy, so if you can avoid it, that's probably better.


Quote:
Originally Posted by mattmc View Post
I could probably get this going using NodeJS in a few hours, make the script available, maybe even set up an NPM package. Any thoughts?
If you want to take a crack at implementing the full set of sanity checks and fixup tweaks, JavaScript would be a really good way to do it. My first thought as far as implementation would be something like this:
  1. Iterate all the elements and css rules, and create an associative array with every single CSS class name used in either one. You'll need this for uniqueness checks later.
  2. Iterate all the elements looking for elements with multiple classes. Change the spaces to hyphens or underscores. Add a random number if needed to make the resulting class name unique. Create an associative array mapping each of the original classes to an array, and in that array, put the names of any combined classes based on the original class. That way, you can quickly obtain a list of all the new combined classes that are derived from each of the original classes.
  3. Iterate all the CSS rules. For each rule, if any part of the rule matches one of the original classes, add a new copy of that rule with the combined class substituted in place of the original class, but do not remove the original rule, because it might also affect other non-combining elements.
  4. Iterate all the CSS styles a second time. This time, for each rule with a complex selector (multi-element or multi-class or both), take the selector, remove all the characters that would be invalid in a class name, add a random number if needed to make the class name unique, and create a copy of that rule under the new name. Delete the original rule. Then use getElementsBySelector to obtain a list of elements that match the original selector. For each element, add the newly generated class name to its list of classes.
  5. Repeat steps 2–4 in a loop until nothing changes. (Step 4 can cause you to need to repeat step 2.)
  6. Optional: For each CSS style, perform a getElementsBySelector, and if it returns an empty list of elements, delete the unused style.


Be careful when handling the @ styles, e.g. @media, @font-face, etc. You can probably just ignore @font-face rules entirely, and emit them in the final output.

You'll want to process the rules within an @media query just like you would any other rule, being sure to add any replacement rules inside the same @media query that the original came from. I wouldn't try to process the @media rules separately, because my gut says that could cause a nasty mess.

I'm suddenly tempted to try this.

Last edited by dgatwood; 08-04-2015 at 10:49 PM.
dgatwood is offline   Reply With Quote
Advert
Old 08-10-2015, 01:39 PM   #36
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by dgatwood View Post
If you want to take a crack at implementing the full set of sanity checks and fixup tweaks, JavaScript would be a really good way to do it. My first thought as far as implementation would be something like this:
  1. Iterate all the elements and css rules, and create an associative array with every single CSS class name used in either one. You'll need this for uniqueness checks later.
  2. Iterate all the elements looking for elements with multiple classes. Change the spaces to hyphens or underscores. Add a random number if needed to make the resulting class name unique. Create an associative array mapping each of the original classes to an array, and in that array, put the names of any combined classes based on the original class. That way, you can quickly obtain a list of all the new combined classes that are derived from each of the original classes.
  3. Iterate all the CSS rules. For each rule, if any part of the rule matches one of the original classes, add a new copy of that rule with the combined class substituted in place of the original class, but do not remove the original rule, because it might also affect other non-combining elements.
  4. Iterate all the CSS styles a second time. This time, for each rule with a complex selector (multi-element or multi-class or both), take the selector, remove all the characters that would be invalid in a class name, add a random number if needed to make the class name unique, and create a copy of that rule under the new name. Delete the original rule. Then use getElementsBySelector to obtain a list of elements that match the original selector. For each element, add the newly generated class name to its list of classes.
  5. Repeat steps 2–4 in a loop until nothing changes. (Step 4 can cause you to need to repeat step 2.)
  6. Optional: For each CSS style, perform a getElementsBySelector, and if it returns an empty list of elements, delete the unused style.


Be careful when handling the @ styles, e.g. @media, @font-face, etc. You can probably just ignore @font-face rules entirely, and emit them in the final output.

You'll want to process the rules within an @media query just like you would any other rule, being sure to add any replacement rules inside the same @media query that the original came from. I wouldn't try to process the @media rules separately, because my gut says that could cause a nasty mess.

I'm suddenly tempted to try this.
Heh. I may have to build this, because I have all my processing scripts in JS and I actually need to confront making sure I have my ducks in a row for KF7. Maybe in a few days I'll get to it, I'm tied up right now.

Question: It seems like you could do point #4 first, and then you'd only have to do point #2 once. Right?
mattmc is offline   Reply With Quote
Old 08-11-2015, 10:47 PM   #37
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by mattmc View Post
Heh. I may have to build this, because I have all my processing scripts in JS and I actually need to confront making sure I have my ducks in a row for KF7. Maybe in a few days I'll get to it, I'm tied up right now.

Question: It seems like you could do point #4 first, and then you'd only have to do point #2 once. Right?
I think so, now that you mention it. At least I can't think of any cases where that wouldn't work.

For fun, I took a crack at #2 and #3 last week. I ran into a bit of a wall where I couldn't do anything useful with the results from within a web browser, and I also ran into some data loss because Safari's CSS objects only include the bits that Safari uses, and leaves out properties that are specific to other browsers, but it might be useful as a starting point.


Code:
function translate() {
    changed = false;

    do {
        fixup_multi_class_elements();
        fixup_complex_styles();

    } while (changed);

}

function fixup_multi_class_elements()
{
    var changed = false;
    var CSSClassMap = new Array();

    var nodeIterator = document.createNodeIterator(document,
        NodeFilter.SHOW_ELEMENT,
        { acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT; } },
        false);

    while (currentNode = nodeIterator.nextNode()) {
        if (currentNode.className.match(/\S\s\S/)) {
            /* Multiple words separated by a space. */

            var bits = currentNode.className.split(/\s/).sort()
            var newCSSName = bits.join().replace(/\s/, "");

            for (var bitnum in bits) {
		var bit = bits[bitnum];

                if (!CSSClassMap[bit]) {
                    CSSClassMap[bit] = new Array();
                }
                CSSClassMap[bit].push(newCSSName);
            }

            currentNode.className = newCSSName;

            changed = true;
        }
    }

    var replacements = new Array();

    var stylesheets = document.styleSheets;
    for (var stylesheet = 0; stylesheet < stylesheets.length; stylesheet++)
    {
	var newRules = new Array();

	var stylesheet = document.styleSheets[stylesheet];

	var pos = 0;
        for (var i = 0; i < stylesheet.cssRules.length; i++)
        {
            var rule = stylesheet.cssRules[i];

            if (rule.type == CSSRule.MEDIA_RULE) {
	        // CSS media rule

		alert("Media rules not implemented yet.\n");
		newRules.push(constructStyleRule(null, rule.cssText))

		// Eventually, need to iterate the child rules, and
		// build a new rule.

	    } else if (rule.type == CSSRule.STYLE_RULE) {
		var newrule = combinedClassRules(rule, CSSClassMap);
		newRules.push(constructStyleRule(newrule["selector"], newrule["style"]))

	    } else {
		// Just add the rule.
		newRules.push(constructStyleRule(null, rule.cssText));
	    }
        }
        while (stylesheet.cssRules.length) {
	    stylesheet.deleteRule(0);
	}

	var newStylesheet = "";

	var pos = 0;
	for (var i=0; i < newRules.length; i++) {
	    var newrule = newRules[i];

	    //console.log("INSERTING RULE\n"); console.log(newrule);
	    // stylesheet.insertRule(newrule["selector"], newrule["style"], i);

	    if (newrule["selector"]) {
	        newStylesheet += "\n"+newrule["selector"]+"\n{\n"+newrule["style"]+"\n}\n";
	    } else {
	        newStylesheet += "\n"+newrule["style"]+"\n";
	    }
	}

	console.log(newStylesheet);
    }
}

function constructStyleRule(selectorString, ruleString)
{
    return {
	"selector": selectorString,
	"style": ruleString + ""
    };
}


function combinedClassRules(rule, CSSClassMap)
{
    var selector = rule.selectorText;

    var parts = selector.split(",");
    var newSelectors = new Array();

    // console.log("PARTS:"); console.log(parts);

    for (var partnum in parts) {
	var part = parts[partnum];

	// console.log("PART: "+part);
	if (part.match(/\S/)) {
		var replacedSelectors = mapSelectors(part, CSSClassMap)
		if (replacedSelectors.length) newSelectors.push(replacedSelectors);
		newSelectors.push(part);
	}
    }

    return constructStyleRule(newSelectors.join(","), rule.style.cssText);
}

function classesAndIDsInSelector(selector, CSSClassMap)
{
    var parts = selector.split(/[#.]/).slice(1);
    var classes = new Array();

    for (partnum in parts) {
	var part = parts[partnum];
	// -?[_a-zA-Z]+[_a-zA-Z0-9-]*

	var possibleClassOrID = part.replace(/^\s*/, "").replace(/(\s|[#.>+~[]).*$/, "");

	if (possibleClassOrID.length) {
	    /* Valid class or ID */
	    classes.push(possibleClassOrID);
	}
    }

    // console.log("SEL "+selector+" contains "+classes.join(","));

    return classes;
}

function mapSelectors(selector, CSSClassMap)
{
    var candidates = classesAndIDsInSelector(selector, CSSClassMap);

    var tempArray = new Array();
    for (var oldClassNum in candidates) {
	var oldClass = candidates[oldClassNum];
	var newClasses = CSSClassMap[oldClass];

	if (newClasses) {
	    for (var newClassNum in newClasses) {
		var newClass = newClasses[newClassNum];

		tempArray.push(replaceClassInSelector(selector, oldClass, newClass));
	    }
	}
    }
    return tempArray.join(",");
}

/* From http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex */
function escapeRegExp(str) {
    return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}

function replaceClassInSelector(selector, oldClass, newClass)
{
    var quoted = escapeRegExp(oldClass);
    var searchRE = new RegExp("([#.])"+quoted+"(?=$|\s|[#.>+~[])", "g");

    return selector.replace(searchRE, "$1"+newClass);
}

function fixup_complex_styles()
{
}
dgatwood is offline   Reply With Quote
Old 08-17-2015, 11:33 PM   #38
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by dgatwood View Post
I think so, now that you mention it. At least I can't think of any cases where that wouldn't work.

For fun, I took a crack at #2 and #3 last week. I ran into a bit of a wall where I couldn't do anything useful with the results from within a web browser, and I also ran into some data loss because Safari's CSS objects only include the bits that Safari uses, and leaves out properties that are specific to other browsers, but it might be useful as a starting point.
Nice! That's some nice JS

I do understand the wall there, though--I think it's best to stay away from the browser on this one, due to quirks and portability issues. I ended up writing #4 in a script that just processes the files on-disk in an unzipped ePub, basically. With the Cheerio module this is fairly easy:
  1. Iterate the CSS rules
  2. Find any selectors for those rules that are "too complex" for Kindlegen
  3. Swap the complex selectors for simplified class selectors
  4. In the HTML, use Cheerio to select elements that the complex selectors match
  5. Add the simple classes to those elements

So something like p > span becomes .p-span, voila.

I put it here as a Gist with syntax highlighting, or you can see it here:

Spoiler:
Code:
#!/usr/bin/env node
// The above "shebang" tells the bash terminal to run this script as a Node.js script.

// Import NodeJS packages
var argv = require('yargs').argv;   // for processing command-line arguments
var fs = require("fs");             // filesystem
var wrench = require('wrench');     // for deep recursive copying
var css = require('css');           // for parsing CSS into an AST
var walk = require('rework-walk'); // for walking ASTs generated by the css module
var cheerio = require('cheerio');   // jquery-like access to an HTML document

// Define an "ends with" method that is useful later.
if (typeof String.prototype.endsWith !== 'function') {
    String.prototype.endsWith = function(suffix) {
        return this.indexOf(suffix, this.length - suffix.length) !== -1;
    };
}

// Fetches all files in a folder, recursively
function filesRecursive(dir) {
    var results = [];
    var list = fs.readdirSync(dir);
    list.forEach(function(file) {
        file = dir + '/' + file;
        var stat = fs.statSync(file);
        if (stat && stat.isDirectory()) {
            results = results.concat(walk(file));
        } else {
            results.push(file);  
        } 
    });
    return results;
}

// Same as filesRecursive, but filters for files that end with an ext, such as ".css"
function filesRecursiveWithExt(dir, ext){
    var results = walk(dir);
    return results.reduce(function(filteredContents, filename){
        if (filename.endsWith(ext)) {
            filteredContents.push(filename);
        }
        return filteredContents;
    }, []);
}

// Convenience function: 
// reads a file, calls a function that you give it, 
// and then writes back to the file whatever your function returns
function readWrite(filePath, callback){
    var content = '';
    var stats = fs.statSync(filePath);
    if (!stats.isDirectory()) {
        content = fs.readFileSync(filePath).toString();
    }
    if (callback) {
        content = callback(content, stats);
        if (!stats.isDirectory() && content) {
            fs.writeFileSync(filePath, content);
        }
    } else {
        return content;
    }
}

/////////////////////
// MAIN SCRIPT STARTS
/////////////////////

// Get the path to the epub directory, as an argument
var targetDirectory = argv._[0];
if (!targetDirectory) {
    console.log("Please specify the directory of the epub you want to work with.");

}
var scriptDir = path.dirname(require.main.filename); // current directory of the main script
var resolvedTargetDir = path.resolve(targetDirectory);

// Clone the directory so we don't taint the original
var newDirectory = resolvedTargetDir + '_kf7';
wrench.copyDirSyncRecursive(resolvedTargetDir, newDirectory, {
    forceDelete: true // overwrites any "_kf7" directory that's already there
});

// Simplify our CSS rules
var complexSelectorMap = {};
var cssFiles = filesRecursiveWithExt(newDirectory, "css");
cssFiles.forEach(function(filename){
    readWrite(filename, function(content, stats){
        if (!content) return;

        // Parse the CSS into an AST that can be walked
        var ast = css.parse(content);

        // Walk the AST
        walk(ast.stylesheet, function(rule, node){

            if (!rule.selectors) return;

            var remove = [];
            var add = [];
            rule.selectors.forEach(function(sel, idx){

                // If the selector contains a space, it's too complex for Kindlegen
                var parts = sel.split(" ");
                if (parts.length > 1) {

                    // Create a simplified version of the selector
                    var newSel = '.' + parts.join('-').replace('#', '-id-').replace('.', '-clz-').replace('+', '-adj-').replace('~', '-pre-').replace('[', '-lbr-').replace(']', '-rbr-').replace(/-{2,}/, '-');

                    // Add it to our list to add to this rule
                    add.push(newSel);

                    // Map the complex selector to the simplified version (for later adjustments we do in the markup)
                    complexSelectorMap[sel] = newSel;

                    // Add this selector to our list to remove
                    remove.push(idx);
                }
            });

            // Remove the complex selectors, if any.
            // Note that we go backwards through the list, otherwise our indexes will be messed up.
            for (var i = remove.length - 1; i >= 0; i--) {
                rule.selectors.splice(remove[i], 1);
            };

            // Add the simplified selectors, if any
            rule.selectors = rule.selectors.concat(add);

        });

        // Stringify the modified AST and return it, so it gets written back to the file
        return css.stringify(ast);
    });
});

// Now convert everything in the markup to the simpler selectors, as it were
var compoundClassMap = {};
var htmlFiles = filesRecursiveWithExt(newDirectory, "html"); // will include "xhtml"
htmlFiles.forEach(function(filename){
    readWrite(filename, function(content, stats){
        if (!content) return;

        // Load up everything into Cheerio!
        var $ = cheerio.load(content, {
            xmlMode: true
        });

        // Find everything that the complex selectors applied to, and stick the simpler class on them
        Object.keys(complexSelectorMap).forEach(function(key) {
            $(key).addClass(complexSelectorMap[key]); // ...Well, that was easy.
        });

        return $.xml();
    });
});


I do want to evolve it and probably make it into a proper NPM package with tests and all that, but I figured I'd post my immediate results.

-----

Okay, now there's the question of #2 and #3. You basically mention elements that have multiple classes, but I think for a truly universal solution a more complex approach is required. Correct me if I'm wrong, but it's not so much elements with multiple classes as it is elements that multiple selectors apply to, right?

Like, what if you have span.blah and you have a <p class="blah"> for whatever reason? The selector wouldn't actually apply in that scenario, but if you were just looking at classes, you would think it did.

Or if you have selectors #super and .duper, and element <p class="duper" id="super"> then both rules would apply to that element.

It's really all dependent on what kind of CSS is being used by the book creator; if you're just using classes then obviously that's fine, I'm just thinking it all the way through to the conclusion.

I suppose if you already got rid of all of the complex selectors, basically anything matching /[+~\[\] ]/, then all you have to worry about is IDs and classes? So you could walk the DOM with that in mind, I suppose.
mattmc is offline   Reply With Quote
Old 08-19-2015, 02:01 AM   #39
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by mattmc View Post
Okay, now there's the question of #2 and #3. You basically mention elements that have multiple classes, but I think for a truly universal solution a more complex approach is required. Correct me if I'm wrong, but it's not so much elements with multiple classes as it is elements that multiple selectors apply to, right?
I don't think so. Kindlegen seems to handle multiple matches correctly, with the last one taking precedence. The only multiple match issue I remember was when an element had multiple class names (class="foo bar").


Quote:
Originally Posted by mattmc View Post
Like, what if you have span.blah and you have a <p class="blah"> for whatever reason? The selector wouldn't actually apply in that scenario, but if you were just looking at classes, you would think it did.
I'm pretty sure that span.blah and p.blah are correctly treated as distinct.


Quote:
Originally Posted by mattmc View Post
Or if you have selectors #super and .duper, and element <p class="duper" id="super"> then both rules would apply to that element.
My recollection is that multiple rules matching a single element don't cause problems.


Quote:
Originally Posted by mattmc View Post
It's really all dependent on what kind of CSS is being used by the book creator; if you're just using classes then obviously that's fine, I'm just thinking it all the way through to the conclusion.

I suppose if you already got rid of all of the complex selectors, basically anything matching /[+~\[\] ]/, then all you have to worry about is IDs and classes? So you could walk the DOM with that in mind, I suppose.
Exactly. You can trivially get rid of the complex selectors by just replacing them with an arbitrary class name, then getting all the elements that match the original selector, and adding that class name to each of them. From there, the remaining mess is ensuring that each element has exactly one class name.
dgatwood is offline   Reply With Quote
Old 08-19-2015, 08:36 AM   #40
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,336
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by dgatwood View Post
Exactly. You can trivially get rid of the complex selectors by just replacing them with an arbitrary class name, then getting all the elements that match the original selector, and adding that class name to each of them. From there, the remaining mess is ensuring that each element has exactly one class name.
I can't add anything to this discussion...way beyond me... but I have a question.

Aren't multiple classes defined in the ePub spec? Shouldn't a compliant reader/app be able to handle <p class="super duper"> in exactly the way you describe...combining the two class's css with the class listed first given higher priority? I rarely use multiple classes, but it seems Sigil and Marvin handle them just fine.

Is this just a work around for kindle's shortcomings rather than an ePub issue??

Cheers,
Turtle91 is offline   Reply With Quote
Old 08-19-2015, 06:48 PM   #41
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by dgatwood View Post
Exactly. You can trivially get rid of the complex selectors by just replacing them with an arbitrary class name, then getting all the elements that match the original selector, and adding that class name to each of them. From there, the remaining mess is ensuring that each element has exactly one class name.
Excellent summary.

I'm building out a Node module on github that I dubbed allscribe, intended to serve as a library for various ebook processing tasks. This KF7 business is the first thing I'm adding to it.

The first part (selector simplification into classes) is done, and the tests are passing. I'll be moving onto the class de-duping next, and I can report when I'm done. The idea is that you can do something like this:

Code:
var book = openEpub('/Documents/Books/Magnum-Opus');
var clone = book.clone(book.path + '_copy');
clone.simplifyCSS();
(Not that you need this lib, dgatwood, what with your whole perl setup, but maybe someone will find it useful.)

Quote:
Originally Posted by Turtle91 View Post
Aren't multiple classes defined in the ePub spec? Shouldn't a compliant reader/app be able to handle <p class="super duper"> in exactly the way you describe...combining the two class's css with the class listed first given higher priority? I rarely use multiple classes, but it seems Sigil and Marvin handle them just fine.

Is this just a work around for kindle's shortcomings rather than an ePub issue??
Yep, precisely. Obviously this thread is about ePubs originally, but we sorta segued into how to format ePubs in a way that they can be converted into KF7 without breaking into a million pieces. Here was dgatwood's earlier post:

Quote:
Originally Posted by dgatwood View Post
I'm not producing the KF7 content myself. That list is what I had to do to my EPUB source just to make the latest version of Kindlegen convert it properly to KF7/KF8. For KF7, Kindlegen converts the CSS into HTML markup, but does it very badly, with a CSS parser that doesn't properly handle selectors containing multiple elements, multiple class names on single elements, lists of selectors applied to a single rule set, etc., resulting in all sorts of joy for those of us who routinely use nontrivial CSS.
Ah, dgatwood (er, can I just say David? I read some of your blog, btw ) I noticed in this earlier post that you said "lists of selectors applied to a single rule set", as in body, span, p { color:red; }, but then later you said comma-separated selectors weren't an issue. I'm guessing your memory's hazy on this one? I'm obviously relying on your experience for all this, so forgive me for being such a lawyer over your comments.

I guess if I really wanted to get serious about this, I'd build a testing framework that ran things through Kindlegen and then somehow examined the KF7 markup, which could be used to write unit tests for all this. Frankly, I wouldn't mind except that extracting data from an old .MOBI file seems super arcane. There ain't no JS library for that one, son.
mattmc is offline   Reply With Quote
Old 08-22-2015, 02:21 PM   #42
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by mattmc View Post
Ah, dgatwood (er, can I just say David? I read some of your blog, btw ) I noticed in this earlier post that you said "lists of selectors applied to a single rule set", as in body, span, p { color:red; }, but then later you said comma-separated selectors weren't an issue. I'm guessing your memory's hazy on this one? I'm obviously relying on your experience for all this, so forgive me for being such a lawyer over your comments.
I think my first comment was erroneous, and I think that comma-delimited selectors are okay, but I'm not 100% certain. It has been many months since I analyzed what was going on with kindlegen.


Quote:
Originally Posted by mattmc View Post
I guess if I really wanted to get serious about this, I'd build a testing framework that ran things through Kindlegen and then somehow examined the KF7 markup, which could be used to write unit tests for all this. Frankly, I wouldn't mind except that extracting data from an old .MOBI file seems super arcane. There ain't no JS library for that one, son.
That's for sure.
dgatwood is offline   Reply With Quote
Old 08-24-2015, 06:05 PM   #43
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by dgatwood View Post
I think my first comment was erroneous, and I think that comma-delimited selectors are okay, but I'm not 100% certain. It has been many months since I analyzed what was going on with kindlegen.
Hm, okay cool. I'll assume for now that this is fine. Thanks!

It actually makes it a little easier, because instead of cloning rules for the class-combination selectors, I can just add the combo selector to any rule that has one of the component selectors. So:

Code:
<span class="apple banana">Hello World!</span>

span.apple {
    color: red;
}
Becomes:

Code:
<span class="apple-banana">Hello World!</span>

span.apple,
.apple-banana {
    color: red;
}
Just, easier than copying rules and declarations around wholesale...

Anyway, I was sick for a couple of days, but I have it working now. At least all of my unit tests are passing. Assuming you have gorilla.epub you can do:

Code:
var allscribe = require('allscribe');

var gorilla = allscribe.openEpub('gorilla.epub');
gorilla.process(function(unpacked){
    unpacked.simplifyCssAndMarkup(); // handles multi-element and such complex selectors
    unpacked.mergeMarkupClasses();   // handles multiple classes on an element
    return 'silverback';
});
That'll get you silverback.epub with the various modifications done. Whew.

Anyway, I'll be incorporating all this into my current workflow, start testing how my book looks on K1, etc. But it's a good start, I think. Too bad it's probably not all that accessible to non-scripters.

(I would ask you to give it a shot, seeing that you are a scripter, but all your stuff is custom anyway )
mattmc is offline   Reply With Quote
Old 08-26-2015, 10:51 PM   #44
mattmc
Connoisseur
mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.mattmc can program the VCR without an owner's manual.
 
Posts: 89
Karma: 185923
Join Date: May 2015
Device: iPad 1/2/Air, K3/PW2/Fire1, Kobo Touch, Samsung Tab, Nook Color/Touch
Quote:
Originally Posted by dgatwood View Post
You sure it wasn't just a specificity issue? The iBooks stylesheets throw a lot of stuff into universal selectors like this:

Code:
:root[__ibooks_internal_theme*="Night"] * { ... }
Try this:

Code:
body element.class, element.class * { ... }
which gives you the same specificity as the built-in rules, and by virtue of being later, wins. (The universal selector in the second part is so that it applies to all elements inside. This overrides the universal selector for those. Do not include that for styles that set relative font sizes, obviously, or for box model stuff.)
By the by, I did try this out, and it didn't work on iBooks for OSX v1.1.1. Unfortunately, while I can Web Inspect the body of the book, I can't seem to do that for the popovers, so I can't tell what is going on exactly. It may just be that the HTML is being fed into some custom function that renders with a certain set of CSS and doesn't even pull in my main stylesheets, for example.

Any other ideas? I also tried * element.class, in case it's "not in a body".
mattmc is offline   Reply With Quote
Old 08-27-2015, 02:16 AM   #45
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by mattmc View Post
By the by, I did try this out, and it didn't work on iBooks for OSX v1.1.1. Unfortunately, while I can Web Inspect the body of the book, I can't seem to do that for the popovers, so I can't tell what is going on exactly. It may just be that the HTML is being fed into some custom function that renders with a certain set of CSS and doesn't even pull in my main stylesheets, for example.
Very possible. Have you tried looking at the stylesheets inside the app?

Control-click on the app bundle and choose "Show Package Contents". Then go to Contents, PlugIns, then control-click on BKAssetEpub.bundle, choose "Show Package Contents", then Contents, Resources, and open the various .css.tmpl files.
dgatwood is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
EPUB to HTLM (single page) obihal Conversion 8 05-20-2014 01:45 PM
Online HTML book -> epub: TOC from single file dancal Conversion 0 01-27-2014 01:45 PM
EPUB Formatting Challenge: Embedding blog posts in the flow of story text Morganucopia ePub 18 08-02-2013 04:47 PM
Several xhtml/html to a single epub file help. clowe1028 ePub 3 03-21-2010 03:47 AM
single HTML to ePub with fixed width font skyfish Calibre 8 12-10-2009 01:30 PM


All times are GMT -4. The time now is 01:38 AM.


MobileRead.com is a privately owned, operated and funded community.