Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-14-2015, 01:29 PM   #1
playfetch
Junior Member
playfetch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2015
Device: Kindle Paperwhite
Create eBook from MDN & FB Developer sites via Calibre

Hello!

I'm trying to figure out how I would go about converting a couple websites into eBooks for easy reading on my Kindle Paperwhite.

For example, there are two in particular that I would like to convert:

1. Mozilla Developer Network's JavaScript guides:
https://developer.mozilla.org/en-US/docs/Web/JavaScript (all pages on the left panel).

2. Facebook's React Native guide:
https://facebook.github.io/react-nat...d.html#content

I've tried setting up a custom news source within the calibre software, but I get garbled text, presumably from the JavaScript. I also have had trouble keeping the code snippets formatted properly (instead of reduced down to plain text).

Can someone help me set up my scripts to make this work for MDN & FB?

I'll include the script for MDN & a part of the output below, so you can see what's happening.

Thanks a lot!

SCRIPT (partial)

#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1436894783(BasicNewsRecipe):
title = 'MDN JavaScript Guide'
oldest_article = 9999
max_articles_per_feed = 100
auto_cleanup = True

feeds = [
('Introduction', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Introduction'),
('Grammar & Types', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Grammar_and_Types'),
('Control flow and error handling', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Control_flow_and_error_handling'),
('Loops and iteration', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Loops_and_iteration'),
('Functions', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Functions'),
('Expressions and operators', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Expressions_and_Operators'),
('Numbers and dates', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Numbers_and_dates'),
('Text formatting', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Text_formatting'),
('Regular Expressions', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions'),
('Indexed collections', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Indexed_collections'),
('Keyed collections', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Keyed_collections'),
('Working with objects', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Working_with_Objects'),
('Details of the object model', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Details_of_the_Object_Model'),
('Iterators and generators', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_generators'),
('Meta programming', 'https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Meta_programming'),
]

//MORE FEEDS WOULD GO HERE, UNLESS WE COULD AUTOMATE THE DOWNLOAD OF EVERYTHING CONTAINED WITHIN developer.mozilla.org/en-US/docs/Web/JavaScript/



OUTPUT (partial)
US/docs/Web/JavaScript/Reference/Statements/for...of" title="The for...of statement creates a loop Iterating over iterable objects (including Array, Map, Set, arguments object and so on), invoking a custom iteration hook with statements to be executed for the value of each distinct property."><code>for..of</code></a> construct. Some built-in types, such as <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array" title="The JavaScript Array global object is a constructor for arrays, which are high-level, list-like objects."><code>Array</code></a> or <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map" title="The Map object is a simple key/value map. Any value (both objects and primitive values) may be used as either a key or a value."><code>Map</code></a>, have a default iteration behavior, while other types (such as <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object" title="The Object constructor creates an object wrapper."><code>Object</code></a>) do not.</p> <p>In order to be <strong>iterable</strong>, an object must implement the <strong>@@iterator</strong> method, meaning that the object (or one of the objects up its <a href="/en-US/docs/Web/JavaScript/Guide/Inheritance_and_the_prototype_chain">prototype chain</a>) must have a property with a <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/iterator" title="The Symbol.iterator well-known symbol specifies the default iterator for an object. Used by for...of."><code>Symbol.iterator</code></a> key:</p> <h3 id="User-defined_iterables">User-defined iterables</h3> <p>We can make our own iterables like this:</p> <pre class="brush: js">var myIterable = {} myIterable[Symbol.iterator] = function* () { yield 1; yield 2; yield 3; }; [...myIterable] // [1, 2, 3] </pre> <h3 id="Built-in_iterables">Built-in iterables</h3> <p><a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/String" title="The String global object is a constructor for strings, or a sequence of characters."><code>String</code></a>, <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array" title="The JavaScript Array global object is a constructor for arrays, which are high-level, list-like objects."><code>Array</code></a>, <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray" title="A TypedArray object describes an array-like view of an underlying binary data buffer. There is no global property named TypedArray, nor is there a directly visible TypedArray constructor. Instead, there are a number of different global properties, whose values are typed array constructors for specific element types, listed below. On the following pages you will find common properties and methods that can be used with any typed array containing elements of any type."><code>TypedArray</code></a>, <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map" title="The Map object is a simple key/value map. Any value (both objects and primitive values) may be used as either a key or a value."><code>Map</code></a> and <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set" title="The Set object lets you store unique values of any type, whether primitive values or object references."><code>Set</code></a> are all built-in iterables, because the prototype objects of them all have a <a href="/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/iterator" title="The Symbol.iterator well-known symbol specifies the default iterator for an object. Used by for...of."><code>Symbol.iterator</code></a> method.</p> <h3 id="Syntaxes_expecting_iterables">Syntaxes expecting iterables</h3> <p>Some statements and expressions are expecting iterables, for example the <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...of">for-of</a></code> loops, <a href="https://developer.mozilla.org/en
// "b" // "c" [..."abc"] // ["a", "b", "c"] function* gen(){ yield* ["a", "b", "c"] } gen().next() // { value:"a", done:false } [a, b, c] = new Set(["a", "b", "c"]) a // "a" </pre> <h2 id="Generators">Generators</h2> <p>While custom iterators are a useful tool, their creation requires careful programming due to the need to explicitly maintain their internal state. Generators provide a powerful alternative: they allow you to define an iterative algorithm by writing a single function which can maintain its own state.</p> <p>A generator is a special type of function that works as a factory for iterators. A function becomes a generator if it contains one or more <code>yield</code> expressions and if it uses the <code>function*</code> syntax.</p> <pre class="brush: js">function* idMaker(){ var index = 0; while(true) yield index++; } var gen = idMaker(); console.log(gen.next().value); // 0 console.log(gen.next().value); // 1 console.log(gen.next().value); // 2 // ...</pre> <h2 id="Advanced_generators">Advanced generators</h2> <p>Generators compute their yielded values on demand, which allows them to efficiently represent sequences that are expensive to compute, or even infinite sequences as demonstrated above.</p> <p>The <code>next()</code> method also accepts a value which can be used to modify the internal state of the generator. A value passed to <code>next()</code> will be treated as the result of the last <code>yield</code> expression that paused the generator.</p> <p>Here is the fibonacci generator using <code>next(x)</code> to restart the sequence:</p> <pre class="brush: js">function* fibonacci(){ var fn1 = 1; var fn2 = 1; while (true){ var current = fn2; fn2 = fn1; fn1 = fn1 + current; var reset = yield current; if (reset){ fn1 = 1; fn2 = 1; } } } var sequence = fibonacci(); console.log(sequence.next().value); // 1 console.log(sequence.next().value); // 1 console.log(sequence.next().value); // 2 console.log(sequence.next().value); // 3 console.log(sequence.next().value); // 5 console.log(sequence.next().value); // 8 console.log(sequence.next().value); // 13 console.log(sequence.next(true).value); // 1 console.log(sequence.next().value); // 1 console.log(sequence.next().value); // 2 console.log(sequence.next().value); // 3</pre> <div class="note"><strong>Note:</strong> As a point of interest, calling <code>next(undefined)</code> is equivalent to calling <code>next()</code>. However, starting a newborn generator with any value other than undefined when calling <code>next()</code> will result in a <code>TypeError</code> exception.</div> <p>You can force a generator to throw an exception by calling its <code>throw()</code> method and passing the exception value it should throw. This exception will be thrown from the current suspended context of the generator, as if the <code>yield</code> that is currently suspended were instead a <code>throw <em>value</em></code> statement.</p> <p>If a <code>yield</code> is not encountered during the processing of the thrown exception, then the exception will propagate up through the call to <code>throw()</code>, and subsequent calls to <code>next()</code> will result in the <code>done</code> property being <code>true</code>.</p> <p>Generators have a <code>return(value)</code> method that returns the given value and finishes the generator itself.</p> <h2 id="Generator_comprehensions">Generator comprehensions</h2> <p>A significant drawback of <a href="/en-US/docs/Web/JavaScript/Reference/Operators/Array_comprehensions" title="en/JavaScript/Guide/Predefined Core Objects#Array comprehensions">array comprehensions</a> is that they cause an entire new array to be constructed in memory. When the input to the comprehension is itself a small array the overhead involved is insignificant — but when the input is a large array or an expensive (or indeed infinite) generator the creation of a new array can be problematic.</p> <p>Generators enable lazy computation of sequences, with items calculated on-demand as they are needed. <a href="/en-
playfetch is offline   Reply With Quote
Old 07-14-2015, 01:47 PM   #2
playfetch
Junior Member
playfetch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2015
Device: Kindle Paperwhite
FYI, this is the code that I have for the Facebook React Guide eBook, which also has the same issue of improper fetching of javascript:


#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1436895312(BasicNewsRecipe):
title = 'Facebook React Guide'
oldest_article = 999
max_articles_per_feed = 100
auto_cleanup = True

feeds = [
('Getting Started', 'https://facebook.github.io/react-native/docs/getting-started.html#content'),
('Tutorial', 'https://facebook.github.io/react-native/docs/tutorial.html#content'),
('Guides: Style', 'https://facebook.github.io/react-native/docs/style.html#content'),
('Guides: Gesture Responder System', 'https://facebook.github.io/react-native/docs/gesture-responder-system.html#content'),
('Guides: Animations', 'https://facebook.github.io/react-native/docs/animations.html#content'),
('Guides: Accessibility', 'https://facebook.github.io/react-native/docs/accessibility.html#content'),
('Guides: Native Modules (iOS)', 'https://facebook.github.io/react-native/docs/nativemodulesios.html#content'),
('Guides: Native UI Components (iOS)', 'https://facebook.github.io/react-native/docs/nativecomponentsios.html#content'),
('Guides: Direct Manipulation', 'https://facebook.github.io/react-native/docs/direct-manipulation.html#content'),
('Guides: Linking Libraries', 'https://facebook.github.io/react-native/docs/linking-libraries.html#content'),
('Guides: Debugging', 'https://facebook.github.io/react-native/docs/debugging.html#content'),
('Guides: Testing', 'https://facebook.github.io/react-native/docs/testing.html#content'),
('Guides: Running On Device', 'https://facebook.github.io/react-native/docs/runningondevice.html#content'),
('Guides: Integration with Existing App', 'https://facebook.github.io/react-native/docs/embedded-app.html#content'),
('Guides: JavaScript Environment', 'https://facebook.github.io/react-native/docs/javascript-environment.html#content'),
('Guides: Navigator Comparison', 'https://facebook.github.io/react-native/docs/navigator-comparison.html#content'),
('Components: ActivityIndicatorIOS', 'https://facebook.github.io/react-native/docs/activityindicatorios.html#content'),
('Components: DatePickerIOS', 'https://facebook.github.io/react-native/docs/datepickerios.html#content'),
('Components: Image', 'https://facebook.github.io/react-native/docs/image.html#content'),
('Components: ListView', 'https://facebook.github.io/react-native/docs/listview.html#content'),
('Components: MapView', 'https://facebook.github.io/react-native/docs/mapview.html#content'),
('Components: Navigator', 'https://facebook.github.io/react-native/docs/navigator.html#content'),
('Components: NavigatorIOS', 'https://facebook.github.io/react-native/docs/navigatorios.html#content'),
('Components: PickerIOS', 'https://facebook.github.io/react-native/docs/pickerios.html#content'),
('Components: ScrollView', 'https://facebook.github.io/react-native/docs/scrollview.html#content'),
('Components: SegmentedControlIOS', 'https://facebook.github.io/react-native/docs/segmentedcontrolios.html#content'),
('Components: SliderIOS', 'https://facebook.github.io/react-native/docs/sliderios.html#content'),
('Components: SwitchIOS', 'https://facebook.github.io/react-native/docs/switchios.html#content'),
('Components: TabBarIOS', 'https://facebook.github.io/react-native/docs/tabbarios.html#content'),
('Components: TabBarIOS.item', 'https://facebook.github.io/react-native/docs/tabbarios-item.html#content'),
('Components: Text', 'https://facebook.github.io/react-native/docs/text.html#content'),
('Components: TextInput', 'https://facebook.github.io/react-native/docs/textinput.html#content'),
('Components: TouchableHighlight', 'https://facebook.github.io/react-native/docs/touchablehighlight.html#content'),
('Components: TouchableOpacity', 'https://facebook.github.io/react-native/docs/touchableopacity.html#content'),
('Components: TouchableWithoutFeedback', 'https://facebook.github.io/react-native/docs/touchablewithoutfeedback.html#content'),
('Components: View', 'https://facebook.github.io/react-native/docs/view.html#content'),
('Components: WebView', 'https://facebook.github.io/react-native/docs/webview.html#content'),
('APIs: ActionSheetIOS', 'https://facebook.github.io/react-native/docs/actionsheetios.html#content'),
('APIs: AlertIOS', 'https://facebook.github.io/react-native/docs/alertios.html#content'),
('APIs: AppRegistry', 'https://facebook.github.io/react-native/docs/appregistry.html#content'),
('APIs: AppStateIOS', 'https://facebook.github.io/react-native/docs/appstateios.html#content'),
('APIs: AsyncStorage', 'https://facebook.github.io/react-native/docs/asyncstorage.html#content'),
('APIs: CameraRoll', 'https://facebook.github.io/react-native/docs/cameraroll.html#content'),
('APIs: InteractionManager', 'https://facebook.github.io/react-native/docs/interactionmanager.html#content'),
('APIs: LayoutAnimation', 'https://facebook.github.io/react-native/docs/layoutanimation.html#content'),
('APIs: LinkingIOS', 'https://facebook.github.io/react-native/docs/linkingios.html#content'),
('APIs: NetInfo', 'https://facebook.github.io/react-native/docs/netinfo.html#content'),
('APIs: PanResponder', 'https://facebook.github.io/react-native/docs/panresponder.html#content'),
('APIs: PixelRatio', 'https://facebook.github.io/react-native/docs/pixelratio.html#content'),
('APIs: PushNotificationIOS', 'https://facebook.github.io/react-native/docs/pushnotificationios.html#content'),
('APIs: StatusBarIOS', 'https://facebook.github.io/react-native/docs/statusbarios.html#content'),
('APIs: StyleSheet', 'https://facebook.github.io/react-native/docs/stylesheet.html#content'),
('APIs: VibrationIOS', 'https://facebook.github.io/react-native/docs/vibrationios.html#content'),
('Polyfills: Flexbox', 'https://facebook.github.io/react-native/docs/flexbox.html#content'),
('Polyfills: Geolocation', 'https://facebook.github.io/react-native/docs/geolocation.html#content'),
('Polyfills: Network', 'https://facebook.github.io/react-native/docs/network.html#content'),
('Polyfills: Timers', 'https://facebook.github.io/react-native/docs/timers.html#content'),
]
playfetch is offline   Reply With Quote
Advert
Old 07-14-2015, 10:04 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,251
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I dont actually see what is garbled in your output? Are you saying that the output contains HTML tags instead of normal text?

If so, you can implement preprocess_raw_html() in your recipe to fix the parsing, something like this

Code:
    def preprocess_raw_html(self, raw, url):
        from lxml import etree
        import html5lib
        root = html5lib.parse(
            clean_xml_chars(raw), treebuilder='lxml',
            namespaceHTMLElements=False)
        return etree.tostring(root, encoding=unicode)
kovidgoyal is offline   Reply With Quote
Old 07-15-2015, 12:58 PM   #4
playfetch
Junior Member
playfetch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2015
Device: Kindle Paperwhite
I can't seem to get it to work, still. I've added your code to my script and I'm still getting html, rather than properly formatted results.

Here is a link to my recipe. Would you mind taking a look and letting me know what's wrong with it? Thanks so much!

https://www.dropbox.com/s/djp9lodqx2...02.recipe?dl=0
playfetch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
MDN info pages aren't ideal chaot Editor 2 02-25-2015 02:48 PM
Using Sigil & Calibre to create a Novel Notebook crich70 Writers' Corner 0 10-06-2010 02:13 AM
The Green Reader and Linux User&Developer jules_july PocketBook 1 07-14-2010 06:10 AM
Pocketbook 301+Comfort - Review auf LinuxUser & developer Lino PocketBook 0 06-18-2010 02:08 AM
The Green Reader and Linux User&Developer jules_july News 0 06-17-2010 04:38 AM


All times are GMT -4. The time now is 08:54 PM.


MobileRead.com is a privately owned, operated and funded community.