Ok, i downloaded a smaller version of the articles, but it was just ONE big XML file... nothing to do there.
i'm still extracting the big 19GB file for all the articles, after that i have to remove the "user comments" and other extra stuff that we dont need and i'll start converting the html files.
i still have to see if i will use html2epub directly (have to learn to do that from command prompt) or calibre's GUI.
If anybody know another way to convert a bunch of html files to another format easily, please let me know.
|