View Single Post
Old 08-27-2009, 12:37 AM   #1
Waltarro
Junior Member
Waltarro began at the beginning.
 
Posts: 6
Karma: 32
Join Date: Sep 2008
Device: Sony PRS505
Extract html from epub

I got a little tired of manually extracting the html from epub
files when I wanted to just read the book in a browser. Just messing
around with bash I came up with a simple script to do the job.

Its pretty crude and I know I should have read the metadata.opf
and probably would have if I did this in Java or Python, anyway
thought I would share nonetheless. Works in linux, might work
on a mac with a few tweaks. Just pass in the epub file as the first
parameter.

Code:
#!/bin/bash

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

bookname=$1
unzip $1 -d /tmp/epub2html > /dev/null

str0=`find /tmp/epub2html/content/* -regex '.*_1.html'`
let len=${#str0}-6

substr=${str0:23:$len}
substr=${substr%1.html}

files=`ls -l /tmp/epub2html/content/$substr*.html | wc -l`

for x in $(seq 0 $files); do
 
filepart="/tmp/epub2html/content/$substr$x.html"

   if [ -e $filepart ]; then 
     cat $filepart >> ${bookname//.epub/.html}
   fi
done

#copy over the images if you want them
if [ ! -e resources ]; then
  mkdir resources 
fi

`cp /tmp/epub2html/content/resources/*.jpg /tmp/epub2html/content/resources/*.png -t ./resources 2> /dev/null`

rm -R /tmp/epub2html
Waltarro is offline   Reply With Quote