Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-22-2010, 03:04 AM   #1
nkormanik
Enthusiast
nkormanik began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Oct 2010
Device: none
Question: Deleting html links to sites in document

I'm constructing a personal e-book containing a collection of html pages I gathered from the Internet.

I merged these pages into one large html document. Within the merged document are many links to outside sites. When I bring the html document into my web browser, I can see at the bottom of the screen lots of downloading from those various links/sites.

My question is: What's the easiest way of removing or eliminating all of those links to the outside?

One way I found was to use an editor and to simply remove "http://" wherever it occurs in the html document.

Is that the best way?

Thanks,
Nicholas Kormanik
nkormanik is offline   Reply With Quote
Old 11-22-2010, 03:06 AM   #2
nkormanik
Enthusiast
nkormanik began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Oct 2010
Device: none
I was hoping that Calibre would have some auto-stripping function for html importing, that would automatically eliminate such links.
nkormanik is offline   Reply With Quote
Old 11-22-2010, 04:35 AM   #3
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
If you want to remove whatever from your document, you can always use the header/footer removal regexes in structure detection.
Manichean is offline   Reply With Quote
Old 11-22-2010, 12:06 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,445
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
just convert your html file, links pointing to non local resources are not followed by calibre's conversion engine anyway
kovidgoyal is online now   Reply With Quote
Old 11-22-2010, 08:51 PM   #5
nkormanik
Enthusiast
nkormanik began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Oct 2010
Device: none
Glad to hear Calibre strips the links.

The problem is, however, if there are lots of links, it takes a long time for Calibre to convert the file. Seems better to delete them upfront, from the HTML document, before bringing it into Calibre.
nkormanik is offline   Reply With Quote
Old 11-22-2010, 09:05 PM   #6
nkormanik
Enthusiast
nkormanik began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Oct 2010
Device: none
Things are rough enough for Calibre to handle. Cut the junk first.
nkormanik is offline   Reply With Quote
Old 11-22-2010, 09:07 PM   #7
nkormanik
Enthusiast
nkormanik began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Oct 2010
Device: none
Converting to txt is not the answer, though, as too much formatting is given up.
nkormanik is offline   Reply With Quote
Reply

Tags
eliminate links

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Links to other similar sites? Rhynedahll Writers' Corner 4 10-25-2010 06:37 PM
html to zip without following links dracore Calibre 1 09-08-2010 06:10 PM
Sigil shows a blank document when importing valid HTML walter2 Sigil 15 03-25-2010 07:17 AM
Make chapters for a document/HTML iodine9176 ePub 12 02-23-2010 02:24 PM
HTML with external links posativ LRF 2 02-07-2010 07:27 AM


All times are GMT -4. The time now is 11:55 AM.


MobileRead.com is a privately owned, operated and funded community.