Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-19-2007, 03:22 PM   #46
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://libprs500.kovidgoyal.net/wiki/UserProfiles
kovidgoyal is offline   Reply With Quote
Old 11-19-2007, 06:53 PM   #47
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by kovidgoyal View Post
By the way, your wiki reference reminded me that I put a short article about libprs500 in the MobileRead wiki. you may want to flush it out with more data.

Dale
DaleDe is offline   Reply With Quote
Old 11-19-2007, 07:00 PM   #48
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
pydoc str
Look for rpartition
kovidgoyal is offline   Reply With Quote
Old 11-20-2007, 10:48 AM   #49
Silvayn
Member
Silvayn began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2007
Location: Slovakia
Device: HTC Touch Diamond, Sony Reader 505
If I understand it correctly, rpartition divides a string into a 3-member array. This doesn't really help me that much, as I don't "speak" python and it's different from the languages that I know. So... if I could ask some python-knowledgable person to give me the exact command for the string conversion... I assume it would cost you about 5 secs of your life

Thank you in advance... in return I offer (rusty) pascal & vbscript support

i need
http://www.sme.sk/c/3592953/Ceskoslovenska-esej.html

to become
http://www.sme.sk/clanok_tlac.asp?cl=3592953

replace('/c/', '/clanok_tlac.asp?cl=') is step one... but after that i'm stuck
Silvayn is offline   Reply With Quote
Old 11-20-2007, 12:17 PM   #50
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah well here you go
Code:
url = 'http://www.sme.sk/c/3592953/Ceskoslovenska-esej.html'.rpartition('/')[0].replace('c/', 'clanok_tlac.asp?cl=')
kovidgoyal is offline   Reply With Quote
Old 11-21-2007, 04:45 AM   #51
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Text links being dropped

Kovid,
I noticed that web2lrf ignores/deletes words entirely that have underlying links. This makes some articles a little hard to understand since key words are sometimes left out.

As an example, in the following article the names "David Beckham," "Adidas," and "Pepsi" are all deleted/ignored when it is converted to an lrf.
http://www.nytimes.com/2007/11/17/bu...gewanted=print

I noticed the same thing happens when downloading the html file and running it through html2lrf. I've attached the lrf I generated as an example.

Is there something about linked text that makes it difficult to parse? Or is this simply a bug that needs to be eliminated?

Thanks a lot for your help.

BTW, still trying to get some profiles made. Not knowing Python is proving to be a rather large stumbling block, however.
Attached Files
File Type: lrf 17interview.lrf (7.1 KB, 817 views)

Last edited by JTravers; 11-21-2007 at 04:55 AM. Reason: added lrf attachment
JTravers is offline   Reply With Quote
Old 11-21-2007, 10:31 AM   #52
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That's a bug, actually a regression I introduced a few versions back. It will be fixed in the next release.
kovidgoyal is offline   Reply With Quote
Old 11-21-2007, 04:41 PM   #53
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by JTravers View Post
BTW, still trying to get some profiles made. Not knowing Python is proving to be a rather large stumbling block, however.
Here's a link to a python tutorial that may be of some help

http://docs.python.org/tut/tut.html
kovidgoyal is offline   Reply With Quote
Old 11-21-2007, 07:01 PM   #54
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
Here's a link to a python tutorial that may be of some help

http://docs.python.org/tut/tut.html
Thanks for the link

I'm really looking forward to getting some more interesting web content onto my 505.

BTW, does web2lrf only accept RSS feeds as input, or can one give it a regular webpage to process?
JTravers is offline   Reply With Quote
Old 11-22-2007, 12:58 PM   #55
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
web2lrf --url http://mypage default

will process a website.
kovidgoyal is offline   Reply With Quote
Old 11-22-2007, 03:47 PM   #56
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Can you stop the processing after the html has been cleaned up but before the html file tree is removed? (Or how do you get web2html?)
tompe is offline   Reply With Quote
Old 11-22-2007, 05:27 PM   #57
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
web2disk
kovidgoyal is offline   Reply With Quote
Old 11-22-2007, 06:32 PM   #58
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Does web2disk really do the cleanup ot the html code? If I only want the files I suppose wget will work also. Or do web2disk do something that wget does not do?
tompe is offline   Reply With Quote
Old 11-22-2007, 07:43 PM   #59
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,482
Karma: 24495778
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's optimized for downloading websites for conversion to ebooks. Has link filters and recursion level control and a bunch of other features
Code:
web2disk --help
cleanup is done by regexps, I dont remeber whether the regexps are passed to web2disk or html2lrf, i think it is web2disk, but there may not be a command line interface to it.
kovidgoyal is offline   Reply With Quote
Old 11-22-2007, 08:19 PM   #60
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
But if you run web2lrf it seems like the cleanup is done just before the conversion to another format. With --debug it says:

[INFO] convert_from.py:330: Processing 7108374.stm
[INFO] convert_from.py:283: Parsing HTML...
[INFO] convert_from.py:318: Written preprocessed HTML to /tmp/html2lrf-verbose.html
[INFO] convert_from.py:333: Converting to BBeB...


But since "web2disk bbc" is not implemented I have not been able to get the result after the preprocessing so I have not been able to check how it looks.
tompe is offline   Reply With Quote
Reply

Tags
libprs500, web2lrf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 08:06 PM.


MobileRead.com is a privately owned, operated and funded community.