MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Recipes (https://www.mobileread.com/forums/forumdisplay.php?f=228)
-   -   German / Deutsch - Zeit Online broken (https://www.mobileread.com/forums/showthread.php?t=107064)

feodor 11-16-2010 05:31 AM

German / Deutsch - Zeit Online broken
 
1 Attachment(s)
Hi,
does anyone happen to know why Zeit Online news download is broken?
Please see the file attached.

regards,
feodor

miwie 11-16-2010 07:21 AM

Try again. I managed to fetch "Zeit Online" just now w/o any problems.

feodor 11-16-2010 07:45 AM

I tried serveral times. doesn't work for me.

EeeGrill 11-16-2010 07:57 AM

Its still broken.
In the log are many error messages like:
Quote:

Downloading
Fetching http://www.zeit.de/kultur/musik/2010...all&print=true
Failed to download article: Neu Delhi: Dutzende Menschen sterben in eingestürztem Haus from http://www.zeit.de/gesellschaft/zeit...all&print=true
Traceback (most recent call last):
File "site-packages/calibre/utils/threadpool.py", line 95, in run
File "site-packages/calibre/web/feeds/news.py", line 838, in fetch_article
File "site-packages/calibre/web/feeds/news.py", line 834, in _fetch_article
Exception: Konnte Artikel nicht abrufen. Mit -vv starten, um den Grund dafür zu sehen

Starson17 11-16-2010 11:11 AM

Quote:

Originally Posted by feodor (Post 1218809)
Hi,
does anyone happen to know why Zeit Online news download is broken?
Please see the file attached.

regards,
feodor

1) The thumbnail image ("file attached") doesn't seem to show an error, it looks like a normal progress bar that isn't done yet.
2) The recipe downloads and completes for me.
3) The recipe doesn't seem to have much content, but I don't know how much content that site normally has.
4) I see errors during the download, but that does not necessarily mean there's a problem. Malformed pages can generate errors during download, or pages that have unsuitable content, such as video or audio or links to pdf files can also do this. It depends on how the recipe is written. It is also possible that the errors are a problem.

EeeGrill 11-16-2010 12:53 PM

Yes, the recipe downloads complete and result in a correct file,
but
in the file is only a table of contents with the names of the sections
and a short description of each section.
There are no articles at all.
The size of the file is now 90 KB and it was several MB.

Starson17 11-16-2010 02:30 PM

Quote:

Originally Posted by EeeGrill (Post 1219292)
Yes, the recipe downloads complete and result in a correct file,
but
in the file is only a table of contents with the names of the sections
and a short description of each section.
There are no articles at all.
The size of the file is now 90 KB and it was several MB.

It sounds like the site has changed its format.

Artemis_A 11-18-2010 05:56 PM

I just tried to download the ZEIT again. Still with the same bad result as described by EeeGrill.
I checked the links in the recipe. They are all ok and obviously haven't changed on the ZEIT homepage. So that's not the cause. I also checked the div tags. They seem to be ok also. Anybody with more ideas??

-Thomas- 11-18-2010 07:55 PM

I had success by applying the following changes to the recipe:

Code:

--- /usr/share/calibre/recipes/zeitde.recipe    2010-11-12 21:33:30.000000000 +0100
+++ /tmp/zeitde.recipe  2010-11-19 00:58:10.000000000 +0100
@@ -11,7 +11,8 @@
 
    title = 'Zeit Online'
    description = 'Zeit Online'
-    language = 'de'
+    lang = 'de'
+    encoding = 'UTF-8'
 
    __author__ = 'Martin Pitt, Sujata Raman, Ingo Paschke and Marc Toensing'

The encoding is kind of hard-coded, but it works for me.

Rod Laird 11-18-2010 08:52 PM

Die Zeit fix
 
Vielen Dank Thomas!

m f G aus Australien

Rod

feodor 11-19-2010 05:38 AM

First of all I want so say that I'm really surprised how supportive this community is. Thank you!

Your proposal didn't work for me, Thomas. I located the zeit recipe on my disk.
It's "C:\Program Files (x86)\Calibre2\resources\recipes\zeitde.recipe" for me.
I removed the line you marked with "-" and added the lines you marked with "+".
Was that correct?

I saved and tried again -> same failure.

Regards,
Andreas

PS: Is it possible that Zeit implemented some kind of "mass-query-prevention"? To keep us leechers away :-)

miwie 11-19-2010 06:01 AM

I just successfully generated an epub using "zeitde.recipe".

Michael

EeeGrill 11-19-2010 08:47 AM

I applied the changes but the result is the same.
I still get errors like:
Quote:

Formel 1: Sebastian Vettel ist Weltmeister from Sport
http://www.zeit.de/sport/2010-11/vet...all&print=true
Traceback (most recent call last):
File "site-packages/calibre/utils/threadpool.py", line 95, in run
File "site-packages/calibre/web/feeds/news.py", line 838, in fetch_article
File "site-packages/calibre/web/feeds/news.py", line 834, in _fetch_article
Exception: Konnte Artikel nicht abrufen. Mit -vv starten, um den Grund dafür zu sehen

Parsing all content...
Parsing feed_1/index.html ...
Initial parse failed:
Traceback (most recent call last):
File "site-packages/calibre/ebooks/oeb/base.py", line 818, in first_pass
File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48270)
File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71812)
File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70608)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67148)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63824)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64745)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64088)
XMLSyntaxError: Opening and ending tag mismatch: hr line 29 and div, line 30, column 7

EeeGrill 11-19-2010 09:04 AM

Hello miwie,

did you really get a complete epub?
How big is your epub?
My epub is approx. 100 KB.
This is only the table of contents.

miwie 11-19-2010 09:12 AM

The resulting "zeit.epub" is approx. 5.7 MB in size. I did not check every article in it, but it looks like it is complete.

Michael
PS: I'm working with calibre 0.7.28 (WinXP)


All times are GMT -4. The time now is 09:42 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.