View Full Version : German / Deutsch - Zeit Online broken


feodor
11-16-2010, 05:31 AM
Hi,
does anyone happen to know why Zeit Online news download is broken?
Please see the file attached.

regards,
feodor

miwie
11-16-2010, 07:21 AM
Try again. I managed to fetch "Zeit Online" just now w/o any problems.

feodor
11-16-2010, 07:45 AM
I tried serveral times. doesn't work for me.

EeeGrill
11-16-2010, 07:57 AM
Its still broken.
In the log are many error messages like:

Downloading
Fetching http://www.zeit.de/kultur/musik/2010-11/klassik-im-netz?page=all&print=true
Failed to download article: Neu Delhi: Dutzende Menschen sterben in eingestürztem Haus from http://www.zeit.de/gesellschaft/zeitgeschehen/2010-11/delhi-einsturz?page=all&print=true
Traceback (most recent call last):
File "site-packages/calibre/utils/threadpool.py", line 95, in run
File "site-packages/calibre/web/feeds/news.py", line 838, in fetch_article
File "site-packages/calibre/web/feeds/news.py", line 834, in _fetch_article
Exception: Konnte Artikel nicht abrufen. Mit -vv starten, um den Grund dafür zu sehen

Starson17
11-16-2010, 11:11 AM
Hi,
does anyone happen to know why Zeit Online news download is broken?
Please see the file attached.

regards,
feodor

1) The thumbnail image ("file attached") doesn't seem to show an error, it looks like a normal progress bar that isn't done yet.
2) The recipe downloads and completes for me.
3) The recipe doesn't seem to have much content, but I don't know how much content that site normally has.
4) I see errors during the download, but that does not necessarily mean there's a problem. Malformed pages can generate errors during download, or pages that have unsuitable content, such as video or audio or links to pdf files can also do this. It depends on how the recipe is written. It is also possible that the errors are a problem.

EeeGrill
11-16-2010, 12:53 PM
Yes, the recipe downloads complete and result in a correct file,
but
in the file is only a table of contents with the names of the sections
and a short description of each section.
There are no articles at all.
The size of the file is now 90 KB and it was several MB.

Starson17
11-16-2010, 02:30 PM
Yes, the recipe downloads complete and result in a correct file,
but
in the file is only a table of contents with the names of the sections
and a short description of each section.
There are no articles at all.
The size of the file is now 90 KB and it was several MB.
It sounds like the site has changed its format.

Artemis_A
11-18-2010, 05:56 PM
I just tried to download the ZEIT again. Still with the same bad result as described by EeeGrill.
I checked the links in the recipe. They are all ok and obviously haven't changed on the ZEIT homepage. So that's not the cause. I also checked the div tags. They seem to be ok also. Anybody with more ideas??

-Thomas-
11-18-2010, 07:55 PM
I had success by applying the following changes to the recipe:

--- /usr/share/calibre/recipes/zeitde.recipe 2010-11-12 21:33:30.000000000 +0100
+++ /tmp/zeitde.recipe 2010-11-19 00:58:10.000000000 +0100
@@ -11,7 +11,8 @@

title = 'Zeit Online'
description = 'Zeit Online'
- language = 'de'
+ lang = 'de'
+ encoding = 'UTF-8'

__author__ = 'Martin Pitt, Sujata Raman, Ingo Paschke and Marc Toensing'



The encoding is kind of hard-coded, but it works for me.

Rod Laird
11-18-2010, 08:52 PM
Vielen Dank Thomas!

m f G aus Australien

Rod

feodor
11-19-2010, 05:38 AM
First of all I want so say that I'm really surprised how supportive this community is. Thank you!

Your proposal didn't work for me, Thomas. I located the zeit recipe on my disk.
It's "C:\Program Files (x86)\Calibre2\resources\recipes\zeitde.recipe" for me.
I removed the line you marked with "-" and added the lines you marked with "+".
Was that correct?

I saved and tried again -> same failure.

Regards,
Andreas

PS: Is it possible that Zeit implemented some kind of "mass-query-prevention"? To keep us leechers away :-)

miwie
11-19-2010, 06:01 AM
I just successfully generated an epub using "zeitde.recipe".

Michael

EeeGrill
11-19-2010, 08:47 AM
I applied the changes but the result is the same.
I still get errors like:

Formel 1: Sebastian Vettel ist Weltmeister from Sport
http://www.zeit.de/sport/2010-11/vettel-formel-eins-weltmeister?page=all&print=true
Traceback (most recent call last):
File "site-packages/calibre/utils/threadpool.py", line 95, in run
File "site-packages/calibre/web/feeds/news.py", line 838, in fetch_article
File "site-packages/calibre/web/feeds/news.py", line 834, in _fetch_article
Exception: Konnte Artikel nicht abrufen. Mit -vv starten, um den Grund dafür zu sehen

Parsing all content...
Parsing feed_1/index.html ...
Initial parse failed:
Traceback (most recent call last):
File "site-packages/calibre/ebooks/oeb/base.py", line 818, in first_pass
File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48270)
File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71812)
File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70608)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67148)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63824)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64745)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64088)
XMLSyntaxError: Opening and ending tag mismatch: hr line 29 and div, line 30, column 7

EeeGrill
11-19-2010, 09:04 AM
Hello miwie,

did you really get a complete epub?
How big is your epub?
My epub is approx. 100 KB.
This is only the table of contents.

miwie
11-19-2010, 09:12 AM
The resulting "zeit.epub" is approx. 5.7 MB in size. I did not check every article in it, but it looks like it is complete.

Michael
PS: I'm working with calibre 0.7.28 (WinXP)

Starson17
11-19-2010, 09:47 AM
The resulting "zeit.epub" is approx. 5.7 MB in size. I did not check every article in it, but it looks like it is complete.
Did you download via the internal Calibre recipe system and the GUI, or did you run ebook-convert on the command line? I'm getting a complete recipe when run via the command line and an incomplete one when run via the GUI.

I hesitate to report this, as there should be no difference between the two, and I haven't had time to track down what's going on (and it's just barely possible that there is a subtle difference - I may have done some subtle cleanup of tabs before running it on the CL). There should be no difference, but I have seen a difference before (in one rare case).

Try running from command line.

miwie
11-19-2010, 10:00 AM
I used "ebook-convert" from the command line.

Starson17
11-19-2010, 10:50 AM
I used "ebook-convert" from the command line.

That's as I suspected. I took a brief look at this 2 days ago. Initially I ran the recipe and was surprised when it completed successfully. I noticed it was fairly small, but I don't speak German and wasn't sure if it was supposed to contain more.

Then, when it was reported that it had changed and used to have more content, I looked at it again. As usual, I put it into my test recipe area and was very surprised when it worked perfectly to pull the article content. Initially I thought I might have changed it to make it start working.

(I suspect that's what happened with the changes by -Thomas-. Changes are made, tests run on the CL, and then they fail on the GUI)

I intended to go back to it, which I did briefly this morning. Again, it worked correctly from the command line. I then ran it from the GUI and it failed. I then saw your report that it worked. Would you kindly run it from the GUI and report success/failure?

As I said, I've seen this only once before (and even there I wasn't 100% certain that's what we were seeing)- where the CL works differently from the GUI. If this can be verified, it's a bug and we can track it down. I don't have access to a test machine at the moment

EeeGrill
11-19-2010, 10:57 AM
I just tried "ebook-convert" from the command line and now I have a complete epub.
The GUI method seems to work in a different way.

-Thomas-
11-19-2010, 05:37 PM
I used both the CLI and GUI to test the changes, and both work. I'm using Calibre 0.7.28 (compiled from source) on a Debian Linux 5.0 machine.

Maybe there is a connection between the charset option I included and my Linux machine, as Debian 5.0 is UTF-8 based while Windows uses some other encoding.

Artemis_A
12-06-2010, 04:58 PM
- language = 'de'
+ lang = 'de'



That worked for me. I' m glad to have DIE ZEIT back :-)
Thank you!!

Pinguin
12-12-2010, 02:22 PM
The change posted by Artemis_A works for me, too. And I am very glad, too. Many thanks! :)

tommy123
12-14-2010, 06:27 AM
I still don't have the Zeit back. I only get an empty ebook sent to my Kindle with only the Topic pages without any content.

Please advice me on which log file to upload, so an experienced person could look at it.

Greetings
Thomas

kovidgoyal
12-14-2010, 12:32 PM
I fixed the builtin recipe, try it again, it should work now.

tommy123
12-17-2010, 02:57 AM
This probably means that I have to wait for the new build, right?

tommy123
12-19-2010, 02:59 PM
I have just tested "Die Zeit" with Calibre 0.7.34. It works perfectly again. Thank you very much.
Die Zeit (http://www.zeit.de/index) is one of the best weekly magazines in Germany. It comes out every Monday, but they also maintain an online editorial department.

Pinguin
12-20-2010, 01:08 PM
Yes, with calibre 0.7.34 it works excellent without any changing. Thank you! :)