Originally Posted by gambarini
Is there the option to add one or more lines (like the signature of the article, when the signature is a gif and it is into a table (td) withouth tag) to the downloaded article?
I'm not 100% certain what you are asking. Preprocess_html or postprocess_html will let you add anything you want. You can add tags to the html with any content, including images. On your question about the table, are you asking how to put things into a table, or how to extract it from a table? Generally, both are possible with BeautifulSoup.
some newspaper give the opportunity to read the entire newspaper in various format (a jpg for every page, or a single pdf file for every page) directly in the browser. Is there the possibility to download these files? i
Now i use the first jpg (pdf) for the cover image, so i am able to find the correct page and the correct date, but it is only initial page, and with a fixed resolution.
At least this is a good option to obtain an overall image of all the newspaper, though it is not give a comfortable reading.
Are you asking how to split up pdfs to get images found on pages 2 and beyond, or how to use content you already have access to?