![]() |
#1 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
request a recipe-bloomberg
Request recipe bloomberg, can anyone help, thanks a lot
|
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
tried this once.
all links redirect to.. are you a robot? solve captcha page! as javascript is disabled. If you can find a way for it to not redirect.. whole article can be loaded from raw html. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you dont want to follow the redirect, do this:
Code:
def get_browser(self, *a, **kw): br = super().get_browser(*a, **kw) br.set_handle_redirect(False) return br |
![]() |
![]() |
![]() |
#4 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
I had to use vpn to test this.
don't try it twice in a row. Code:
Traceback (most recent call last): File "calibre\web\fetch\simple.py", line 275, in fetch_url File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 313, in _mech_open mechanize._response.get_seek_wrapper_class.<locals>.httperror_seek_wrapper: HTTP Error 307: s2s_high_score During handling of the above exception, another exception occurred: Traceback (most recent call last): File "calibre\web\fetch\simple.py", line 533, in process_links File "calibre\web\fetch\simple.py", line 280, in fetch_url calibre.web.fetch.simple.FetchError: Temporary Redirect But if it works all articles will load. Last edited by unkn0wn; 10-25-2022 at 12:43 PM. |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
i found this google rss feed.. but it needs to redirect from google link to bloomberg but not from bloomberg to captha page! how can I do this!
https://news.google.com/rss/search?q...=US&ceid=US:en |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can implement get_obfuscated_article() to get them manually, something like this
Code:
articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() try: br.open(url) except Exception as e: url = e.hdrs.get('location') html = br.open(url).read() |
![]() |
![]() |
![]() |
#7 | |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
Quote:
Pursuits Last Thing The above three parts of the crawl failed, but nevertheless, it has been very good, thank you very much |
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
Thanks.
Code:
Traceback (most recent call last): File "calibre\web\fetch\simple.py", line 275, in fetch_url File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 313, in _mech_open mechanize._response.get_seek_wrapper_class.<locals>.httperror_seek_wrapper: HTTP Error 307: s2s_high_score During handling of the above exception, another exception occurred: Traceback (most recent call last): File "calibre\web\fetch\simple.py", line 533, in process_links File "calibre\web\fetch\simple.py", line 280, in fetch_url calibre.web.fetch.simple.FetchError: Temporary Redirect I was able to fetch the whole recipe.. but some other times not all articles load. Last edited by unkn0wn; 10-26-2022 at 02:37 AM. |
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will be bot protection, you can add the delay field to the recipe so it only sends one request every delay seconds. Experiment a bit and see if a delay of 1 or 2 does the trick.
|
![]() |
![]() |
![]() |
#10 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
added delay and changed somethings.. was able to download both the recipes completely.
|
![]() |
![]() |
![]() |
#11 | |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
Quote:
calibre, version 6.7.1 (win32, embedded-python: True) Bloomberg Businessweek Bloomberg Businessweek Conversion options changed from defaults: verbose: 2 output_profile: 'generic_eink' Resolved conversion options calibre version: 6.7.1 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'epub_version': '2', 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x000001D2F850E7A0>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.GenericEink object at 0x000001D2F850F130>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'transform_css_rules': None, 'transform_html_rules': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Downloading recipe urn: custom:1002 Traceback (most recent call last): File "runpy.py", line 196, in _run_module_as_main File "runpy.py", line 86, in _run_code File "site.py", line 82, in <module> File "site.py", line 77, in main File "site.py", line 49, in run_entry_point File "calibre\utils\ipc\worker.py", line 215, in main File "calibre\gui2\convert\gui_conversion.py", line 31, in gui_convert_recipe File "calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "calibre\ebooks\conversion\plumber.py", line 1108, in run File "calibre\customize\conversion.py", line 242, in __call__ File "calibre\ebooks\conversion\plugins\recipe_input.py ", line 138, in convert File "calibre\web\feeds\news.py", line 1058, in download File "calibre\web\feeds\news.py", line 1227, in build_index File "<string>", line 31, in parse_index File "calibre\web\feeds\news.py", line 707, in index_to_soup File "mechanize\_mechanize.py", line 241, in open_novisit File "mechanize\_mechanize.py", line 313, in _mech_open mechanize._response.get_seek_wrapper_class.<locals >.httperror_seek_wrapper: HTTP Error 307: s2s_high_score Using proxies: {'http': '127.0.0.1:7890', 'https': '127.0.0.1:7890', 'ftp': '127.0.0.1:7890'} |
|
![]() |
![]() |
![]() |
#12 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
that maybe cause your ip was already flagged yesterday. Open bloomberg on browser & verify and then try or give it a gap of 2 or 3 days.
I was able to load both recipes one after another, yesterday and today, from the same ip. maybe increase delay to 3 seconds. Last edited by unkn0wn; 10-28-2022 at 04:30 AM. |
![]() |
![]() |
![]() |
#13 |
want to learn what I want
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,611
Karma: 7891011
Join Date: Sep 2020
Device: none
|
the bberg-businessweek recipe worked for me, it's pretty cool (24 articles fetched), all images included.
the other one returns: <urlopen error [Errno 11001] getaddrinfo failed> I used the recipes from the latest source, not sure if they're the same as those attached in post #10 |
![]() |
![]() |
![]() |
#14 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
retry! check internet access.
|
![]() |
![]() |
![]() |
#15 |
Connoisseur
![]() Posts: 97
Karma: 10
Join Date: Aug 2022
Device: PC
|
Success:
Bloomberg.recipe (3.6 KB) but Failure(Tried many times): Bloomberg Businessweek.recipe (4.7 KB) |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bloomberg Recipe - Only Renders First Article | papermadeblues | Recipes | 2 | 02-04-2023 01:00 AM |
Request - Bloomberg.com Recipe | SunLight | Recipes | 5 | 10-07-2015 09:02 PM |
Recipe request for bloomberg.com | djdag | Recipes | 0 | 06-24-2011 02:14 PM |
Recipe request please | aessedai44 | Recipes | 2 | 10-06-2010 01:07 AM |
Request for recipe | exdream | Calibre | 3 | 04-24-2010 10:13 AM |