Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bing search is broken #171

Open
bentsi opened this issue Jul 13, 2022 · 2 comments
Open

Bing search is broken #171

bentsi opened this issue Jul 13, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@bentsi
Copy link
Contributor

bentsi commented Jul 13, 2022

Describe the bug
Running simple code (based on the Readme)

getting:

ENGINE FAILURE: Bing
Traceback (most recent call last):
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/bentsi/devel/continueai/backend/src/scraping/search_engine_query.py", line 17, in <module>
    bresults = bsearch.search(**search_args)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 288, in search
    return self.get_results(soup, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 247, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic

after digging into the root cause I found following:

  1. http request to Bing returns response with HTML without results
    image
  2. after adding a cookie that Google Chrome adds to GET headers, the code starts working
    image

So the solution is to add cookie data, but I am not sure what exactly should be added, since cookie looks sophisticated.

To Reproduce

from search_engine_parser.core.engines.bing import Search as BingSearch
company_name = "samsung electronics corp official website"

search_args = {"query": company_name, "page": 1}
bsearch = BingSearch()
bsearch.clear_cache()
bresults = bsearch.search(**search_args)

Expected behavior
Search returns results
Screenshots

Desktop (please complete the following information):

  • OS: Ubuntu 20.04
  • Python Version: 3.10.5
  • Search-engine-parser version: 0.6.6
@bentsi bentsi added the bug Something isn't working label Jul 13, 2022
@bentsi
Copy link
Contributor Author

bentsi commented Jul 14, 2022

succeeded to find the correct cookie, but now getting results parsing issue:

Traceback (most recent call last):
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 252, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/engines/bing.py", line 68, in parse_single_result
    rdict["descriptions"] = desc.text
AttributeError: 'NoneType' object has no attribute 'text'

will work on a fix

bentsi pushed a commit to bentsi/search-engine-parser that referenced this issue Jul 14, 2022
bentsi pushed a commit to bentsi/search-engine-parser that referenced this issue Jul 14, 2022
@deven96
Copy link
Member

deven96 commented Jul 23, 2022

Thanks for the detailed investigation and working on a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants