CopyPastor

Detecting plagiarism made easy.

Score: 1; Reported for: Exact paragraph match Open both answers

Possible Plagiarism

Reposted on 2023-05-25
by Denis Skopa

Original Post

Original - Posted on 2023-01-21
by Denis Skopa



            
Present in both answers; Present only in the new answer; Present only in the old answer;

You can control a number of results with [`_ipg` parameter](https://serpapi.com/ebay-search-api#api-parameters-pagination--ipg) which has a 200 listings limit.
Additionally, you can collect data from all pages of a website, regardless of the number of pages using `while` loop with [non-token based pagination](https://python.plainenglish.io/pagination-techniques-to-scrape-data-from-any-website-in-python-779cd32bd514#5a76).
Check code with pagination in the [online IDE](https://replit.com/@denisskopa/scrape-ebay-iphone-pagination-bs4#main.py). ```python from bs4 import BeautifulSoup import requests, json, lxml import pandas as pd
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers # https://www.whatismybrowser.com/detect/what-is-my-user-agent/ headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36", } params = { "_nkw": "iphone 8 plus", # search query "_pgn": 1, # page number "_ipg": 200 # 200 listings per page }
data = [] limit = 5 # page limit (if needed)
# pagination while True: page = requests.get("https://www.ebay.com/sch/i.html", params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, "lxml") for products in soup.select(".s-item__info"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text data.append({ "title" : title, "price" : price }) # exit on the specified page limit if params['_pgn'] == limit: break
# exit if there is no "next page" button on the page if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break
# add data to CSV file pd.DataFrame(data=data).to_csv( 'listings.csv', # name of the CSV file index=False # remove pandas row index (numeration) ) ``` Output: file 'listings.csv' will be created.
____
Also, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It's a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination: ```python from serpapi import EbaySearch import json import pandas as pd
params = { "api_key": "...", # serpapi key, https://serpapi.com/manage-api-key "engine": "ebay", # search engine "ebay_domain": "ebay.com", # ebay domain "_nkw": "iphone 8 plus", # search query "_pgn": 1 # page number }
search = EbaySearch(params) # where data extraction happens limit = 5 page_num = 0 data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): title = organic_result.get("title") price = organic_result.get("price")
data.append({ "title" : title, "price" : price }) page_num += 1 print(page_num)
if params['_pgn'] == limit: break if "next" in results.get("pagination", {}): params['_pgn'] += 1 else: break
pd.DataFrame(data=data).to_csv( 'listings.csv', # name of the CSV file index=False # remove pandas row index (numeration) ) ``` Output: Output: file 'listings.csv' will be created.
The response may be empty because the `requests` request may be blocked, since the default `user-agent` in the `requests` library is [`python-requests`](https://github.com/psf/requests/blob/89c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814) to tell the website that it is a bot or script that is sending the request. [Check what user agent you have](https://www.whatismybrowser.com/detect/what-is-my-user-agent/).
An additional step besides providing browser user-agent could be to [rotate `user-agent`](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#rotate-user-agents), for example, to switch between PC, mobile, and tablet, as well as between browsers e.g. Chrome, Firefox, Safari, Edge and so on.
It is also possible to fetch [all results from all pages using pagination](https://dimitryzub.medium.com/pagination-techniques-to-scrape-data-from-any-website-in-python-779cd32bd514#5a11), the solution to this would be to use an infinite `while` loop and test for something (button, element) that will cause it to exit.
In our case, this is the presence of a button on the page (`.pagination__next` selector).
Check code in [online IDE](https://replit.com/@denisskopa/scrape-ebay-sold-item-csv-bs4#main.py). ```python from bs4 import BeautifulSoup import requests, lxml import pandas as pd
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" } params = { '_nkw': 'oakley+sunglasses', # search query 'LH_Sold': '1', # shows sold items '_pgn': 1 # page number }
data = []
while True: page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, 'lxml') print(f"Extracting page: {params['_pgn']}")
print("-" * 10) for products in soup.select(".s-item__pl-on-bottom"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text try: sold_date = products.select_one(".s-item__title--tagblock .POSITIVE").text except: sold_date = None data.append({ "title" : title, "price" : price, "sold_date": sold_date })
if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break # save to CSV (install, import pandas as pd) pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False) ``` Output: file is created: "ebay_products.csv"
____
As an alternative, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It's a paid API with a free plan that handles blocks and parsing on their backend.
Example code: ```python from serpapi import EbaySearch import os import pandas as pd
params = { "api_key": os.getenv("API_KEY"), # serpapi key, https://serpapi.com/manage-api-key "engine": "ebay", # search engine "ebay_domain": "ebay.com", # ebay domain "_nkw": "oakley+sunglasses", # search query "LH_Sold": "1" # shows sold items }
search = EbaySearch(params) # where data extraction happens
page_num = 0
data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): title = organic_result.get("title") price = organic_result.get("price")
data.append({ "title" : title, "price" : price }) page_num += 1 print(page_num) if "next" in results.get("pagination", {}): params['_pgn'] += 1
else: break pd.DataFrame(data=data).to_csv("ebay_products.csv", index=False) ``` Output: file is created: "ebay_products.csv"
There's a [13 ways to scrape any public data from any website](https://serpapi.com/blog/13-ways-to-scrape-any-data-from-any-website/) blog post if you want to know more about website scraping.

        
Present in both answers; Present only in the new answer; Present only in the old answer;