CopyPastor

Detecting plagiarism made easy.

Score: 1; Reported for: Exact paragraph match Open both answers

Possible Plagiarism

Reposted on 2023-05-25
by Denis Skopa

Original Post

Original - Posted on 2022-10-20
by Denis Skopa



            
Present in both answers; Present only in the new answer; Present only in the old answer;

You can control a number of results with [`_ipg` parameter](https://serpapi.com/ebay-search-api#api-parameters-pagination--ipg) which has a 200 listings limit.
Additionally, you can collect data from all pages of a website, regardless of the number of pages using `while` loop with [non-token based pagination](https://python.plainenglish.io/pagination-techniques-to-scrape-data-from-any-website-in-python-779cd32bd514#5a76).
Check code with pagination in the [online IDE](https://replit.com/@denisskopa/scrape-ebay-iphone-pagination-bs4#main.py). ```python from bs4 import BeautifulSoup import requests, json, lxml import pandas as pd
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers # https://www.whatismybrowser.com/detect/what-is-my-user-agent/ headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36", } params = { "_nkw": "iphone 8 plus", # search query "_pgn": 1, # page number "_ipg": 200 # 200 listings per page }
data = [] limit = 5 # page limit (if needed)
# pagination while True: page = requests.get("https://www.ebay.com/sch/i.html", params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, "lxml") for products in soup.select(".s-item__info"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text data.append({ "title" : title, "price" : price }) # exit on the specified page limit if params['_pgn'] == limit: break
# exit if there is no "next page" button on the page if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break
# add data to CSV file pd.DataFrame(data=data).to_csv( 'listings.csv', # name of the CSV file index=False # remove pandas row index (numeration) ) ``` Output: file 'listings.csv' will be created.
____
Also, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It's a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination: ```python from serpapi import EbaySearch import json import pandas as pd
params = { "api_key": "...", # serpapi key, https://serpapi.com/manage-api-key "engine": "ebay", # search engine "ebay_domain": "ebay.com", # ebay domain "_nkw": "iphone 8 plus", # search query "_pgn": 1 # page number }
search = EbaySearch(params) # where data extraction happens limit = 5 page_num = 0 data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): title = organic_result.get("title") price = organic_result.get("price")
data.append({ "title" : title, "price" : price }) page_num += 1 print(page_num)
if params['_pgn'] == limit: break if "next" in results.get("pagination", {}): params['_pgn'] += 1 else: break
pd.DataFrame(data=data).to_csv( 'listings.csv', # name of the CSV file index=False # remove pandas row index (numeration) ) ``` Output: Output: file 'listings.csv' will be created.
It is not necessary to use `Selenium` for eBay scraping, as the data is not rendered by JavaScript thus can be extracted from plain HTML. It is enough to use [`BeautifulSoup`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) web scraping library.
Keep in mind that problems with site parsing may arise when you try to request a site multiple times. eBay may consider that this is a bot that sends a request (not a real user).
To avoid this, one of the ways is to send `headers` that contain [user-agent in the request](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#user-agent), then the site will assume that you're a user and display information.
As an [additional step is to rotate those user-agents](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#rotate-user-agents). The ideal scenario is to use [proxies](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#proxies) in combo with rotated user-agents (besides CAPTCHA solver)
```python from bs4 import BeautifulSoup import requests, json, lxml
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" } params = { '_nkw': 'oakley+sunglasses', # search query 'LH_Sold': '1', # shows sold items '_pgn': 1 # page number }
data = []
while True: page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, 'lxml') print(f"Extracting page: {params['_pgn']}")
print("-" * 10) for products in soup.select(".s-item__info"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text link = products.select_one(".s-item__link")["href"] data.append({ "title" : title, "price" : price, "link" : link })
if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break
print(json.dumps(data, indent=2, ensure_ascii=False) ```
Example output
```json Extracting page: 1 ---------- [ { "title": "Shop on eBay", "price": "$20.00", "link": "https://ebay.com/itm/123456?hash=item28caef0a3a:g:E3kAAOSwlGJiMikD&amdata=enc%3AAQAHAAAAsJoWXGf0hxNZspTmhb8%2FTJCCurAWCHuXJ2Xi3S9cwXL6BX04zSEiVaDMCvsUbApftgXEAHGJU1ZGugZO%2FnW1U7Gb6vgoL%2BmXlqCbLkwoZfF3AUAK8YvJ5B4%2BnhFA7ID4dxpYs4jjExEnN5SR2g1mQe7QtLkmGt%2FZ%2FbH2W62cXPuKbf550ExbnBPO2QJyZTXYCuw5KVkMdFMDuoB4p3FwJKcSPzez5kyQyVjyiIq6PB2q%7Ctkp%3ABlBMULq7kqyXYA" }, { "title": "Oakley X-metal Juliet Men's Sunglasses", "price": "$280.00", "link": "https://www.ebay.com/itm/265930582326?hash=item3deab2a936:g:t8gAAOSwMNhjRUuB&amdata=enc%3AAQAHAAAAoH76tlPncyxembf4SBvTKma1pJ4vg6QbKr21OxkL7NXZ5kAr7UvYLl2VoCPRA8KTqOumC%2Bl5RsaIpJgN2o2OlI7vfEclGr5Jc2zyO0JkAZ2Gftd7a4s11rVSnktOieITkfiM3JLXJM6QNTvokLclO6jnS%2FectMhVc91CSgZQ7rc%2BFGDjXhGyqq8A%2FoEyw4x1Bwl2sP0viGyBAL81D2LfE8E%3D%7Ctkp%3ABk9SR8yw1LH9YA" }, { "title": " Used Oakley PROBATION Sunglasses Polished Gold/Dark Grey (OO4041-03)", "price": "$120.00", "link": "https://www.ebay.com/itm/334596701765?hash=item4de7847e45:g:d5UAAOSw4YtjTfEE&amdata=enc%3AAQAHAAAAoItMbbzfQ74gNUiinmOVnzKlPWE%2Fc54B%2BS1%2BrZpy6vm5lB%2Bhvm5H43UFR0zeCU0Up6sPU2Wl6O6WR0x9FPv5Y1wYKTeUbpct5vFKu8OKFBLRT7Umt0yxmtLLMWaVlgKf7StwtK6lQ961Y33rf3YuQyp7MG7H%2Fa9fwSflpbJnE4A9rLqvf3hccR9tlWzKLMj9ZKbGxWT17%2BjyUp19XIvX2ZI%3D%7Ctkp%3ABk9SR8yw1LH9YA" }, ```
____
As an alternative, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It`s a paid API with a free plan that handles blocks and parsing on their backend.
Example code that paginates through all pages: ```python from serpapi import EbaySearch import os, json
params = { "api_key": os.getenv("API_KEY"), # serpapi api key "engine": "ebay", # search engine "ebay_domain": "ebay.com", # ebay domain "_nkw": "oakley+sunglasses", # search query "_pgn": 1, # page number "LH_Sold": "1" # shows sold items }
search = EbaySearch(params) # where data extraction happens
page_num = 0
data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): link = organic_result.get("link") price = organic_result.get("price")
data.append({ "price" : price, "link" : link }) page_num += 1 print(page_num) if "next" in results.get("pagination", {}): params['_pgn'] += 1
else: break
print(json.dumps(data, indent=2)) ``` Output: ```json [ { "price": { "raw": "$68.96", "extracted": 68.96 }, "link": "https://www.ebay.com/itm/125360598217?epid=20030526224&hash=item1d3012ecc9:g:478AAOSwCt5iqgG5&amdata=enc%3AAQAHAAAA4Ls3N%2FEH5OR6w3uoTlsxUlEsl0J%2B1aYmOoV6qsUxRO1d1w3twg6LrBbUl%2FCrSTxNOjnDgIh8DSI67n%2BJe%2F8c3GMUrIFpJ5lofIRdEmchFDmsd2I3tnbJEqZjIkWX6wXMnNbPiBEM8%2FML4ljppkSl4yfUZSV%2BYXTffSlCItT%2B7ZhM1fDttRxq5MffSRBAhuaG0tA7Dh69ZPxV8%2Bu1HuM0jDQjjC4g17I3Bjg6J3daC4ZuK%2FNNFlCLHv97w2fW8tMaPl8vANMw8OUJa5z2Eclh99WUBvAyAuy10uEtB3NDwiMV%7Ctkp%3ABk9SR5DKgLD9YA" }, { "price": { "raw": "$62.95", "extracted": 62.95 }, "link": "https://www.ebay.com/itm/125368283608?epid=1567457519&hash=item1d308831d8:g:rnsAAOSw7PJiqMQz&amdata=enc%3AAQAHAAAA4AwZhKJZfTqrG8VskZL8rtfsuNtZrMdWYpndpFs%2FhfrIOV%2FAjLuzNzaMNIvTa%2B6QUTdkOwTLRun8n43cZizqtOulsoBLQIwy3wf19N0sHxGF5HaIDOBeW%2B2sobRnzGdX%2Fsmgz1PRiKFZi%2BUxaLQpWCoGBf9n8mjcsFXi3esxbmAZ8kenO%2BARbRBzA2Honzaleb2tyH5Tf8%2Bs%2Fm5goqbon%2FcEsR0URO7BROkBUUjDCdDH6fFi99m6anNMMC3yTBpzypaFWio0u2qu5TgjABUfO1wzxb4ofA56BNKjoxttb7E%2F%7Ctkp%3ABk9SR5DKgLD9YA" }, # ... ] ```
> Disclaimer, I work for SerpApi.

        
Present in both answers; Present only in the new answer; Present only in the old answer;