CopyPastor

Detecting plagiarism made easy.

Score: 1; Reported for: Exact paragraph match Open both answers

Possible Plagiarism

Reposted on 2023-05-25
by Denis Skopa

Original Post

Original - Posted on 2022-10-20
by Denis Skopa



            
Present in both answers; Present only in the new answer; Present only in the old answer;

It can also be due to the wrong choice of container class or elements in the container.
In order to quickly find the necessary object on the page, you can use the search by CSS selectors using the [SelectorGadget](https://serpapi.com/blog/13-ways-to-scrape-any-data-from-any-website/#selectorgadgetchromeextension) extension for Chrome, which does not always work perfectly if the page is heavily using JS ( in this case we can).
Also, if you want to get listings from all pages, you can do it by using a `while` loop to dynamically paginate all possible pages using the [non-token pagination technique](https://python.plainenglish.io/pagination-techniques-to-scrape-data-from-any-website-in-python-779cd32bd514#5a76).
Code example with pagination in the [online IDE](https://replit.com/@denisskopa/scrape-ebay-logitech-pagination-bs4#main.py). ```python from bs4 import BeautifulSoup import requests, json, lxml

# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36", } params = { "_nkw": "logitech", # search query "_pgn": 1, # page number "LH_Auction": "1" # auction items # "LH_Sold": "1" # sold items }
data = [] limit = 5 # page limit (if needed)
# pagination while True: page = requests.get("https://www.ebay.co.uk/sch/i.html", params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, "lxml") for products in soup.select(".s-item__info"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text data.append({ "title" : title, "price" : price }) # exit on the specified page limit if params['_pgn'] == limit: break
# exit if there is no "next page" button on the page if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break
print(json.dumps(data, indent=2, ensure_ascii=False)) ``` Example output: ```json [ { "title": "Logitech PC Flight Yoke System", "price": "£49.00" }, { "title": "Logitech G213 UK Layout RGB Gaming Keyboard New", "price": "£4.00" }, other results ... ] ```
____
As an alternative, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It's a paid API with a free plan that handles blocks and parsing on their backend.
Example code with pagination: ```python from serpapi import EbaySearch import json

params = { "api_key": "...", # serpapi key, https://serpapi.com/manage-api-key "engine": "ebay", # search engine "ebay_domain": "ebay.co.uk", # ebay domain "_nkw": "logitech", # search query "_pgn": 1 # page number }
search = EbaySearch(params) # where data extraction happens limit = 5 page_num = 0 data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): title = organic_result.get("title") price = organic_result.get("price")
data.append({ "title" : title, "price" : price }) page_num += 1 print(page_num)
if params['_pgn'] == limit: break if "next" in results.get("pagination", {}): params['_pgn'] += 1 else: break print(json.dumps(data, indent=2, ensure_ascii=False)) ``` Output: ```json [ { "title": "Logitech Speakers Z506", "price": { "raw": "£40.00", "extracted": 40.0 } }, { "title": "Logitech C615 1080p Full HD Portable USB Webcam Black C Grade", "price": { "raw": "£22.99", "extracted": 22.99 } }, other results ... ] ``` There's a [13 ways to scrape any public data from any website](https://serpapi.com/blog/13-ways-to-scrape-any-data-from-any-website/) blog post if you want to know more about website scraping.
It is not necessary to use `Selenium` for eBay scraping, as the data is not rendered by JavaScript thus can be extracted from plain HTML. It is enough to use [`BeautifulSoup`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) web scraping library.
Keep in mind that problems with site parsing may arise when you try to request a site multiple times. eBay may consider that this is a bot that sends a request (not a real user).
To avoid this, one of the ways is to send `headers` that contain [user-agent in the request](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#user-agent), then the site will assume that you're a user and display information.
As an [additional step is to rotate those user-agents](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#rotate-user-agents). The ideal scenario is to use [proxies](https://serpapi.com/blog/how-to-reduce-chance-of-being-blocked-while-web/#proxies) in combo with rotated user-agents (besides CAPTCHA solver)
```python from bs4 import BeautifulSoup import requests, json, lxml
# https://requests.readthedocs.io/en/latest/user/quickstart/#custom-headers headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" } params = { '_nkw': 'oakley+sunglasses', # search query 'LH_Sold': '1', # shows sold items '_pgn': 1 # page number }
data = []
while True: page = requests.get('https://www.ebay.com/sch/i.html', params=params, headers=headers, timeout=30) soup = BeautifulSoup(page.text, 'lxml') print(f"Extracting page: {params['_pgn']}")
print("-" * 10) for products in soup.select(".s-item__info"): title = products.select_one(".s-item__title span").text price = products.select_one(".s-item__price").text link = products.select_one(".s-item__link")["href"] data.append({ "title" : title, "price" : price, "link" : link })
if soup.select_one(".pagination__next"): params['_pgn'] += 1 else: break
print(json.dumps(data, indent=2, ensure_ascii=False) ```
Example output
```json Extracting page: 1 ---------- [ { "title": "Shop on eBay", "price": "$20.00", "link": "https://ebay.com/itm/123456?hash=item28caef0a3a:g:E3kAAOSwlGJiMikD&amdata=enc%3AAQAHAAAAsJoWXGf0hxNZspTmhb8%2FTJCCurAWCHuXJ2Xi3S9cwXL6BX04zSEiVaDMCvsUbApftgXEAHGJU1ZGugZO%2FnW1U7Gb6vgoL%2BmXlqCbLkwoZfF3AUAK8YvJ5B4%2BnhFA7ID4dxpYs4jjExEnN5SR2g1mQe7QtLkmGt%2FZ%2FbH2W62cXPuKbf550ExbnBPO2QJyZTXYCuw5KVkMdFMDuoB4p3FwJKcSPzez5kyQyVjyiIq6PB2q%7Ctkp%3ABlBMULq7kqyXYA" }, { "title": "Oakley X-metal Juliet Men's Sunglasses", "price": "$280.00", "link": "https://www.ebay.com/itm/265930582326?hash=item3deab2a936:g:t8gAAOSwMNhjRUuB&amdata=enc%3AAQAHAAAAoH76tlPncyxembf4SBvTKma1pJ4vg6QbKr21OxkL7NXZ5kAr7UvYLl2VoCPRA8KTqOumC%2Bl5RsaIpJgN2o2OlI7vfEclGr5Jc2zyO0JkAZ2Gftd7a4s11rVSnktOieITkfiM3JLXJM6QNTvokLclO6jnS%2FectMhVc91CSgZQ7rc%2BFGDjXhGyqq8A%2FoEyw4x1Bwl2sP0viGyBAL81D2LfE8E%3D%7Ctkp%3ABk9SR8yw1LH9YA" }, { "title": " Used Oakley PROBATION Sunglasses Polished Gold/Dark Grey (OO4041-03)", "price": "$120.00", "link": "https://www.ebay.com/itm/334596701765?hash=item4de7847e45:g:d5UAAOSw4YtjTfEE&amdata=enc%3AAQAHAAAAoItMbbzfQ74gNUiinmOVnzKlPWE%2Fc54B%2BS1%2BrZpy6vm5lB%2Bhvm5H43UFR0zeCU0Up6sPU2Wl6O6WR0x9FPv5Y1wYKTeUbpct5vFKu8OKFBLRT7Umt0yxmtLLMWaVlgKf7StwtK6lQ961Y33rf3YuQyp7MG7H%2Fa9fwSflpbJnE4A9rLqvf3hccR9tlWzKLMj9ZKbGxWT17%2BjyUp19XIvX2ZI%3D%7Ctkp%3ABk9SR8yw1LH9YA" }, ```
____
As an alternative, you can use [Ebay Organic Results API](https://serpapi.com/ebay-organic-results) from SerpApi. It`s a paid API with a free plan that handles blocks and parsing on their backend.
Example code that paginates through all pages: ```python from serpapi import EbaySearch import os, json
params = { "api_key": os.getenv("API_KEY"), # serpapi api key "engine": "ebay", # search engine "ebay_domain": "ebay.com", # ebay domain "_nkw": "oakley+sunglasses", # search query "_pgn": 1, # page number "LH_Sold": "1" # shows sold items }
search = EbaySearch(params) # where data extraction happens
page_num = 0
data = []
while True: results = search.get_dict() # JSON -> Python dict
if "error" in results: print(results["error"]) break for organic_result in results.get("organic_results", []): link = organic_result.get("link") price = organic_result.get("price")
data.append({ "price" : price, "link" : link }) page_num += 1 print(page_num) if "next" in results.get("pagination", {}): params['_pgn'] += 1
else: break
print(json.dumps(data, indent=2)) ``` Output: ```json [ { "price": { "raw": "$68.96", "extracted": 68.96 }, "link": "https://www.ebay.com/itm/125360598217?epid=20030526224&hash=item1d3012ecc9:g:478AAOSwCt5iqgG5&amdata=enc%3AAQAHAAAA4Ls3N%2FEH5OR6w3uoTlsxUlEsl0J%2B1aYmOoV6qsUxRO1d1w3twg6LrBbUl%2FCrSTxNOjnDgIh8DSI67n%2BJe%2F8c3GMUrIFpJ5lofIRdEmchFDmsd2I3tnbJEqZjIkWX6wXMnNbPiBEM8%2FML4ljppkSl4yfUZSV%2BYXTffSlCItT%2B7ZhM1fDttRxq5MffSRBAhuaG0tA7Dh69ZPxV8%2Bu1HuM0jDQjjC4g17I3Bjg6J3daC4ZuK%2FNNFlCLHv97w2fW8tMaPl8vANMw8OUJa5z2Eclh99WUBvAyAuy10uEtB3NDwiMV%7Ctkp%3ABk9SR5DKgLD9YA" }, { "price": { "raw": "$62.95", "extracted": 62.95 }, "link": "https://www.ebay.com/itm/125368283608?epid=1567457519&hash=item1d308831d8:g:rnsAAOSw7PJiqMQz&amdata=enc%3AAQAHAAAA4AwZhKJZfTqrG8VskZL8rtfsuNtZrMdWYpndpFs%2FhfrIOV%2FAjLuzNzaMNIvTa%2B6QUTdkOwTLRun8n43cZizqtOulsoBLQIwy3wf19N0sHxGF5HaIDOBeW%2B2sobRnzGdX%2Fsmgz1PRiKFZi%2BUxaLQpWCoGBf9n8mjcsFXi3esxbmAZ8kenO%2BARbRBzA2Honzaleb2tyH5Tf8%2Bs%2Fm5goqbon%2FcEsR0URO7BROkBUUjDCdDH6fFi99m6anNMMC3yTBpzypaFWio0u2qu5TgjABUfO1wzxb4ofA56BNKjoxttb7E%2F%7Ctkp%3ABk9SR5DKgLD9YA" }, # ... ] ```
> Disclaimer, I work for SerpApi.

        
Present in both answers; Present only in the new answer; Present only in the old answer;