Amazon posting is less sensitive to URL changes
Amazon has a lot of extra information in their URLs that do not make then good unique identifiers. However, each URL contains an ASIN which is unique. Uniqueness is now determined by ASIN rather than URL, and URLs are now reconstructed from the ASIN. This decreases the bot thinking that it saw a new item when it really just saw a previously seen item with a new URL.
Target should stop alerting on non-Funko products
An improvement to the regex for scraping Target’s site should decrease the amount of non-Funko products being found. However, this RegExr decreases performance and needs to be improved.
Target and Amazon will now be more reliably scraped
A change to the get request to include more relevant headers will more reliably trick the server into thinking we are a legit request and return thefull HTML more reliably.
No comments:
Post a Comment