Product Cataloging and intelligence
Product comparison sites scrape data from various e-commerce websites and provide user with a one-stop pricing details page. The goal is to similarly build a product database, extract vendor pricing information and perform analytics over the data.
Main system
The main system consists of the following modules :
- Master catalogue development (Scrape)
- Product dump extraction
- Vendor dump extraction
- Price dump extraction
- Product pricing sensitivity analysis
- Product master creation
- Intelligence to avoid screen scraping blockages
- HTML GUI query interface along with the required backend.
A more sophisticated model of our system is represented out by flowchart below:
Applications and Advantages
- An aggregated view of products across e-retailers on the web with detailed comparison on various aspects of the product such as its price, vendor reliability etc can be created.
- User would be able to gauge the quality of the product by reviewing its ratings across multiple websites at one place
- Method is applicable across any sellable product such as electronic appliances, clothing etc.
- Analytics such as history of vendor rating, product rating, product price would further facilitate the purchase for the user.
Tools Used
Future Improvements
- Comparison Feature in between products.
- Location Based analytics of the vendors and developing heuristics for calculating the preference of vendors based on a multitude of factors.
- Distributing the task of scraping across multiple machines.
Authors and Contributors
This tool was made by Karan Mangla (@manglakaran), Shailendra Joshi (@pshall), and Karan Aggarwal from International Institute of Information Technology, Hyderabad as a part of Information Retrieval and Extraction course project.
Support or Contact
Having trouble with Pages? Check out our video or our presentation or raise an issue and we’ll help you sort it out.
Tags
Information Retrieval and Extraction
IIIT-H
Major Project
Analytics
Crawlers
Beautiful Soup
MongoDb