32e941fd57a19abdfb5d259fa8c037bed92f72c3
GISP Registry Scraper (API-Direct)
This project provides a robust, high-performance API for searching the Russian Industry portal (gisp.gov.ru). It bypasses brittle UI automation (Selenium) by interacting directly with the portal's internal REST API.
Features
- Fast & Reliable: Bypasses browser rendering and DOM-scraping.
- Filtering: Allows querying by registry number without default UI filter constraints (like date range).
- Lightweight: No need for Selenium Grid or heavy headless browsers.
API Usage
- Endpoint:
/scrape/{registry_number} - Example:
GET /scrape/10084557
Setup
- Environment: Ensure you have Python 3.12+.
- Install Dependencies:
pip install fastapi uvicorn httpx - Run the App:
uvicorn app.main:app --host 0.0.0.0 --port 8000
Why API-Direct?
The GISP portal uses a complex DevExtreme grid that is prone to race conditions and default date filters. By targeting the /pub/prod/b/ endpoint directly, we eliminate the need for containerized browser nodes and significantly reduce scraping latency.
Description
Languages
Python
99.9%