GISP Registry Scraper (API-Direct)

This project provides a robust, high-performance API for searching the Russian Industry portal (gisp.gov.ru). It bypasses brittle UI automation (Selenium) by interacting directly with the portal's internal REST API.

Features

  • Fast & Reliable: Bypasses browser rendering and DOM-scraping.
  • Filtering: Allows querying by registry number without default UI filter constraints (like date range).
  • Lightweight: No need for Selenium Grid or heavy headless browsers.

API Usage

  • Endpoint: /scrape/{registry_number}
  • Example: GET /scrape/10084557

Setup

  1. Environment: Ensure you have Python 3.12+.
  2. Install Dependencies:
    pip install fastapi uvicorn httpx
    
  3. Run the App:
    uvicorn app.main:app --host 0.0.0.0 --port 8000
    

Why API-Direct?

The GISP portal uses a complex DevExtreme grid that is prone to race conditions and default date filters. By targeting the /pub/prod/b/ endpoint directly, we eliminate the need for containerized browser nodes and significantly reduce scraping latency.

Description
No description provided
Readme 9.6 MiB
Languages
Python 99.9%