Add production-ready scraper with README

2026-04-10 15:29:44 +00:00
commit 75f51121ea
1882 changed files with 350270 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,26 @@
+# GISP Registry Scraper (API-Direct)
+
+This project provides a robust, high-performance API for searching the Russian Industry portal (gisp.gov.ru). It bypasses brittle UI automation (Selenium) by interacting directly with the portal's internal REST API.
+
+## Features
+- **Fast & Reliable**: Bypasses browser rendering and DOM-scraping.
+- **Filtering**: Allows querying by registry number without default UI filter constraints (like date range).
+- **Lightweight**: No need for Selenium Grid or heavy headless browsers.
+
+## API Usage
+- **Endpoint**: `/scrape/{registry_number}`
+- **Example**: `GET /scrape/10084557`
+
+## Setup
+1. **Environment**: Ensure you have Python 3.12+.
+2. **Install Dependencies**:
+   ```bash
+   pip install fastapi uvicorn httpx
+   ```
+3. **Run the App**:
+   ```bash
+   uvicorn app.main:app --host 0.0.0.0 --port 8000
+   ```
+
+## Why API-Direct?
+The GISP portal uses a complex DevExtreme grid that is prone to race conditions and default date filters. By targeting the `/pub/prod/b/` endpoint directly, we eliminate the need for containerized browser nodes and significantly reduce scraping latency.