143 lines
6.4 KiB
Markdown
143 lines
6.4 KiB
Markdown
# Copilot Instructions for wohn-bot
|
|
|
|
## Project Overview
|
|
|
|
A Python-based apartment monitoring bot for Berlin's public housing portal (inberlinwohnen.de) and WG rooms (wgcompany.de). Monitors listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM) plus WGcompany, and sends Telegram notifications with optional auto-application via Playwright browser automation.
|
|
|
|
## Architecture
|
|
|
|
**Modularized structure** with the following key components:
|
|
|
|
- `main.py`: Entry point for the bot. Runs the monitoring loop and autocleaning every 48 hours.
|
|
- `handlers/`: Contains company-specific handlers for auto-apply functionality. Each handler is responsible for automating the application process for a specific housing company. Includes:
|
|
- `howoge_handler.py`
|
|
- `gewobag_handler.py`
|
|
- `degewo_handler.py`
|
|
- `gesobau_handler.py`
|
|
- `stadtundland_handler.py`
|
|
- `wbm_handler.py`
|
|
- `wgcompany_notifier.py`: Handles WGcompany listing fetching, deduplication, and notification
|
|
- `base_handler.py`: Provides shared functionality for all handlers.
|
|
- `application_handler.py`: Delegates application tasks to the appropriate handler based on the company. Enforces valid browser context.
|
|
- `telegram_bot.py`: Fully async Telegram bot handler for commands and notifications. Uses httpx for messaging.
|
|
- `autoclean_debug.py`: Deletes debug files (screenshots, HTML) older than 48 hours.
|
|
- `helper_functions/`: Contains data merge utilities for combining stats from multiple sources:
|
|
- `merge_listing_times.py`
|
|
- `merge_applications.py`
|
|
- `merge_dict_json.py`
|
|
- `merge_wgcompany_times.py`
|
|
|
|
**Data flow**: Fetch listings → Compare with `listings.json` / `wgcompany_listings.json` → Detect new → Log to CSV → Auto-apply if autopilot enabled → Save to `applications.json` → Send Telegram notification → Autoclean debug files every 48 hours.
|
|
|
|
## Key Patterns
|
|
|
|
### Company-specific handlers
|
|
Each housing company has a dedicated handler in the `handlers/` directory. When adding support for a new company:
|
|
1. Create a new handler file in `handlers/` (e.g., `newcompany_handler.py`).
|
|
2. Implement the handler by extending `BaseHandler` and overriding necessary methods.
|
|
3. Update `application_handler.py` to include the new handler in the `handlers` dictionary.
|
|
|
|
### Listing identification
|
|
Listings are hashed by `md5(key_fields)[:12]` to generate stable IDs:
|
|
- InBerlin: `md5(rooms+size+price+address)`
|
|
- WGcompany: `md5(link+price+size)`
|
|
|
|
### State management
|
|
- `state.json` - Runtime state (autopilot toggle)
|
|
- `listings.json` - Previously seen inberlinwohnen listings
|
|
- `wgcompany_listings.json` - Previously seen WGcompany listings
|
|
- `applications.json` - Application history with success/failure status, timestamps, and listing details
|
|
- `listing_times.csv` / `wgcompany_times.csv` - Time-series data for pattern analysis
|
|
- `monitor.log` - Centralized logs with rotation (RotatingFileHandler)
|
|
|
|
### Logging
|
|
All modules use centralized logging configured in `main.py`:
|
|
- `RotatingFileHandler` writes to `data/monitor.log` (max 5MB, 5 backups)
|
|
- `StreamHandler` outputs to console/Docker logs
|
|
- All handlers, notifiers, and utilities use `logging.getLogger(__name__)` for consistent logging
|
|
|
|
### Autocleaning
|
|
Debug material (screenshots, HTML files) older than 48 hours is automatically deleted by `autoclean_debug.py`, which runs every 48 hours in the main loop.
|
|
|
|
## Development
|
|
|
|
### Run locally
|
|
```bash
|
|
# Install dependencies (requires Playwright)
|
|
pip install -r requirements.txt
|
|
playwright install chromium
|
|
|
|
# Set env vars and run
|
|
export TELEGRAM_BOT_TOKEN=... TELEGRAM_CHAT_ID=...
|
|
python main.py
|
|
```
|
|
|
|
### Docker (production)
|
|
```bash
|
|
cp .env.example .env # Configure credentials
|
|
docker compose up -d
|
|
docker compose logs -f
|
|
```
|
|
|
|
### Debugging
|
|
- Screenshots saved to `data/` on application failures (`*_nobtn_*.png`)
|
|
- HTML saved to `data/debug_page.html` (inberlin) and `data/wgcompany_debug.html`
|
|
- Full logs in `data/monitor.log` with rotation
|
|
- Debug files older than 48 hours are autocleaned
|
|
|
|
## Environment Variables
|
|
|
|
Required: `TELEGRAM_BOT_TOKEN`, `TELEGRAM_CHAT_ID`
|
|
InBerlin login: `INBERLIN_EMAIL`, `INBERLIN_PASSWORD`
|
|
Form data: `FORM_ANREDE`, `FORM_VORNAME`, `FORM_NACHNAME`, `FORM_EMAIL`, `FORM_PHONE`, `FORM_STRASSE`, `FORM_HAUSNUMMER`, `FORM_PLZ`, `FORM_ORT`, `FORM_PERSONS`, `FORM_CHILDREN`, `FORM_INCOME`
|
|
WGcompany: `WGCOMPANY_ENABLED`, `WGCOMPANY_MIN_SIZE`, `WGCOMPANY_MAX_SIZE`, `WGCOMPANY_MIN_PRICE`, `WGCOMPANY_MAX_PRICE`, `WGCOMPANY_BEZIRK`
|
|
|
|
## Telegram Commands
|
|
|
|
- `/autopilot on|off` - Enable or disable automatic applications
|
|
- `/status` - Show current status and statistics (autopilot state, application counts by company)
|
|
- `/plot` - Generate and send a weekly listing-patterns plot
|
|
- `/errorrate` - Generate and send an autopilot success vs failure plot
|
|
- `/retryfailed` - Retry all failed applications
|
|
- `/resetlistings` - Reset seen listings (marks all current as failed to avoid spam)
|
|
- `/help` - Show available commands and usage information
|
|
|
|
## Common Tasks
|
|
|
|
### Fix a broken company handler
|
|
Check `data/*_nobtn_*.png` screenshots and `data/debug_page.html` to see actual page structure. Update selectors in the corresponding handler file in `handlers/`.
|
|
|
|
### Add Telegram command
|
|
1. Add a case in `TelegramBot._handle_update()`.
|
|
2. Implement the corresponding `_handle_{command}_command()` method.
|
|
|
|
### Modify listing extraction
|
|
- InBerlin: Update regex patterns in `InBerlinMonitor.fetch_listings()`. Test against `data/debug_page.html`.
|
|
- WGcompany: Update parsing in `WGCompanyMonitor.fetch_listings()`. Test against `data/wgcompany_debug.html`.
|
|
|
|
### Merge data from another machine
|
|
Use the helper scripts in `helper_functions/`:
|
|
- `merge_listing_times.py` - Merge listing_times.csv files
|
|
- `merge_applications.py` - Merge applications.json files
|
|
- `merge_dict_json.py` - Merge listings.json and wgcompany_listings.json
|
|
- `merge_wgcompany_times.py` - Merge wgcompany_times.csv files
|
|
|
|
All scripts deduplicate by key and timestamp, and output merged results to the current data folder.
|
|
|
|
## Unit Tests
|
|
|
|
### Overview
|
|
The project includes unit tests to ensure functionality and reliability. Key test files:
|
|
|
|
- `tests/test_telegram_bot.py`: Tests the Telegram bot's commands and messaging functionality.
|
|
- `tests/test_error_rate_plot.py`: Tests the error rate plot generator for autopilot applications.
|
|
|
|
### Running Tests
|
|
|
|
To run the tests, use:
|
|
|
|
```bash
|
|
pytest tests/
|
|
```
|
|
|
|
Ensure all dependencies are installed and the environment is configured correctly before running the tests.
|