This commit is contained in:
Aron Petau 2026-01-01 15:27:25 +01:00
parent d596ed7e19
commit aa6626d80d
21 changed files with 1051 additions and 333 deletions

View file

@ -8,7 +8,7 @@ A Python-based apartment monitoring bot for Berlin's public housing portal (inbe
**Modularized structure** with the following key components:
- `main.py`: Entry point for the bot.
- `main.py`: Entry point for the bot. Runs the monitoring loop and autocleaning every 48 hours.
- `handlers/`: Contains company-specific handlers for auto-apply functionality. Each handler is responsible for automating the application process for a specific housing company. Includes:
- `howoge_handler.py`
- `gewobag_handler.py`
@ -16,11 +16,18 @@ A Python-based apartment monitoring bot for Berlin's public housing portal (inbe
- `gesobau_handler.py`
- `stadtundland_handler.py`
- `wbm_handler.py`
- `wgcompany_notifier.py`: Handles WGcompany listing fetching, deduplication, and notification
- `base_handler.py`: Provides shared functionality for all handlers.
- `application_handler.py`: Delegates application tasks to the appropriate handler based on the company.
- `telegram_bot.py`: Handles Telegram bot commands and notifications.
- `application_handler.py`: Delegates application tasks to the appropriate handler based on the company. Enforces valid browser context.
- `telegram_bot.py`: Fully async Telegram bot handler for commands and notifications. Uses httpx for messaging.
- `autoclean_debug.py`: Deletes debug files (screenshots, HTML) older than 48 hours.
- `helper_functions/`: Contains data merge utilities for combining stats from multiple sources:
- `merge_listing_times.py`
- `merge_applications.py`
- `merge_dict_json.py`
- `merge_wgcompany_times.py`
**Data flow**: Fetch listings → Compare with `listings.json` / `wgcompany_listings.json` → Detect new → Log to CSV → Auto-apply if autopilot enabled → Save to `applications.json` → Send Telegram notification.
**Data flow**: Fetch listings → Compare with `listings.json` / `wgcompany_listings.json` → Detect new → Log to CSV → Auto-apply if autopilot enabled → Save to `applications.json` → Send Telegram notification → Autoclean debug files every 48 hours.
## Key Patterns
@ -39,8 +46,18 @@ Listings are hashed by `md5(key_fields)[:12]` to generate stable IDs:
- `state.json` - Runtime state (autopilot toggle)
- `listings.json` - Previously seen inberlinwohnen listings
- `wgcompany_listings.json` - Previously seen WGcompany listings
- `applications.json` - Application history with success/failure status
- `applications.json` - Application history with success/failure status, timestamps, and listing details
- `listing_times.csv` / `wgcompany_times.csv` - Time-series data for pattern analysis
- `monitor.log` - Centralized logs with rotation (RotatingFileHandler)
### Logging
All modules use centralized logging configured in `main.py`:
- `RotatingFileHandler` writes to `data/monitor.log` (max 5MB, 5 backups)
- `StreamHandler` outputs to console/Docker logs
- All handlers, notifiers, and utilities use `logging.getLogger(__name__)` for consistent logging
### Autocleaning
Debug material (screenshots, HTML files) older than 48 hours is automatically deleted by `autoclean_debug.py`, which runs every 48 hours in the main loop.
## Development
@ -65,7 +82,8 @@ docker compose logs -f
### Debugging
- Screenshots saved to `data/` on application failures (`*_nobtn_*.png`)
- HTML saved to `data/debug_page.html` (inberlin) and `data/wgcompany_debug.html`
- Full logs in `data/monitor.log`
- Full logs in `data/monitor.log` with rotation
- Debug files older than 48 hours are autocleaned
## Environment Variables
@ -74,6 +92,16 @@ InBerlin login: `INBERLIN_EMAIL`, `INBERLIN_PASSWORD`
Form data: `FORM_ANREDE`, `FORM_VORNAME`, `FORM_NACHNAME`, `FORM_EMAIL`, `FORM_PHONE`, `FORM_STRASSE`, `FORM_HAUSNUMMER`, `FORM_PLZ`, `FORM_ORT`, `FORM_PERSONS`, `FORM_CHILDREN`, `FORM_INCOME`
WGcompany: `WGCOMPANY_ENABLED`, `WGCOMPANY_MIN_SIZE`, `WGCOMPANY_MAX_SIZE`, `WGCOMPANY_MIN_PRICE`, `WGCOMPANY_MAX_PRICE`, `WGCOMPANY_BEZIRK`
## Telegram Commands
- `/autopilot on|off` - Enable or disable automatic applications
- `/status` - Show current status and statistics (autopilot state, application counts by company)
- `/plot` - Generate and send a weekly listing-patterns plot
- `/errorrate` - Generate and send an autopilot success vs failure plot
- `/retryfailed` - Retry all failed applications
- `/resetlistings` - Reset seen listings (marks all current as failed to avoid spam)
- `/help` - Show available commands and usage information
## Common Tasks
### Fix a broken company handler
@ -87,6 +115,15 @@ Check `data/*_nobtn_*.png` screenshots and `data/debug_page.html` to see actual
- InBerlin: Update regex patterns in `InBerlinMonitor.fetch_listings()`. Test against `data/debug_page.html`.
- WGcompany: Update parsing in `WGCompanyMonitor.fetch_listings()`. Test against `data/wgcompany_debug.html`.
### Merge data from another machine
Use the helper scripts in `helper_functions/`:
- `merge_listing_times.py` - Merge listing_times.csv files
- `merge_applications.py` - Merge applications.json files
- `merge_dict_json.py` - Merge listings.json and wgcompany_listings.json
- `merge_wgcompany_times.py` - Merge wgcompany_times.csv files
All scripts deduplicate by key and timestamp, and output merged results to the current data folder.
## Unit Tests
### Overview