This commit is contained in:
Aron Petau 2026-01-01 15:27:25 +01:00
parent d596ed7e19
commit aa6626d80d
21 changed files with 1051 additions and 333 deletions

296
README.md
View file

@ -4,11 +4,14 @@ A Python bot that monitors Berlin's public housing portal (inberlinwohnen.de) an
## What it does
- Monitors inberlinwohnen.de for new apartment listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM)
- Monitors wgcompany.de for WG room listings with configurable filters
- Sends Telegram notifications with listing details
- Logs listing times to CSV for pattern analysis
- Auto-apply feature for supported housing companies
- **Monitors** inberlinwohnen.de for new apartment listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM)
- **Monitors** wgcompany.de for WG room listings with configurable filters
- **Notifies** via Telegram with rich listing details and application status
- **Logs** listing times to CSV for pattern analysis and visualization
- **Auto-applies** to new listings when autopilot is enabled (all 6 companies supported)
- **Generates** weekly listing pattern plots and autopilot performance analytics
- **Autocleans** debug files older than 48 hours to manage disk space
- **Tracks** application history with success/failure reasons in JSON
## Auto-Apply Support
@ -48,11 +51,22 @@ playwright install chromium
export TELEGRAM_BOT_TOKEN=your_token
export TELEGRAM_CHAT_ID=your_chat_id
# ... other env vars
# ... other env vars (see .env.example)
python monitor.py
python main.py
```
### Helper Scripts
The `helper_functions/` directory contains utilities for merging data from multiple machines:
- `merge_listing_times.py` - Merge listing_times.csv files
- `merge_applications.py` - Merge applications.json files
- `merge_dict_json.py` - Merge listings.json and wgcompany_listings.json
- `merge_wgcompany_times.py` - Merge wgcompany_times.csv files
All scripts deduplicate by key and timestamp.
## Configuration
### Required environment variables
@ -91,100 +105,248 @@ python monitor.py
## Telegram Commands
- `/autopilot on|off` - Enable or disable automatic applications (use `/autopilot on` or `/autopilot off`).
- `/status` - Show current status and statistics (autopilot state, application counts by company).
- `/plot` - Generate and send a weekly listing-patterns plot (`data/weekly_plot.png`).
- `/errorrate` - Generate and send an autopilot success vs failure plot (`data/error_rate.png`).
- `/help` - Show available commands and usage information.
- `/autopilot on|off` - Enable or disable automatic applications
- `/status` - Show current status and statistics (autopilot state, application counts by company)
- `/plot` - Generate and send a weekly listing-patterns plot with heatmap and charts (high-res, seaborn-styled)
- `/errorrate` - Generate and send an autopilot performance analysis with success/failure rates by company (high-res, seaborn-styled)
- `/retryfailed` - Retry all previously failed applications
- `/resetlistings` - Reset seen listings (marks all current as failed to avoid spam)
- `/help` - Show available commands and usage information
Note: The bot only processes commands from the configured `TELEGRAM_CHAT_ID`. Use `/autopilot off` while testing selector changes or after modifying configuration to avoid accidental submissions.
**Important:** The bot only processes commands from the configured `TELEGRAM_CHAT_ID`. Use `/autopilot off` while testing selector changes or after modifying configuration to avoid accidental submissions.
**Plot Features:** All plots are generated at 300 DPI with seaborn styling for publication-quality output.
## Data files
All data is stored in the `data/` directory:
- `listings.json` - Previously seen inberlinwohnen listings
- `wgcompany_listings.json` - Previously seen WGcompany listings
- `applications.json` - Application history
- `listing_times.csv` - Time series data for listings
- `state.json` - Runtime state (autopilot toggle)
- `monitor.log` - Application logs
**Persistent State:**
- `listings.json` - Previously seen inberlinwohnen listings (deduplicated by hash)
- `wgcompany_listings.json` - Previously seen WGcompany listings (deduplicated by hash)
- `applications.json` - Application history with timestamps, success/failure status, and error messages
- `listing_times.csv` - Time series data for inberlinwohnen listings (for pattern analysis)
- `wgcompany_times.csv` - Time series data for WGcompany listings
- `state.json` - Runtime state (autopilot toggle, persistent across restarts)
- `monitor.log` - Rotating application logs (max 5MB, 5 backups)
**Generated Plots:**
- `weekly_plot.png` - Weekly listing patterns (heatmap + charts, 300 DPI)
- `error_rate.png` - Autopilot performance analysis (3-panel chart, 300 DPI)
**Debug Files (auto-cleaned after 48 hours):**
- `data/<company>/*.png` - Screenshots from failed applications
- `data/<company>/*.html` - Page HTML snapshots for debugging
- `data/debug_page.html` - InBerlin page snapshot
- `data/wgcompany_debug.html` - WGcompany page snapshot
**Note:** Debug files (screenshots, HTML) are automatically deleted after 48 hours to save disk space. Listing data, applications, and logs are never deleted.
## Debugging
When applications fail, the bot saves:
When applications fail, the bot saves debug material to help diagnose issues:
- Screenshots to `data/*.png`
- Page HTML to `data/debug_page.html`
**Company-specific folders:**
Check these files to understand why an application failed.
- `data/howoge/` - Howoge screenshots and HTML
- `data/gewobag/` - Gewobag screenshots and HTML
- `data/degewo/` - Degewo screenshots and HTML
- `data/gesobau/` - Gesobau screenshots and HTML
- `data/stadtundland/` - Stadt und Land screenshots and HTML
- `data/wbm/` - WBM screenshots and HTML
**General debug files:**
- `data/debug_page.html` - InBerlin page snapshot
- `data/wgcompany_debug.html` - WGcompany page snapshot
Check `applications.json` for error messages and timestamps. Debug files are automatically cleaned after 48 hours but can be manually inspected while fresh.
## Code Structure
The bot has been modularized for better maintainability. The main components are:
- `main.py`: The entry point for the bot.
- `handlers/`: Contains company-specific handlers for auto-apply functionality. Each company has its own handler file:
- `howoge_handler.py`
- `gewobag_handler.py`
- `degewo_handler.py`
- `gesobau_handler.py`
- `stadtundland_handler.py`
- `wbm_handler.py`
- `application_handler.py`: Orchestrates the application process by delegating to the appropriate handler.
- `telegram_bot.py`: Handles Telegram bot commands and notifications.
**Core:**
The `handlers/` directory includes a `BaseHandler` class that provides shared functionality for all company-specific handlers.
- `main.py` - Entry point, orchestrates monitoring loop and autoclean
- `application_handler.py` - Delegates applications to company handlers, generates plots
- `telegram_bot.py` - Async Telegram bot with httpx for commands and notifications
- `state_manager.py` - Manages persistent state (autopilot toggle)
- `autoclean_debug.py` - Deletes debug files older than 48 hours
**Handlers:**
- `handlers/base_handler.py` - Abstract base class with shared functionality (cookie handling, consent, logging)
- `handlers/howoge_handler.py` - HOWOGE application automation
- `handlers/gewobag_handler.py` - Gewobag application automation
- `handlers/degewo_handler.py` - Degewo application automation (Wohnungshelden)
- `handlers/gesobau_handler.py` - Gesobau application automation
- `handlers/stadtundland_handler.py` - Stadt und Land application automation
- `handlers/wbm_handler.py` - WBM application automation
- `handlers/wgcompany_notifier.py` - WGcompany monitoring (notification only, no autopilot)
**Utilities:**
- `helper_functions/` - Data merge utilities for combining stats from multiple sources
- `merge_listing_times.py`
- `merge_applications.py`
- `merge_dict_json.py`
- `merge_wgcompany_times.py`
**Tests:**
- `tests/` - Comprehensive unit tests (48 tests total)
- `test_telegram_bot.py` - Telegram bot commands and messaging
- `test_error_rate_plot.py` - Plot generation
- `test_wgcompany_notifier.py` - WGcompany monitoring
- `test_handlers.py` - Handler initialization
- `test_application_handler.py` - Application orchestration
- `test_helper_functions.py` - Merge utilities
- `test_autoclean.py` - Autoclean script validation
## Unit Tests
The project includes unit tests to ensure functionality and reliability. Key test files:
The project includes comprehensive unit tests (48 tests total) to ensure functionality and reliability:
- `tests/test_telegram_bot.py`: Tests the Telegram bot's commands and messaging functionality.
- `tests/test_error_rate_plot.py`: Tests the error rate plot generator for autopilot applications.
- `test_telegram_bot.py` - Telegram bot commands and messaging (13 tests)
- `test_error_rate_plot.py` - Plot generation and data analysis (2 tests)
- `test_wgcompany_notifier.py` - WGcompany monitoring (7 tests)
- `test_handlers.py` - Handler initialization and structure (6 tests)
- `test_application_handler.py` - Application orchestration (10 tests)
- `test_company_detection.py` - Company detection from URLs (6 tests)
- `test_state_manager.py` - State persistence (2 tests)
- `test_helper_functions.py` - Merge utilities (2 tests)
- `test_autoclean.py` - Autoclean script validation (1 test)
### Running Tests
To run the tests, use:
```bash
pytest tests/
pytest tests/ -v
```
Ensure all dependencies are installed and the environment is configured correctly before running the tests.
All tests use mocking to avoid external dependencies and can run offline.
## Workflow Diagram
```mermaid
flowchart TD
A([Start]) --> B[Fetch Listings]
B --> C[Load Previous Listings]
C --> D[Deduplicate: Find New Listings]
D --> E{Any New Listings?}
E -- No --> Z1([Sleep & Wait])
E -- Yes --> F[Log New Listings to CSV]
F --> G[Save Current Listings]
G --> H[Check Autopilot State]
H -- Off --> I[Send Telegram Notification (No Apply)]
H -- On --> J[Attempt Auto-Apply to Each New Listing]
J --> K{Application Success?}
K -- Yes --> L[Log Success, Save to applications.json]
K -- No --> M[Log Failure, Save to applications.json]
L --> N[Send Telegram Notification (Success)]
M --> O[Send Telegram Notification (Failure)]
N --> P([Sleep & Wait])
O --> P
I --> P
Z1 --> B
P --> B
%% Details for error/debugging
J --> Q{Handler Error?}
Q -- Yes --> R[Save Screenshot/HTML for Debug]
R --> M
Q -- No --> K
Start([Start Bot]) --> Init[Initialize Browser & Telegram Bot]
Init --> Loop{Main Loop}
%% InBerlin Monitoring
Loop --> InBerlin[Fetch InBerlin Listings]
InBerlin --> ParseIB[Parse & Hash Listings]
ParseIB --> LoadIB[Load Previous InBerlin Listings]
LoadIB --> DedupeIB{New InBerlin Listings?}
DedupeIB -- Yes --> LogIB[Log to listing_times.csv]
LogIB --> SaveIB[Save to listings.json]
DedupeIB -- No --> WG
%% WGcompany Monitoring
SaveIB --> WG[Fetch WGcompany Listings]
WG --> ParseWG[Parse & Hash Listings]
ParseWG --> LoadWG[Load Previous WGcompany Listings]
LoadWG --> DedupeWG{New WGcompany Listings?}
DedupeWG -- Yes --> LogWG[Log to wgcompany_times.csv]
LogWG --> SaveWG[Save to wgcompany_listings.json]
DedupeWG -- No --> CheckAutopilot
%% Autopilot Decision
SaveWG --> CheckAutopilot{Autopilot Enabled?}
SaveIB --> CheckAutopilot
CheckAutopilot -- Off --> NotifyOnly[Send Telegram Notifications]
NotifyOnly --> CheckClean
CheckAutopilot -- On --> CheckApplied{Already Applied?}
CheckApplied -- Yes --> Skip[Skip Listing]
CheckApplied -- No --> DetectCompany[Detect Company]
%% Application Flow
DetectCompany --> SelectHandler[Select Handler]
SelectHandler --> OpenPage[Open Listing Page]
OpenPage --> Check404{404 or Deactivated?}
Check404 -- Yes --> MarkPermanent[Mark permanent_fail]
MarkPermanent --> SaveFail[Save to applications.json]
SaveFail --> NotifyFail[Notify: Application Failed]
Check404 -- No --> HandleCookies[Handle Cookie Banners]
HandleCookies --> FindButton[Find Application Button]
FindButton --> ButtonFound{Button Found?}
ButtonFound -- No --> Screenshot1[Save Screenshot & HTML]
Screenshot1 --> SaveFail
ButtonFound -- Yes --> ClickButton[Click Application Button]
ClickButton --> MultiStep{Multi-Step Form?}
MultiStep -- Yes --> NavigateSteps[Navigate Form Steps]
NavigateSteps --> FillForm
MultiStep -- No --> FillForm[Fill Form Fields]
FillForm --> SubmitForm[Submit Application]
SubmitForm --> CheckConfirm{Confirmation Detected?}
CheckConfirm -- Yes --> SaveSuccess[Save success to applications.json]
SaveSuccess --> NotifySuccess[Notify: Application Success]
CheckConfirm -- No --> Screenshot2[Save Screenshot & HTML]
Screenshot2 --> SaveFail
NotifySuccess --> CheckClean
NotifyFail --> CheckClean
Skip --> CheckClean
%% Autoclean
CheckClean{Time for Autoclean?}
CheckClean -- Yes --> RunClean[Delete Debug Files >48h]
RunClean --> Sleep
CheckClean -- No --> Sleep[Sleep CHECK_INTERVAL]
Sleep --> TelegramCmd{Telegram Command?}
TelegramCmd -- /autopilot --> ToggleAutopilot[Toggle Autopilot State]
TelegramCmd -- /status --> ShowStatus[Show Status & Stats]
TelegramCmd -- /plot --> GenPlot[Generate Weekly Plot]
TelegramCmd -- /errorrate --> GenError[Generate Error Rate Plot]
TelegramCmd -- /retryfailed --> RetryFailed[Retry Failed Applications]
TelegramCmd -- /resetlistings --> ResetListings[Reset Seen Listings]
TelegramCmd -- /help --> ShowHelp[Show Help]
TelegramCmd -- None --> Loop
ToggleAutopilot --> Loop
ShowStatus --> Loop
GenPlot --> Loop
GenError --> Loop
RetryFailed --> Loop
ResetListings --> Loop
ShowHelp --> Loop
style Start fill:#90EE90
style SaveSuccess fill:#90EE90
style SaveFail fill:#FFB6C1
style MarkPermanent fill:#FFB6C1
style RunClean fill:#87CEEB
style CheckAutopilot fill:#FFD700
style Check404 fill:#FFD700
style ButtonFound fill:#FFD700
style CheckConfirm fill:#FFD700
```
This diagram illustrates the workflow of the bot, from fetching listings to logging, notifying, and optionally applying to new listings.
**Key Features:**
- **Dual Monitoring**: Tracks both InBerlin (6 companies) and WGcompany listings
- **Smart Deduplication**: MD5 hashing prevents duplicate notifications
- **Autopilot**: Automated application with company-specific handlers
- **Error Handling**: 404 detection, permanent fail tracking, debug screenshots
- **Autoclean**: Automatic cleanup of debug files every 48 hours
- **Rich Commands**: Status, plots, retry failed, reset listings
- **High-Res Analytics**: 300 DPI seaborn-styled plots for pattern analysis
## License