2025-12-15 15:30:41 +01:00
# wohn-bot
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
A Python bot that monitors Berlin's public housing portal (inberlinwohnen.de) and WG rooms (wgcompany.de). Sends Telegram notifications when new listings appear and can automatically apply to some listings.
2025-12-09 11:30:17 +01:00
2025-12-09 11:44:40 +01:00
## What it does
2026-01-01 15:27:25 +01:00
- **Monitors** inberlinwohnen.de for new apartment listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM)
- **Monitors** wgcompany.de for WG room listings with configurable filters
- **Notifies** via Telegram with rich listing details and application status
- **Logs** listing times to CSV for pattern analysis and visualization
- **Auto-applies** to new listings when autopilot is enabled (all 6 companies supported)
- **Generates** weekly listing pattern plots and autopilot performance analytics
- **Autocleans** debug files older than 48 hours to manage disk space
- **Tracks** application history with success/failure reasons in JSON
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
## Auto-Apply Support
2025-12-08 14:44:59 +01:00
2025-12-15 15:30:41 +01:00
All six housing companies monitored by this bot now support the autopilot (automatic application) feature. Use autopilot with care — automatic form submission is destructive and may send many requests if configured incorrectly.
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
| Company | Status | Notes |
|---------|--------|-------|
2025-12-15 15:30:41 +01:00
| HOWOGE | Working | Fully automated and tested |
| Degewo | Working | Uses Wohnungshelden portal; automated |
| Stadt und Land | Working | Embedded form handled automatically |
| Gewobag | Working | Wohnungshelden iframe handled automatically |
| Gesobau | Working | Automated form submission implemented |
| WBM | Working | Automated form submission implemented |
| WGcompany | Monitoring only | WGcompany monitoring only (no autopilot) |
Recommended precautions:
- Run with `/autopilot off` while testing new selectors or after changing config.
- Inspect `data/applications.json` and saved screenshots in `data/` after enabling autopilot.
- Respect site terms of use and rate limits; set `CHECK_INTERVAL` appropriately.
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
## Setup
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
### Docker (recommended)
2025-12-08 14:44:59 +01:00
```bash
cp .env.example .env
# Edit .env with your credentials
2025-12-09 11:44:40 +01:00
docker compose up -d
2025-12-08 14:44:59 +01:00
```
2025-12-09 11:44:40 +01:00
### Local development
2025-12-08 14:44:59 +01:00
```bash
2025-12-09 11:44:40 +01:00
pip install -r requirements.txt
playwright install chromium
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
export TELEGRAM_BOT_TOKEN=your_token
export TELEGRAM_CHAT_ID=your_chat_id
2026-01-01 15:27:25 +01:00
# ... other env vars (see .env.example)
2025-12-08 14:44:59 +01:00
2026-01-01 15:27:25 +01:00
python main.py
2025-12-08 14:44:59 +01:00
```
2026-01-01 15:27:25 +01:00
### Helper Scripts
The `helper_functions/` directory contains utilities for merging data from multiple machines:
- `merge_listing_times.py` - Merge listing_times.csv files
- `merge_applications.py` - Merge applications.json files
- `merge_dict_json.py` - Merge listings.json and wgcompany_listings.json
- `merge_wgcompany_times.py` - Merge wgcompany_times.csv files
All scripts deduplicate by key and timestamp.
2025-12-09 11:44:40 +01:00
## Configuration
### Required environment variables
- `TELEGRAM_BOT_TOKEN` - Bot token from @BotFather
- `TELEGRAM_CHAT_ID` - Your Telegram chat ID
### InBerlin login (required for auto-apply)
- `INBERLIN_EMAIL` - Your inberlinwohnen.de email
- `INBERLIN_PASSWORD` - Your inberlinwohnen.de password
### Form data (for auto-apply)
- `FORM_ANREDE` - Salutation (Herr/Frau)
- `FORM_VORNAME` - First name
- `FORM_NACHNAME` - Last name
- `FORM_EMAIL` - Email address
- `FORM_PHONE` - Phone number
- `FORM_STRASSE` - Street name
- `FORM_HAUSNUMMER` - House number
- `FORM_PLZ` - Postal code
- `FORM_ORT` - City
- `FORM_PERSONS` - Number of persons in household
2026-01-05 13:40:12 +01:00
- `FORM_ADULTS` - Number of adults (for GEWOBAG forms, defaults to 1)
- `FORM_CHILDREN` - Number of children (defaults to 0)
2025-12-09 11:44:40 +01:00
- `FORM_INCOME` - Monthly net income
### WGcompany filters
- `WGCOMPANY_ENABLED` - Enable WGcompany monitoring (true/false)
- `WGCOMPANY_MIN_SIZE` - Minimum room size in sqm
- `WGCOMPANY_MAX_SIZE` - Maximum room size in sqm
- `WGCOMPANY_MIN_PRICE` - Minimum price in EUR
- `WGCOMPANY_MAX_PRICE` - Maximum price in EUR
- `WGCOMPANY_BEZIRK` - District filter (optional)
2025-12-08 14:44:59 +01:00
## Telegram Commands
2026-01-01 15:27:25 +01:00
- `/autopilot on|off` - Enable or disable automatic applications
- `/status` - Show current status and statistics (autopilot state, application counts by company)
- `/plot` - Generate and send a weekly listing-patterns plot with heatmap and charts (high-res, seaborn-styled)
- `/errorrate` - Generate and send an autopilot performance analysis with success/failure rates by company (high-res, seaborn-styled)
- `/retryfailed` - Retry all previously failed applications
- `/resetlistings` - Reset seen listings (marks all current as failed to avoid spam)
- `/help` - Show available commands and usage information
**Important:** The bot only processes commands from the configured `TELEGRAM_CHAT_ID` . Use `/autopilot off` while testing selector changes or after modifying configuration to avoid accidental submissions.
2025-12-15 17:04:14 +01:00
2026-01-01 15:27:25 +01:00
**Plot Features:** All plots are generated at 300 DPI with seaborn styling for publication-quality output.
2025-12-08 14:44:59 +01:00
2025-12-09 11:44:40 +01:00
## Data files
All data is stored in the `data/` directory:
2026-01-01 15:27:25 +01:00
**Persistent State:**
- `listings.json` - Previously seen inberlinwohnen listings (deduplicated by hash)
- `wgcompany_listings.json` - Previously seen WGcompany listings (deduplicated by hash)
- `applications.json` - Application history with timestamps, success/failure status, and error messages
- `listing_times.csv` - Time series data for inberlinwohnen listings (for pattern analysis)
- `wgcompany_times.csv` - Time series data for WGcompany listings
- `state.json` - Runtime state (autopilot toggle, persistent across restarts)
- `monitor.log` - Rotating application logs (max 5MB, 5 backups)
**Generated Plots:**
- `weekly_plot.png` - Weekly listing patterns (heatmap + charts, 300 DPI)
- `error_rate.png` - Autopilot performance analysis (3-panel chart, 300 DPI)
**Debug Files (auto-cleaned after 48 hours):**
- `data/<company>/*.png` - Screenshots from failed applications
- `data/<company>/*.html` - Page HTML snapshots for debugging
- `data/debug_page.html` - InBerlin page snapshot
- `data/wgcompany_debug.html` - WGcompany page snapshot
**Note:** Debug files (screenshots, HTML) are automatically deleted after 48 hours to save disk space. Listing data, applications, and logs are never deleted.
2025-12-09 11:44:40 +01:00
## Debugging
2026-01-01 15:27:25 +01:00
When applications fail, the bot saves debug material to help diagnose issues:
2025-12-09 11:44:40 +01:00
2026-01-01 15:27:25 +01:00
**Company-specific folders:**
2025-12-09 11:44:40 +01:00
2026-01-01 15:27:25 +01:00
- `data/howoge/` - Howoge screenshots and HTML
- `data/gewobag/` - Gewobag screenshots and HTML
- `data/degewo/` - Degewo screenshots and HTML
- `data/gesobau/` - Gesobau screenshots and HTML
- `data/stadtundland/` - Stadt und Land screenshots and HTML
- `data/wbm/` - WBM screenshots and HTML
**General debug files:**
- `data/debug_page.html` - InBerlin page snapshot
- `data/wgcompany_debug.html` - WGcompany page snapshot
Check `applications.json` for error messages and timestamps. Debug files are automatically cleaned after 48 hours but can be manually inspected while fresh.
2025-12-09 11:44:40 +01:00
2025-12-27 11:59:04 +01:00
## Code Structure
The bot has been modularized for better maintainability. The main components are:
2026-01-01 15:27:25 +01:00
**Core:**
- `main.py` - Entry point, orchestrates monitoring loop and autoclean
- `application_handler.py` - Delegates applications to company handlers, generates plots
- `telegram_bot.py` - Async Telegram bot with httpx for commands and notifications
- `state_manager.py` - Manages persistent state (autopilot toggle)
- `autoclean_debug.py` - Deletes debug files older than 48 hours
**Handlers:**
- `handlers/base_handler.py` - Abstract base class with shared functionality (cookie handling, consent, logging)
- `handlers/howoge_handler.py` - HOWOGE application automation
- `handlers/gewobag_handler.py` - Gewobag application automation
- `handlers/degewo_handler.py` - Degewo application automation (Wohnungshelden)
- `handlers/gesobau_handler.py` - Gesobau application automation
- `handlers/stadtundland_handler.py` - Stadt und Land application automation
- `handlers/wbm_handler.py` - WBM application automation
- `handlers/wgcompany_notifier.py` - WGcompany monitoring (notification only, no autopilot)
2025-12-27 11:59:04 +01:00
2026-01-01 15:27:25 +01:00
**Utilities:**
- `helper_functions/` - Data merge utilities for combining stats from multiple sources
- `merge_listing_times.py`
- `merge_applications.py`
- `merge_dict_json.py`
- `merge_wgcompany_times.py`
**Tests:**
- `tests/` - Comprehensive unit tests (48 tests total)
- `test_telegram_bot.py` - Telegram bot commands and messaging
- `test_error_rate_plot.py` - Plot generation
- `test_wgcompany_notifier.py` - WGcompany monitoring
- `test_handlers.py` - Handler initialization
- `test_application_handler.py` - Application orchestration
- `test_helper_functions.py` - Merge utilities
- `test_autoclean.py` - Autoclean script validation
2025-12-27 11:59:04 +01:00
## Unit Tests
2026-01-01 15:27:25 +01:00
The project includes comprehensive unit tests (48 tests total) to ensure functionality and reliability:
2025-12-27 11:59:04 +01:00
2026-01-01 15:27:25 +01:00
- `test_telegram_bot.py` - Telegram bot commands and messaging (13 tests)
- `test_error_rate_plot.py` - Plot generation and data analysis (2 tests)
- `test_wgcompany_notifier.py` - WGcompany monitoring (7 tests)
- `test_handlers.py` - Handler initialization and structure (6 tests)
- `test_application_handler.py` - Application orchestration (10 tests)
- `test_company_detection.py` - Company detection from URLs (6 tests)
- `test_state_manager.py` - State persistence (2 tests)
- `test_helper_functions.py` - Merge utilities (2 tests)
- `test_autoclean.py` - Autoclean script validation (1 test)
2025-12-27 11:59:04 +01:00
### Running Tests
```bash
2026-01-01 15:27:25 +01:00
pytest tests/ -v
2025-12-27 11:59:04 +01:00
```
2026-01-01 15:27:25 +01:00
All tests use mocking to avoid external dependencies and can run offline.
2025-12-27 11:59:04 +01:00
## Workflow Diagram
```mermaid
2025-12-29 22:46:10 +01:00
flowchart TD
2026-01-01 15:27:25 +01:00
Start([Start Bot]) --> Init[Initialize Browser & Telegram Bot]
Init --> Loop{Main Loop}
%% InBerlin Monitoring
Loop --> InBerlin[Fetch InBerlin Listings]
InBerlin --> ParseIB[Parse & Hash Listings]
ParseIB --> LoadIB[Load Previous InBerlin Listings]
LoadIB --> DedupeIB{New InBerlin Listings?}
DedupeIB -- Yes --> LogIB[Log to listing_times.csv]
LogIB --> SaveIB[Save to listings.json]
DedupeIB -- No --> WG
%% WGcompany Monitoring
SaveIB --> WG[Fetch WGcompany Listings]
WG --> ParseWG[Parse & Hash Listings]
ParseWG --> LoadWG[Load Previous WGcompany Listings]
LoadWG --> DedupeWG{New WGcompany Listings?}
DedupeWG -- Yes --> LogWG[Log to wgcompany_times.csv]
LogWG --> SaveWG[Save to wgcompany_listings.json]
DedupeWG -- No --> CheckAutopilot
%% Autopilot Decision
SaveWG --> CheckAutopilot{Autopilot Enabled?}
SaveIB --> CheckAutopilot
CheckAutopilot -- Off --> NotifyOnly[Send Telegram Notifications]
NotifyOnly --> CheckClean
CheckAutopilot -- On --> CheckApplied{Already Applied?}
CheckApplied -- Yes --> Skip[Skip Listing]
CheckApplied -- No --> DetectCompany[Detect Company]
%% Application Flow
DetectCompany --> SelectHandler[Select Handler]
SelectHandler --> OpenPage[Open Listing Page]
OpenPage --> Check404{404 or Deactivated?}
2026-01-01 22:14:55 +01:00
Check404 -- Yes --> MarkPermanent[Mark deactivated]
2026-01-01 15:27:25 +01:00
MarkPermanent --> SaveFail[Save to applications.json]
SaveFail --> NotifyFail[Notify: Application Failed]
Check404 -- No --> HandleCookies[Handle Cookie Banners]
HandleCookies --> FindButton[Find Application Button]
FindButton --> ButtonFound{Button Found?}
ButtonFound -- No --> Screenshot1[Save Screenshot & HTML]
Screenshot1 --> SaveFail
ButtonFound -- Yes --> ClickButton[Click Application Button]
ClickButton --> MultiStep{Multi-Step Form?}
MultiStep -- Yes --> NavigateSteps[Navigate Form Steps]
NavigateSteps --> FillForm
MultiStep -- No --> FillForm[Fill Form Fields]
FillForm --> SubmitForm[Submit Application]
SubmitForm --> CheckConfirm{Confirmation Detected?}
CheckConfirm -- Yes --> SaveSuccess[Save success to applications.json]
SaveSuccess --> NotifySuccess[Notify: Application Success]
CheckConfirm -- No --> Screenshot2[Save Screenshot & HTML]
Screenshot2 --> SaveFail
NotifySuccess --> CheckClean
NotifyFail --> CheckClean
Skip --> CheckClean
%% Autoclean
CheckClean{Time for Autoclean?}
CheckClean -- Yes --> RunClean[Delete Debug Files >48h]
RunClean --> Sleep
CheckClean -- No --> Sleep[Sleep CHECK_INTERVAL]
Sleep --> TelegramCmd{Telegram Command?}
TelegramCmd -- /autopilot --> ToggleAutopilot[Toggle Autopilot State]
TelegramCmd -- /status --> ShowStatus[Show Status & Stats]
TelegramCmd -- /plot --> GenPlot[Generate Weekly Plot]
TelegramCmd -- /errorrate --> GenError[Generate Error Rate Plot]
TelegramCmd -- /retryfailed --> RetryFailed[Retry Failed Applications]
TelegramCmd -- /resetlistings --> ResetListings[Reset Seen Listings]
TelegramCmd -- /help --> ShowHelp[Show Help]
TelegramCmd -- None --> Loop
ToggleAutopilot --> Loop
ShowStatus --> Loop
GenPlot --> Loop
GenError --> Loop
RetryFailed --> Loop
ResetListings --> Loop
ShowHelp --> Loop
style Start fill:#90EE90
style SaveSuccess fill:#90EE90
style SaveFail fill:#FFB6C1
style MarkPermanent fill:#FFB6C1
style RunClean fill:#87CEEB
style CheckAutopilot fill:#FFD700
style Check404 fill:#FFD700
style ButtonFound fill:#FFD700
style CheckConfirm fill:#FFD700
2025-12-27 11:59:04 +01:00
```
2026-01-01 15:27:25 +01:00
**Key Features:**
- **Dual Monitoring**: Tracks both InBerlin (6 companies) and WGcompany listings
- **Smart Deduplication**: MD5 hashing prevents duplicate notifications
- **Autopilot**: Automated application with company-specific handlers
- **Error Handling**: 404 detection, permanent fail tracking, debug screenshots
- **Autoclean**: Automatic cleanup of debug files every 48 hours
- **Rich Commands**: Status, plots, retry failed, reset listings
- **High-Res Analytics**: 300 DPI seaborn-styled plots for pattern analysis
2025-12-27 11:59:04 +01:00
2025-12-09 11:44:40 +01:00
## License
2025-12-08 14:44:59 +01:00
2025-12-27 11:59:04 +01:00
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
You are free to:
- **Share** — copy and redistribute the material in any medium or format
- **Adapt** — remix, transform, and build upon the material
Under the following terms:
- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- **NonCommercial** — You may not use the material for commercial purposes.
For more details, see the [full license text ](https://creativecommons.org/licenses/by-nc/4.0/ ).