prod
This commit is contained in:
parent
d596ed7e19
commit
aa6626d80d
21 changed files with 1051 additions and 333 deletions
296
README.md
296
README.md
|
|
@ -4,11 +4,14 @@ A Python bot that monitors Berlin's public housing portal (inberlinwohnen.de) an
|
|||
|
||||
## What it does
|
||||
|
||||
- Monitors inberlinwohnen.de for new apartment listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM)
|
||||
- Monitors wgcompany.de for WG room listings with configurable filters
|
||||
- Sends Telegram notifications with listing details
|
||||
- Logs listing times to CSV for pattern analysis
|
||||
- Auto-apply feature for supported housing companies
|
||||
- **Monitors** inberlinwohnen.de for new apartment listings from 6 housing companies (HOWOGE, Gewobag, Degewo, Gesobau, Stadt und Land, WBM)
|
||||
- **Monitors** wgcompany.de for WG room listings with configurable filters
|
||||
- **Notifies** via Telegram with rich listing details and application status
|
||||
- **Logs** listing times to CSV for pattern analysis and visualization
|
||||
- **Auto-applies** to new listings when autopilot is enabled (all 6 companies supported)
|
||||
- **Generates** weekly listing pattern plots and autopilot performance analytics
|
||||
- **Autocleans** debug files older than 48 hours to manage disk space
|
||||
- **Tracks** application history with success/failure reasons in JSON
|
||||
|
||||
## Auto-Apply Support
|
||||
|
||||
|
|
@ -48,11 +51,22 @@ playwright install chromium
|
|||
|
||||
export TELEGRAM_BOT_TOKEN=your_token
|
||||
export TELEGRAM_CHAT_ID=your_chat_id
|
||||
# ... other env vars
|
||||
# ... other env vars (see .env.example)
|
||||
|
||||
python monitor.py
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Helper Scripts
|
||||
|
||||
The `helper_functions/` directory contains utilities for merging data from multiple machines:
|
||||
|
||||
- `merge_listing_times.py` - Merge listing_times.csv files
|
||||
- `merge_applications.py` - Merge applications.json files
|
||||
- `merge_dict_json.py` - Merge listings.json and wgcompany_listings.json
|
||||
- `merge_wgcompany_times.py` - Merge wgcompany_times.csv files
|
||||
|
||||
All scripts deduplicate by key and timestamp.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Required environment variables
|
||||
|
|
@ -91,100 +105,248 @@ python monitor.py
|
|||
|
||||
## Telegram Commands
|
||||
|
||||
- `/autopilot on|off` - Enable or disable automatic applications (use `/autopilot on` or `/autopilot off`).
|
||||
- `/status` - Show current status and statistics (autopilot state, application counts by company).
|
||||
- `/plot` - Generate and send a weekly listing-patterns plot (`data/weekly_plot.png`).
|
||||
- `/errorrate` - Generate and send an autopilot success vs failure plot (`data/error_rate.png`).
|
||||
- `/help` - Show available commands and usage information.
|
||||
- `/autopilot on|off` - Enable or disable automatic applications
|
||||
- `/status` - Show current status and statistics (autopilot state, application counts by company)
|
||||
- `/plot` - Generate and send a weekly listing-patterns plot with heatmap and charts (high-res, seaborn-styled)
|
||||
- `/errorrate` - Generate and send an autopilot performance analysis with success/failure rates by company (high-res, seaborn-styled)
|
||||
- `/retryfailed` - Retry all previously failed applications
|
||||
- `/resetlistings` - Reset seen listings (marks all current as failed to avoid spam)
|
||||
- `/help` - Show available commands and usage information
|
||||
|
||||
Note: The bot only processes commands from the configured `TELEGRAM_CHAT_ID`. Use `/autopilot off` while testing selector changes or after modifying configuration to avoid accidental submissions.
|
||||
**Important:** The bot only processes commands from the configured `TELEGRAM_CHAT_ID`. Use `/autopilot off` while testing selector changes or after modifying configuration to avoid accidental submissions.
|
||||
|
||||
**Plot Features:** All plots are generated at 300 DPI with seaborn styling for publication-quality output.
|
||||
|
||||
## Data files
|
||||
|
||||
All data is stored in the `data/` directory:
|
||||
|
||||
- `listings.json` - Previously seen inberlinwohnen listings
|
||||
- `wgcompany_listings.json` - Previously seen WGcompany listings
|
||||
- `applications.json` - Application history
|
||||
- `listing_times.csv` - Time series data for listings
|
||||
- `state.json` - Runtime state (autopilot toggle)
|
||||
- `monitor.log` - Application logs
|
||||
**Persistent State:**
|
||||
|
||||
- `listings.json` - Previously seen inberlinwohnen listings (deduplicated by hash)
|
||||
- `wgcompany_listings.json` - Previously seen WGcompany listings (deduplicated by hash)
|
||||
- `applications.json` - Application history with timestamps, success/failure status, and error messages
|
||||
- `listing_times.csv` - Time series data for inberlinwohnen listings (for pattern analysis)
|
||||
- `wgcompany_times.csv` - Time series data for WGcompany listings
|
||||
- `state.json` - Runtime state (autopilot toggle, persistent across restarts)
|
||||
- `monitor.log` - Rotating application logs (max 5MB, 5 backups)
|
||||
|
||||
**Generated Plots:**
|
||||
|
||||
- `weekly_plot.png` - Weekly listing patterns (heatmap + charts, 300 DPI)
|
||||
- `error_rate.png` - Autopilot performance analysis (3-panel chart, 300 DPI)
|
||||
|
||||
**Debug Files (auto-cleaned after 48 hours):**
|
||||
|
||||
- `data/<company>/*.png` - Screenshots from failed applications
|
||||
- `data/<company>/*.html` - Page HTML snapshots for debugging
|
||||
- `data/debug_page.html` - InBerlin page snapshot
|
||||
- `data/wgcompany_debug.html` - WGcompany page snapshot
|
||||
|
||||
**Note:** Debug files (screenshots, HTML) are automatically deleted after 48 hours to save disk space. Listing data, applications, and logs are never deleted.
|
||||
|
||||
## Debugging
|
||||
|
||||
When applications fail, the bot saves:
|
||||
When applications fail, the bot saves debug material to help diagnose issues:
|
||||
|
||||
- Screenshots to `data/*.png`
|
||||
- Page HTML to `data/debug_page.html`
|
||||
**Company-specific folders:**
|
||||
|
||||
Check these files to understand why an application failed.
|
||||
- `data/howoge/` - Howoge screenshots and HTML
|
||||
- `data/gewobag/` - Gewobag screenshots and HTML
|
||||
- `data/degewo/` - Degewo screenshots and HTML
|
||||
- `data/gesobau/` - Gesobau screenshots and HTML
|
||||
- `data/stadtundland/` - Stadt und Land screenshots and HTML
|
||||
- `data/wbm/` - WBM screenshots and HTML
|
||||
|
||||
**General debug files:**
|
||||
|
||||
- `data/debug_page.html` - InBerlin page snapshot
|
||||
- `data/wgcompany_debug.html` - WGcompany page snapshot
|
||||
|
||||
Check `applications.json` for error messages and timestamps. Debug files are automatically cleaned after 48 hours but can be manually inspected while fresh.
|
||||
|
||||
## Code Structure
|
||||
|
||||
The bot has been modularized for better maintainability. The main components are:
|
||||
|
||||
- `main.py`: The entry point for the bot.
|
||||
- `handlers/`: Contains company-specific handlers for auto-apply functionality. Each company has its own handler file:
|
||||
- `howoge_handler.py`
|
||||
- `gewobag_handler.py`
|
||||
- `degewo_handler.py`
|
||||
- `gesobau_handler.py`
|
||||
- `stadtundland_handler.py`
|
||||
- `wbm_handler.py`
|
||||
- `application_handler.py`: Orchestrates the application process by delegating to the appropriate handler.
|
||||
- `telegram_bot.py`: Handles Telegram bot commands and notifications.
|
||||
**Core:**
|
||||
|
||||
The `handlers/` directory includes a `BaseHandler` class that provides shared functionality for all company-specific handlers.
|
||||
- `main.py` - Entry point, orchestrates monitoring loop and autoclean
|
||||
- `application_handler.py` - Delegates applications to company handlers, generates plots
|
||||
- `telegram_bot.py` - Async Telegram bot with httpx for commands and notifications
|
||||
- `state_manager.py` - Manages persistent state (autopilot toggle)
|
||||
- `autoclean_debug.py` - Deletes debug files older than 48 hours
|
||||
|
||||
**Handlers:**
|
||||
|
||||
- `handlers/base_handler.py` - Abstract base class with shared functionality (cookie handling, consent, logging)
|
||||
- `handlers/howoge_handler.py` - HOWOGE application automation
|
||||
- `handlers/gewobag_handler.py` - Gewobag application automation
|
||||
- `handlers/degewo_handler.py` - Degewo application automation (Wohnungshelden)
|
||||
- `handlers/gesobau_handler.py` - Gesobau application automation
|
||||
- `handlers/stadtundland_handler.py` - Stadt und Land application automation
|
||||
- `handlers/wbm_handler.py` - WBM application automation
|
||||
- `handlers/wgcompany_notifier.py` - WGcompany monitoring (notification only, no autopilot)
|
||||
|
||||
**Utilities:**
|
||||
|
||||
- `helper_functions/` - Data merge utilities for combining stats from multiple sources
|
||||
- `merge_listing_times.py`
|
||||
- `merge_applications.py`
|
||||
- `merge_dict_json.py`
|
||||
- `merge_wgcompany_times.py`
|
||||
|
||||
**Tests:**
|
||||
|
||||
- `tests/` - Comprehensive unit tests (48 tests total)
|
||||
- `test_telegram_bot.py` - Telegram bot commands and messaging
|
||||
- `test_error_rate_plot.py` - Plot generation
|
||||
- `test_wgcompany_notifier.py` - WGcompany monitoring
|
||||
- `test_handlers.py` - Handler initialization
|
||||
- `test_application_handler.py` - Application orchestration
|
||||
- `test_helper_functions.py` - Merge utilities
|
||||
- `test_autoclean.py` - Autoclean script validation
|
||||
|
||||
## Unit Tests
|
||||
|
||||
The project includes unit tests to ensure functionality and reliability. Key test files:
|
||||
The project includes comprehensive unit tests (48 tests total) to ensure functionality and reliability:
|
||||
|
||||
- `tests/test_telegram_bot.py`: Tests the Telegram bot's commands and messaging functionality.
|
||||
- `tests/test_error_rate_plot.py`: Tests the error rate plot generator for autopilot applications.
|
||||
- `test_telegram_bot.py` - Telegram bot commands and messaging (13 tests)
|
||||
- `test_error_rate_plot.py` - Plot generation and data analysis (2 tests)
|
||||
- `test_wgcompany_notifier.py` - WGcompany monitoring (7 tests)
|
||||
- `test_handlers.py` - Handler initialization and structure (6 tests)
|
||||
- `test_application_handler.py` - Application orchestration (10 tests)
|
||||
- `test_company_detection.py` - Company detection from URLs (6 tests)
|
||||
- `test_state_manager.py` - State persistence (2 tests)
|
||||
- `test_helper_functions.py` - Merge utilities (2 tests)
|
||||
- `test_autoclean.py` - Autoclean script validation (1 test)
|
||||
|
||||
### Running Tests
|
||||
|
||||
To run the tests, use:
|
||||
|
||||
```bash
|
||||
pytest tests/
|
||||
pytest tests/ -v
|
||||
```
|
||||
|
||||
Ensure all dependencies are installed and the environment is configured correctly before running the tests.
|
||||
All tests use mocking to avoid external dependencies and can run offline.
|
||||
|
||||
## Workflow Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A([Start]) --> B[Fetch Listings]
|
||||
B --> C[Load Previous Listings]
|
||||
C --> D[Deduplicate: Find New Listings]
|
||||
D --> E{Any New Listings?}
|
||||
E -- No --> Z1([Sleep & Wait])
|
||||
E -- Yes --> F[Log New Listings to CSV]
|
||||
F --> G[Save Current Listings]
|
||||
G --> H[Check Autopilot State]
|
||||
H -- Off --> I[Send Telegram Notification (No Apply)]
|
||||
H -- On --> J[Attempt Auto-Apply to Each New Listing]
|
||||
J --> K{Application Success?}
|
||||
K -- Yes --> L[Log Success, Save to applications.json]
|
||||
K -- No --> M[Log Failure, Save to applications.json]
|
||||
L --> N[Send Telegram Notification (Success)]
|
||||
M --> O[Send Telegram Notification (Failure)]
|
||||
N --> P([Sleep & Wait])
|
||||
O --> P
|
||||
I --> P
|
||||
Z1 --> B
|
||||
P --> B
|
||||
%% Details for error/debugging
|
||||
J --> Q{Handler Error?}
|
||||
Q -- Yes --> R[Save Screenshot/HTML for Debug]
|
||||
R --> M
|
||||
Q -- No --> K
|
||||
Start([Start Bot]) --> Init[Initialize Browser & Telegram Bot]
|
||||
Init --> Loop{Main Loop}
|
||||
|
||||
%% InBerlin Monitoring
|
||||
Loop --> InBerlin[Fetch InBerlin Listings]
|
||||
InBerlin --> ParseIB[Parse & Hash Listings]
|
||||
ParseIB --> LoadIB[Load Previous InBerlin Listings]
|
||||
LoadIB --> DedupeIB{New InBerlin Listings?}
|
||||
|
||||
DedupeIB -- Yes --> LogIB[Log to listing_times.csv]
|
||||
LogIB --> SaveIB[Save to listings.json]
|
||||
DedupeIB -- No --> WG
|
||||
|
||||
%% WGcompany Monitoring
|
||||
SaveIB --> WG[Fetch WGcompany Listings]
|
||||
WG --> ParseWG[Parse & Hash Listings]
|
||||
ParseWG --> LoadWG[Load Previous WGcompany Listings]
|
||||
LoadWG --> DedupeWG{New WGcompany Listings?}
|
||||
|
||||
DedupeWG -- Yes --> LogWG[Log to wgcompany_times.csv]
|
||||
LogWG --> SaveWG[Save to wgcompany_listings.json]
|
||||
DedupeWG -- No --> CheckAutopilot
|
||||
|
||||
%% Autopilot Decision
|
||||
SaveWG --> CheckAutopilot{Autopilot Enabled?}
|
||||
SaveIB --> CheckAutopilot
|
||||
|
||||
CheckAutopilot -- Off --> NotifyOnly[Send Telegram Notifications]
|
||||
NotifyOnly --> CheckClean
|
||||
|
||||
CheckAutopilot -- On --> CheckApplied{Already Applied?}
|
||||
CheckApplied -- Yes --> Skip[Skip Listing]
|
||||
CheckApplied -- No --> DetectCompany[Detect Company]
|
||||
|
||||
%% Application Flow
|
||||
DetectCompany --> SelectHandler[Select Handler]
|
||||
SelectHandler --> OpenPage[Open Listing Page]
|
||||
OpenPage --> Check404{404 or Deactivated?}
|
||||
|
||||
Check404 -- Yes --> MarkPermanent[Mark permanent_fail]
|
||||
MarkPermanent --> SaveFail[Save to applications.json]
|
||||
SaveFail --> NotifyFail[Notify: Application Failed]
|
||||
|
||||
Check404 -- No --> HandleCookies[Handle Cookie Banners]
|
||||
HandleCookies --> FindButton[Find Application Button]
|
||||
FindButton --> ButtonFound{Button Found?}
|
||||
|
||||
ButtonFound -- No --> Screenshot1[Save Screenshot & HTML]
|
||||
Screenshot1 --> SaveFail
|
||||
|
||||
ButtonFound -- Yes --> ClickButton[Click Application Button]
|
||||
ClickButton --> MultiStep{Multi-Step Form?}
|
||||
|
||||
MultiStep -- Yes --> NavigateSteps[Navigate Form Steps]
|
||||
NavigateSteps --> FillForm
|
||||
MultiStep -- No --> FillForm[Fill Form Fields]
|
||||
|
||||
FillForm --> SubmitForm[Submit Application]
|
||||
SubmitForm --> CheckConfirm{Confirmation Detected?}
|
||||
|
||||
CheckConfirm -- Yes --> SaveSuccess[Save success to applications.json]
|
||||
SaveSuccess --> NotifySuccess[Notify: Application Success]
|
||||
|
||||
CheckConfirm -- No --> Screenshot2[Save Screenshot & HTML]
|
||||
Screenshot2 --> SaveFail
|
||||
|
||||
NotifySuccess --> CheckClean
|
||||
NotifyFail --> CheckClean
|
||||
Skip --> CheckClean
|
||||
|
||||
%% Autoclean
|
||||
CheckClean{Time for Autoclean?}
|
||||
CheckClean -- Yes --> RunClean[Delete Debug Files >48h]
|
||||
RunClean --> Sleep
|
||||
CheckClean -- No --> Sleep[Sleep CHECK_INTERVAL]
|
||||
|
||||
Sleep --> TelegramCmd{Telegram Command?}
|
||||
TelegramCmd -- /autopilot --> ToggleAutopilot[Toggle Autopilot State]
|
||||
TelegramCmd -- /status --> ShowStatus[Show Status & Stats]
|
||||
TelegramCmd -- /plot --> GenPlot[Generate Weekly Plot]
|
||||
TelegramCmd -- /errorrate --> GenError[Generate Error Rate Plot]
|
||||
TelegramCmd -- /retryfailed --> RetryFailed[Retry Failed Applications]
|
||||
TelegramCmd -- /resetlistings --> ResetListings[Reset Seen Listings]
|
||||
TelegramCmd -- /help --> ShowHelp[Show Help]
|
||||
TelegramCmd -- None --> Loop
|
||||
|
||||
ToggleAutopilot --> Loop
|
||||
ShowStatus --> Loop
|
||||
GenPlot --> Loop
|
||||
GenError --> Loop
|
||||
RetryFailed --> Loop
|
||||
ResetListings --> Loop
|
||||
ShowHelp --> Loop
|
||||
|
||||
style Start fill:#90EE90
|
||||
style SaveSuccess fill:#90EE90
|
||||
style SaveFail fill:#FFB6C1
|
||||
style MarkPermanent fill:#FFB6C1
|
||||
style RunClean fill:#87CEEB
|
||||
style CheckAutopilot fill:#FFD700
|
||||
style Check404 fill:#FFD700
|
||||
style ButtonFound fill:#FFD700
|
||||
style CheckConfirm fill:#FFD700
|
||||
```
|
||||
|
||||
This diagram illustrates the workflow of the bot, from fetching listings to logging, notifying, and optionally applying to new listings.
|
||||
**Key Features:**
|
||||
|
||||
- **Dual Monitoring**: Tracks both InBerlin (6 companies) and WGcompany listings
|
||||
- **Smart Deduplication**: MD5 hashing prevents duplicate notifications
|
||||
- **Autopilot**: Automated application with company-specific handlers
|
||||
- **Error Handling**: 404 detection, permanent fail tracking, debug screenshots
|
||||
- **Autoclean**: Automatic cleanup of debug files every 48 hours
|
||||
- **Rich Commands**: Status, plots, retry failed, reset listings
|
||||
- **High-Res Analytics**: 300 DPI seaborn-styled plots for pattern analysis
|
||||
|
||||
## License
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue