🏠 Apartment & flat listings scraper for the Marvelous City
aperj searches across 15 Brazilian real-estate websites simultaneously, collects apartment/flat listings, and presents results in a beautiful rich-text table on the terminal or exports them to CSV.
- 15 sources scraped in parallel using
asyncio+aiohttp - Beautiful rich-text terminal output via the
richlibrary - CSV export for spreadsheet analysis
- YAML-based configuration with sensible defaults
- Per-source cookies for sites that require authentication
- Modular, extensible architecture - one module per source
- CLI powered by
argparse
🗺️ Not in Rio? Although
aperjwas designed with Rio de Janeiro in mind, it can search for apartments in any location in Brazil. Just editconfig.ymlto change theregionfield (e.g."São Paulo, SP"), disable sources that are exclusive to Rio (such asportalrjimoveis,judicearaujo, orpatrimovel), and optionally add real-estate source modules for your state. That's it - the nationwide sources (ZAP, VivaReal, OLX, Mercado Livre, …) will work anywhere.
| # | Source | Website |
|---|---|---|
| 1 | ZAP Imóveis | https://www.zapimoveis.com.br |
| 2 | VivaReal | https://www.vivareal.com.br |
| 3 | ImovelWeb | https://www.imovelweb.com.br |
| 4 | OLX | https://www.olx.com.br |
| 5 | Nuroa | https://www.nuroa.com.br |
| 6 | Nestoria | https://www.nestoria.com.br |
| 7 | Portal RJ Imóveis | http://www.portalrjimoveis.com.br |
| 8 | Imoveis.net | https://www.imoveis.net |
| 9 | Mercado Livre | https://imoveis.mercadolivre.com.br |
| 10 | Judice & Araujo | https://www.judicearaujo.com.br |
| 11 | Lopes | https://www.lopes.com.br |
| 12 | Patrimóvel | https://www.patrimovel.com.br |
| 13 | Cyrela | https://www.cyrela.com.br |
| 14 | Even | https://www.even.com.br |
| 15 | MRV | https://www.mrv.com.br |
# Clone and install in editable mode
git clone <repo-url> && cd aperj
pip install -e ".[dev]"pre-commit installaperj --init-configThis creates:
~/.config/aperj/config.yml- main configuration file~/.config/aperj/cookies/- directory for per-source cookie files
# ~/.config/aperj/config.yml
keywords: []
region: "Rio de Janeiro, RJ"
max_results_per_source: 200
output:
format: rich
csv_path: apes.csv
sources:
zapimoveis:
enabled: true
vivareal:
enabled: true
# … etc.
auth:
# source_name:
# username: you@example.com
# password: secret# Uses keywords from config
aperj
# Override / add keywords from CLI
aperj --keywords "cobertura duplex" "zona sul"
# Only specific sources
aperj --sources zapimoveis vivareal olx
# Export to CSV
aperj --export apes.csv
# Verbose (INFO-level logging)
aperj -v
# Debug-level logging
aperj -vvusage: aperj [-h] [--version] [--init-config] [--config CONFIG]
[--keywords KEYWORDS [KEYWORDS ...]]
[--sources SOURCES [SOURCES ...]]
[--export FILE] [--no-rich] [--max-results MAX_RESULTS]
[--verbose] [--flaresolverr URL]
[--min-price BRL] [--max-price BRL]
[--min-condo BRL] [--max-condo BRL]
[--min-iptu BRL] [--max-iptu BRL]
[--min-area-m2 M2] [--max-area-m2 M2]
[--min-bedrooms N] [--max-bedrooms N]
[--min-suites N] [--max-suites N]
[--min-bathrooms N] [--max-bathrooms N]
[--min-parking N] [--max-parking N]
[--listing-type {venda,aluguel}]
[--property-type TYPE [TYPE ...]]
[--neighborhood NAME [NAME ...]]
Options:
--version Show version and exit
--init-config Create a default config file and exit
--config CONFIG Path to config file (default: ~/.config/aperj/config.yml)
-k, --keywords ... Search keywords (merged with config keywords)
-s, --sources ... Only scrape these sources (by name)
--export FILE Write results to a CSV file
--no-rich Suppress pretty-printed table output
--max-results N Override max results per source
-v, --verbose Increase logging verbosity (-v INFO, -vv DEBUG)
--flaresolverr URL FlareSolverr endpoint for Cloudflare-protected sites
Filters (post-scrape):
--min-price BRL Minimum price in BRL (e.g. 200000)
--max-price BRL Maximum price in BRL (e.g. 800000)
--min-condo BRL Minimum condo fee in BRL
--max-condo BRL Maximum condo fee in BRL
--min-iptu BRL Minimum IPTU in BRL
--max-iptu BRL Maximum IPTU in BRL
--min-area-m2 M2 Minimum area in square metres
--max-area-m2 M2 Maximum area in square metres
--min-bedrooms N Minimum number of bedrooms
--max-bedrooms N Maximum number of bedrooms
--min-suites N Minimum number of suites
--max-suites N Maximum number of suites
--min-bathrooms N Minimum number of bathrooms
--max-bathrooms N Maximum number of bathrooms
--min-parking N Minimum number of parking spots
--max-parking N Maximum number of parking spots
--listing-type TYPE Keep only venda (sale) or aluguel (rent)
--property-type TYPE Keep only these property types (e.g. apartamento cobertura)
--neighborhood NAME Keep only listings in these neighborhoods (substring match)
# Apartments up to R$ 800k with at least 2 bedrooms
aperj --max-price 800000 --min-bedrooms 2
# Rentals only, in Copacabana or Ipanema, area ≥ 60 m²
aperj --listing-type aluguel --neighborhood Copacabana Ipanema --min-area-m2 60
# Only coberturas (penthouses) with parking
aperj --property-type cobertura --min-parking 1
# Combine filters: 2–3 bedrooms, price between R$ 400k and R$ 700k
aperj --min-price 400000 --max-price 700000 --min-bedrooms 2 --max-bedrooms 3Note: Filters are applied after scraping. Listings where the filtered field is unknown (
None) are excluded - if a source doesn't report an area, for instance,--min-area-m2will drop those listings.
Some sources may require authentication. You can provide credentials in config.yml:
auth:
zapimoveis:
username: you@example.com
password: yourpasswordFor sites that need browser cookies (e.g. after an SSO / Google login):
- Log in to the site in your browser.
- Export the cookies (e.g. with a browser extension like EditThisCookie).
- Save them to
~/.config/aperj/cookies/<source_name>in Netscape/curl format or simplekey=valuelines.
# Example: ~/.config/aperj/cookies/zapimoveis
session_id=abc123
csrf_token=xyz789
- Create
aperj/sources/mysite.pywith a class inheriting fromBaseSource. - Implement
_do_scrape(self, keywords: list[str]) -> list[Listing]. - Register the class in
aperj/sources/__init__.py. - Add a
mysiteentry undersources:in the default config.
from aperj.models import Listing
from aperj.sources.base import BaseSource
class MySiteSource(BaseSource):
name = "mysite"
base_url = "https://www.mysite.com.br"
async def _do_scrape(self, keywords: list[str]) -> list[Listing]:
async with self._build_session() as session:
html = await self._fetch(session, f"{self.base_url}/search?q={'+'.join(keywords)}")
soup = self._soup(html)
# … parse and return list[Listing]
return []Some sources (notably ImovelWeb) are behind Cloudflare managed challenges that block automated requests. When aperj encounters a 403 response it already tries two levels of fallback automatically:
- aiohttp - standard HTTP request
- curl_cffi - retries with a real browser TLS fingerprint (bypasses basic bot detection)
However, sites with aggressive Cloudflare protection will block both approaches, especially from cloud/data-centre IPs. For these sources you need a third layer: FlareSolverr, a proxy server that runs a real browser to solve Cloudflare challenges on your behalf.
The easiest way to run FlareSolverr is via Docker:
docker run -d \
--name flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latestVerify it's running:
curl http://localhost:8191/You can point aperj at your FlareSolverr instance in two ways:
Option 1 - CLI flag (one-off):
aperj --flaresolverr http://localhost:8191/v1Option 2 - Configuration file (persistent):
Add the flaresolverr_url key to ~/.config/aperj/config.yml:
flaresolverr_url: "http://localhost:8191/v1"With FlareSolverr configured, the fallback chain becomes:
- aiohttp -> 403? ->
- curl_cffi (TLS impersonation) -> still 403? ->
- FlareSolverr (real browser solves the challenge)
Without FlareSolverr configured, Cloudflare-protected sources will simply fail gracefully and the remaining sources will continue to work normally.
