Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
.serena
.windsurf
.zed-ai
AGENTS.md
CLAUDE.md
GEMINI.md
AGENTS.local.md
CLAUDE.local.md
GEMINI.local.md

# Cache
__pycache__
Expand Down
116 changes: 116 additions & 0 deletions .rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Coding guidelines

This file provides guidance to programming agents when working with code in this repository.

## Project Overview

The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.10–3.14. Build system: hatchling.

## Common Commands

```bash
# Install dependencies (including dev)
uv sync --all-extras

# Install dev dependencies + pre-commit hooks
uv run poe install-dev

# Format code (also auto-fixes lint issues via ruff check --fix)
uv run poe format

# Lint (format check + ruff check)
uv run poe lint

# Type check
uv run poe type-check

# Run all checks (lint + type-check + unit tests)
uv run poe check-code

# Unit tests (no API token needed)
uv run poe unit-tests

# Run a single test file
uv run pytest tests/unit/actor/test_actor_lifecycle.py

# Run a single test by name
uv run pytest tests/unit/actor/test_actor_lifecycle.py -k "test_name"

# Integration tests (needs APIFY_TEST_USER_API_TOKEN)
uv run poe integration-tests

# E2E tests (needs APIFY_TEST_USER_API_TOKEN, builds/deploys Actors on platform)
uv run poe e2e-tests
```

## Code Style

- **Formatter/Linter**: Ruff (line length 120, single quotes for inline, double quotes for docstrings)
- **Type checker**: ty (targets Python 3.10)
- **All ruff rules enabled** with specific ignores — see `pyproject.toml` `[tool.ruff.lint]` for the full ignore list
- Tests are exempt from docstring rules (`D`), assert warnings (`S101`), and private member access (`SLF001`)
- Unused imports are allowed in `__init__.py` files (re-exports)
- **Pre-commit hooks**: lint check + type check run automatically on commit

## Architecture

### Core (`src/apify/`)

- **`_actor.py`** — The `_ActorType` class is the central API. `Actor` is a lazy-object-proxy (`lazy-object-proxy.Proxy`) wrapping `_ActorType` — it acts as both a class (e.g. `Actor.is_at_home()`) and an instance-like context manager (`async with Actor:`). On `__aenter__`, the proxy's `__wrapped__` is replaced with the active `_ActorType` instance. It manages the full Actor lifecycle (`init`, `exit`, `fail`), provides access to storages (`open_dataset`, `open_key_value_store`, `open_request_queue`), handles events, proxy configuration, charging, and platform API operations (`start`, `call`, `metamorph`, `reboot`).

- **`_configuration.py`** — `Configuration` extends Crawlee's `Configuration` with Apify-specific settings (API URL, token, Actor run metadata, proxy settings, charging config). Configuration is populated from environment variables (`APIFY_*`).

- **`_charging.py`** — Pay-per-event billing system. `ChargingManager` / `ChargingManagerImplementation` handle charging events against pricing info fetched from the API.

- **`_proxy_configuration.py`** — `ProxyConfiguration` manages Apify proxy setup (residential, datacenter, groups, country targeting).

- **`_models.py`** — Pydantic models for API data structures (Actor runs, webhooks, pricing info, etc.).

### Storage Clients (`src/apify/storage_clients/`)

Four storage client implementations, all implementing Crawlee's abstract storage client interface:

- **`_apify/`** — `ApifyStorageClient`: talks to the Apify API for dataset, key-value store, and request queue operations (separate sub-clients for single vs. shared request queues). Used when running on the Apify platform.
- **`_file_system/`** — `FileSystemStorageClient` (alias `ApifyFileSystemStorageClient`): extends Crawlee's file system client with Apify-specific key-value store behavior.
- **`_smart_apify/`** — `SmartApifyStorageClient`: hybrid client that writes to both API and local file system for resilience.
- **`MemoryStorageClient`** — re-exported from Crawlee for in-memory storage.

### Storages (`src/apify/storages/`)

Re-exports Crawlee's `Dataset`, `KeyValueStore`, and `RequestQueue` classes.

### Events (`src/apify/events/`)

- **`_apify_event_manager.py`** — `ApifyEventManager` extends Crawlee's event system with platform-specific events received via WebSocket connection.

### Request Loaders (`src/apify/request_loaders/`)

- **`_apify_request_list.py`** — `ApifyRequestList` creates request lists from Actor input URLs (supports both direct URLs and "requests from URL" sources).

### Scrapy Integration (`src/apify/scrapy/`)

Optional integration (`apify[scrapy]` extra) providing Scrapy scheduler, middlewares, pipelines, and extensions for running Scrapy spiders as Apify Actors.

### Key Dependencies

- **`crawlee`** — Base framework providing storage abstractions, event system, configuration, service locator pattern
- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`)
- **`apify-shared`** — Shared constants and utilities (`ApifyEnvVars`, `ActorEnvVars`, etc.)

## Testing

Three test levels in `tests/`:

- **`unit/`** — Fast tests with no external dependencies. Use mocked API clients (`ApifyClientAsyncPatcher` fixture). Run with `uv run poe unit-tests`.
- **`integration/`** — Tests making real Apify API calls but not deploying Actors. Requires `APIFY_TEST_USER_API_TOKEN`. Run with `uv run poe integration-tests`.
- **`e2e/`** — Full end-to-end tests that build and deploy Actors on the platform. Slowest. Requires `APIFY_TEST_USER_API_TOKEN`. Use `make_actor` and `run_actor` fixtures. Run with `uv run poe e2e-tests`.

All test levels use `pytest-asyncio` with `asyncio_mode = "auto"` (no need for `@pytest.mark.asyncio`). Tests run in parallel via `pytest-xdist` (`--numprocesses`). Each test gets isolated state via the autouse `_isolate_test_environment` fixture which resets `Actor`, `service_locator`, and `AliasResolver` state. Conftest files live in each subdirectory (`tests/unit/conftest.py`, etc.) — there is no top-level `tests/conftest.py`.

### Key Test Fixtures

- **`apify_client_async_patcher`** (unit) — `ApifyClientAsyncPatcher` instance for mocking `ApifyClientAsync` methods. Patch by `method`/`submethod`, tracks call history in `.calls`.
- **`make_httpserver`/`httpserver`** (unit) — session-scoped `HTTPServer` via `pytest-httpserver` for HTTP interception.
- **`apify_client_async`** (integration/e2e) — real `ApifyClientAsync` using `APIFY_TEST_USER_API_TOKEN`.
- **`make_actor`** (e2e) — creates a temporary Actor on the platform from a function, `main_py` string, or source files dict; cleans up after the session.
- **`run_actor`** (e2e) — calls an Actor and waits up to 10 minutes for completion.
1 change: 1 addition & 0 deletions AGENTS.md
1 change: 1 addition & 0 deletions CLAUDE.md
1 change: 1 addition & 0 deletions GEMINI.md