Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Session.vim

# Docs
docs/changelog.md
website/versioned_docs/*/changelog.md

# Website build artifacts, node dependencies
website/build
Expand Down
3 changes: 2 additions & 1 deletion typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,6 @@ extend-exclude = [
"*.lock",
"*.min.js",
"*.min.css",
"CHANGELOG.md",
"**/CHANGELOG.md",
"**/changelog.md",
]
67 changes: 67 additions & 0 deletions website/versioned_docs/version-0.2/01-introduction/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
id: introduction
title: Overview
sidebar_label: Overview
slug: /overview
description: 'The official library for creating Apify Actors in Python, providing tools for web scraping, automation, and data storage integration.'
---

The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python.
It provides useful features like automatic retries and convenience functions that improve the experience of using the Apify API.

```python
import asyncio
from apify import Actor

async def main():
async with Actor:
actor_input = await Actor.get_input()
print('Actor input:', actor_input)
await Actor.push_data([{'result': 'Hello, world!'}])
await Actor.set_value('OUTPUT', 'Done!')

asyncio.run(main())
```

## What are Actors?

Actors are serverless cloud programs that can do almost anything a human can do in a web browser. They can do anything from small tasks such as filling in forms or unsubscribing from online services, all the way up to scraping and processing vast numbers of web pages.

Actors can be run either locally, or on the [Apify platform](https://docs.apify.com/platform/), where you can run them at scale, monitor them, schedule them, and even publish and monetize them.

If you're new to Apify, learn [what is Apify](https://docs.apify.com/platform/about) in the Apify platform documentation.

## Quick start

To create and run Actors through Apify Console, see the [Console documentation](https://docs.apify.com/academy/getting-started/creating-actors#choose-your-template). For creating and running Python Actors locally, refer to the [quick start guide](./quick-start).

## Installation

The Apify SDK for Python requires Python 3.8 or above. You can install it from [PyPI](https://pypi.org/project/apify/):

```bash
pip install apify
```

## Features

### Local storage emulation

When running Actors locally, the Apify SDK performs storage operations like `Actor.push_data()` or `Actor.set_value()` on the local filesystem, in the `storage` folder in the Actor project directory.

### Automatic configuration

When running Actors on the Apify platform, the SDK automatically configures the Actor using the environment variables the platform provides to the Actor's container. This means you don't have to specify your Apify API token, your Apify Proxy password, or the default storage IDs manually.

### Interacting with other Actors

You can interact with other Actors with useful API wrappers:
- `Actor.start(other_actor_id, run_input=...)` starts a run of another Actor.
- `Actor.call(other_actor_id, run_input=...)` starts a run and waits for it to finish.
- `Actor.call_task(actor_task_id)` starts an Actor task run and waits for it to finish.

:::note API client alternative

If you need to interact with the Apify API programmatically without creating Actors, use the [Apify API client for Python](https://docs.apify.com/api/client/python) instead.

:::
87 changes: 87 additions & 0 deletions website/versioned_docs/version-0.2/01-introduction/quick-start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
id: quick-start
title: Quick start
sidebar_label: Quick start
description: 'Get started with the Apify SDK for Python by creating your first Actor and learning the basics.'
---

Learn how to create and run Actors using the Apify SDK for Python.

---

## Step 1: Create Actors

To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli) and select one of the Python Actor templates.

```bash
apify create my-python-actor --template python-sdk
cd my-python-actor
```

This will create a new folder called `my-python-actor`, download and extract the Python SDK Actor template there, create a virtual environment in `my-python-actor/.venv`, and install the Actor dependencies in it.

## Step 2: Run Actors

To run the Actor, you can use the [`apify run` command](https://docs.apify.com/cli/docs/reference#apify-run):

```bash
apify run
```

This command:

- Activates the virtual environment in `.venv` (if no other virtual environment is activated yet)
- Starts the Actor with the appropriate environment variables for local running
- Configures it to use local storages from the `storage` folder

The Actor input, for example, will be in `storage/key_value_stores/default/INPUT.json`.

## Step 3: Understand Actor structure

The `.actor` directory contains the [Actor configuration](https://docs.apify.com/platform/actors/development/actor-config), such as the Actor's definition and input schema, and the Dockerfile necessary to run the Actor on the Apify platform.

The Actor's runtime dependencies are specified in the `requirements.txt` file, which follows the [standard requirements file format](https://pip.pypa.io/en/stable/reference/requirements-file-format/).

The Actor's source code is in the `src` folder with two important files:

- `main.py` - which contains the main function of the Actor
- `__main__.py` - which is the entrypoint of the Actor package, setting up the Actor logger and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).

```python title="src/main.py"
from apify import Actor

async def main():
async with Actor:
print('Actor input:', await Actor.get_input())
await Actor.set_value('OUTPUT', 'Hello, world!')
```

```python title="src/__main__.py"
import asyncio
import logging

from apify.log import ActorLogFormatter

from .main import main

handler = logging.StreamHandler()
handler.setFormatter(ActorLogFormatter())

apify_logger = logging.getLogger('apify')
apify_logger.setLevel(logging.DEBUG)
apify_logger.addHandler(handler)

asyncio.run(main())
```

If you want to modify the Actor structure, you need to make sure that your Actor is executable as a module, via `python -m src`, as that is the command started by `apify run` in the Apify CLI.

## Next steps

To learn more about the features of the Apify SDK and how to use them, check out the Concepts section, especially:

- [Actor lifecycle](../concepts/actor-lifecycle)
- [Working with storages](../concepts/storages)
- [Working with proxies](../concepts/proxy-management)
- [Managing Actor events](../concepts/actor-events)
- [Direct access to the Apify API](../concepts/access-apify-api)
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: Actor lifecycle
sidebar_label: Actor lifecycle
---

## Lifecycle methods

### Initialization and cleanup

At the start of its runtime, the Actor needs to initialize itself, its event manager and its storages,
and at the end of the runtime it needs to close these cleanly.
The Apify SDK provides several options on how to manage this.

#### `Actor.init()` and `Actor.exit()`

The `Actor.init()` method initializes the Actor,
the event manager which processes the Actor events from the platform event websocket,
and the storage client used in the execution environment.
It should be called before performing any other Actor operations.

The `Actor.exit()` method then exits the Actor cleanly,
tearing down the event manager and the storage client.
There is also the `Actor.fail()` method, which exits the Actor while marking it as failed.

```python title="src/main.py"
import asyncio
from apify import Actor
from apify.consts import ActorExitCodes

async def main():
await Actor.init()
try:
print('Actor input:', await Actor.get_input())
await Actor.set_value('OUTPUT', 'Hello, world!')
await Actor.exit()
except Exception as e:
print('Error while running Actor', e)
await Actor.fail(exit_code=ActorExitCodes.ERROR_USER_FUNCTION_THREW, exception=e)

asyncio.run(main())
```

#### Context manager

So that you don't have to call the lifecycle methods manually, the `Actor` class provides a context manager,
which calls the `Actor.init()` method on enter,
the `Actor.exit()` method on a clean exit,
and the `Actor.fail()` method when there is an exception during the run of the Actor.

This is the recommended way to work with the `Actor` class.

```python title="src/main.py"
import asyncio
from apify import Actor

async def main():
async with Actor:
print('Actor input:', await Actor.get_input())
await Actor.set_value('OUTPUT', 'Hello, world!')

asyncio.run(main())
```

#### Main function

Another option is to pass a function to the Actor via the `Actor.main(main_func)` method,
which causes the Actor to initialize, run the main function, and exit, catching any runtime errors in the passed function.

```python title="src/main.py"
import asyncio
from apify import Actor

async def actor_main_func():
print('Actor input:', await Actor.get_input())
await Actor.set_value('OUTPUT', 'Hello, world!')

async def main():
await Actor.main(actor_main_func)

asyncio.run(main())
```
40 changes: 40 additions & 0 deletions website/versioned_docs/version-0.2/02-concepts/02-storages.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Working with storages
sidebar_label: Storages
---

The `Actor` class provides methods to work either with the default storages of the Actor, or with any other storage, named or unnamed.

## Convenience methods for default storages

There are several methods for directly working with the default key-value store or default dataset of the Actor.

- `Actor.get_value('my-record')` reads a record from the default key-value store of the Actor.
- `Actor.set_value('my-record', 'my-value')` saves a new value to the record in the default key-value store.
- `Actor.get_input()` reads the Actor input from the default key-value store of the Actor.
- `Actor.push_data([{'result': 'Hello, world!'}, ...])` saves results to the default dataset of the Actor.

## Opening other storages

The `Actor.open_dataset()`, `Actor.open_key_value_store()` and `Actor.open_request_queue()` methods can be used to open any storage for reading and writing. You can either use them without arguments to open the default storages, or you can pass a storage ID or name to open another storage.

```python
import asyncio
from apify import Actor

async def main():
async with Actor:
# Work with the default dataset of the Actor
dataset = await Actor.open_dataset()
await dataset.push_data({'result': 'Hello, world!'})

# Work with the key-value store with ID 'mIJVZsRQrDQf4rUAf'
key_value_store = await Actor.open_key_value_store(id='mIJVZsRQrDQf4rUAf')
await key_value_store.set_value('record', 'Hello, world!')

# Work with the request queue with name 'my-queue'
request_queue = await Actor.open_request_queue(name='my-queue')
await request_queue.add_request({'uniqueKey': 'v0Ngr', 'url': 'https://example.com'})

asyncio.run(main())
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: Proxy management
sidebar_label: Proxy management
---

To work with proxies in your Actor, you can use the `Actor.create_proxy_configuration()` method, which allows you to generate proxy URLs either for the Apify Proxy, or even for your own proxies, with automatic proxy rotation and support for sessions.

```python
import asyncio
import httpx
from apify import Actor

async def main():
async with Actor:
# You can either set the proxy config manually
proxy_configuration = await Actor.create_proxy_configuration(
groups=['RESIDENTIAL'],
country_code='US',
)

# --- OR ---
# You can get the proxy config from the Actor input
actor_input = await Actor.get_input()
selected_proxy_config = actor_input['proxyConfiguration']
proxy_configuration = await Actor.create_proxy_configuration(
actor_proxy_input=selected_proxy_config,
)

# --- OR ---
# You can use your own proxy servers
proxy_configuration = await Actor.create_proxy_configuration(
proxy_urls=[
'http://my-proxy.com:8000',
'http://my-other-proxy.com:8000',
],
)

proxy_url = await proxy_configuration.new_url(session_id='my_session')

async with httpx.AsyncClient(proxies=proxy_url) as client:
response = await client.get('http://example.com')
await Actor.set_value('OUTPUT', response.text)

asyncio.run(main())
```
52 changes: 52 additions & 0 deletions website/versioned_docs/version-0.2/02-concepts/04-actor-events.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Actor events
sidebar_label: Actor events
---

The Apify platform sends several events to the Actor. If you want to work with them, you can use the `Actor.on()` and `Actor.off()` methods.

## Available events

- **`ActorEventTypes.SYSTEM_INFO`** — Emitted every minute, the event data contains info about the resource usage of the Actor.
- **`ActorEventTypes.MIGRATING`** — Emitted when the Actor running on the Apify platform is going to be migrated to another worker server soon. You can use it to persist the state of the Actor and abort the run, to speed up the migration.
- **`ActorEventTypes.PERSIST_STATE`** — Emitted in regular intervals (by default 60 seconds) to notify the Actor that it should persist its state, in order to avoid repeating all work when the Actor restarts.
- **`ActorEventTypes.ABORTING`** — When a user aborts an Actor run on the Apify platform, they can choose to abort gracefully to allow the Actor some time before getting terminated.

## Example

```python
import asyncio
from pprint import pprint
from apify import Actor
from apify.consts import ActorEventTypes

async def print_system_info(event_data):
print('Actor system info from platform:')
pprint(event_data)

async def react_to_abort(event_data):
print('The Actor is aborting!')
pprint(event_data)

async def persist_state(event_data):
print('The Actor should persist its state!')
pprint(event_data)
# Add your state persisting logic here

async def main():
async with Actor:
Actor.on(ActorEventTypes.SYSTEM_INFO, print_system_info)
Actor.on(ActorEventTypes.ABORTING, react_to_abort)
Actor.on(ActorEventTypes.PERSIST_STATE, persist_state)

# Do some work here
...

# Remove the event handler when no longer needed
Actor.off(ActorEventTypes.SYSTEM_INFO, print_system_info)

# Do some more work here
...

asyncio.run(main())
```
Loading
Loading