Arrow Development Environment Part 2: Python

arrow
dev-env
Author

Will Jones

Published

August 13, 2022

This is part of a series on setting up an Arrow development environment. If you haven’t gone through part 1 on setting up C++, start there.

Installation

First, install archery:

pip install -e "dev/archery[lint]"

Then, add the following to your .envrc file:

# Number of threads to use while building PyArrow C++ and Cython
export PYARROW_PARALLEL=8

# Enable/disable optional PyArrow components
export PYARROW_WITH_PARQUET=ON
export PYARROW_WITH_DATASET=ON
export PYARROW_WITH_GANDIVA=OFF
export PYARROW_WITH_S3=ON
export PYARROW_WITH_HDFS=OFF
export PYARROW_WITH_GCS=OFF
export PYARROW_WITH_PLASMA=OFF
export PYARROW_WITH_FLIGHT=OFF
export PYARROW_WITH_PARQUET_ENCRYPTION=OFF

You will need to run direnv allow and reopen VS Code again to propagate those changes. Close the VS Code and run:

direnv allow
code .

Configuring VS Code

To configure VS Code you’ll need to update .vscode/tasks.json and .vscode/launch.json.

Add the following tasks to .vscode/tasks.json:

    {
            "type": "process",
            "label": "Build Python",
            "command": "python",
            "args": [
                "setup.py",
                "build_ext",
                "--inplace",
                "--build-type=debug"
            ],
            "group": "build",
            "options": {"cwd": "${workspaceFolder}/python"}
        },
        {
      "type": "process",
      "label": "Test Python",
      "command": "pytest",
      "args": ["pyarrow"],
      "group": "test",
      "options": {"cwd": "${workspaceFolder}/python/"}
    },
    {
            "type": "process",
            "label": "Lint Python",
            "command": "archery",
            "args": [
                "lint",
                "--python",
                "--fix"
            ],
            "group": "test",
        }

Finally, for debugger support, add the following launch configuration to .vscode/launch.json:

    {
        "name": "LLDB Attach to Python",
        "type": "lldb",
        "request": "attach",
        "program": "${command:python.interpreterPath}",
        // Use `import os; os.getpid()` to get the process id.
        "pid": "${command:pickProcess}",
        "args": [],
        "stopAtEntry": false,
        "cwd": "${workspaceFolder}",
        "environment": [],
        "externalConsole": true,
        "MIMode": "lldb",
        "setupCommands": [
          {
            "description": "Enable pretty-printing for gdb",
            "text": "-enable-pretty-printing",
            "ignoreFailures": true
          }
        ]
      }

How to Use the Environment

Within VS Code

You can build the Python module with:

   CMD + SHIFT + B > Build Python

Run all tests:

   CMD + P, type “task”, then select Test Python

Format and lint:

   CMD + P, type “task”, then select Lint Python

For using the debugger, see part 4.

From CLI

To build PyArrow:

pushd python
python setup.py build_ext --inplace --build-type=debug

To test:

pytest pyarrow

To format and lint:

archery lint --python --fix

Next Steps

Now that you’ve build the C++ Arrow library and PyArrow, you can move on to the R libraries: