This is part of a series on setting up an Arrow development environment. If you haven’t gone through part 1 on setting up C++, start there.
Installation
First, install archery:
pip install -e "dev/archery[lint]"
Then, add the following to your .envrc
file:
# Number of threads to use while building PyArrow C++ and Cython
export PYARROW_PARALLEL=8
# Enable/disable optional PyArrow components
export PYARROW_WITH_PARQUET=ON
export PYARROW_WITH_DATASET=ON
export PYARROW_WITH_GANDIVA=OFF
export PYARROW_WITH_S3=ON
export PYARROW_WITH_HDFS=OFF
export PYARROW_WITH_GCS=OFF
export PYARROW_WITH_PLASMA=OFF
export PYARROW_WITH_FLIGHT=OFF
export PYARROW_WITH_PARQUET_ENCRYPTION=OFF
You will need to run direnv allow and reopen VS Code again to propagate those changes. Close the VS Code and run:
direnv allow
code .
Configuring VS Code
To configure VS Code you’ll need to update .vscode/tasks.json
and .vscode/launch.json
.
Add the following tasks to .vscode/tasks.json
:
{
"type": "process",
"label": "Build Python",
"command": "python",
"args": [
"setup.py",
"build_ext",
"--inplace",
"--build-type=debug"
],
"group": "build",
"options": {"cwd": "${workspaceFolder}/python"}
},
{
"type": "process",
"label": "Test Python",
"command": "pytest",
"args": ["pyarrow"],
"group": "test",
"options": {"cwd": "${workspaceFolder}/python/"}
},
{
"type": "process",
"label": "Lint Python",
"command": "archery",
"args": [
"lint",
"--python",
"--fix"
],
"group": "test",
}
Finally, for debugger support, add the following launch configuration to .vscode/launch.json
:
{
"name": "LLDB Attach to Python",
"type": "lldb",
"request": "attach",
"program": "${command:python.interpreterPath}",
// Use `import os; os.getpid()` to get the process id.
"pid": "${command:pickProcess}",
"args": [],
"stopAtEntry": false,
"cwd": "${workspaceFolder}",
"environment": [],
"externalConsole": true,
"MIMode": "lldb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
}
How to Use the Environment
Within VS Code
You can build the Python module with:
CMD + SHIFT + B > Build Python
Run all tests:
CMD + P, type “task”, then select Test Python
Format and lint:
CMD + P, type “task”, then select Lint Python
For using the debugger, see part 4.
From CLI
To build PyArrow:
pushd python
python setup.py build_ext --inplace --build-type=debug
To test:
pytest pyarrow
To format and lint:
archery lint --python --fix
Next Steps
Now that you’ve build the C++ Arrow library and PyArrow, you can move on to the R libraries: