3.3.3. Packaging and Distribution#
Packaging and distributing Python applications ensures that software can be easily shared, installed, and run across different environments. This process involves preparing code for public or private distribution while handling dependencies and configurations in a standardized way. Effective packaging streamlines deployment, fosters collaboration, and enables scalability by providing consistent methods for delivering software.
The packaging and distribution process involves several key steps, each playing a critical role in ensuring compatibility, reliability, and ease of use for end users. These include preparing import packages, configuring metadata, building the package, and publishing it to indexing repositories like PyPI and Anaconda.org. Users can then download and install the package on their machines using supported package management systems such as pip. for PyPI and conda or mamba for Anaconda.org.
The rest of this page provides a general overview of key steps in the packaging and distribution of Python packages. For information about Conda packages, see Packaging and Distribution in the Anaconda ecosystem.
Learn More
For more detailed information, see:
Python Packaging User Guide—The official Python packaging guide by PyPA.
pyOpenSci Python Package Guide—Packaging guides and tutorials for scientific packages by the pyOpenSci community.
3.3.3.1. Package Structure#
The first step in packaging involves organizing the project’s code into one or several import packages. This requires developers to follow a specific directory structure and naming scheme, so that the package and its components can be correctly recognized by Python’s import system. Organizing code into reusable and logically structured packages also makes it easier for developers to maintain and extend their projects. Moreover, proper use of namespaces and directory structures is essential for clarity and functionality.
Python packages are directories containing a special __init__.py
file
(except for namespace packages),
which marks them as importable modules.
The name of this directory defines the import name of the package,
and must be a valid Python identifier
and should follow the naming conventions
defined in PEP 8.
All the source code of the package must be placed inside this directory,
organized into subpackages and modules, which can be further nested to any depth.
3.3.3.2. Configuration and Metadata#
Import packages must define additional build settings and package metadata, allowing them to be built into binaries and installed on other machines.
History
Previously, building Python packages was done by
distutils—the
original Python packaging system—which used a
setup.py
file
for configurations.
As Setuptools
started replacing distutils,
it added its own setup.cfg
file
to enable declarative configurations in .ini
format and reduce boilerplate code.
Additionally, PEP 517 and
PEP 518 proposed
standardization of configurations in a build-system independent format,
using a pyproject.toml
file. First introduced in 2016, this new standard
was established in 2021, after the acceptance and implementation of
PEP 621 and
PEP 660.
While setup.py
and setup.cfg
files are still valid configuration files for Setuptools,
it is highly recommended
to use pyproject.toml
for defining static configurations and metadata in a declarative format.
All package configurations can be declaratively defined
in the standardized pyproject.toml
file specification.
Written in TOML format, the pyproject.toml
file defines
three main tables for build system dependencies, project metadata, and tool configuration.
3.3.3.2.1. Build System Dependencies#
The build-system
table can be used to specify dependencies required to execute the build.
This includes the packaging system, e.g., setuptools,
hatch,
flit,
poetry,
or pdm,
as well as other build tools
and plugins, such as setuptools-scm for versioning.
The table can also contain specific configurations
for the selected build backend, such as the location of the source code,
files to include/exclude, and how to handle different aspects of the build process.
The exact format and syntax of these configurations depend on the selected build backend.
3.3.3.2.2. Project Metadata#
The `project table specifies the project’s core metadata, including:
Name: The name of the package on the online repository, used by the package manager to uniquely identify and locate the package. The package name must follow the PyPA specifications introduced in PEP 503 and PEP 508.
Version: The version identifier of the package, used by the package manager to identify and install the correct version of the package. It must be a valid public version identifier according to the PyPA specifications first introduced in PEP 440, and must be incremented for every new release of the package, following specific rules.
Python Version: The minimum Python version required by the package, used by the package manager to ensure that the package is compatible with the user’s Python interpreter. It must be a valid version specifier according to the PyPA specifications, and must be incremented whenever the package drops support for older Python versions.
Dependencies: The required and optional dependencies of the package (i.e., other software that the package depends on to function correctly), which are automatically installed by the package manager along with the package. These must be specified in a standardized format defined in PEP 508, and must be kept up to date and synchronized with the dependencies used in the source code. Note that only packages available from PyPI are allowed.
Entry Points: The entry points of the package, such as console scripts, GUI scripts, and other callable objects, which are automatically registered by the package manager and made available to the user. These must follow the PyPA specifications, and must refer to actual objects (e.g., functions) defined in the source code.
In addition, several other metadata must be provided so that the online package index can correctly categorize and display the package, facilitating its discovery by users, and providing them with a clear overview of the project. These include:
Description: A short description of the package, which is displayed on the package index and used by the package manager to provide a brief overview of the project.
Keywords: A list of keywords describing the package, which are used by the package index to categorize the package, and help users find it through various search and filtering options.
License: The license of the package, so that users can know under which terms they can use the project.
Authors and Maintainers: Names and emails of the authors and maintainers of the package, so that users can know who is responsible for the project and how to contact them.
Project URLs: A list of URLs related to the project, such as the project’s homepage, documentation, source code, issue tracker, and changelog, which are displayed on the package index and used by the package manager to provide users with additional information and resources for the project.
Classifiers: A list of Trove classifiers as defined in PEP 301, to describe each release of the package (e.g., development status, supported Python versions and operating systems, project topics, intended audience, natural language, license, etc.). These standardized classifiers are used by the package index to categorize the package, and help users find it through various search and filtering options.
README: A README file similar to the repository’s README, containing a detailed and up-to-date description of the package, which is displayed on the package index to provide users with a clear overview of the project. As the first thing that users notice when viewing the project on the package index, it is crucial to have an informative, engaging, and visually appealing README that captures the attention of visitors and provides them with all the necessary information and resources for the project. Both PyPI and Anaconda.org support markup languages such as Markdown and reStructuredText for defining the contents of the README file. However, like GitHub, they impose several restrictions on the supported features, and perform additional post-processing and sanitization after rendering the contents to HTML. For example, PyPI uses the Readme Renderer library to render the README file, which only supports a limited subset of HTML tags and attributes. Since these do not completely overlap with the features supported by GitHub, a separate PyPI-friendly README must be provided for PyPI, to ensure that the contents are correctly rendered on the package index.
3.3.3.2.3. Tool Configuration#
The tool
table can contain arbitrary configurations for tools and services used in the entire project,
including but not limited to build tools, linters, formatters, and testing tools.
Each tool defines its own configuration structure, which can be added in a sub-table within tool
.
3.3.3.3. Build#
Import package(s) must be transformed into distribution packages, which are versioned archives containing the import packages and other required files and resources. Distribution packages are the files that are actually uploaded to the online package index, to be downloaded and installed by the end-users via package managers. There are two major distribution formats for Python packages:
Source Distributions or sdists are
tar.gz
archive files providing the source code of the package, along with the required configuration files, metadata, and resources that are needed for generating various built distributions.Built Distributions are binary archives containing files and metadata that only need to be moved to the correct location on the target system, to be installed. Wheel is the standard binary distribution format for Python packages, designed to expedite the installation process by eliminating the need for building packages from source. A wheel archive contains all the necessary files for a Python package, including compiled binaries for specific platforms if needed. As a platform-independent standard defined in PEP 427, wheels are widely supported by tools like pip, replacing the older Egg format. Wheels can be either platform-independent or platform-specific, depending on whether the package is pure-Python or contains compiled extensions.
PyPA recommends always uploading a source distribution to PyPI, along with one or more built distributions for each supported platform. These can be generated using the build (for source distributions and pure-Python wheels) and cibuildwheel (for platform-specific wheels) packages provided by PyPA.
3.3.3.4. Publication#
The final step in the process involves publishing the package to a repository, such as PyPI, for distribution. For PyPI, this requires developers to create an account on the platform, generate an API token for authentication, and use the twine library to upload the packages. Alternatively, trusted publishing (OpenID Connect standard) can be used in conjunction with the PyPI publish GitHub Action to publish packages directly from GitHub Actions, without the need for an API token.
Sdists and wheels can also be published to any other supporting repository. For example, they can be uploaded to pypi.anaconda.org using the Anaconda Client.