The new major release of the Apache Airflow workflow management platform will soon be one year old. If you haven't considered upgrading before, now is a good time to do so. Moving to the new major release brings new features, faster workflow execution, and improved security. Nevertheless, upgrading involves effort and carries the risk of data loss and temporary system downtime.
To ensure that nothing stands in the way of a problem-free changeover, in this articel we’ll provide you with practical tips on what to bear in mind during the upgrade.
New features in Apache Airflow 2.1
To increase the motivation to upgrade, it's worth taking a look at the new features. As an open-source tool, Apache Airflow is further developed by a wide-ranging community, which develops feature ideas from its own use cases. The new features thus promise user-centric workflow management at the pulse of current needs. What does this mean in concrete terms in version 2.1?
- Improved Scheduler
Since the major release, the scheduler component has undergone a major overhaul. For larger installations, multiple scheduler instances can now be used, working together in an active-active design. The introduced optimizations lead to a convincing speed advantage even in the case of a single scheduler.
- Expanded API
While the old API remains intact for the time being, a new REST API based on the OpenAPI specification offers new possibilities for automation and integration scenarios between used systems. Variables, connections, workflows and users are conveniently controlled via an HTTP request. Access authorization can easily be done via user interface authentication.
Effective Workflow Management with
Apache Airflow 2.0
- User-friendly monitoring
With two new functions, you can keep a perfect overview of your workflows. In the calendar view, you can see the success rate of your workflows broken down graphically by day of the week and month.
As soon as you want to follow a workflow live, the auto-refresh functionality comes in handy: The status of the tasks are automatically updated in the graph view and the page does not have to be reloaded.
Of course, further functionality will be added as part of the lifecycle of the new major release. For example, in the current development pipeline is a more extensive possibility to define the execution times of a workflow. Here, the last working day of the month will be a valid selection in the future.
Relevant changes for users
Some aspects remain the same in the new major version of Airflow. Accordingly, it is still possible to start a workflow via the web interface and the workflows are still available in a Python file. Typical use cases remain and Apache Airflow best practices are still valid.
What is new, however, is the revised, modern web user interface. This is also the focus in the further development of Apache Airflow.While initially only the design was adapted, in the future Airflow will be transformed into a modern web application with an appealing technology stack.
Also relevant for users is the separation of Apache Airflow into a main application and provider packages. This measure is intended to keep the installation more lightweight. If Airflow is loaded locally on a single server, the Kubernetes package can be left out. The provider packages are also detached from the official Airflow releases and bug fixes are implemented faster. For users, the separation manifests itself in the customization of operator paths in new and old workflows.
Practical tips for the changeover
For a successful upgrade, you can refer to the detailed documentation provided by Airflow. The necessary steps and background information are summarized there. Nevertheless, you should take the following tips to heart:
- Data security comes first
Create a backup of the used database to avoid losing your historical workflow runs. A copy of the configuration file also helps during the upgrade to take all settings with you. This prevents unexpected behavior.
- Iterative migration via bridge release 1.10.15
With the bridge release, the new import paths of the provider packages can be tested in advance. The adaptation of the workflows can thus be prepared in a production-ready environment.
- Using the upgrade check
The built-in upgrade check provides a list of potential problems that must be fixed before the upgrade. This function must be installed separately.
- Customizing the configuration file
After the upgrade, the database upgrade (airflow db upgrade) provides further hints regarding the configuration file. Since many settings are moved, it is more time-saving to edit the official configuration file from the Github repo and replace the old one.
We wish you the best of success with the upgrade to the new major version of Apache Airflow. If you need assistance with the transition or implementing an integration scenario with the new API, feel free to contact us. Our consultants are ready to assist you with practical knowhow.