Using Apache Airflow for SAP S/4HANA Change Data Capture

Robin

Written By: Robin - 30 November 2023

Handling complex IT system landscapes and the variety of data they produce have posed a significant challenge for businesses and data teams for a long time. In good news, there are state of the art technologies that can help your business to process and make sense of data originating from all kinds of source systems. SAP products like SAP BW4/HANA or SAP S/4HANA are especially likely to store a lot of valuable information for all kinds of business processes, which makes extracting and evaluating it a valuable source for business insights. In a previous blog article, we described how you can utilize the leading open source technology for workflow orchestration, Apache Airflow, to continuously extract data from your SAP BW Data Warehouse. Here we want to expand on that, showing how a very similar approach can be utilized to also collect the ERP data contained in your SAP S/4HANA system in an external database using the OData adapter capabilities of SAP S/4HANA.

As with almost any use case for Apache Airflow, the flexibility it provides through its code-first approach allows for endless customization, so even if this example doesn’t cover your specific use case, we hope to provide the building blocks you can use to expand on for your very own, tailormade implementation.

Preparing the data source

SAP S/4 HANA offers a wide range of data objects that can be exposed using the OData protocol, including master data entities, transactional data, custom tables, views, standard application modules like the Production Planning module as well as CDS (Core Data Services) views like “I_PRODUCT”, “I_CUSTOMER” or “I_COMPANYCODE”. For our example use case, we want to regularly extract the data and its changes from a SAP S/4HANA CDS view. In other words, we want to implement a Change Data Capture (CDC) mechanism using the ODP-based data extraction via OData.

We do that by creating a new project for our OData service first, giving it a new technical name.

01_sap-create-project_original_Change_Data_Capture

Afterwards we prepare the OData access for this specific extractor. After choosing the data model defined by our selected CDS view, we need to prepare the extractor in the SAP S/4 HANA interface.

02_sap-step2_original_Change_Data_Capture

In the final step of the creation of the extractor, we toggle all the top level check boxes to select what will be accessible using the OData service.

03_sap-step3_original_Change_Data_Capture


Effective workflow management with
Apache Airflow 2.0

NextLytics Whitepaper Apache Airflow


 

04_sap-add-service_original_Change_Data_Capture

This header configuration is an essential part, this allows requests to return only the actual changes to the data source instead of loading the whole content each time we query data from the OData service.

05_sap-http-header_original_Change_Data_Capture

The OData service is now ready and can be called via its URL endpoint. We suggest that you create a connection object within Airflow to store the base url of your S4/HANA instance as well as the credentials to access the OData service.

When it comes to the Airflow part for interacting with the OData service, you can use the DAG we published in the previous article regarding SAP OData extraction. Just switch out the SAP connection object defined as `sap_conn` in the source code and you should be good to go.

06_gitlab_code_snippet_original_Change_Data_Capture

The benefits of change data capture for ERP data

While an ERP-system like SAP S4/HANA might not offer the sheer volume of data contained in a Data Warehouse like SAP BW, the data extracted from it can still offer significant benefits to your business. The real-time availability of ERP-data in combination with the flexible scheduling options of Apache Airflow can come in handy to provide the grounds for fast operational decision-making and responding to market changes quickly. Another benefit in getting this data straight from its source is, that it goes through fewer data transformation steps, increasing the integrity of the queried data.

SAP S/4HANA change data capture using Apache Airflow - Our Conclusion

The flexibility of both the OData protocol in its implementation in SAP platforms and Apache Airflow as orchestration service for business intelligence platforms allows for generating meaningful, real time insights into your data, empowering you to make informed decisions. The ERP-data contained in SAP S/4HANA especially lends itself to the usage for operationally efficient decision-making while maintaining a high degree of data accuracy. We at NextLytics will be happy to advise you on the best solution - for this specific use case or other challenges you might face.

Learn more about Apache Airflow

Topics: Machine Learning, Apache Airflow

Share article