A few weeks ago we have written about SAP’s Virtual Data Model and how well it fits together with SAP Datasphere. Today we want to dive deeper into a particularly interesting aspect we have briefly mentioned there: Simplifying Data Integration with Change Data Capture.
What is Change Data Capture (CDC) in CDS Views and why is it important?
An extraction enabled S/4HANA CDS View allows the corresponding data to be exported by SAP Datasphere (among others) to satisfy your reporting needs. With large and numerous data sets however it becomes inefficient and time consuming to reload the full data set (full load).
For S/4HANA, SAP has provided a sophisticated mechanism allowing efficient data replication out of the box, without much manual development. This mechanism is called Change Data Capture, which detects and captures any changes in the source system (S/4HANA), that can in turn be periodically picked up by consuming systems.
Using SAP Datasphere with Replication Flows as such a consuming system, only the changed data is transferred (delta load) and thus very fast load times are achieved. This has the side effect of making more frequent loading possible with near real-time reporting being a very feasible approach.
One example of a SAP delivered CDS View that already includes CDC out-of-the-box is “C_SalesDocumentItemDEX_1”, which exposes for consumption Sales Documents like Orders. This CDS View can be extracted by Datasphere without any manual changes or development efforts on the S/4HANA side.
But not all CDS Views that SAP predelivers are created in the same way. For example the Address CDS View (“I_Address”) that contains additional address information lacks the required CDC definition. So by default, the Address information could only be extracted by a scheduled full load. To avoid that, we will showcase how the CDC definition can be added manually with a few simple S/4HANA Development steps.
Everything shown in the following is part of the extension Framework for CDS Views that SAP has envisioned and outlined. This is the definitively supported and intended approach to extend this functionality.
Scenario
First we compare the definitions of both CDS Views. In the following screenshots we have highlighted the CDC definition part, which is present in SalesDocuments, but not in Addresses. Our goal will be to build a similar CDC definition for I_Address. Since SAP doesn’t allow customer changes to pre-delivered, SAP managed, CDS Views, we will create a new (Customer Managed) CDS View, that consumes the existing I_Address. We will then enhance that new CDS View with specific annotations for the CDC definition.
Pillars of a Custom View
When creating a new view like e.g Z_I_Address in this case, there are two areas you should focus on:
- Annotations
Area where you have to define all the necessary ‘’attributes’’, labels or semantics of your View. This is also where the CDC definition resides. (Annotations are highlighted in blue - see screenshot for reference) - View Definition
Area where you must Select fields, filter or transform data. For our goal we want to SELECT everything since the purpose of the View is only to apply CDC and not exclude any existing field.
Note: It is important to first SELECT all the key fields from the underlying table in order for the CDC you’re about to build to work - otherwise it will fail
Building the CDC definition
The key component for the CDC definition is the mapping of the tables and the necessary filters. To find out which tables are relevant for CDC (and hence should be mapped here), you can simply navigate through the source of each underlying CDS View until you arrive at the base level - the table(s). This can be more or less complicated depending on the complexity of the CDS View, but most of them will be a simple 1-1 scenario.
In our case the core table is “ADRC” - that’s why we have set it’s “role” as “#MAIN”. The ‘’tableElement and the ‘’viewElement’ annotations should respectively contain the name of the key fields of the table and their alias’ (names) in the current View definition.
If your CDS View is a more complex case, the CDC definition can account for that complexity. We have shown in the screenshot how a “filter” annotation and a “role”: “#LEFT_OUT_TO_ONE_JOIN” annotation can be used. Generally all of these additional properties are optional. Ultimately they govern how your CDC mechanism will function.
If you do not define a filter here, your CDC mechanism will trigger for every change in the table.
With a filter, only changes that match that filter will trigger the CDC mechanism, and any other changes will be ignored.
Similarly, if you do not define a second table (“ADCP”), then any changes to the data in that table will not trigger a corresponding change for the consumer (Datasphere), regardless of any joins you might have in the View definition outside the annotations. The CDC mechanism will then be solely based on changes in our main table “ADRC”. All of these scenarios could match the desired loading behavior and as such depend on your requirements in each case.
Note: In order to expose your new CDS View and consume it via e.g Datasphere you have to set the “dataExtraction” annotations as “enabled:true”.
Watch the recording of our webinar:
"SAP Datasphere and the Databricks Lakehouse Approach"
CDC Everywhere?
Keep in mind that every CDC enabled CDS View you are replicating via Replication Flows will consume additional resources on both the source (S/4HANA) and the target (Datasphere) side.
On the source side database triggers are set on the corresponding tables, which make any related write operation take longer.
Additionally the Replication Flow will periodically check the source to pull any new changes, which requires a process in the target and the source.
This strain can add up quickly when we poll many objects with a high frequency.
While the CDC mechanism is really useful, there are scenarios where the benefit is minimal. For instance, it's feasible to apply CDC for text Views. However, that dataset is usually quite small and barely changes, hence a full load would be more than enough.
Another aspect is the stability contract of the involved CDS Views. It’s recommended to use at least C1 released Views (regardless of CDC or full replication). Otherwise you are not guaranteed that the structure of the View will stay untouched with future S/4HANA releases, which can at minimum break your replication. Regardless it sometimes cannot be avoided to use unreleased CDS Views, but be aware of the risks.
Delta mechanism (CDC) on a CDS View: Our Conclusion
The delta mechanism via Change Data Capture (CDC) is a powerful tool for efficient data replication from S/4HANA to SAP Datasphere & Business Data Cloud as a whole.
While SAP provides many pre-delivered CDS Views for this purpose, we have showcased part of the SAP recommended extension approach for adding the same functionality manually to CDS Views that lack the CDC definition.
This process however consumes additional resources on both the source and target system side. Therefore, it is important to consider for each scenario beforehand whether a delta load provides a significant enough benefit over a full load.
Do you have questions about SAP Datasphere? Are you trying to build up the necessary know-how in your department or do you need support with a specific issue?
Please do not hesitate to contact us. We look forward to exchanging ideas with you!
FAQ
CDC (Change Data Capture) transfers only the changes instead of the entire dataset. This reduces load times, saves resources, and enables near real-time reporting in SAP Datasphere.
No. Some CDS views, such as for sales documents, already come with CDC enabled. Others, like the I_Address view, do not and require a custom Z_* wrapper to enable CDC.
You can create a custom Z_* view on top of the standard view and additionally supply the necessary CDC annotation.
Yes. CDC creates database triggers and polling overhead. This can slow down write operations on source tables. It’s best to enable CDC only where real-time updates provide tangible business value or where the total data volume is too large for a periodic full load.
For datasets with very few changes (e.g., static text tables), the overhead of CDC isn’t justified. In those cases, periodic full loads are simpler and more efficient.
SAP Data Warehouse, Datasphere
