How to leverage SAP Databricks to integrate ServiceNow Data

Apostolos04 September 2025 7 min read

Recently, SAP and Databricks announced their strategic partnership delivering Databricks as a native component in the SAP Business Data Cloud. The seamless communication between the two systems enables data practitioners to develop data science solutions, leverage machine learning capabilities like tracking experiments, versioning of deployed models and of course benefit from Delta Lake, the powerful open source storage framework. Bringing Databricks workspaces into the SAP ecosystem can be a shortcut for 2 topics that have been notoriously difficult before: low-cost but readily accessible mass storage and native tooling to ingest third party data. This article shows one example of how you now can achieve the latter without any additional licensed integration tools and products.

In a recent article, we provided the bird’s eye view of SAP Databricks integration and answered some key questions regarding the strategic significance, the architecture and some potential limitations. Now it’s time for a hands-on example. We will develop a simple data application that pulls data from the ServiceNow API as a real-world example and ingests into a historized table in SAP Datasphere. The process will be scheduled to run in a predetermined timeframe, leveraging the scheduling capabilities of Databricks, where monitoring and alerts are available out-of-the-box. Let’s start with the architectural overview of the application.

Architecture

The application consists of a single Python notebook that orchestrates the data flow. Utilizing notebooks allows for cooperation between engineers and data analysts, and offers a quick and easy way to schedule the entire process. The diagram below outlines the flow of the data, and how each component relates to the others.

ServiceNow_Dataflow_Diagram

The source system is ServiceNow, a cloud-based platform that helps organizations automate and manage digital workflows across IT, HR, customer service, and other business functions. We will query the REST API and retrieve customer support tickets in an incremental manner; only the recently updated entries will be returned each time, reducing the size of the payload significantly. The destination will be a historized table in Databricks or SAP Datasphere. A historized table tracks historical changes in dimensional data over time by specifying the period that each record is considered active. When a change is identified in a certain record, a new entry is added and the most recent is retired. A typical SCD type 2 historized table consists of two metadata columns (valid_from and valid_to timestamps), natural and surrogate keys and multiple other attributes which are subject to change.

Typical_SCD_type_2_historized_table
Such historized tables can be easily created and maintained through dlt (data load tool), an open source Python library that offers lightweight APIs for data loading from multiple sources, storing in popular database formats or cloud storage and offering an intuitive way to track changes in datasets. It is just a matter of defining our source and target, ServiceNow REST API and SAP Datasphere in this case, provide some configuration and let the pipeline run. dltHub will take care of data extraction from source in an incremental way and ingestion to the target system.

Databricks vs Fabric vs SAP Business Data Cloud vs Dremio -
Download the Whitepaper here!

Implementation

The application consists of three simple methods that define the source configuration, connect to the destination (SAP Datasphere) and run the dltHub pipeline. Sensitive information like credentials can be stored securely in Databricks’ secret scope and retrieved within the notebook.

Databricks_SAP_BDC

Scheduling the process is straightforward and can be achieved through the SAP Databricks UI. It is possible to define complex cron expressions for the interval, receive notification emails on success and failure, and even provide parameters for the execution.

SAP_Databricks_UI

The result will be a fully historized table with data from the ServiceNow table API. Every subsequent pipeline execution will only ingest new data and will keep track of changes and updates in existing records based on the defined composite key.

Interoperability in Practice

The presented approach is highly portable and a prime example of what systems’ interoperability in an ecosystem of open API specifications and open source tools can achieve: The Python dlt module we use for ELT-style data loading comes with numerous source and destination systems connections predefined and harmonizes all data transports to a minimum standard. Instead of extracting data from ServiceNow API, we could easily switch to any REST API, relational database, or object storage source. Salesforce, Shopify, Google Analytics, Jira, Asana and many others are all supported data sources. On the other hand, configuring different destinations is just a minor change in pipeline configuration and opens up various options to work with Databricks and SAP systems. Databricks Unity Catalog is a typical destination for data loads and can easily be leveraged as a persistent, fully replayable permanent record of changes retrieved from the source. Keep a Delta Table like this as the archive layer and build any refined downstream data objects from there.

SAP systems can be directly written to leveraging SAP HANA ODBC connections. This allows us to use the presented pipeline to ingest data from practically any source system into any SAP system running a HANA database: S4, BW, or Datasphere. Better yet: for this to work, we do not even need to have access to an SAP Databricks workspace. Any Databricks workspace on Azure, Google Cloud, or AWS can be used to run this kind of pipeline. If you really want to push the limits of efficiency, you do not even need Databricks. The Python code can be run anywhere - serverless application frameworks, local servers, inside Apache Airflow tasks. A beautiful example of interoperability and the endless possibilities of optimizing automated data processing systems to meet your specific process requirements or technical preferences.

The now immanent future with SAP Databricks and its native Delta Sharing integration into the SAP Business Data Cloud is what really makes this approach compelling for us: Instead of ever writing data through a bottleneck ODBC connection, SAP Databricks can store any incoming data in its native Delta Table format and just expose this data to BDC without any data replication.

How to integrate ServiceNow data with SAP Databricks: Our Conclusion

This hands-on example illustrates a straightforward yet powerful use case of SAP Databricks for building a data extraction and ingestion application. By utilizing dltHub, we can easily ingest information from ServiceNow incrementally into SAP Datasphere or any other HANA database, enabling consistent and auditable change tracking. The integration not only streamlines development and scheduling through an intuitive pro-code interface, but also enhances operational reliability with built-in monitoring and alerting. While our example focused on a simple ServiceNow integration, the same approach scales to complex enterprise scenarios, making it a solid foundation for advanced analytics, machine learning, and (near) real-time data products in the SAP Business Data Cloud.

Do you have questions on this or another topic? Simply get in touch with us - we look forward to exchanging ideas with you!

Don't miss out on our upcoming Webinar: "Briding Business & Analytics: The Plug-and-Play Future of Data Platforms". For more information and free registration follow this link.

FAQ

What is the benefit of using SAP Databricks to integrate ServiceNow data? By leveraging SAP Databricks, organizations can ingest ServiceNow data without needing additional licensed integration tools. The combination of Databricks and SAP Datasphere enables cost-efficient storage, seamless ingestion, and the ability to historize records for consistent change tracking—all within the SAP Business Data Cloud ecosystem.

How does the data flow from ServiceNow into SAP Datasphere? The process starts with querying ServiceNow’s REST API, retrieving updated customer support tickets incrementally. The data is processed in a Python notebook on Databricks using dltHub, which handles incremental extraction and ingestion. Finally, the data is written into a historized table in SAP Datasphere (or any HANA-based system), ensuring changes are tracked over time.

What is a historized table and why is it important? A historized table (often modeled as SCD type 2) keeps track of changes in dimensional data by storing both current and historical records. Each record is marked with validity periods (valid_from and valid_to), ensuring full traceability. This is crucial for auditing, analytics, and reporting, where understanding past values is as important as analyzing the current state.

How is the integration secured and scheduled? Credentials (e.g., ServiceNow API keys, SAP Datasphere access) are stored securely in Databricks’ secret scope. The pipeline can be scheduled directly from the Databricks UI using cron-like expressions, with monitoring and email alerts available out of the box. This ensures both secure handling of sensitive information and reliable automation of recurring data loads.

Can this approach work with systems other than ServiceNow and SAP Datasphere? Yes. The presented method is highly portable. Instead of ServiceNow, you can connect to other REST APIs or SaaS platforms such as Salesforce, Shopify, Google Analytics, or Jira. On the destination side, you can write to Databricks Unity Catalog, SAP S/4HANA, BW, or any other HANA-based system via ODBC. The pipeline design only requires minor configuration changes to support new sources or destinations.

Do I need an SAP Databricks workspace to use this integration? Not necessarily. While SAP Databricks offers seamless integration with SAP Business Data Cloud, the same Python notebook and dltHub pipeline can run on any Databricks workspace (Azure, AWS, GCP) or even outside Databricks—in local servers, serverless frameworks, or Apache Airflow tasks. SAP Databricks simply adds efficiency and eliminates the need for data replication via Delta Sharing.

Apostolos

Apostolos has been a Data Engineering Consultant for NextLytics AG since 2022. He holds experience in research projects regarding deep learning methodologies and their applications in Fintech, as well as background in backend development. In his spare time he enjoys playing the guitar and stay up to date with the latest news on technology and economics.