With SAP Datasphere gaining popularity in the market, we see an increasing demand for guidance on governance topics like space setup, layer architecture, and self service approach. Especially SAP BW customers are asking themselves, how they can design a future proof BI landscape that utilizes the radically increased possibilities that Datasphere offers over a traditional Data Warehouse. After partnering with various organizations to guide and implement Datasphere systems with a variety of focus points, we have compiled our learnings into a comprehensive Datasphere Reference Architecture.
This architecture represents a superset of best practices and follows a modular design, allowing you to select the parts that fit your needs and make sense for your system landscape. In this blog, we aim to provide a high-level explanation of its individual components with a section at the end to put all those ideas together. Stay tuned for more detailed breakdowns of individual topic areas in future posts. We are also eager to share our best practices on other important datasphere governance topics, like developer guidelines & naming conventions, object and row level security concepts and ETL & data integration strategy at a later stage.
Key Components Explained
Sources & Consumers
The key focus here is to showcase how and where the core Datasphere Architecture design needs to consider different types of source systems as well as consuming systems.
For this purpose source systems are categorized into two major types:
- IT Managed: Typical ERP systems, possibly an on-premise BW system running side-by-side with Datasphere.
- Business Managed: File uploads or standalone small databases managed by individual departments.
- BW Bridge: A special case involving a migration from a legacy BW system and generating Datasphere objects based on BW data models with the entity import feature.
- 3rd Party Sources via Middleware: Covers cases where no native connector for a particular source system type is available within Datasphere. A notable alternative to SAP Open Connectors that we have successfully used is a cheap python runtime via HANA XSA & HDI to consume REST APIs.
Additionally two particularly interesting source-cases are covered separately:
- BW Bridge: A special case involving a migration from a legacy BW system and generating Datasphere objects based on BW data models with the entity import feature.
- 3rd Party Sources via Middleware: Covers cases where no native connector for a particular source system type is available within Datasphere. A notable alternative to SAP Open Connectors that we have successfully used is a cheap python runtime via HANA XSA & HDI to consume REST APIs.
The Datasphere outbound process can be categorized into four major categories:
- ODBC: The most basic yet reliable and universal technique. Third-party applications and clients can pull Datasphere data from the underlying HANA database. However, it does not support application layer-specific concepts such as semantics, associations, DACs, or non-relational objects like Analytic Models.
- OData/Native: Native connections, such as SAC and Excel add-in, are first-class citizens in terms of data consumption, supporting all functionalities. OData is theoretically almost on par but lacks good out-of-the-box solutions (e.g. Power BI + Datasphere OData limitations).
- Premium Outbound Integration: Allowing replication flows to push data to various target systems. This method comes with high additional costs.
- Other: Any scenario (e.g. REST) which is not covered by the other three categories has to be implemented with the help of a middleware.
Simple Layer Architecture
For the Layer setup we follow a minimal approach with three layers:
Inbound Layer (IL):
- data ingestion from various sources
- mostly 1-1 with slight enhancements like a source system field or a load time field
- obligatory persistence layer (local/remote tables)
Propagation Layer (PL):
- harmonizations between different sources
- semantic enrichment & data access controls happen here by the latest
- contains the bulk of expensive business logic and transformations
- optional second persistence layer (view persistence) depending on performance impact
Reporting Layer (RL):
- facilitates data consumption as the access layer for reporting clients and consumers
- limited modeling (only run-time relevant logic) and no persistence
- object type used depends on consumer system (Analytic Model or exposed View)
Download the whitepaper and find out
which product is best for your data warehousing strategy!
Space Concept - “less is more”
Datasphere Architects will often face a decision: Do we create a separate space for topic XYZ, or not?
This question is obviously nuanced, but to make it short - the two biggest factors you should consider are:
- Is a separate set of object level authorizations required?
- Does the workload need to be managed separately?
Barring any special cases (see below), if your answer is “no” to both, then we recommend to keep it simple and avoid creating additional spaces, as they will offer little benefit and increased effort in maintenance.
Following this “less is more” approach, we have compiled a generic representation of a common space setup across our client base. These are divided into two parts:
Generic Spaces:
Layer-agnostic spaces used for authorization, monitoring, and administration.
Main Spaces:
- includes a Central IT Space with data models for central reporting and IT-managed data products,
- a few special spaces with the BW Bridge Inbound Space (which is a technical requirement for a BW Bridge system) and a ODBC Consumption Space (which is a necessary workaround for a variety of ODBC use-cases),
-
as well as multiple Business Spaces representing varying self service maturity levels and correspondingly featuring a varying depth of integration into the Simple Layer Architecture with the sharing of objects from the Central IT Space via the appropriate Layer.
Self Service Maturity Model
To Implement an effective self service strategy for your organization it is necessary to analyze your user base. Showcased here is a simple framework to do just that. We can group our users in e.g. their respective functional business teams and define three degrees of maturity to classify these teams into. Each team would get their own version of these template spaces with the corresponding authorizations according to their classification.
The “Central Reporting Consumers” are not actively working in the Datasphere system. They are solely interested in consuming the it-managed central reporting in various frontend applications.
The “Self Service Modellers” want to work within Datasphere to enrich data models, build their own KPIs and combine data models in new ways.
The “Data Product Team” is responsible for providing their data models from A-Z. Only the initial integration into the Inbound Layer and monitoring of data loads is still handled by the central IT team.
Each classification is handled differently. The higher the maturity level, the more autonomy they receive organizationally, the more technical privileges they get in the system and the more responsibilities they have to provide a form of data product for the rest of the organization.
This whole process is less so a technical challenge within Datasphere and more of an organizational and change management effort. It could just involve an analysis of the status quo, or it could be part of a larger data democratization strategy to reach a to-be goal in terms of self-service ability. It could also involve different classifications than shown here and different responsibilities, depending on your organization's user base and goals.
In any case the Datasphere Architecture has to consider these circumstances.
Datasphere Reference Architecture - Putting it all together
By combining these core ideas of:
- a simple layer architecture,
- a “less is more” approach to a space concept,
- a self service strategy that considers user base maturity classifications
- and how various types of upstream and downstream systems interact with the core architecture,
we provide a modular framework to help your organization drive a future-proof Datasphere implementation that supports your overall data strategy.
Stay tuned for detailed blogs and deep dives on individual areas of the architecture. Let us know your thoughts and questions, as well as what areas you are interested in hearing more about.
Or would you rather like to exchange thoughts personally? Sure no problem, just follow the link: https://www.nextlytics.com/meetings/irvin-rodin
We look forward to exchanging ideas with you!