The IT industry is constantly redefining the term “efficient" by creating new software and hardware solutions designed to increase the performance of applications. The task is further hindered by a parallel trend, which is an increase in the amount of data. This is particularly important with regard to analytical systems where most of the processing relates to data sets of millions of records. Therefore, data warehouses were among the first with “in-memory computing" solutions, that is those in which reading data from “slow" hard drives is eliminated through the increased use of memory.
Basic expectations of “in-memory” technologies are twofold. Firstly, a rapid access to current information and analyses necessary to respond adequately to changing market conditions. Secondly, lowering the cost of purchase and operation of the IT infrastructure required to store huge data sets.
However, from the point of view of an end user it is the effect, namely the speed of operation, that counts most of all. These expectations are rising all the time, of course. Some time ago, analysts were satisfied when it took a dozen or so seconds to generate a report in the data warehouse, while now they want this time to be reduced to less than a few seconds. The ever-increasing demands result from the change in the way of working with the tool.
Currently, working with business intelligence systems involves, to an increasingly larger extent, ad-hoc analyses: following a data drill-down path/analysis path defined “on the fly" to find answers to atypical questions. The analyst must be able to freely select a data drill-down criterion without constraints imposed by the program or data warehouse developer. And that means performance challenges, lack of possibility to use pre-prepared summaries, and a full access to the data set for the analyst at all times during his work. And this is where using SAP HANA can bring the largest added value to the company.
An absolute novelty
SAP High-Performance Analytic Appliance 1.0 (SAP HANA) is a new product available to SAP AG customers since June 20th, 2011. It uses a fully innovative technology – SAP In-Memory Computing. An important feature of this solution is its openness, which allows in-memory processing of data, regardless of its source. A common source of data for SAP HANA 1.0 is the SAP ERP system and other systems included in the SAP Business Suite (including SAP CRM, SAP BI). Importantly, the implemented integration mechanisms also allow for non-SAP applications to be data sources.
The premiere of the solution coincided with the release of a new version of the SAP BI 4.0 platform. The combination of these two products – a reporting platform with SAP HANA – constitutes a complete Business Intelligence solution, ready for analytical work on huge data sets nearly in real-time.
Processing in the SAP memory
SAP In-Memory Computing is an innovative technology based on the SAP transactional system. Generally speaking, it consists in transferring the management of large data sets from SAP ERP and business analytics into memory. By improving the performance, it considerably accelerates access to data and provides the results of complicated calculations in real time. The development of the in-memory technology is enabled by the parallel development of the hardware layer – the servers available in the market are increasingly more efficient, and at the same time, the operating memory has become considerably cheaper.
Integral elements of the SAP technology In-Memory Computing are new business applications which enable the information necessary for planning, forecasting and optimisation of business processes that require processing of large amounts of data to be displayed on the screen within a few seconds.
In addition to the application layer, the HANA solution also includes SAP In-Memory Appliance. These are servers optimised in terms of capacity and performance, which are supplied by leading SAP hardware partners. The dedicated hardware solutions use a multi-core processor architecture (e.g. 64 processor cores per machine), 64-bit address space allowing the use of 1TB RAM, and a very fast disk system which uses the SAS/SSD drive technology or Fusion-io ioDrive storage technology. The above-mentioned disk system is required for backup and recovery operations using snapshots for the database which resides entirely in RAM.
The SAP In-Memory Database is a new generation database, the last element of the SAP In-Memory Computing technology. It contains innovative solutions optimising information storage and management. They include the columnar data storage as a primary method of data storage, plus a typical raw-based storage, data compression, partitioning of tables, and the use of delta for operations of inserting new data into tables.
The fundamental feature that distinguishes the SAP In-Memory Database from a classical relational database is that it resides entirely in RAM and uses the disk system for backup. This feature increases the speed of operations on data hundreds of times by eliminating slow, disk input/output operations with database queries.
In-Memory Computing Engine (IMCE), a central element of SAP HANA 1.0, contains several technical layers. Sybase Replication Server is used to replicate data from a database e.g. of the SAP ERP system to the SAP In-Memory Database. Relational engines storing data in columns and rows are used for this purpose (Relational Engines, Row Store, Column Store). The central layer of IMCE is the Persistence Layer. This is where ultra-fast information processing takes place (Page Management) and from where regular snapshots are made (Logger) to the last layer of SAP HANA 1.0, which is Disk Storage.
There are two volumes implemented in the disk file system: Log Volume (1x RAM) and Data Volume (4x RAM). Such sizes of the disk system allow for frequent snapshots from SAP In-Memory Database to disks to increase the security of the solution.
With regard to the installation of SAP HANA 1.0, SAP AG recommends the following steps:
– installation of IMCE and Sybase Replication Server in SAP In-Memory Appliance,
– installation of SAP HostAgent and LoadController (on the side of SAP ERP),
– installation of Client drivers (on the side of IMCE) – required for connecting the software of SAP HANA 1.0 clients. SAP HANA 1.0 clients are MS Excel and SAP BI 4.0 clients (Webl, Explorer, Analysis),
– installation of IMCE Studio on a dedicated server – in order to carry out the basic configuration of SAP HANA 1.0.
The configuration of SAP HANA 1.0 involves the creation of models providing access to key data in the process of making decisions pertaining to planning, forecasting and optimisation of business processes. A typical implementation of the model consists of the following configuration elements:
– selecting the required SAP ERP tables – with transactional data as well as master data or texts,
– loading the selected SAP ERP tables into SAP HANA 1.0 – at this stage, we carry out initial loading and we implement delta mechanisms,
– restoring relationships between tables in SAP HANA 1.0,
– creating authorisations for access to analytical data,
– creating reports using standard tools (Excel, Explorer, WebI, Xcelsius, Crystal, Analysis),
– verifying whether SAP HANA 1.0 works properly in terms of the data, model and processing.
Data needed “this minute”
Information needs of every business are different, but there are also needs which may be called universal. That is, for example, the case with using BI as a tool supporting the process of the accounting period closing. This is a critical process (not only in terms of taxes but also in terms of managerial issues), the most important task of the month for many employees from controlling and accounting departments, in which time plays a critical role.
For BI solutions based on a data warehouse (such as SAP BW), the process is supported by building summary reports that contain data as loaded into the data warehouse for the last time (usually the last night or in the ad-hoc safe mode by a user during working hours). This means that in the warehouse we operate on the data from the end of the previous day and it is not possible to carry out full and effective analyses when, for example, you need to quickly find the cause of discrepancies between the data from the system for mobile salespersons and the SAP SD module.
It may look differently if SAP HANA is used. Analytical operations are carried out directly on the transactional data from SAP ERP, with omission of the intermediate layer in the form of a traditional data warehouse. Then, the data status is almost always up-to-date, and by using a cross-module data model you can very quickly navigate from an SAP document item to source documents of another module of the SAP system or a reference document from an external system.
To know the unknown
The advantage of in-memory processing solutions over classical solutions based on relational databases lies in providing an analyst with a greater flexibility of work. The data model implemented in BI solutions based on the data warehouse determines the way of looking at data. Sometimes, it is an obstacle, for example when we do not want to impose the direction of inference on the analyst when he must verify a number of possible scenarios before finding an answer to his question.
An example might be a situation where the analyst observes a significant increase in the cost of contractual penalties pertaining to the downtime of the forwarder’s trucks waiting to be loaded. First, he notices the increasing cost when scanning the profit and loss account items. He tries to associate this information with data from the warehouse system, then from the production system (repair notifications), and finally, in combination with the time recording system, he discovers a pattern that suggests which employees should be trained to operate the packaging line machinery in order to remedy such situations.
The execution of such analytical work would not be possible without an adequate tool. In the context of SAP HANA, the most suitable one is BusinessObjects Analysis edition for OLAP, which appears as the most important client of the BusinessObjects 4.0 platform for ad-hoc work in the Excel environment. The great value of this tool is the full support for the BW hierarchy, which is what was missing in the previous version and required custom solutions.
When the amount of data increases
SAP HANA is a response to the expanding data sets, especially in companies such as large retail chains, manufacturing companies from the FMCG sector, which now must put up with a loss of data quality for analyses caused by too long time of loading them into the data warehouse and/or too long time of waiting for a report to be generated.
Of course, methods that are used currently include partitioning, indexing, data archiving strategy (“near line storage"), aggregates, caching, and generation of reports outside system overload hours. However, in each of these approaches you must take into account the downsides, which may include a reduced number of details, less flexibility in selecting a data drill-down criterion and no possibility to work online with the tool.
It is different with HANA 1.0 – the first comparative analyses, even for solutions with the Accelerated BW version, show even a hundredfold reduction of the system response time – generation of a report without the loss of data quality (all possible dimensions of the analysis). This means that for those companies for which it is important not to lose the quality of analysed data as their volume increases (or on the contrary, to increase the level of detail), for which the effectiveness of analysts in the context of growth of the scale of operations and quick reaction to changing business conditions play an important role, the solutions based on SAP HANA will be a natural direction of development of BI solutions.
From detail to the world of science
SAP HANA means not only faster data processing. First of all, this solution represents a major step forward in flexibility and reduction of analytics costs.
In traditional data warehouse systems, data is stored in static structures that must be properly designed and adjusted to the previously known reporting requirements. HANA capabilities in managing large volumes of “in memory" data allow companies to carry out more ad hoc analyses, reducing the need to build predefined cubes and queries. This introduces a new level of flexibility in the data analysis process as a whole.
It is already noticeable that HANA is going to be used in the first place in retail or manufacturing and sales companies. Of course, we mean big retail chains that process vast amounts of data for sales forecasting, market basket analyses or customer segmentation. The more the time to get information counts in the reporting and the possibility of a non-standard approach to analytics is expected, the greater the benefit from the in-memory processing technology. For the same reasons, HANA will be interesting for institutions that provide financial services.
An additional advantage of HANA 1.0 is the opportunity for companies to make a breakthrough in the IT technology without destroying the existing system architecture.
I think that the new opportunities provided by the new technology will be rapidly transformed into innovative applications also in other branches of industry and non-commercial areas, such as medical care or many scientific fields. So in all those fields where there is a strong need to analyse large amounts of structured and unstructured data.
Aneta Suchanecka, Leader of the Organisation Effectiveness Management Team, All for One Poland