Research
The first step in implementing automation is to select the processes that can be automated. As a prerequisite, they must be repeatable processes. Of course, the greatest benefits will be gained if we also select time-consuming processes.
The main processes performed by our consultants on customer systems administered by us and selected for automation include updating operating systems, updating SAP Host Agents and SAP Diagnostic Agent – although we don’t stop there and are constantly looking for more processes to optimize.
We already see the potential in automating audit processes. Regular collection of data on selected parameters of operating systems, databases or SAP systems significantly reduces the time for preparing and analyzing audit data.
Classic approach
Before we describe how we optimize administrative processes, let’s try to estimate how time-consuming the above-mentioned tasks are. For easy conversion, we assume that our environment consists of five classic landscapes, each containing three hosts with SAP systems, one development system, one test system, and one production system. The databases are located on the same hosts. So we have a total of five development systems, five test systems and five production systems. For the sake of clarity, we will focus on one task, which is the simplest operating system update.
The OS update process in the context of SAP systems should be divided into at least two iterations: work on dev/tst systems and work on prd systems. The period between these tasks is, of course, used to observe the behavior of the systems after the OS update and to perform tests of the operation of databases and SAP systems. In the event of any abnormality, extended analysis is required to find the source of the problem and resolve it.
A series of tasks must be performed in a specific order to update a specific host. Below, we present a minimum scope of tasks that, in our opinion, must be carried out:
- Notification of users about the work and downtime;
- Verification of the correctness of the system backup;
- Verification of the operation of the SAP system, database, and OS before the work;
- Scheduling a maintenance window in monitoring tools;
- SAP shutdown;
- Database shutdown;
- Taking a snapshot of the system;
- OS restart before work;
- System update;
- OS restart after updates;
- Verification of the operating system by pairs;
- Database launch;
- SAP system launch;
- SAP system verification;
- Deletion of the snapshot if no abnormalities are found.
The whole process requires one person to spend about 150 minutes per system. Of course, some of the tasks can be done in parallel, so we assume that the time required to update one system is two hours. Therefore we need 20 hours of work to update dev/tst systems, while updating prd systems takes another 10 hours of work. That’s a total of 30 hours of work on a full landscape of SAP systems. Let’s remember this value, it will come in handy later.
Available tools
The number of tools and technologies we can use is very large. Looking for the ones optimal for our needs and compatible with our operating model in terms of licensing, we decided to choose Ansible supported by AWX, which uses our GIT repository.
With Ansible, multiple systems and tasks can be managed in parallel in a controlled manner, and the extension in the form of the AWX service provides us with additional functionalities, such as the ability to cyclically, automatically schedule tasks or execute multiple tasks simultaneously from a single tool and its central console. The essence of the Ansible tool is formed by scenarios that can use their own built-in modules and plug-ins, and also external scripts. This gives consultants and administrators tools with virtually unlimited capabilities. They execute repetitive activities in an unlimited number of systems in parallel.
Importantly, in order to easily manage and execute tasks using Ansible, scenarios and external scripts should be shared between consultants and administrators. This is where the GIT tool for repository sharing comes in handy. It allows each consultant and administrator collaborating on tasks for Ansible to view the current version of scenarios and scripts in real time, to make corrections to them and to extend them with new functionalities. In addition, Ansible itself can be synchronized with a designated GIT repository, which provides it with automatic access to current versions of scenarios and scripts.
Automation at the customer’s site
We have successfully used OS update automation in cooperation with one of our customers.
The high security standards maintained at the customer’s company generated a high workload for our teams, so that’s where we piloted the automation solutions. No infrastructure changes were required on the customer’s side, as all the necessary infrastructure to run the automation tools is located in the All for One Data Center. The only configuration change that was required on the customer’s hosts was to appoint a user with the appropriate permissions at the operating system level.
Once we received the login credentials, we began testing and built a database of input parameters controlling the process. The first iteration took place only on a few selected development systems. As part of subsequent work, the range of hosts covered by the automatic OS update process was expanded.
We are currently performing most OS update tasks using this method, and we are working on further development of our tools in order to update all systems in this way in the second quarter of 2023.
Automation promotes security
As a person responsible for the critical systems of the Amica Group, I support initiatives that accelerate repetitive administrative tasks, which have become critical due to increasing security threats. All for One Poland implements the described automation in the Group, supporting our efforts to provide the best possible security for internal systems. As IT, we have implemented a patch management system on the Linux operating systems, on which our main systems run, allowing for the supervision of distributed patches and enabling All for One to better manage the process in this aspect.
In addition, also our central monitoring system verifies the progress of the work and communicates information about problems and deficiencies to the dedicated IT team.
Grzegorz Smolański, Critical IT Infrastructure Manager, IT Department, Amica SA
Gains
In order to develop a target solution, we have dedicated long hours of work, meetings, and discussions. When we started the work, we didn’t assume that we could prepare a fully automated process – today we know it’s completely achievable. Nevertheless, we can already boast of the gains from such an approach.
Users are notified of planned work a few days in advance. We dedicate a total of 2 hours to preparing work for 15 systems. This work includes verifying backups, operating systems, and the performance of selected SAP systems. Next, we plan an update using AWX. It takes no more than 15 minutes to plan the work since we already have scenarios that just need to be planned. This work can be performed earlier rather than during the designated window as in the traditional approach. Then the scripts are executed in parallel and automatically without the participation of the administrator at the time set by us. Of course, we don’t leave the work to chance. Its progress is monitored by a consultant, who only reacts to the automation signal. In case of any abnormalities, the machine immediately stops the work for the problematic host and sends a notification.
By using process automation, the update of our hypothetical environment of five development systems and five test systems is completed in 30-60 minutes. The process flow for the production systems does not exceed 30 minutes either. Taking into account the additional verification of the systems after the work, which takes about 2 hours, the result is: five hours of work for the entire landscape. This is the best measure of the cost optimization that automation brings.
There are more benefits: we shorten the maintenance window and system downtime for the customer to the limit of what is possible because the scripts do not wait for human intervention, but analyze the current situation and act on their own.