Integration Between Monitoring and ITSM

  • Tutorial
image At the request of workers, we present new material from a series of articles on the integration of various IT systems in the customer’s infrastructure. This time we will dwell in more detail on such a symbiosis as a monitoring system and an ITSM system.

What are these systems individually can be a long story. A properly configured and working monitoring system helps to avoid many troubles or prevent them, and the ITSM system allows you to manage IT processes and record events that occur in the infrastructure. We will not delve into the intricacies of the operation of these systems separately, but we will learn how to combine these systems for the benefit of the customer as a whole and the service company serving the IT infrastructure, in particular.

In the case of monitoring integration, we are primarily interested in the fact that events recorded in the monitoring system, alerts, initiate the creation of an incident in the ITSM system.

Alert - an event recorded by the monitoring system at the time when the device or service reached the set threshold value.

An incident is any event that is not part of the standard operations of the service and causes or may cause interruption of service or a decrease in the quality of service.

We will consider integration on the example of one of the most popular and demanded systems on the market - ServiceNow and Microsoft System Center Operation Manager (SCOM). However, the approach can be implemented on other similar systems.

There are two main ways to integrate a monitoring system and an ITSM system.

1. Using a connector, the so-called MID server. Since ServiceNow is a cloud platform, the use of an intermediate link is the preferred condition for the normal functioning of such a bundle.



2. Using the REST API (REpresentational State Transfer Application Program Interface). Most modern web applications, of which ServiceNow is also, provide the user with such an interface.



The REST API is a style of software architecture for building distributed, scalable web services.

Each implementation path has its pros and cons. In the first case, you need to install a MID server, open certain ports, and so on. It also requires the interaction of two support teams, which in some cases can be difficult. In the second case, we only need an account with certain rights in ServiceNow and, in principle, that's all. All other work can be done by the monitoring team.

After weighing all the pros and cons, as well as evaluating the experience, it was decided to go along the number two path, namely, use the REST API.

Baseline Analysis


So, the task was to automate the raising of incidents by events in the monitoring system. Depending on the source of the event, incidents should be prioritized and assigned to certain responsible teams.

First, the events in the monitoring system were analyzed. Some regularities were identified that needed to be taken into account when building integration. For example, there are times when one alert receives several alerts or one location. Typically, such events are grouped, and only one incident per group is created.



A list of responsible teams has been defined, and a list of priorities has been compiled depending on the type of event.

Implementation


We will carry out the bulk of the work on the SCOM side, therefore it is more convenient to use PowerShell in conjunction with the OperationsManager module, which is part of the monitoring system distribution kit. We will contact ServiceNow by means of the REST API for certain data, as well as to create directly incidents.

ServiceNow has a fairly developed programming interface (API) and allows you to almost completely control the system. We need only the Table API method. This method allows you to create, update, read and delete records in ServiceNow. In our case, this is the Table API - GET / api / now / table /, with which we will read data from ServiceNow, and the Table API - POST / api / now / table /, with which we will actually add new incidents.

More methods are described here. docs.servicenow.com/bundle/geneva-servicenow-platform/page/integrate/inbound_rest/concept/c_TableAPI.html . As parameters to methods, a structured hash table is passed with the data that we need. On the piece of the script below you can see what parameters are used in our case:

#Define Hash Table
$HashTable = @{
'u_snow_category' = 'Infrastructure';
'u_affected_user' = 'scom';
'caller_id' = 'scom';
'assignment_group' = $record.ResolverGroup;
'cmdb_ci' = $record.CI.name;
'location' = $record.Location;
'short_description' = [System.Web.HttpUtility]::HtmlEncode($record.ShortDescription);
'description' = [System.Web.HttpUtility]::HtmlEncode($record.Description);
'impact' = $record.Impact;
"contact_type" = "Own Observation";    

#Posting new incident
$RaisedIncident = Invoke-RestMethod -uri "$SNOW$SNOWtable`incident" -Headers $PostHeader -Method Post -Body ($HashTable | ConvertTo-Json);

In the example above, the line

$RaisedIncident = Invoke-RestMethod -uri "$SNOW$SNOWtable`incident" -Headers $PostHeader -Method Post -Body ($Body | ConvertTo-Json);

It directly creates an incident in ServiceNow, where:
"$ SNOW $ SNOWtable`incident" is the address of the program interface, which in general looks like this CustomDomain.service-now.com/api/now/table/incident
$ PostHeader
- a variable that transfers the content type ;
$ HashTable - directly a hash table with the necessary data.

The simplified script algorithm is as follows:
1. Reading the environment parameters We
define all the variables that we read for the entire osprey so that we can later access them from functions.

An example of reading variables:

New-Variable -Name ScriptConfiguraion -Value (Get-Content '.\Configuration.txt' -Raw -ErrorAction Stop | ConvertFrom-Json -ErrorAction Stop) -Option AllScope, ReadOnly -ErrorAction Stop;

In addition to the file with configuration settings, we will need a list of rules by which we will determine whether an event belongs to one or another priority and one or another responsible command:

New-Variable -Name DefaultRules -Value (Import-Csv $ScriptConfiguraion.Configuration.Rules.DefaultRules -ErrorAction Stop) -Option AllScope, ReadOnly -ErrorAction Stop;

An example rules file in our case is as follows:


	{
		"Parameters":"",
		"Properties":"Name",
		"Expression":"{0} -match \"http\\://www\\.domainname\\.com/\"",
		"ResolverGroup":"Application Team",
		"Priority":"2"
	},


Settings for connecting to ServiceNow:

New-Variable -Name SNOW -Value $ScriptConfiguraion.Configuration.SNOW.APIURL -Option AllScope, ReadOnly -ErrorAction Stop;
    New-Variable -Name SNOWtable -Value $ScriptConfiguraion.Configuration.SNOW.APITables -Option AllScope, ReadOnly -ErrorAction Stop;
    New-Variable -Name GetHeader -Value (@{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($SNOWacc+":"+$SNOWaccPass))}) -Option AllScope, ReadOnly -ErrorAction Stop;
    New-Variable -Name PostHeader -Value (@{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($SNOWacc+":"+$SNOWaccPass));"Content-Type" = "application/json"}) -Option AllScope, ReadOnly -ErrorAction Stop;

If you notice, then we take the file paths, settings, service data from the main configuration file, which we read from the beginning and placed the contents in the $ ScriptConfiguraion variable.

Next, we load the modules for working with ServiceNow and SCOM:

Add-Type -AssemblyName System.Web -ErrorAction Stop;
    Import-Module OperationsManager -ErrorAction Stop;


2. Reading data from the monitoring system
To upload data from SCOM, you need to establish a connection:

  #Connecting to SCOM API
    if(-not(Get-SCOMManagementGroupConnection | ?{$_.IsActive})){
        foreach($ManagementServer in $ScriptConfiguraion.Configuration.SCOM.ManagementServers){
            New-SCManagementGroupConnection $ManagementServer;
            if(Get-SCOMManagementGroupConnection | ?{$_.IsActive}){
                break;
            }
        }
        if(-not(Get-SCOMManagementGroupConnection | ?{$_.IsActive})){
            Write-CustomLog "Script failed to connect to all SCOM management servers [$($ScriptConfiguraion.Configuration.SCOM.ManagementServers)] supplied in the configuration file. Please review debug log for more info.";
            exit;
        }
    }

And read the data about alerts:

$AlertList = Get-SCOMAlert -Criteria "$($ScriptConfiguraion.Configuration.SCOM.AlertCriteria)";

At the exit to $ AlertList we will have a list of all alerts from SCOM that meet the criterion of $ ScriptConfiguraion.Configuration.SCOM.AlertCriteria. We set the following criteria:

(CustomField2 IS NULL OR CustomField2 = '') AND Severity = 2 AND ResolutionState <> 255

In CustomField2 in our infrastructure we store incident number data. Thus, in the future it will be easy to find the necessary alert by the number of the incident, and also using this field we will group incidents for the same type of alerts.

3. Reading data from the ITSM system
After receiving the incident data, you need to read the information on the CI (Configuration Item) - a configuration unit in the ITSM system. This is necessary in order to compare the data from the monitoring system with the data from the ITSM system and set the priority for the created incident.

A piece of the script for uploading CI data is as follows:

$CI = (Invoke-RestMethod -uri "$SNOW$SNOWtable`cmdb_ci" -Headers $GetHeader -Method Get -Body @{sysparm_query="nameLIKE$SourceObject";sysparm_fields='name,location,u_environment,u_service_level,sys_updated_on,install_status'}).result;

4. Grouping alerts by location
To group events, we group all alerts in SCOM by location:

$Alerts = $Alerts | group Location;

And we check whether there is an incident on a given group in the ITSM system. Since we record information on incidents in the SCOM database, we will also check on the SCOM side.

foreach($entry in ($OperationalData | ?{$_.GroupingType -eq 'LocationAndMonitorID'})){
                if($entry.Location -eq $Location -and $entry.MonitorId -eq $MonitorId){
                    return(@{sysID=$entry.sysID;Number=$entry.Number});
                }
            }


5. Checking the status of the alert over time and creating the incident.
The last check before raising the incident is to see if the alert is closed while we all checked.

($record.Alert.ResolutionState -ne 255){}

Well, when all the information is received, the incident itself raises, as described slightly above:

#Define Hash Table
$HashTable = @{
'u_snow_category' = 'Infrastructure';
'u_affected_user' = 'scom';
'caller_id' = 'scom';
'assignment_group' = $record.ResolverGroup;
'cmdb_ci' = $record.CI.name;
'location' = $record.Location;
'short_description' = [System.Web.HttpUtility]::HtmlEncode($record.ShortDescription);
'description' = [System.Web.HttpUtility]::HtmlEncode($record.Description);
'impact' = $record.Impact;
"contact_type" = "Own Observation";    

#Posting new incident
$RaisedIncident = Invoke-RestMethod -uri "$SNOW$SNOWtable`incident" -Headers $PostHeader -Method Post -Body ($HashTable | ConvertTo-Json);



Do not forget to update CustomField2, so that in the future do not go into ServiceNow for this information.

$AlertUpdateResult = $Alert | Update-Alert -ParamStr "-CustomField2 'Raised' -CustomField3 
'$($result['IncidentNumber'])'




conclusions


As you can see, it’s not so difficult to implement integration with ServiceNow in a simple way. If you do not go into details, then all integration is reduced to running a script on a schedule that will download data from the monitoring system and based on it will raise the incident in the ITSM system. Further, the incident will be processed by Service Desk or directly to the responsible team.
With the introduction of integration, the overall reaction time to events in the system is reduced, which allows you to respond to problems arising in time and eliminate them in a timely manner. Human errors when raising an incident are reduced (incorrect assignment to another team, incorrect priority of the incident, not filling in the data necessary for the initial diagnosis). Total labor costs are reduced.