What term does syslog use when referring to the monitored aspect of the system?

Creating a Network with Hyper Resiliency and Availability at its Core

A Guide to Network Monitoring and Incident Response

Network monitoring, incident response and IT security generally, can be difficult to understand with endless abbreviations and feature-specific terms. To help with you compare and understand the benefits of network monitoring solutions.

We have compiled a glossary to get you started.

SNMP

The grandfather of all things network and device monitoring is SNMP or the Simple Network Management protocol. Used to monitor applications, UNIX based operating systems and network devices such as routers and switches.

Split into two mechanisms, regular SNMP on port 161 and SNMP traps and port 162. The latter of the two, SNMP traps are alerting messages sent from monitored device to the network monitoring solution, drawing attention to a problem. For example a hard disk array in a server can be configured to send SNMP traps to a network monitoring solution when of the disks fails replication.

The former, regular SNMP is a querying technology whereby the workflow is reversed. Instead of an alert being sent, the network monitoring solution queries the monitored device for health statuses.

OIDs or Object IDs are numerical references used by the SNMP portion a network monitoring tool to query very specific parts of the monitored device. The response given can be compared to a known good response to indicate health. With potentially thousands of OIDs per monitored device, all specific to the device being monitored. Device manufacturers create OID libraries which can be imported into network monitoring solutions to make things easier. These are known as MIBs (Management Information Base).

SNMP has come under scrutiny since its creation over its lack of inherent security. SNMP v1 and v2c have no authentication capabilities, outside of their community strings, despite providing diagnostic information; and even allowing remote changes in configuration when write-mode is enabled. SNMPv3 has sought to cure this problem by introducing encryption and authentication but is far from fully available and thought of as complex.

Where SNMPv3 is available, it is recommended it is used. Where it is not the recommendation is to change the default community string and never enable write-mode.

Are you interested in learning more about the security of SNMP? Read our blog on the topic here.

WMI

A SNMP style protocol developed for Windows operating systems since Windows Server 2000. 

The purpose of WMI (Windows Management Instrument) is to define a proprietary set of specifications which allow management information to be shared with a querying network monitoring solution. It provides querying access into all manner of operating system functions such as services, file shares, hardware health and file properties.

The initial request for WMI is made using port 135, subsequent communication regarding that specific request is then made on a randomised port.

As SNMP is not turned on by default, the de-facto standard for monitoring Windows devices is WMI. However, note that WMI can be used to query SNMP OIDs too.

SNMP and WMI are the two primary protocols used in device monitoring.

Agent and Agentless

Network monitoring solution vendors will pick one of two methods for monitoring devices, agent-based or agentless.

An agent is a small application or piece of software, which is installed onto a device to be monitored. It is then the job of the agent to query the device for health information and pass this back to the network monitoring solution.

The alternative is that nothing is installed on the monitored device and instead the network monitoring solution is provided with credentials for the devices it is to monitor so that it may log onto them remotely and collect diagnostic information.

While there are a mix of both solutions available, agent-based solutions in an industry are usually seen as a disadvantage for a number of reasons:

  • Agents have a resource overhead which might negatively affect the monitored device.
  • Agents might be using frameworks such as Java which need updating.
  • There might be incompatibilities with software installed on monitored devices and the agent.
  • There is an initial rollout of installing agents which can be labour and time intensive.
  • If the network monitoring solution is decommissioned, the agents will need to be removed.

Agentless or credential based network monitoring are generally favoured by both users and vendors alike.

Monitors

At an application level, monitors are the individual health queries which the network monitoring solution carries out. For example, ping being used to monitor the uptime of a device's network card is an individual monitor.

Monitoring typically takes one of three forms:

  • Active monitoring.
  • Passive monitoring.
  • Performance monitoring.

This topic will be covered in more detail in a later section of the article.

Alerting

When a device state changes, an alert whether positive or negative can be generated to notify someone or multiple persons to this change in state.

Email or SMS notifications are common, however most modern network monitoring tools will offer a number of different notification types. Some with integrating capabilities.

Type of alert include:

  • Email.
  • SMS.
  • Pager alerts.
  • IFTTT.
  • Syslog.
  • Script actions.
  • Integrations with VMware.
  • SSH actions.
  • ServiceNow integrations.

In addition to changes in state, network monitoring tools which have been configured to collect and store performance information can generate notifications based on thresholds. For example, a list of all devices which have used more than 90% of available hard disk capacity in the preceding week.

Alerts like this are useful in predicting future events and being able to correct them before they cause a network or service outage.

Network monitoring solution Ipswitch WhatsUp Gold includes IFTTT integration which is used to integrate all manner of devices and technology, such as Amazon Alexa and other automation solutions, with a monitored condition. Take a look at this blog post where we were able to create a mobile phone alert from a monitored device.

With all network monitoring solutions, whether agent-based or agentless, the initial starting point is to discover devices for monitoring.

The discovery phase will cover the entire known network or a specified subnet, depending on the scope of the project and what it is be discovered. In the case of agentless network monitoring, a device is classified as discovered when it responds to one of three tests.

  • Ping.
  • SNMP.
  • WMI.

Depending on whether the discovery scan has been paired with SNMP connectivity parameters (community string) or credentials, the scan will attempt to identify the type of device. Displaying device manufacturer, software version and device type.

Despite being used as an tool for initial discovery before full monitoring, discovery scans should be configured to run on an automated and frequent basis. This will reveal any new devices added to the network, especially useful when those device additions are unauthorised. Scan results can often be output into recurring reports which can, in turn, be used as asset registers. Useful for good IT management and compliance drivers such as ISO 27001 and the UK's Cyber Essentials programme.

Want to compare Ipswitch WhatsUp Gold and SolarWinds NPM? Take a look at our comparison blog here.

Network Monitoring Maps

With the network discovered and an initial set of devices recorded. Those that are to be monitored are marked as such, moving them into monitored mode. Once marked as monitored, devices will be displayed in some form of status screen, often represented as a map. One such example from Ipswitch WhatsUp Gold is shown below.

Network maps serve multiple functions and are useful for IT teams and operation centres for the following reasons.

  • Maps provide an overall health screen with colour coded indicators.
  • Maps will show both the physical and logical links between devices in the network and even indicate the health of those those connections.
  • Maps are interactive and clicking on an individual device can open up additional diagnostic control or displays of supplementary information.
  • Maps can be paired with geographical maps or building blueprints to show physical location.

In most solutions, the mapping function is dynamic, reacting to changes in conditions across the network. For this reason and those highlighted above, it is very common to see such maps on large displays in IT team or operation centre offices.

Five Things to Look for in a Network Monitoring Solution Map

If you are looking for the right network monitoring solution for your organisation, the suitability of the mapping function is crucial, as this is likely to be the most used function on a day-to-day basis.

Consider our top five mapping features when looking at solutions:

  • Colour coded indicators for each monitor on a device to indicate health.
  • Grouped maps, whereby you can view a subset of the network. For example, a map which shows just boundary routers or devices located in the US offices.
  • Interactivity, whether that be tools such as SSH or drill-down information about the selected device.
  • Logical and physical links, showing virtualization and WiFi AP to WLC relationships; and those devices connected to each other with a cable.
  • Link utlisation and bandwidth monitoring. Devices are not the only things which need monitoring, so do the connecting links for bottlenecks.

We have put together a list of all the network monitoring solution must-haves in this blog post here. Check it out.

Active, Passive and Performance Monitors

Network monitoring solutions typically have three types of monitors, which can be used in combination to get an overall and comprehensive idea of device health.

Active monitors - used to display immediate health, active monitors poll a device for an open port, service status or ping response and then compares that to a known good or healthy value. Active monitors can poll as frequently as every 60 seconds to keep the status of that monitor as current as possible.

Active monitors lack any real intelligence and are at a basic level, "are you alive?" tests.

Passive monitors - In the opposite fashion to an active monitor, a passive monitor is alerted by the monitored device to a condition which it has logged. Depending on the device type, this could be an SNMP trap, a syslog message or a Microsoft event log.

The network monitoring tool will be configured to look out for particular error codes or event IDs which are of interest and flag these using an alerting type when they appear.

Examples of passive monitor usage could be, software updates being installed, services restarting, license changes or administrative logon events.

Performance monitors - Unlike the active and passive monitors, performance monitors are not indicators of immediate health. Instead they collect long term data regarding hardware such as disk space, RAM utilisation and CPU usage. This can then be plotted on graphs and other analysis tools for trend purposes.

Performance monitors are less about the short term availability of devices and more focused on the longer term.

The final of the three monitor types, performance monitoring, can provide a rich insight into the health of network devices over long periods of time, through dashboards and reports.

Take the image below as a sample. This top 10 dash highlights network interface utlisation and ping availability statistics over a 24-hour period, giving insight into areas of the network where poor connectivity might be experienced.

In theory any active monitor can be turned into a performance monitor, however the most common performance monitor types include:

  • Ping Latency and Availability.
  • CPU Utilisation.
  • Disk Utilisation.
  • Memory or RAM Utilisation.
  • UPS performance statistics such as total charge.
  • Hyper-V and VMware statistics.

Crucially, long term performance statistics can be turned into threshold alerts which allow you to monitor for breached thresholds. For example, you might be interested to receive an alert each month for devices which have consumed more than 90% of their available disk capacity. This type of quick-to-hand information is invaluable when avoiding a major outage due to exhausted disk space on a critical resource.

The network of today very rarely lacks some element of cloud infrastructure. Whether it be for development, hosting an externally facing service or to reduce your data center footprint, networks are becoming ever more hybrid and cloud aligned.

There are a number of cloud providers offering hosting options today. Some of the more popular include:

  • Microsoft Azure.
  • Amazon AWS.
  • Google Cloud.

Irrespective of the cloud provider you are using, the infrastructure you place in the cloud is likely to have some importance to your organisation or and its data processing activities. As a result, these cloud hosted devices will need to include some of the protections you would expect from any other device.

A good example of this is antivirus software. In today's security focussed world, it is almost inconceivable not to have antivirus software installed on a server hosted in an on-site data center. The same should be true for the cloud.

Check our our blog entitled "Extending Network Monitoring to AWS and Azure" to see how you can monitor cloud hosted servers and devices just as though they were inside your network.

Network monitoring solutions are being used to bridge the management and monitoring gaps between the cloud and the internal networks, as the two begin to merge. No longer is cloud seen as developmental and new, instead it is expected that it affords the same capabilities and more.

For example, cloud hosted infrastructure is priced based on both the size of the hosted infrastructure in terms of its resource requirement and the time it is in and online state. This is not typically the case for in-network hosted equipment and so there is an additional need to monitor cost for cloud.

These capabilities are of course all achievable using the APIs provided by the cloud hosting service provider. In the case of Azure, Microsoft provide an API key which can be fed into a supporting network monitoring tool, so that it can read the properties of the cloud hosted devices.

From this, there are a number of metrics which can be derived, such as:

  • Bandwidth.
  • Device health, such as disk space.
  • Online and offline states.
  • Total accumulated cost for a period of time.
  • Connected users.
  • Running services.

This is not an exhaustive list, as the API capabilities of cloud hosting service providers are being constantly updated. For a full list of Azure's API references, click here.

Once you are monitoring your cloud infrastructure in a manner similar to the in-network devices, you can also benefit from the same alerting and incident response features. Such as alerting by email to state changes, thresholds which indicate impending issues being met and corrective actions being executed.

Netflow is a diagnostic and analytics protocol, originally created by industry giant Cisco. It is used to collect and record all IP Traffic going to and from a network device which has the netflow function or capability enabled. This collected packet data is then usually forwarded to a netflow analyser or network monitoring solution where it is collated and presented in a readable format..

Netflow is incredibly revealing. Whereas network device monitoring using SNMP can reveal hardware issues or network interface utlisation; netflow can reveal detailed information about data packets themselves, such as:

  • Source.
  • Destination.
  • Size.
  • Port number.
  • Protocol.
  • Class of service.
  • Errors.

This data can then be presented in a format which highlights problem areas or trends. For example, spikes in traffic overnight to a cloud-based backup solution using HTTPS.

Due to its obvious advantages, other network device manufacturers were quick to produce netflow like features themselves, such as Juniper JFlow; and HP SFlow.

How Does Netflow Work?

Netflow consists of two main components. The netflow cache and the netflow exporter.

In the case of the cache, this is a temporary holding space in system memory where data flows are held before being handed to the exporter for delivery to your configured netflow analyser or network monitoring tool.

Netflow attempts to identify flows or strings of related network packets, rather than treat each individually. This helps to understand the context of network conversations.

Each time a packet is received on network device, its source, destination, port numbers, protocol, TOS byte and input are analysed to determine the flow it belongs to. Once identified, it is then added to its respective flow and stored in the netflow cache.

Once the netflow cache reaches its maximum size or its time to live value expires, the contents of the netflow cache are exported to a configured destination determined by you. This could be a dedicated netflow analyser tool or a full network management suite which accepts netflow as a complimenting feature.

Network monitoring solutions such as the widely acclaimed Ipswitch WhatsUp Gold includes an extension for netflow analysis. With drill-down reports and real-time dashboards, you have complete visibility of your network traffic.

How to Turn on Netflow

For detail and precise steps for turning on netflow or any of its rival derivatives, it is recommended that you refer the manufacturer's guidance.

Network monitoring solutions are passive to netflow traffic being sent to them typically on ports 9999 or 9995. Some network monitoring solutions will allow you to utilize their existing connection the network device via SNMP to configure and enable netflow. Saving you the need to find the manufacturer's instructions. 

Benefits to Using Netflow

While some free analysers do exist, they are limited in functionality and will often restrict the number of sources; and so you will be left asking whether or not paying for a solution or a plug-in for netflow is a nice to have or is a worthwhile investment.

A number of our customers use netflow analysing features and have cited different reasons, including:

  • Understanding why network speeds would slow at particular times in the day.
  • Discovering how much traffic related to internet browsing during working hours.
  • Monitoring large file transfers or cloud destined backups during the night.
  • Understanding the makeup of traffic in the network.
  • Discovering bottlenecks which need correcting.
  • Discovering outbound routes, some of which had been thought to have been disused.

In all cases, our customers have been happy about the information which netflow analysis has revealed; and have been able to apply some corrective action where the result was undesired.

For more information about why you should be monitoring netflow information, check out our blog entitled "Six Reasons to Monitor Netflow with Network Traffic Analysis".

For almost any network monitoring solution project, a core business outcome will be the proactive alerting of service outage before it takes place, so that such disruption can either be avoided or contained early enough that it is minimised.

Therefore the alerting and incident response capabilities of any selected or implemented network monitoring solution is of paramount importance.

Alerting functions tend to provide two major forms.

  • Immediate alerting.
  • Threshold alerting.

In the case of immediate alerting, a message of some kind is sent to alert to a current state or just changed state. For example, if a network switch fails to reply to ICMP or ping packets within a 60 second window. The state is assumed to be down and an immediate alert sent.

The former warns or thresholds being met, which could indicate a problem developing in the near future. An example of this might be a hard disk in a server reaching 90% capacity. The server is still operational, however has been flagged as a device which may need some remediative action to avoid a future outage. This could also be referred to as predictive trending analysis.

In either case, the mechanism for delivering the alert may vary from one of the following actions:

  • Email.
  • SMS.
  • Pager alert.
  • Syslog.
  • SNMP trap.
  • Write to log file.
  • IFTTT interaction.
  • Integration with a third-party solution, such as ServiceNow.
  • Post into Slack or other team chat utility.
  • Push notification on a smartphone app.

Have you come across IFTTT before? In a recent blog post, we used IFTTT to generate alerts from a network monitoring solution which can be sent to almost any internet enabled device. Read more here.

Different devices may be owned and maintained by different teams, meaning alerts must be routed to the correct parties. In addition, it might be wise to think about having escalating alerting, whereby if a device remains in a state for a period after the first alert has been sent, another can be sent via a different means or to a different recipient. For example, should a VMware become unavailable, in the first instance email the virtualisation team. If VMware remains unavailable for a further 30 minutes, send an SMS to the manager of the virtualisation team.

With support teams mobilised at the point of the outage taking place, the road to resumed service should be much shorter. Not to mention the preemptive fixes made by those threshold based alerts.

Where a fix is known, network monitoring solutions can become incident response tools and perform corrective actions. For example, should a known problematic Windows service turn to an off state, a network monitoring solutions can detect this and restart it, resulting in minimal impact.

The follow are some of the possible corrective actions:

  • Execute a powershell script.
  • Execute a batch file.
  • Take a VMware based action.
  • Interact with an API.
  • Restart a service.
  • Run an SSH command or an SNMP write command.

With the use of APIs or scripting, almost anything is achievable as a corrective action.

Take another example of there being two service providing servers, one of which is accidently taken offline. A network monitoring solution could detect this has happened and send an SSH to command to a critical router which changes the routing path from the offline server, to one which has been sitting in a cold backup site.

In today's world, it is ever increasingly important to maintain high levels of availability for both internal services and those which are public facing. Employees demand remote working capabilities which are leading to an increase in non-standard working hours; and an organisation's presence online means it is expected to provide a service at a 24-hour convenience.

This style of hyper-availability has ultimately lead to the need for hyper-resilience in the face of both cyber threats and loss of service.

So you are interested in using a network monitoring solution? Good, our article writing skills were not in vain.

The question then becomes, whether to go with an off-the shelf commercial solution; or a freeware option. The freeware / open source / DIY option is a question which arises in any new project as a way of saving on cost. After all, good software solutions are not cheap and justifying the need to senior managers can often be an art in itself.

DIY Options

A DIY build of a network monitoring solution is usually not formally planned. It just starts and evolves as your requirements dictate. Over time, it usually becomes the responsibility of a very small number of people or even a sole individual, within the organisation who becomes the owner for the home grown solution.

One thing which all organisations who adopt freeware or create their own solutions agree on, is is that ongoing maintenance of such solutions can consume a considerable amount of at least one, but usually a few people’s time. Smaller organisations have reported that, on average, one of their skilled IT operations personnel needs to spend up to 40% of their time maintaining their home grown tool.

The cost of building the initial version of the tool is often not pre-calculated, but let’s assume for a small IT services organisation they allocate one experienced IT operations engineer for 50% of his or her time to develop a solution over a period of six months. If that engineer has a £45,000 salary. The initial Build Cost is £11,250 (£45,000 x 0.5 x 0.5).

The cost of maintenance.

Given the constant rate of change in the technology sector, it’s reasonable to assume that up to 30% of an engineer’s time will be needed to maintain and update the DIY monitoring solution, which would include adding new functionality to meet ongoing requirements. The annual maintenance cost would therefore conservatively be £13,500 (£45,000 x 0.3) each year.

The opportunity cost.

This is usually a hidden cost that many organisations fail to factor in at all. IT services organisations and many in-house IT departments charge their customers a fixed hourly or daily rate for their qualified engineers. So again, let’s assume that one of the IT engineers is spending 30% of their time annually maintaining the in-house solution instead of providing the service they usually would provide to your customers or departments.  We can make the following additional assumptions based on what would be typical industry norms

  • The engineer’s Daily Charge Out Rate is £325/day
  • The number of billable days per annum per engineer is 220

The opportunity cost, or “lost revenue” that your business has missed out on because of your engineer maintaining your in-house tool is £325 x 220 x 0.3 = £21,450 per annum

Therefore the initial cost £11,250 and the ongoing cost is £34,950.

This might be a palatable amount depending on the size of the organisation and the use case.

Freeware & Open Source Options

Free network monitoring tools are popular among smaller organisations who find it harder to justify IT spending.

Organisations who use freeware options tend to return to off-the-shelf commercial offerings later in life, whether it be because they have larger budgets or have had a poor experience. Some of the cited reasons we have noted are:

  • Significant difficulty in customisation, often having to be achieved with in-depth scripting knowledge.
  • Lack of support for when things go wrong or a customisation is required.
  • Vulnerabilities remaining unresolved in the solutions long after disclosure.
  • Lack of support for new devices to be monitored.
  • Difficulties in upgrading or migrating.

Freeware and open source software might be a quick win for the finance or procurement department, however the ongoing difficulties mean that network monitoring projects are far too often abandoned.

Off-the-Shelf Commercial Software

Commercial network monitoring software offered by industry vendors offers the best option in our opinion. With dedicated support, routine development of the solution and lower ongoing costs.

Network monitoring solutions are primarily licensed in two manners. Monitor or sometimes called sensor based licensing; and device licensing.

Monitor based licensing is priced per monitor applied to a device. For example, if you wanted to monitor all the ports on a standard 48-port switch, you would need to factor in a cost of 48 x monitor price. The price of a monitor is typically much less but you will need to purchase more of them.

Device based licensing takes the view that all monitors are free so long as there is a license for a particular device available. 100 to be monitored is 100 x the device price and each device can be monitored with an unlimited number of monitors. In our example with the 48-port switch, the switch would consume once license and all ports would be monitored by default. Of course this makes the price of a device license higher than a monitor license, however you will need fewer of them.

Whichever option best suits, it is important to factor in a five year pricing plan with growth expectations to ensure that you are investing in the right tool. With many solutions offering perpetual licensing, the year 1 investment is high and so make a mistake in solution choice is a costly one.

Which common tool comes with all versions of Windows and is used to create a baseline on Windows systems?

Sysprep (System Preparation Tool) Sysprep is Microsoft's System Preparation tool intended to duplicate, test and deliver new installations for the Windows operating system based on an established installation. It is a command-line tool that can be run manually or through a script.

Which core function is sent by the agent after the SNMP manager queries an agent with a Getrequest or Getnextrequest?

Get-response. Sent by SNMP agent in response to a get-request, get-next-request, or set-request message.

What is SNMP protocol and how it works?

SNMP relies on a client-server application model, where a software server component (the SNMP Manager) collects information by querying a software client component (the SNMP Agent), which runs on a network device. You can also configure the SNMP Agent to send information to the manager without being queried.

What user datagram protocol UDP ports does SNMP use for secure communication?

SNMP generally uses User Datagram Protocol (UDP) port numbers 161 and 162.

Toplist

Neuester Beitrag

Stichworte