Partial Truth Only Results in Assumptions
A common gripe for Network Engineers is that their current network monitoring solution doesn’t provide the depth of information needed to quickly ascertain the true cause of a network issue. Imagine reading a book that is missing 4 out of every 6 words, understanding the context will be hopeless and the book has near to no value. Many already have over-complicated their monitoring systems and methodologies by continuously extending their capabilities with a plethora of add-ons or relying on disparate systems that often don’t interface very well with each other. There is also an often-mistaken belief that the network monitoring solutions that they have invested in will suddenly give them the depth they need to have the required visibility to manage complex networks.
A best-value approach to NDR, NTA and general network monitoring is to use a flow-based analytics methodology such as NetFlow, sFlow or IPFIX.
The Misconception & What Really Matters
In this market, it’s common for the industry to express a flow software’s scaling capability in flows-per-second. Using Flows-per-second as a guide to scalability is misleading as it is often used to hide a flow collector’s inability to archive flow data by overstating its collection capability and enables them to present a larger number considering they use seconds instead of minutes. It’s important to look not only at flows-per-second but to understand the picture created once all the elements are used together. Much like a painting of a detailed landscape, the finer the brush and the more colors used will ultimately provide the complete and truly detailed picture of what was being looked at when drawing the landscape.
Granularity is the prime factor to start focusing on, specifically referring to granularity retained per minute (flow retention rate). Naturally, speed impediment is a significant and critical factor to be aware of as well. The speed and flexibility of alerting, reporting, forensic depth, and diagnostics all play a strategic role but will be hampered when confronted with scalability limitations. Observing the behavior when impacted by high-flow-variance or sudden-bursts and considering the number of devices and interfaces can enable you to appreciate the absolute significance of scalability in producing actionable insights and analytics. Not to mention the ability to retain short-term and historical collections, which provide vital trackback information, would be nonexistent. To provide the necessary visibility to accomplish the ever-growing number of tasks analysts and engineers must deal with daily along with resolving issues to completion, NDR, NTA and general Network Monitoring System (NMS) must have the ability to scale in all its levels of consumption and retention.
How Should Monitoring Solutions Scale?
Flow-Based Network Detection and Response (NDR) / Network Traffic Analysis (NTA) software needs to scale in its collection of data in five ways:
Ingestion Capability – Also referred to as Collection, means the number of flows that can be consumed by a single collector. This is a feat that most monitoring solutions are able to accomplish, unfortunately, it is also the one they pride themselves on. It is an important ability but is only the first step of several crucial capabilities that will determine the quality of insights and intelligence of a monitoring system. Ingestion is only the ability to take in data, it does not mean “retention”, and therefore could do very little on its own.
Digestion Capability – Also referred to as Retention, means the number of flow records that can be retained by a single collector. The most overlooked and difficult step in the network monitoring world. Digestion / Flow retention rates are particularly critical to quantify as they dictate the level of granularity that allows a flow-based NMS to deliver the visibility required to achieve quality Predictive AI Baselining, Anomaly Detection, Network Forensics, Root Cause Analysis, Billing Substantiation, Peering Analysis, and Data Retention compliance. Without retaining data, you cannot inspect it beyond the surface level, losing the value of network or cloud visibility.
Multitasking Processes– Pertains to the multitasking strength of a solution and its ability to scale and spread a load of collection processes across multiple CPUs on a single server. This seems like an obvious approach to the collection but many systems have taken a linear serial approach to handle and ingest multiple streams of flow data that don’t allow their technologies to scale when new flow generating devices, interfaces, or endpoints are added forcing you to deploy multiple instances of a solution which becomes ineffective and expensive.
Clustered Collection – Refers to the ability of a flow-based solution to run a single data warehouse that takes its input from a cluster of collectors as a single unit as a means to load balance. In a large environment, you mostly have very large equipment that sends massive amounts of data to collectors. In order to handle all that data, you must distribute the load amongst a number of collectors in a cluster to multiple machines that make sense of it instead of a single machine that will be overloaded. This ability enables organizations to scale up in data use instead of dropping it as they attempt to collect it.
Hierarchical Correlation – The purpose of Hierarchical correlation is to take information from multiple databases and aggregate them into a single Super SIEM. With the need to consume and retain huge amounts of data, comes the need to manage and oversee that data in an intelligent way. Hierarchical correlation is designed to enable parallel analytics across distributed data warehouses to aggregate their results. In the field of network monitoring, getting overwhelmed with data to the point where you cannot find what you need is a as useful as being given all the books in the world and asked a single question that is answered in only one.
Network traffic visibility is considerably improved by reducing network blindspots and providing qualified sources and reasons of communications that impair business continuity.The capacity to capture flow at a finer level allows for new Predictive AI Baselining and Machine Learning application analysis and risk mitigation.
There are so many critical abilities that a network monitoring solution must enable its user, all are affected by whether or not the solution can scale.
Visibility is a range and not binary, you do not have or don’t have visibility, its whether you have enough to achieve your goals and keep your organization productive and safe.