The proprietary Csico NetFlow protocol was invented circa 1990. It was designed to collect network traffic metadata, i.e. data about data. NetFlow, will thus collect network protocols, ports, and other activity, but not collect the actual communication content that is traversing the network. This feat of acquiring network management data that may be used for trouble-shooting and otherwise fine-tuning network performance is accomplished by analyzing network flows.
What Is a Flow?
By definition, a network flow is a series of packets that share the same source and destination IP addresses, source and destination ports, and IP protocol. The typical flow that is found on a network, and which comprises the primary source of network traffic for examination are TCP, UDP, and ICMP flows. The term flow may also has the connotation of an aggregate of individual flows, and a flow record may be considered a summary of information about a flow. This recording may outline which hosts have communicated with another specific host, and there may be temporal data as to when the communication took place. In addition, the recording may indicate by which means the data was transmitted, and provide other basic information that is germane to the host communications.
Flow System Architecture
A typical flow-based management system is comprised of three components, a sensor, a collector, and a reporting system. The sensor may also be known as probe, and may be a device or program that acquires the network traffic data and forwards it to the collector. Sensors may be a switch, router, or a firewall that has flow export capability. Alternatively, a sensor may be a software application that is monitoring an Ethernet tap, while in other scenarios, it may be a switch port set to monitor mode.
Ultimately, a sensor will take note of the status of network connections, and when it perceives that a connection has terminated, or the connection times out, the sensor will transmit the data to the collector. Ideally, if the system resources permit, we should attempt to record as much traffic as possible. Failing that, we should acquire a sample set as large as possible to facilitate us with the process of troubleshooting, and producing accurate flow analysis reports.
As stated, the collector will receive records from the sensor, and will then write them to the hard disk, though there is no official standard as to disk format for storing flow records. The collector runs on a UNIX-like operating system; and will function on virtually any modern UNIX variant system, though a BSD variant is typically recommended. As a caveat, some commercial UNIX systems such as AIX, HP-UX, and SunOS have been reported to introduce errors and complications to flow analysis implementation (Lucas, 2010). As to system resources, other than disk space, flow collection will utilize very little system resources, though added RAM and adequate memory and CPU resources will enhance flow reporting.
The reporting system will read the collector files, and the generate reports for the network administration personnel to examine. As such, the reporting system must be capable of understanding the file format of the collector, which is typically exported from the network devices in Cisco NetFlow format. Best practice dictates that the flow data be stored in a database. As an example, the publisher of “ntop” utilizes a Per1 script that will read NetFlow version 5 data from the system, which is then stored in an open-source MySQL database managementsystem (Zhenqi & Xinyu, 2008). Optimally, a good flow analysis system will provide us with the ability to collect flow information, and a good reporting system will enable us to search, filter, and print actionable flow information.
There have been a number of NetFlow iterations throughout the history of the Cisco proprietary application. In this section, we will discuss them, as well as provide a cursory overview of the open-source IPFIX.
NetFlow Version 1
The first NetFlow iteration from Cisco was Version 1, which was quickly reverse- engineered by vendors, and sold as their own proprietary flow analysis protocol. This version may still be found on a number of older devices, and is still offered by a few vendors. When considering its’ implementation, NetFlow Version 1may be adequate for some use-cases, where minimal flow information is required.
NetFlow Version 5
Of the NetFlow versions, Version 5 is the oldest and most extensively deployed of the flow record formats. This version can report on source and destination IP addresses, source port for TCP and UDP, destination port, IP protocol, the interface a flow arrived on, as well as the IP type of service. NetFlow Version 5 will also report on BGP, the exporter IP address, and additional network traffic features. Although there have been advancements in flow record reporting, NetFlow Version 5 may still fit the bill for most network entities.
NetFlow Version 7
NetFlow Version 7 is typically found in high-end Cisco Catalyst switches. Its feature-set includes a flow record format that provides information on switching and routing information,which is lacking in Version 5. Key amongst the information that it does not provide is reporting on the next hop IP address of the flow.
NetFlow Version 8
Cisco is the only provider of this rarely used version. Netflow Version 8 provides a mixture of formats that will aggregate information. This mixture may be useful in environments that consist of multiple high-bandwidth connections, and where care must be taken to reduce the quantity of system resources that are allocated to flow record collection and analysis.
NetFlow Version 9
NetFlow Version 9 is extensible, and as such, it permits the addition of third-party information by other vendors (Claise, 2004). It is template based, and while it is the first version by Cisco to support IPv6, it is also the last iteration of the Cisco’s proprietary flow reporting software.
The Internet Engineering Task Force (IETFF) attempted in the early 2000s to outline a definitive flow format standard, and to halt the existing flow format fragmentation amongst vendors. The IETF utilized NetFlow Version 9 as the foundation upon which it developed its new standard. As of this writing, IP Flow Information eXport (IPFIX) is the current flow format standard, with vendors providing IPFIX functionality, as well as backwards-compatibility to older versions. While the standard is an advancement, it is not universally deployed, as it is more complicated to implement and requires the use of greater system resources.
Comparing SNMP to NetFlow
We would be remiss in our discussion if we did not compare and contrast the features and usage of the application–layer Simple Network Management Protocol (SNMP) with NetFlow.
SNMP surpasses NetFlow for real-time usage, as it may provide granular reporting on flows every second, as opposed to the minute-by-minute aggregation reporting of NetFlow. By default, SNMP will utilize UDP port 161 and for TRAP⁄ INFORM it will use port 162 for communication. The SNMP “traps”, or notifications affords the ability to send data to an administrator when one or more conditions have occurred. Traps consist of network packets that contain data on a system component sending the trap, and may be statistical in nature or status related. In contrast with NetFlow, which is more verbose and takes up more disk space to provide information on who and with what protocol the end-user is consuming bandwidth, SNMP may collect CPU and memory utilization, while NetFlow cannot (Patterson, 2014).
As shown in Fig. 1, the architecture of SNMP consists of three components: Managed device, Agent, and the Network management station (NMS). The NMS is software that is responsible for communication with the SNMP agent and the managed device. The NMS is tasked with querying the agent, receiving agent responses, setting agent variables, and responding to asynchronous events from agents. The managed devices are typical network devices, such as routers, switches, servers, workstations, printers, UPS, and the like.
The SNMP Agent is a program that will collect data from the management information database (MIB) for processing by the SNMP manager, when it is queried. The agents may utilize the open-source Net-SNMP, or a vendors’ proprietary agent. As may be deduced, agents collect local environment management data, store and retrieve management data as defined by the MIB, and notify the manager of an event.
Alternative Network Flow Analysis Applications
Many network equipment vendors, and operating systems, such as the UNIX variant BSD may utilize NetFlow to track network protocols, ports and other activity. Though NetFlow is a very useful and widely deployed tool, in this section we will examine other applications that may perform network flow analysis.
NetFlow may be utilized to provide a report on flow records based on sampling the traffic traversing a link. However, there may be deficiencies in these reports as router memory and network bandwidth may not be sufficient while experiencing flooding attacks. That issue, and
that it may be difficult to determine an accurate static sampling rate, because no specific rate will afford the correct balance between memory and accuracy with the varied traffic mixes are illustrated in Fig. 2. Also, using heuristics routing, by means of algorithms to determine the best, though perhaps not optimal path to a destination, may not be the wisest choice for utilization with time bins. These bins or fixed intervals of time that divide a traffic stream may span in NetFlow records, which may introduce inaccuracy and complexity. Finally, it is extremely difficult to measure the number of active flows of aggregates with non-TCP traffic.
To mitigate these issues (Estan, Keys, Moore, & Varghese, 2004) have proffered an alternative to NetFlow, known as Adaptive NetFlow. Their alternative may be deployed as a router software update, to dynamically vary sampling without sacrificing accuracy. Furthermore, to address the issue of measuring non-TCP flows, they propose an optional Flow Counting Extension that requires augmenting existing hardware at routers.
The open-source application Nagios, has also proven to be very useful for network flow analysis and monitoring network problems. Originally, Nagios was created for implementation with Linux distributions, but it may also be utilized with UNIX variants. We may configure Nagios to execute assessments on specific resources and services, such as memory usage, disk usage, CPU load, active processes, and log files (Barth, 2008). As to services, we may use it to monitor SMTP, POP3, HTTP, and other common network protocols. Nagios may be used with a web-based GUI to monitor system temperature, humidity, or barometric pressure, and may be configured to restrict access to only authorized users.
The open-source Cacti application, utilizes SNMP to produce network traffic graphs of traffic that traverse between device interfaces, and will store the outcomes in fixed-size databases. Cacti was created as a front-end GUI for the data logging tool RRDtool (Urban,
2011). Typically, it will graph metrics such as CPU load and network bandwidth utilization, and may be configured for multiple users, replete with their own individual graph sets.
The Network Pseudo Device pflow
Within the UNIX variant OpenBSD, there had been netflow sensors and collectors prior to the introduction of the network pseudo device pflow in OpenBSD 4.5. For its’ functionality pflow takes advantage of the OpenBSD Packet Filter (PF) which filters TCP/IP traffic and performs Network Address Translation. To accomplish this, a state option pflow is added to the PF rules that we desire to collect NetFlow data for. The pflow interface will then export pflow data from the OpenBSD kernel by means of UDP packets. The pflow interface is capable of working with NetFlow Version 5, as well as IPFIX.
However, as with all NetFlow analysis, we require a sensor and collector. In this instance, the sensor is the pflow interface, which may be configured temporarily through ifconfig commands, or permanently by configuring the /etc/hostname.pflow0 interface configuration file. We will then need to set up and configure NFdump, as our NetFlow collector (Hansteen,
2014).This installation will also install nfcapd, nfdump, nfreplay, nfexpire, nftest, and nfgen. Finally, we will install NfSen, the graphical web based front end for NFdump by invoking the command $ sudo pkg_add nfsen. We then edit the configuration file /etc/nfsen.conf, and then may collect and analyze NetFlow data with OpenBSD.