Visualizing Network Traffic

Manny Lara
5 min readApr 13, 2019

I. Introduction

When you think of traffic analysis, Wireshark comes to mind. Wireshark is a great tool for packet capture and analysis as it provides everything you need to be effective. However, to those picking up Wireshark for the first time, looking at it can be intimidating. Even just scrolling through all the packets can be overwhelming if you don’t know what you’re doing. Personally, we’ve never used Wireshark to its potential, as a matter of fact, we don’t know anyone who has. At most, we’ve just used it to lookup a few things; we’ve never done serious work with it.

To that end, a script isn’t a bad idea if you only need general information. We think most people have dealt with scripts more than they’ve dealt with Wireshark, so why not streamline the process using these mediums? This paper will outline our approach to simplifying packet analysis without the hassle of learning how to use Wireshark.

II. Prerequisites

For the script we wrote, general knowledge of python is required. The requirement comes due to the fact that this script is somewhat incomplete and may need to be updated in the future. Other than that, a basic understanding of the language will do. Some knowledge of python packages would be great, but you can learn those on the fly.

Some understanding of traffic analysis. None of us are pros here, but it’s important to know what you’re looking for. The script is meant to expedite the process. At least for the scope of this project, several pieces of information should be understood. Source IPs, destination IPs, protocols, source ports, and destination ports.

III. Python

Learning python isn’t hard, that’s why it’s popular. You can get a lot done with very little code, and that’s exactly what we took advantage of. The most complex this script gets is functions, which, depending on who you talk to, is relatively simple.

At the start of the project, we wanted to make a CLI because we thought that made the more sense, but after several failures and the deadline getting closer and closer, we thought a script would suffice.

So, to get a bit more in depth, we used a few packages to simplify the process. The packages were helpful because it allowed us to keep the code clean and not worry about other functions that would otherwise take too long to implement.

A. Scapy is a packet manipulation tool that makes the daunting task of processing PCAP files easy.

B. Matplotlib for graphical representation

C. Pandas for data manipulation and analysis

IV. Traffic Analysis

A. The network packet

The packet contains all the information that we need. It carries source IPs, destination IPs, source ports, destination ports, protocols, payloads, etc… This is arguable the most important piece in this project.

B. System details

When you begin to iterate through the PCAP, you will hit on a lot of information — some useful, some not so much. You need to be familiar with some terminology relating to browsers, operating systems, websites, language, and IP addresses.

C. Source & destination ports

Another important aspect is understanding ports and services. Conventionally speaking, certain services run on certain ports, however, the reality is any service can run on any port. Some ports and services to take note of would be:

21, 22, 23, 25, 53, 80, 143, 194, 443

D. Source & destination IPs

Being able to recognize these is half the battle. Being able to turn these into something useful is the other half. These will tell you who went where.

E. Data Processing

This is where the analysis part comes into play. With all the information we harvested, now we need to turn it into something understandable, after all, that’s the reason we’re writing this script instead of using Wireshark. However, for this project, we will just be looking at general data.

V. Implementation

A. The first couple lines of the script are the imports. As previously mentioned, we’re taking advantage of all the packages available in python. This will make the code shorter, cleaner, and ultimately, easier to understand and manipulate for yourself.

  1. Scapy: this library handles everything to do with the PCAP file. From capturing, to reading, to manipulating — this package is the backbone of this entire project. Today, packets are captured mostly with the help of packet sniffers. Packet capturers gather data from the network.
  2. PrettyTable: this library is to help visualize our data. It will be implemented in part with the other visuals.
  3. Matplotlib will help us with representing the data visually
  4. Pandas will help us give our data structure. It will also give us the ability to manipulate the data easier.

B. Functions are what make the wheels turn. They will produce the outcome as long as we provide the right files and call them at the appropriate time.

  1. printFrame( suppliedPCAP ): this function takes a PCAP file as a parameter and showcases the layers of the packet.
  2. packetInfo( suppliedPCAP ): iterates through each packet and prints out certain elements on the IP frame. In the example, the protocol, source IP and destination IP are printed.
  3. pandasFrame( suppliedPCAP ): this function starts out with creating lists for all the data we want to look at such as source IP, destination IP, protocol, source port, and destination port. From there, it iterates through the the PCAP, adding data to the appropriate lists. When it finishes, we add all that data to a dictionary called ‘packetDetails’. Lastly, we feed our dictionary into the pandas dataframe function to get the dataframe.
  4. pTable( suppliedPCAP ): This function implements the PrettyTable library. First off, it creates a list to store every source IP in the packet. Next it creates a dictionary using the Counter library. Next we iterate through the data frame series, adding the source IP as key and incrementing the counter appropriately. The most_common() function will sort the dictionary (high to low) and we will iterate through it, while adding the entries to our table
  5. plot( suppliedPCAP ): initially, it starts with calling pTable() and catching the returned ‘count’ dictionary. Next, ‘cout’ is parsed and it’s keys and values are split into different lists. From there, the bar plot is constructed and shown

VI. Future plans

In the future, we’d like to add the functionality of plotting the rest of the data such as: destination IP, source ports, and destination ports. In addition, we’d like to see if there’s an option to plot them all on the same graph and be able to switch views on the fly. We think this type of data visualization is important because it helps you construct a clear image of what the traffic is doing and where it’s going.

On top of that, it tells a story about the traffic. A picture of what the packet capture looks like in Wireshark was included, and although it’s a great resource, you can’t tell whats going on. With the data visualization in this script, you're able to see whats going off right off the bad. Also included was a pandas data frame to help with the data structure, so in the future if more is to be added, it can be done simply by parsing the data frame.

--

--