Quantcast
Network Data

Visualizing Network Data Using Python: Part 3

Visualizing Network Data Using Python

If you have read Part 1 and Part 2 of the Visualizing Network Data with Python posts you have probably noticed one missing piece of data; graphing data over time. At first this might seem very easy – just create a list of timestamps and bytes and you are done. The problem is that you will have thousands of packets which are hard to view and most of them will be at or close to your max MTU (usually 1500 bytes).  It would look something like this and not be very helpful:



So the next thought is to group the data into bins of a set time. Which is easy to do if you use the Pandas library in your code. The first step is to install Pandas

pip3 install pandas

Building on our last example you will import Scapy, Plotly and datetime along with pandas.

from scapy.all import *
import plotly
from datetime import datetime
import pandas as pd

Recall the first few steps are to read the PCAP using Scapy and then loop through the file adding bytes and timestamps to lists. In Scapy, bytes are accessed with pkt.len and times are pkt.time. Times are in Unix Epoch, which I like to convert to a string using strftime so I can easily print them using pretty table or other tools.

#Read the packets from file
packets = rdpcap(‘example.pcap’)

#Lists to hold packet info
pktBytes=[] pktTimes=[]

#Read each packet and append to the lists.
for pkt in packets:

   if IP in pkt:
try:

           pktBytes.append(pkt[IP].len)
pktTime=datetime.fromtimestamp(pkt.time)
pktTimes.append(pktTime.strftime(“%Y-%m-%d %H:%M:%S.%f”))

except:
pass

Next we start using Pandas. In Pandas the key element is a data frame. Our data frame is made of the bytes and time stamps. Bytes are a series creates from the list we created earlier. We do that by using pd.Series , with astype(int) so it will convert them as an int not a string.

#This converts list to series
bytes = pd.Series(pktBytes).astype(int)

You will then create the list of times to a Pandas datetime element. You will use  to_datetime with the option “errors=coerce” to handle errors.

#Convert the timestamp list to a pd date_time
times = pd.to_datetime(pd.Series(pktTimes).astype(str),  errors=’coerce’)

Now you create the dataframe with the elements.

#Create the dataframe
df  = pd.DataFrame({“Bytes”: bytes, “Times”:times})

Then you will use the the element times as the index.

#set the date from a range to an timestamp
df = df.set_index(‘Times’)

If you want to do a little trouble shooting you have some options. To print the data simply issue a print(df). Or issue a df.describe to see the types of data. I also usually do a print(df.tail()) to see just the last few lines of the data .

We still haven’t binned the data. To do that we will create a new dataframe with the option resample(timePeriod). This example bin’s the data in to 2 second bins summing the data. You can also take an average using .mean() .

#Create a new dataframe of 2 second sums to pass to plotly
df2=df.resample(‘2S’).sum()
print(df2)

And just like before we will create a graph using plotly with the newly binned data.
#Create the graph
plotly.offline.plot({
“data”:[plotly.graph_objs.Scatter(x=df2.index, y=df2[‘Bytes’])],
“layout”:plotly.graph_objs.Layout(title=”Bytes over Time “,
xaxis=dict(title=”Time”),
yaxis=dict(title=”Bytes”))})

Output

The complete program looks like this:

#!/usr/bin/env python3
from scapy.all import *
import plotly
from datetime import datetime
import pandas as pd#Read the packets from file
packets = rdpcap(‘example.pcap’)#Lists to hold packet info
pktBytes=[] pktTimes=[]#Read each packet and append to the lists.
for pkt in packets:
   if IP in pkt:
       try:
           pktBytes.append(pkt[IP].len)           #First we need to covert Epoch time to a datetime
           pktTime=datetime.fromtimestamp(pkt.time)
           #Then convert to a format we like
           pktTimes.append(pktTime.strftime(“%Y-%m-%d %H:%M:%S.%f”))

except:
           pass

#This converts list to series
bytes = pd.Series(pktBytes).astype(int)

#Convert the timestamp list to a pd date_time
times = pd.to_datetime(pd.Series(pktTimes).astype(str),  errors=’coerce’)

#Create the dataframe
df  = pd.DataFrame({“Bytes”: bytes, “Times”:times})

#set the date from a range to an timestamp
df = df.set_index(‘Times’)

#Create a new dataframe of 2 second sums to pass to plotly
df2=df.resample(‘2S’).sum()
print(df2)

#Create the graph
plotly.offline.plot({
   “data”:[plotly.graph_objs.Scatter(x=df2.index, y=df2[‘Bytes’])],    “layout”:plotly.graph_objs.Layout(title=”Bytes over Time “,
       xaxis=dict(title=”Time”),
       yaxis=dict(title=”Bytes”))})

Conclusion
I hope you found this helpful. You can easily build from this on to more and more complex and unique programs to solve security and other IT problems. Since there was so much interest in this topic, I will follow up with one more complex model of data visualization in the next week. So stayed tuned for Part 4! Thanks for reading and please reach out if you have any questions.

Joe McManus, CISO

Author Joe McManus, CISO

Joe is a Senior Cyber Security Researcher at CERT and a Professor at the University of Colorado College of Engineering where he teaches graduate courses in information security and forensics. Recently, Joe was the Director of Security at SolidFire, (acquired by NetApp [NTAP]). He is an avid cyclist, climber and leads the Automox security team.

More posts by Joe McManus, CISO

Leave a Reply