This is the first of a few blog entries, discussing honeypots, further experiments and measurements.

During the last few months, I started operating multiple (2) honeypots. This blogarticle describes how to setup the honeypots, how to monitor them an what we can learn from operating honeypots (and what not).

What is a honeypot?

A honeypot is a piece of software (or, sometimes, even hardware - we'll focus on software honeypots for this time) which simulates to be a specific service - for example: SSH -, but doesn't process incoming requests the way a normal SSH server would process it. Instead, it logs the incoming request and may execute further actions. For example, one could write and configure a honeypot in a way that it locks the connecting client away utilizing a firewall. That'd lock out attackers from your sensitive services if they are bump into the honeypot before.

Motivation

Simply put, it's possible to use a honeypot either directly in the internet, or inside a network ("intranet"). There are other ways to categorize honeypots, however, for the introduction, let's focus on two types.

On the intranet, a honeypot can be used to detect attackers which were able to circumvent your carefully placed firewall and your thoroughly configured Intrusion Detection System. If we notice a connection to our honeypot, although there is no service in our intranet which is using the services provided by the honeypot, we can be certain that we got an attacker in our network. To raise the bar for the attacker even further, one could set up the service landscape that some servers are connecting to the honeypot, but are whitelisted. That way, an attacker couldn't even determine whether or not a particular service is in fact a honeypot by monitoring the network traffic. However, the whole intranet-honeypot-concept relies on an attacker being interested in the honeypots services; if it's not interesting (or suspiciously too interesting), an attacker will leave the honeypot untouched. That means that honeypots are a good addition to existing services protecting your intranet.

Directly placed on the internet, a honeypot can detect botnets, port scans or other types of clustered, "big range" attacks. That's useful to generate firewall rules protecting your sensitive servers from automated scans. It cannot protect you from targeted attacks, of course. The most interesting part is that you are able to analyze the way a botnet or script is attacking your server, automated.

In this article, we'll focus on analyzing usernames and passwords used to log in to an SSH service - or, at least, try to log in.

In this article, we'll focus on analyzing usernames and passwords used to log in to an SSH service - or, at least, try to log in and monitoring the login attempts based their locations.

Setup

We spawned three servers in three locations: Nürnberg, falkenstein and Helsinki (thanks, Hetzner Server Cloud!. Their IP addresses were userd before for sure. That meaining that it can't be ensured that a certain login attempt was tarteted at out service., bt the service of the previous server ownder. This may falsify the results - however, let's ignore that very fact, as there is no way to differ the connection attempts.

Then, we installed a small service @maride wrote before in Golang, which simply logs username and password of an incoming SSH connection:

import (
    "github.com/gliderlabs/ssh"
    "log"
)

func main() {
    ssh.ListenAndServe(":22", nil, ssh.PasswordAuth(handlePass))
}

func handlePass(ctx ssh.Context, pass string) bool {
    log.Printf("%s: '%s'", ctx.User(), pass)
    return false
}

Isn't that simple? The code is that short thanks to gliderlabs great SSH library. Check it out!
The code was then extended the example with a HTTP server and a /metrics endpoint and collected the information with a Prometheus instance. If you don't know what Prometheus is: it pulls values from pre-defined HTTP endpoints and logs them into a database.
You can find the full code here, and again, it's super simple and easy to skim over it...
The stack was then completed by a Grafana instance to show the collected information in nice graphs. The complete setup can be seen below:

The Honeypot

The actual container running the honeypot exposes two ports: port 22 and port 80.
Port 22 is used as the bait. It looks like a normal SSH server from the outside, but disconnects directly after the user has tried to loggin logging the inserted username and password. Port 80 is used to export metrics such as the total amount of hits (attempted logins) and the amount of hits from an individual country or city.

Prometheus

The Prometheus container is configured to scrape the metrics from the honeypot every few seconds. This makes it possible to track the values exposed over time. It exposes the time-series it generated on port 9090 making it possible for other services such as grafana to work with the data.

Grafana

Grafana is configured to take the time-series from Prometheus and display nice graphs. As an example, the graphs can contain the total amount of hits over time.

In the end, the logs can be displayed in a human-friendly way using grafana, but a lot more can be done with grafana and the data aggregated.

Grafana Worldmap

In addidion to the common methods for displaying metrics, grafana provides some plugins with more panels. One of these is the worldmap plugin. It can display circles at given locations. The locations can be provided using Two letter country codes and other methods listed in the plugin under Map Data Options → Location Data.

Using a geoip-service, I generated metrics mapping the ip-address of the hits to Geo-Locations that can be displayed using grafana.

The resulting maps can be seen below:

Individual countries (green = 10+, yellow = 0-10, red = <0)
Individual cities (red = <10, yellow = 10-20, red = 20+)

The circles are dynamically updated, as are the locations of the cities. So if a new hit comes from a city not registered before, a new location entry gets generated using the data from the geoip-provider.

After creating metrics exposing Two letter country codes, I dug deeper and realized that the geoip service I was using returns the location data of the individual IP more exact than just as a country.

The logical next step was to try to map the hits as exact as possible.

This can be done by providing a custom json-endpoint for the worldmap-plugin to get the location of the hits. So lets say we've got a hit. We can then use the geo-ip service to locate the position of the hit, add it to the location endpoint and let grafana update the map using the new location we added into the endponint by matching it with a label we set in our metric.

# metrics

...
a_metric{city="cairo"} 2
...

The metric above tells us, that there were two hits from cairo. In order to tell Grafana where to draw the circle, we need to add cairo to the custom json endpoint as seen below.
The endpoint has got a field named "key" used for mapping the metrics to specific locations. The fields "latitude" and "longitude" describe where to draw the individual point and the field "name" describes what to display when hovering above the circle.

# custom json endpoint

[
    ...
    {
        "key": "cairo",
        "latitude": "30.0778",
        "longitude": "31.2852",
        "name": "Cairo"
    },
    ...
]

Errors

Setting up the worldmap-plugin with citis worked flawlessly, but providing a cusom json endpoint just didn't work. I later found out why: First, the json endpoint did not support https resulting in grafana having problems with fetching data from it. The second problem had to do with missing http headers: The "Access-Control-Allow-Origin" needs to be set, so that grafana is allowed to access the data (set the value of the header field to the url of your grafana instance).

Learnings

The main thing I learned from operating two honeypots for a few months was that 95% of all the hits come from some devices in China and that they've overall probably tried logging in 1.5 million times altogether.

About 150.000 Hits in exactly two weeks.

It can also clearly be seen that there is a constant growth in the amount of hits, hence a constant background "noise".