Build a Homelab Dashboard: Part 9, IPMI
After a break for some homelab diagrams, its time to drive back into our homelab dashboards! This post will focus on collecting stats from IPMI sources using Telegraf. Before we dive in, as always, we’ll look at the series so far:
- An Introduction
- Organizr Continued
- Telegraf Introduction
- Grafana Introduction
What is IPMI?
If you are reading this, you likely already know what IPMI is, but just in case, here’s a brief intro. IPMI is the acronym for Intelligent Platform Management Interface. Most server manufacturers and server motherboard manufacturers have some sort of IPMI. It provides a variety of functionality like remote console redirection and live statistics often including CPU temperatures, system temperatures, fan speeds, and the like. Most allow you to even remotely mount an ISO for booting and installing operating systems.
IPMI has various names depending on the manufacturer of your hardware. Dell has iDRAC, HP has iLO, and many others just keep it simple by naming it…IPMI. Not all servers come with IPMI, so be sure to keep this in mind as you build o. Many implementations will have a dedicated RJ45 port while others have chosen to share a port.
Telegraf and IPMI
Telegraf has plugins for just about everything it would seem. Among those plug-ins is one for IPMI. But, for Telegraf to collect stats from an IPMI source, we must first have ipmitool installed.
What is ipmitool?
ipmitool is an open-source utility that essentially connects to an IPMI-enabled server and returns sensor readings. It can do far more than that, like control server power and other settings, but for our purposes, we just need it to read sensor values. We’ll start by installing ipmitool on our dashboard server. Fire up an SSH connection and use this command:
sudo apt-get install ipmitool
If you are installing ipmitool on a server with IPMI, you will likely see no errors. If, however, you are installing this on a VM with no IPMI like me, you will see something like this:
In short, it appears to be attempting to interface with local IPMI hardware that it can’t find. Let’s try a command to see if the install was actually successful:
ipmitool -H 10.0.0.193 -U username -P password sensor
Assuming our installation was actually successful, you should see something like this:
This command returns all of the sensors and values. Now let’s move on to Telegraf.
Next up, we’ll go back to our Telegraf config file and make a few modifications. We’ll open the file in pico:
sudo pico /etc/telegraf/telegraf.conf
Once the file is open, I would suggest pressing Ctrl-W and then searching for IPMI as the settings are way down in the file. Now insert this:
[[inputs.ipmi_sensor]] path = "/usr/bin/ipmitool" servers = ["username:password@lan(10.0.0.193)"] interval = "30s" timeout = "20s"
The path should match up with the default installation directory of ipmitool. The server will require a username, password, and IP address, so change those as necessary. Mine looks like this:
Use Ctrl-O to save the file and Ctrl-X to exit and we should be ready to restart Telegraf:
sudo systemctl restart telegraf
If you happen to have more than one server with IPMI, it would look something like this:
Now we can move on to making it pretty with Grafana.
Building Our Dashboard
Because we have Telegraf configured to go to the same database in InfluxDB, we already have a connection, we just need to create a new dashboard. We’ll start simple with CPU temperatures.
If everything so far has gone according to plan, we should see a new table name ipmi_sensor. This table has several fields that we can filter on. The two most important are server and field. Server is, as the name would imply, the server you are getting statistics from. Name is the name of the metric that you are going to visualize. There are also filters for host (the Telegraf server) and unit (the unit of measure).
These fields will vary greatly from manufacturer to manufacturer. Here is a configuration for an ASRock Rack motherboard:
And from a Supermicro motherboard:
Let’s set up our axis for temperatures:
Notice that we define a min and a max so that our graph to level out our changes it temperatures. Next we can get our nice table set up for our legend:
And here’s the final result:
CPU Fan Speed
While we are on the topic of CPU’s, we can take a look at CPU fan speed. I find that RPM’s work best on a singlestat panel. So we’ll start with a new singlestat panel with these settings:
In this example, we are looking at the CPU fan speed. Let’s take a look at the options for our Singlestat panel:
I’ve change the stat to be Current so that we are looking at the current reading for the fan speed. Next, I’ve included a Postfix of RPM since I can’t find a unit that matches up nicely with fan speeds (let me know if I’m just missing something here). I also chose the unit of locale format as it gave me a thousands comma. Obviously this setting could look different for you. Finally, I enabled a spark line. Essentially a spark line shows the history of the statistic at the bottom of our Singlestat panel. The end result is this:
For our last IPMI statistic of the day, I thought I would add in voltage. You have a lot of options here. My boards show CPU voltage, Memory Voltage (by channel), various main voltages (3V, 5V, etc). I decided on CPU voltage for this example. Here are my settings:
This time I’ve selected vcore1 and vcore 2. Now let’s check out the settings for the y-axis:
Notice I’ve selected Volt (V) for the unit, but I’ve also defined a range. I chose .6 and 1.35 because that is the defined operating range for my E5-2670’s. I’ve also set my decimals to 2, as the default is four places with voltage…no thanks. And now on to my table legend:
The biggest change here is that I also changed the decimals to a setting of 2. Four decimals was just too much for me. So what’s the final product:
Putting It All Together
Now that we have a variety of IPMI metrics, what does it look like when we put it all together? Let’s see:
As I mentioned, you can go further with your IPMI dashboard than what I’ve created. But, hopefully this post will provide a pretty good starting point for some pretty great IPMI dashboards. Happy dashboarding!
Nice read, as always! Already looking forward to the next one!
If i want to monitor two different servers, would i just make a second paragraph in telegraf’s config file?
You can just have the same 5 lines and past it in. I may go back and clarify that when I get a moment. Thanks!
I’ve added this to the guide as well. Thanks for the suggestion!
I’m having issues with IPMI tool, I can’t do a lan session. I am trying to do this to an iDRAC and it says unable to establish a LAN session. Does it work with iDRAC?
I don’t have access to a system with iDRAC, so I can’t say for sure. But, when I google IPMItool with iDRAC I get a ton of great content. I would start there. Does it just time out?
I get the following error.
[inputs.ipmi_sensor]: Error in plugin: failed to run command /usr/bin/ipmitool -H 192.168.1.99 -U user -P password -I lanplus sdr: signal: killed –
i get outputs if i run the same command in shell. Any help is appreciated. thanks
All caps on purpose: THANK YOU.
For future nerd readers, I had to add a -I lanplus to my iLo3 setup to test it. ipmitool -I lanplus -H 10.0.1.4 -U eeprom -P ~~~~~~~~~~~~ sensor