[HOWTO] 0.8.7 and 1 minute polling

If you figure out how to do something interesting/cool in Cacti and want to share it with the community, please post your experience here.

Moderators: Developers, Moderators

Post Reply
tekbot
Posts: 49
Joined: Tue Jun 07, 2005 7:42 pm
Location: Venice, CA

Post by tekbot »

Settings Amiss (as mentioned above):

Your Cisco CPU usage Data Template appears to be the default, 5 minute averages all around with a Step of 300 and a Heartbeat (presumably) of 600. This is fine, but the fact that you're not getting data on this graph in addition to your interface graphs says to me that the issue is not in the fact that you created NEW RRAs as described in my first post on this thread.

So, the issue lies elsewhere. A few questions I'd like you to answer:
Are you using any plugins (namely boost)?
Are you using the 0.8.7a Spine binary or cmd.php for your poller?
Why are you running the poller as root? (this shouldn't really matter, but it's best practice to run the poller as a cacti user, and make sure the log file and RRA directory are owned by this cacti user).
How many devices in your infrastructure are you polling on this system? A lot? A few? Routers only? Windows / Linux boxes, switches, etc...?
Is this the only cacti installation on this server?

Please answer these questions and post the output from the tail | grep I requested previously and we should have a better idea of what's going on.

Thanks,
tekbot
schef4711
Posts: 19
Joined: Tue Jul 12, 2005 1:49 pm
Location: Argentina
Contact:

Post by schef4711 »

Hi,
tekbot wrote:There are a few things that looked amiss on your Data Template that I'll comment on shortly. Sorry about the delay in getting back to you, it's been a very busy week for me.
don't hurry up - delay will not be a problem because at the moment I will use the 1min avg only in "test mode" :) So maybe other work is more important 8)
tekbot wrote:I don't think that's it, Schef, but zoom in on the gaps to verify. If you see 0, it's fine, if you see NaN (which I expect) it's a bug with the poller / Data Template / Configuration.
You're right - I will see "NaN" on Current/Averrage/Maximum and 0 Bytes (within the Bytes graph) or 0 mbit in+out (within the 95% graph) when I look inside the gaps.
tekbot wrote: Do me one other favor when you have a chance: go to your cacti.log directory and run the following for 5-10 polling intervals.
I can do this but it need a little bit of time to find out the right graphs because at the moment each router/switch has many graphs. I will create a new device with only 2 interfaces where I will have the problem to reduce the output.

I will be back soon

thx a lot
alex
schef4711
Posts: 19
Joined: Tue Jul 12, 2005 1:49 pm
Location: Argentina
Contact:

Post by schef4711 »

tekbot wrote:Settings Amiss (as mentioned above):

Your Cisco CPU usage Data Template appears to be the default, 5 minute averages all around with a Step of 300 and a Heartbeat (presumably) of 600. This is fine, but the fact that you're not getting data on this graph in addition to your interface graphs says to me that the issue is not in the fact that you created NEW RRAs as described in my first post on this thread. So, the issue lies elsewhere.
surely, maybe there is another problem in fact off no changing nothing to that data template but the best question is "where we should search about it". So I thinking more and more that there is a bug in the poller.
tekbot wrote: A few questions I'd like you to answer:
Are you using any plugins (namely boost)?
Is this the only cacti installation on this server?
no, it is a fresh installation on a new server without any installation before. So there was no upgrading, adding or changing of any file of cacti.
tekbot wrote: Are you using the 0.8.7a Spine binary or cmd.php for your poller?
Why are you running the poller as root? (this shouldn't really matter, but it's best practice to run the poller as a cacti user, and make sure the log file and RRA directory are owned by this cacti user).
I use cmd.php as my poller. Spine isn't installed on that machine. Surely root shouldn't be but I can change it - this isn't the problem - but I don't think that will have a newer effect :cry:
tekbot wrote: How many devices in your infrastructure are you polling on this system? A lot? A few? Routers only? Windows / Linux boxes, switches, etc...?
At the moment the "localhost" is disabled, I have two cisco switch (one with 63 and the other one has 35 graphs), two cisco router (23 and 29 graphs) and 3 zyxel modem (each has 4 graphs). So in total 4 Cisco's with 150 graphs and 3 Modem with 12 graphs :) Like nothing for a Dual Opteron 848 with 4GB and 1TB storage 8)
tekbot wrote: Please answer these questions and post the output from the tail | grep I requested previously and we should have a better idea of what's going on.
the "tail -f" file http://www.buenosair.es/mrtg/20071207cactilog.txt with the last 13 polling outputs . The DS[71] is the 5min CPU graph, the DS[74] (95% graph) and the DS[88] (Bytes Total graph) are the graphs as I posted the images before with the gaps.

If you need some other output please let me know.

thx a lot for helping
alex
Last edited by schef4711 on Fri Dec 07, 2007 2:15 pm, edited 1 time in total.
schef4711
Posts: 19
Joined: Tue Jul 12, 2005 1:49 pm
Location: Argentina
Contact:

Post by schef4711 »

Here also the whole information of the poller output (cacti.log) for my graphs where I will have gaps (not only that ones are affected) :

DS[71] (5min avg CPU)http://www.buenosair.es/mrtg/20071207cactiCISCOCPU.txt(270KB)
DS[74] (95% graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS74.txt(2.9MB)
DS[88] (Bytes graph)http://www.buenosair.es/mrtg/20071207cactiCISCODS88.txt(2.9MB)

This "grep's" are since that time I had installed cacti and had configured the graph.

bye alex
krap_rz
Posts: 26
Joined: Thu May 18, 2006 5:23 am
Location: Cyberjaya, Malaysia
Contact:

Post by krap_rz »

Hi tekbot

After reading your guide note, i was quite interested to update my cacti from 6j to 7a (waiting for stable ver.) and i am looking to change the polling time to less than 5minutes (maybe 3 or 4 minutes).

But looking at your post does give me a thinking cap to see whether going for 1min polling.

It would much appriciated if tekbot could offer us some screen capture of your RRA, console setup, adjustments made in data source, poller and etc.
Hope you have time for this as I believe many would like to see how you do it. :wink:

Appreciate your help on this. Please advice. thanks again.
KeyBoarD Is MightieR ThaN ThE sWorD, iF onLy ConNecTed tO tHe InTernET..
tekbot
Posts: 49
Joined: Tue Jun 07, 2005 7:42 pm
Location: Venice, CA

Post by tekbot »

Sorry about the delay in getting back to you guys. Here's a handful of screenshots. The first is of my custom RRA settings. The next is of a modified CPU Data Template. I threw in one of my 10second graphs as well to show the granularity -- this a 12 hour view of 2 10 second data sources with a cdef that calculates the Net Gain and Loss. For more detailed information, refer to my earlier posts in this thread.

Hope all this helps!
Attachments
12 hour view of a 10s graph.  This graph includes 2 10 second data sources, and a CDEF that calculates the Net Gain / Loss.
12 hour view of a 10s graph. This graph includes 2 10 second data sources, and a CDEF that calculates the Net Gain / Loss.
03 - Client Connect.png (49.89 KiB) Viewed 21576 times
Modified Data Template for standard CPU Data Source.  Note the selected RRAs, Step and Heartbeat values.
Modified Data Template for standard CPU Data Source. Note the selected RRAs, Step and Heartbeat values.
02 - CPU Data Template (1 min).png (90.76 KiB) Viewed 21576 times
Custom RRA Settings for storing 10s, 1m, and 5m graph data as per my first post in this thread.
Custom RRA Settings for storing 10s, 1m, and 5m graph data as per my first post in this thread.
01 - RRA Settings.png (61 KiB) Viewed 21576 times
marcmo
Posts: 27
Joined: Wed Sep 21, 2005 3:39 pm

Post by marcmo »

If one wanted to experiment with 10 second polling would the Data Source step be 10 and the heartbeat be 20?
soloslinger
Posts: 32
Joined: Fri Jan 19, 2007 2:11 pm

Post by soloslinger »

The part I don't understand about the 1 minute polling is, if the poller is scheduled on the cron to run every 5 minutes, inbetween those intervals, how is data then gathered?? In other words, if the poller isn't gathering data every 60 seconds, what is?? Where do the other 4 numbers sampled come from??


soloslinger
agreusel
Posts: 2
Joined: Tue Mar 25, 2008 3:27 pm

Post by agreusel »

soloslinger wrote:The part I don't understand about the 1 minute polling is, if the poller is scheduled on the cron to run every 5 minutes, inbetween those intervals, how is data then gathered?? In other words, if the poller isn't gathering data every 60 seconds, what is?? Where do the other 4 numbers sampled come from??


soloslinger
This is confusing me as well...

From what I've gathered, for this to work, you need the following:

- The poller.php entry in the crontab set to */5 (every 5 minutes).
- [Settings -> Poller -> Cron Interval] set to "Every 5 Minutes".
- [Settings -> Poller -> Poller Interval] set to "Every 1 Minute".

If tek, or someone else, could confirm this for me, I'd greatly appreciate it.
User avatar
TheWitness
Developer
Posts: 16897
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Post by TheWitness »

Quite simply, if you set the cron interval to 5 minutes and the poller interval to 1 minute, the poller will run 5 times and exit.

If you set the cron interval to 5 minutes and the poller interval to 10 seconds, the poller will run 30 times and exit.

If you ever change a poller interval for an existing data source, you have to delete the corresponding rrdfiles (sorry, it's rrdtool).

If you change a poller interval for a data template, you should likely repopulate your poller cache to re-distribute the polling of data sources.

If you were previously polling at 1 minute with a 5 minute RRD to compensate for not having a 64bit counter available, then you have a problem as that was not considered as a part of the design. What I mean by that is that if you have 32bit counters and you poll a device 5 times in 5 minutes to allow RRDtool to store the average of those 5 samples, the design of the poller interval did not take that into account. I suspect that is a corner case as most "high bandwidth" devices are "modern" (net-snmp 5.2++) and otherwise are network electronics which typically support snmpv2/3 and 64bit counters.

TheWitness
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
CPF
Posts: 27
Joined: Sun Aug 28, 2005 8:25 am

Post by CPF »

Hi tekbot,

Just a quick post to say thanks, this has proved to be a really useful thread to understanding how the 1 minute poller works.

The last post from TheWitness also really made it click for me.

This 1 minute polling is already giving me greater visibility of my network. See picture for how easy it is to miss some short-lived traffic spikes.

To The Forum Admins - I know it's already a Sticky, but could/should tekbot's post be moved or linked to the 'How To' Section of the Forum?

(I'm also sure that we're all eagerly awaiting the release of version 0.8.8 too)

Thanks to all the Cacti Team.

Graham.
Attachments
1-min-test.png
1-min-test.png (69.98 KiB) Viewed 20547 times
niobe
Cacti User
Posts: 228
Joined: Mon Mar 10, 2008 6:52 pm
Location: Australia

Post by niobe »

I have my cron and poller intervals set to 1 minute as well as my crontab. I have also created most data sources with interval 60 and heartbeat 120.

Still unclear after reading this what I am actually missing? Are the rrds being updated every minute for five minutes with the same number?

One reason I do this is so that weather maps are recreated every minute, which works a treat - but confused as I am pretty sure the numbers change every minute.
phila
Posts: 20
Joined: Fri Mar 07, 2008 3:47 pm

Not have to recreate graphs

Post by phila »

Hi all,

Is there a way to avoid recreating 1500+ graphs that I have if I want to have 1 min resolution? Of course old data will stay at old resolution, but that new is added at 1min res?
Recreating them all by hand would be such a waste.

Thanks,
User avatar
gandalf
Developer
Posts: 22383
Joined: Thu Dec 02, 2004 2:46 am
Location: Muenster, Germany
Contact:

Post by gandalf »

I'm quite positive that there's no chance. The main problem is the step size/heartbeat. Step size cannot be changed for existing rrd files.
Reinhard
tmnewton
Posts: 1
Joined: Wed May 21, 2008 8:27 am

Completely confused

Post by tmnewton »

Please forgive my ignorance. I have read this post over and over trying to understand how to get the one minute polling to work and reflected in my graphs. I understand leaving cron to run every five minutes and to set the poller interval at one minute. This basically starts the poller every five minutes, polls the devices once a minute for five minutes, and then the poller process ends. My understanding ends here. In tekbot's long post regarding the custom rra's and template's, he states that step is defined as how many polls is required to average the data and enter it into the rrd. So, to get the 10 second granularity that he is stating, his polling interval has to be set at 10 seconds, right? For his 1 minute average, his step is defined as 1, shouldn't that be 6 (6 polls x 1 minute)? For his five minute average, his step is still defined as 1, shouldn't that be 30 (6 polls x 5 minutes)?
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest