NetApp provides software called Operations Manager (formally DFM) that is quite expensive. Because of this I chose to write the important parts myself. The Netapp Plugin consists of two parts. A gatherer and a displayer.
Overview:
1) The gatherer (get_netapp_stats-plugin.pl ) is nice because it allows you to see the raw stats
2) The displayer (netapp_stats-plugin.pl ) just parses the data to fit the PDK requirements.
3) The XML is the HQ plugin XML.
Requirements:
1) The filer will need passwordless ssh access for the user "hyperic" already created and working. See NA for details on doing that.
Brittle Parts:
There are some parts that are brittle that one will need to change:
1) location of script
2) the %hosts is a hash of IP to volume so you'll need to change that
3) Under get_aggregate() you must map the aggregate to the volume. DFM doesn't even do that ![]()
Usage:
1) Generally I will cron the gather to run every 3 minutes. It will then collect the stats for 30 seconds.
2) I will have the HQ poller run every 5 min.
Command line:
./get_netapp_stats-plugin.pl
./netapp_stats.pl [volume] [IP of filer]
HQ GUI
Takes the arguments of [volume] and [filer]
Gather:
#!/usr/bin/perl -w use strict; use IO::File; # CHANGE THESE ############### my $ip = 127.0.0.1 my $volume1 = "vol0" my $volume2 = "vol1" ############### my $user = "hyperic"; my %hosts = ( "$ip" => ["$volume1", "$volume2"], ); foreach my $host (keys %hosts) { # Get All Filer Data my $data = "/home/hyperic/scripts/netapp_".$host."_stats.txt"; open (DATA, ">$data"); system("ssh $user\@$host -C \"priv set advanced; statit -b logout telnet\""); sleep 30; my $aggregate_data = `ssh $user\@$host -C \"priv set advanced; statit -e; logout telnet\"`; # Get Volume Specific Data my $volumes = $hosts{ $host }; for my $volume (@$volumes) { print DATA "\n".$volume."\n"; my $volume_data = `ssh $user\@$host -C \"stats show volume:$volume\n logout telnet\"`; $volume_data=~s/sfdb.*? //g; $volume_data=~s/$volume//g; $volume_data=~s/volume:://g; $volume_data=~s/:/ /g; $volume_data=~s/ms//g; $volume_data=~s/b\/s//g; $volume_data=~s/\/s//g; $volume_data=~s/No(.*)//g; print DATA $volume_data; } print DATA $aggregate_data;
Displayer:
#!/usr/bin/perl use IO::File; my $disk_count = 0; my $aggregate_check = $ARGV[0]; my $ip = $ARGV[1]; $datafile = "/home/hyperic/scripts/netapp_".$ip."_stats.txt"; open (DATA, "$datafile"); @data = <DATA>; print "Checking file $datafile....\n"; my $volume_ref = get_vol_data(); my $aggr_ref = get_aggr_data(); for my $a1 ( sort keys %$volume_ref ) { if($a1 eq $aggregate_check) { print "read_data=$volume_ref->{$a1}{'read_data'}\n"; print "read_latency=$volume_ref->{$a1}{'read_latency'}\n"; print "read_ops=$volume_ref->{$a1}{'read_ops'}\n"; print "write_data=$volume_ref->{$a1}{'write_data'}\n"; print "write_latency=$volume_ref->{$a1}{'write_latency'}\n"; print "write_ops=$volume_ref->{$a1}{'write_ops'}\n"; } } $disk_count = ''; for my $k1 ( sort keys %$aggr_ref ) { chomp($k1); # print "\nvolume $k1\n"; for my $k2 ( keys %{$aggr_ref->{ $k1 }} ) { # print $disk_count." disk : $k2 $aggr_ref->{ $k1 }{ $k2 }\n"; $aggregate_avg += $aggr_ref->{ $k1 }{ $k2 }; $disk_count++; } aggregate_stats($k1); $aggregate_avg = ''; $disk_count = ''; } sub get_vol_data { my $volume_ref; my $server_volume; # print "Aggregate Check: ".$aggregate_check."\n"; $server_volume = $aggregate_check; # print "Server Volume: ".$server_volume."\n"; } foreach $line (@data) { chomp($line); my $bit = 0; if($line=~m/$server_volume/o) { $volume_name = $line; chomp($volume_name); $volume_ref->{$volume_name} = {} ; } if($line=~m/avg_latency/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/read_data/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/read_ops/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/read_latency/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/write_data/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/write_latency/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } if($line=~m/write_ops/o) { @volume_lines = split(/\s+/, $line); $volume_ref->{$volume_name}->{$volume_lines[0] } = $volume_lines[1]; } $bit = 1; if(($line=~m/^(\s)*$/o) && ($bit == 1) && ($volume_name)) { return $volume_ref; } } } sub get_aggr_data { my $aggr_ref; foreach $line (@data) { if ($line=~m/aggr/o) { get_aggregate(); $aggr_ref->{$aggregate} = {}; } if ($line=~m/^0a/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if ($line=~m/^0b/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if ($line=~m/^0c/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if ($line=~m/^1b/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if ($line=~m/^2b/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if ($line=~m/^2c/o) { $disk_count++; @lines = split(/\s+/, $line); $aggr_ref->{$aggregate}->{ $lines[0] } = $lines[1]; } if($line=~m/Aggregate/) { return $aggr_ref; } } } sub aggregate_stats() { my $k1 = shift; if($k1 eq $aggregate_check) { print $k1."=".sprintf("%.2f",($aggregate_avg/($disk_count-2)))."\n"; } } sub get_aggregate() { # Filer if ($ip eq "") { if($line=~m/aggr0/o) { $aggregate = "root"; } } }
<?xml version="1.0"?> <plugin> <!-- define service type name --> <service name="Netapp Aggregate Stats"> <!-- Log messages from script based metrics --> <plugin type="log_track" /> <filter name="service.template" value="exec:file=%script%,args=%server.host% %server.ip%"/> <config> <option name="script" description="Netapp Aggregate script" default="/home/hyperic/scripts/netapp_stats-plugin.pl"/> <option name="server.host" description="Volume Name" default="localhost"/> <option name="server.ip" description="Filer IIP" default="localhost"/> </config> <metric name="Availability" template="${service.template}:availability" indicator="true"/> <metric name="Read Data" template="${service.template}:read_data" category="UTILIZATION" units="B" collectionType="dynamic" indicator="true"/> <metric name="Read Latency" template="${service.template}:read_latency" category="UTILIZATION" units="none" collectionType="dynamic" indicator="true"/> <metric name="Read Operations" template="${service.template}:read_ops" category="UTILIZATION" units="none" collectionType="dynamic" indicator="true"/> <metric name="Write Data" template="${service.template}:write_data" category="UTILIZATION" units="B" collectionType="dynamic" indicator="true"/> <metric name="Write Operations " template="${service.template}:write_ops" category="UTILIZATION" units="none" collectionType="dynamic" indicator="true"/> <metric name="Write Latency" template="${service.template}:write_latency" category="UTILIZATION" units="none" collectionType="dynamic" indicator="true"/> <metric name="Aggregate Utilization " template="${service.template}:%server.host%" category="UTILIZATION" units="percent" collectionType="dynamic" indicator="true"/> </service> </plugin>
|
|
Mirko Pluhar says:Sep 25, 2007 10:01 ( Permalink ) |
|
|
Tal Bar-Or says:... |
|
Browse Space |
Explore Confluence |
Add Content |
|
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.4 Build:#809 Jun 12, 2007) |
Hi Dan,
very very nice ! I don't even own a filer, but monitoring SAN components is very important in large environments.
Are you interested in checking your filers via SNMP ?