Situation
Giggle.com is a rapidly growing social networking site. The site has 2 million unique visitors and 35 million total
visits per month. The user base has doubled in the last three months, and the company deploys new dual-processor
servers at a rate of 5 per week.
Problem
Kim Tolliver, Director of Operations for Giggle, is responsible for data center provisioning, and is concerned with
maintaining service levels during explosive growth while managing to a tight equipment budget.
Giggle's objective is to maintain an average CPU load level of less than 3 across all servers in the infrastructure, so
that the load does not exceed 1.5 times the number of CPUs in each server. Kim needs an at-a-glance view of current
and historical CPU load averages.
Track CPU Load Average
In IQ, Kim selects Chart Selected Metrics for All Resources report from the Trend Analysis and Capacity Planning
folder and sets the following parameters:
- Range of Days - Controls the number of days of history in the report. Kim sets it to 60 days.
- End Time - Controls the end date of the historical range; the default value is the current time. Kim does not
change the default.
- Metrics - Controls the metric(s) to be charted. Kim selects "Load Average 5 Minutes".
Kim clicks Run Report to generate the report. The results look similar to this illustration, where each line represents
a resource.

The report shows that the load metric values peaks every 7 days, approaching or exceeding 3, and then drops down.
Kim knows that the weekly decrease results from new server deployments that distribute the workload. She is
concerned to see load average trending upwards, and that the minimum load value over time is increasing as well.
Each time the load drops after server provisioning, it drops to a higher level than the previous drop - so the weekly
low is steadily increasing. Soon, the CPU load will not only spike above 3, it will never fall below it.
Correlate Load with Running Processes
Kim wants to know what activity drives the load up. In the Giggle environment, the quantity of a particular daemon
process correlates to the number of concurrent users on a server.
Given the rapid increase in users, Kim speculates that each server is handling more concurrent users than previously.
She decides to look at how the number of daemon processes corresponds to the CPU load. (She can do so, because
Giggle already uses a custom plugin to report a "Number of Daemon Processes" metric.)
In IQ, Kim selects Correlate Two Aggregate Metric Values for All Resources report from the Trend Analysis and
Capacity Planning folder and chooses the following report parameter values.
- Range of Days - Kim set the report range to 60 days.
- End Time - Kim want the see the trend over the immediate previous 60 days, so she accepts the default End
Time, which is right now.
- First Metric - Kim chooses "Load Average 5 Minutes" as the first metric to chart - the left-axis of the chart
will be the Y-axis for the metric.
- Second Metric - Kim chooses "Number of Daemon Processes" - the right-axis will be the Y-axis for this
metric.
- Avg/Min/Max - Kim chooses "Average" - the metric values will be averaged across all resources.
She clicks Run Report. The results look similar to the following illustration, where each line represents the average
metric value across all resources collecting the metric.

The report shows that the trends for the two metrics are closely correlated. Like CPU load, the Number of Daemon
Processes peaks, then drops, on a weekly basis when new servers are deployed. The weekly highs and lows - in terms
of daemons per server - is steadily increasing. Servers are running an increasing number of daemons over time.
Analyze and Act
Kim wants to smooth the CPU load trend and keep it below the target value of 3. She identifies these options:
- Scale horizontally - Reduce the number of daemons per server by increasing the rate of new server
deployment to more than the current 5 per week.
- Scale vertically - Increase the number of processors in existing and new servers so that more daemon
processes can run on each server and still lower the CPU load value.
- Improve daemon efficiency - Ask the developers to increase the performance and efficiency of each daemon
process so that fewer daemon processes are required on each server.
Next Steps
- Scheduling - Kim decides to review the load analysis reports every week from now on. She creates a report
version, saving the parameters she used, and scheduling the reports to be run and emailed to her once a week.
- More analysis - To make sure she has the whole picture Kim checks to see if there are specific servers that
skew the aggregate load average. She runs the Rank Resources by Selected Metric Value report to find for
the "Load Average 5 Minutes" metric for all top 5 platforms.
- Alerting - Kim decides to set a Resource Type Alert so that she will be notified whenever CPU load exceeds 3.