Part 1, Disk I/O overview and long-term monitoring tools (sar, nmon, and topas)
Introduction
A critical component of disk I/O tuning involves implementing best practices prior to building your system. Because it is much more difficult to move things around when you are already up and running, it is extremely important that you do things right the first time when planning your disk and I/O subsystem environment. This includes the physical architecture, logical disk geometry, and logical volume and file system configuration.
When a system administrator hears that there might be a disk contention issue, the first thing he or she turns to is iostat. iostat, the equivalent of using vmstat for your memory reports, is a quick and dirty way of getting an overview of what is currently happening on your I/O subsystem. While running iostat is not an inappropriate reaction at all, the time to start thinking about disk I/O is long before tuning becomes necessary. All the tuning in the world will not help if your disks are not configured appropriately for your environment from the beginning. Furthermore, it is extremely important to understand the specifics of disk I/O and how it relates to AIX® and your System p™ hardware.
When it comes to disk I/O tuning, generic UNIX® commands and tools help you much less than specific AIX tools and utilities that have been developed to help you optimize your native AIX disk I/O subsystem. In this article, we will define and discuss the AIX I/O stack and correlate it to both the physical and logical aspects of disk performance. We will discuss direct, concurrent, and asynchronous I/O: what they are, how to turn them on, and how to monitor and tune them. We will also introduce some of the long-term monitoring tools that you should use to help tune your system. You might be surprised to hear that iostat is not one of the tools recommended to help you with long-term gathering of statistical data.
This article looks at the support and changes present in a beta release of AIX 7, including the ways in which the configuration of the different subsystems has changed. The main changes in AIX 7 further simplify the operation and configuration of many of the I/O subsystems, work that had originally been started in AIX 6. The result is that many of the different I/O subsystems no longer need to be enabled and configured. Instead, they are supplied in a pre-configured state and are automatically enabled and started when an application requests that functionality.
The article also concentrates on changes that will help identify and improve the subsystem you are looking to tune. The best time to start monitoring your systems is when you first put your system in production, and it is running well (rather than waiting until your users are screaming about slow performance). You really need to have a baseline of what the system looked like when it was behaving normally to analyze data when it is presumably not performing adequately. When making changes to your I/O subsystem, make these changes one at a time so that you will be able to assess fully the impact of your change. To assess that impact, you'll be capturing data using one of the long-term monitoring tools recommended in this article.
Disk I/O overview
It shouldn't surprise you that the slowest operation for running any program is the time actually spent on retrieving the data from disk. This all comes back to the physical component of I/O. The actual disk arms must find the correct cylinder, the control needs to access the correct blocks, and the disk heads have to wait while the blocks rotate to them. The physical architecture of your I/O system should be understood prior to any work on tuning activities for systems, since all the tuning in the world won't help a poorly architected I/O subsystem that consists of a slow disk or inefficient use of adapters.
Figure 1 illustrates how tightly integrated the physical I/O components relate to the logical disk and its application I/O. This is what is commonly referred to as the AIX I/O stack.
Figure 1. The AIX I/O stack
You need to be cognizant of all the layers when tuning, as each impacts performance in a different way. When first setting up your systems, start from the bottom (the physical layer) as you configure your disk, the device layer, its logical volumes, file systems, and the files and application. We can't emphasize enough the importance in planning your physical storage environment. This involves determining the amount of disk, type (speed), size, and throughput. One important challenge with storage technology to note is that while storage capabilities of disk are increasing dramatically, the rotational speed of the disk increases at a much slower pace. You must never lose sight of the fact that while RAM access takes about 540 CPU cycles, disk access can take 20 million CPU cycles. Clearly, the weakest link on a system is the disk I/O storage system, and it's your job as the system administrator to make sure it doesn't become even more of a bottleneck. As alluded to earlier, poor layout of data affects I/O performance much more than any tunable I/O parameter. Looking at the I/O stack helps you to understand this, as Logical Volume Manager (LVM) and disk placement are closer to the bottom than the tuning parameters (ioo and vmo).
Now let's discuss some best practices of data layout. One important concept is making sure that your data is evenly spread across your entire physical disk. If your data resides on only a few spindles, what is the purpose of having multiple logical unit numbers (LUNs) or physical disks? If you have a SAN or another type of storage array, you should try to create your arrays of equal size and type. You should also create them with one LUN for each array and then spread all your logical volumes across all the physical volumes in your Volume Group.
As stated previously, the time to do this is when you first configure your system, as it is much more cumbersome to fix I/O problems than memory or CPU problems, particularly if it involves moving data around in a production environment. You also want to make certain that your mirrors are on separate disks and adapters. Databases pose separate, unique challenges; so, if possible, your indexes and redo logs should also reside on separate physical disks. The same is true for temporary tablespaces often used for performing sort operations.
Using high-speed adapters to connect the disk drives are extremely important, but you must make certain that the bus itself does not become a bottleneck. To prevent this from happening, make sure to spread the adapters across multiple buses. At the same time, do not attach too many physical disks or LUNs to any one adapter, as this also significantly impacts performance. The more adapters that you configure, the better, particularly if there are large amounts of heavily utilized disk. You should also make sure that the device drivers support multi-path I/O (MPIO), which allows for load balancing and availability of your I/O subsystem.
Direct I/O
Let's return to some of the concepts mentioned earlier, such as direct I/O. What is direct I/O? First introduced in AIX Version 4.3, this method of I/O bypasses the Virtual Memory Manager (VMM) and transfers data directly to disk from the user's buffer. Depending on your type of application, it is possible to have improved performance when implementing this technique. For example, files that have poor cache utilization are great candidates for using direct I/O. Direct I/O also benefits applications that use synchronous writes, as these writes have to go to disk. CPU usage is reduced because the dual data copy piece is eliminated. This copy occurs when the disk is copied to the buffer cache and then again from the file. One of the major performance costs of direct I/O is that while it can reduce CPU usage, it can also result in processes taking longer to complete for smaller requests. Note that this applies to persistent segments files that have a permanent location on disk. When the file is not accessed through direct I/O with the IBM Enhanced Journaled File System for AIX 5L™ (JFS2), the file is cached as local pages and the data copied into RAM. Direct I/O, in many ways, gives you the similar performance of using raw logical volumes, while still keeping the benefits of having a JFS filesystem (for example, ease of administration). When mounting a file system using direct I/O, you should avoid large, file-enabled JFS filesystems.
Concurrent I/O
First introduced in AIX Version 5.2, this feature invokes direct I/O, so it has all the other performance considerations associated with direct I/O. With standard direct I/O, inodes (data structures associated with a file) are locked to prevent a condition where multiple threads might try to change the consults of a file simultaneously. Concurrent I/O bypasses the inode lock, which allows multiple threads to read and write data concurrently to the same file. This is due to the way JFS2 is implemented with a write-exclusive inode lock, allowing multiple users to read the same file simultaneously. As you can imagine, direct I/O can cause major problems with databases that continuously read from the same file. Concurrent I/O solves this problem, which is why it's known as a feature that is used primarily for relational databases. Similar to direct I/O, you can implement this either through an open system call or by mounting the file system, as follows:
# mount -o cio /u
.
When you mount the file system with this command, all its files use concurrent I/O. Even more so than using direct I/O, concurrent I/O provides almost all the advantages of using raw logical volumes, while still keeping the ease of administration available with file systems. Note that you cannot use concurrent I/O with JFS (only JFS2). Further, applications that might benefit from having a file system read ahead or high buffer cache hit rates might actually see performance degradation.
Asynchronous I/O
What about asynchronous I/O? Synchronous and asynchronous I/O refers to whether or not an application is waiting for the I/O to complete to begin processing. Appropriate usage of asynchronous I/O can significantly improve the performance of writes on the I/O subsystem. The way it works is that it essentially allows an application to continue processing while its I/O completes in the background. This improves performance because it allows I/O and application processing to run at the same time. Turning on asynchronous I/O really helps in database environments. How can you monitor asynchronous I/O server utilization? Both iostat and nmon can monitor asynchronous I/O server utilization. Monitoring asynchronous I/O and changing the parameters is only possible if you have executed an application that requires asynchronous I/O. The AIX kernel enables the asynchronous I/O components. This can lead to confusion when trying to alter parameters as the ability to change them is unavailable until the module has been loaded.
To determine whether asynchronous I/O has been enabled, you can check the output of the
ioo
command, as shown in Listing 1.
Listing 1. Checking the output of the
ioo
command# ioo -a
aio_active = 0
aio_maxreqs = 65536
aio_maxservers = 30
aio_minservers = 3
aio_server_inactivity = 300
j2_atimeUpdateSymlink = 0
j2_dynamicBufferPreallocation = 16
j2_inodeCacheSize = 200
j2_maxPageReadAhead = 128
j2_maxRandomWrite = 0
j2_metadataCacheSize = 200
j2_minPageReadAhead = 2
j2_nPagesPerWriteBehindCluster = 32
j2_nRandomCluster = 0
j2_syncPageCount = 0
j2_syncPageLimit = 16
lvm_bufcnt = 9
maxpgahead = 8
maxrandwrt = 0
numclust = 1
numfsbufs = 196
pd_npages = 65536
posix_aio_active = 0
posix_aio_maxreqs = 65536
posix_aio_maxservers = 30
posix_aio_minservers = 3
posix_aio_server_inactivity = 300
You can see from this listing that both the
aio_active
and posix_aio_active
values are set to zero. The other parameters are configurable and will become enabled when the corresponding subsystem has been used.
The aio kernel processes are now available as
aioLpool
and aioPpool
(see Listing 2).
Listing 2. aio kernel processes are available as
aioPpool
and aioLpool
.
l488pp065_pub[/] > pstat -a|grep aio
37 a 250068 1 250068 0 0 1 aioPpool
38 a 260052 1 260052 0 0 1 aioLpool
The result is that the aio system takes up less memory and process space. The tunable parameters, for example
aio_maxservers
, are now configured per CPU tunable and specify the maximum number of servers that can be created. Note that changing these values will not change the immediate number of servers available, only the maximum created by the kernel when there is existing outstanding I/O.
Additional parameters you may want to change are the maximum number of asynchronous I/O requests (
aio_maxreqs
) which alter the request queue size, and the aio_server_inactivity
which controls when asynchronous services are killed when no more requests exist.
To change the parameters, you can use either ioo or smit. You can find the asynchronous parameters within Performance & Resource Scheduling, Tuning Kernel & Network Parameters, and Tuning IO Parameters. Within smit, you can get good idea of both the current and the maximum possible values.
The iostat
-A
command reports back asynchronous I/O statistics if the kernel modules are loaded (see Listing 3).
Listing 3. iostat
-A
command
# iostat -A
System configuration: lcpu=2 drives=3 ent=0.60 paths=4 vdisks=4
aio: avgc avfc maxgc maxfc maxreqs avg-cpu: % user % sys % idle % iowait physc % entc
0 0 32 0 4096 6.4 8.0 85.4 0.2 0.1 16.0
Disks: % tm_ act Kbps tps Kb_read Kb_wrtn
hdisk0 0.5 2.0 0.5 0 4
hdisk1 1.0 5.9 1.5 8 4
hdisk2 0.0 0.0 0.0 0 0
What does this all mean?
- avgc: This reports back the average global asynchronous I/O request per second of the interval you specified.
- avfc: This reports back the average fastpath request count per second for your interval.
- maxgc: This reports back the max global asynchronous I/O request since the last time this value was fetched.
- maxfc: This reports back the maximum fastpath request count since the last time this value was fetched.
- maxreqs: This is the maximum asynchronous I/O requests allowed.
The major difference between
aio
and posixaio
is that the two involve different parameter passing, so you really need to configure both.
In AIX 7, as in AIX 6, the fsfastpath and fastpath tunables are no longer modifiable. They are now classed as restricted tunables and are set to 1 (enabled) by default. As such, they both enable asynchronous I/O requests to be sent directly to underlying disk (instead of through the corresponding subsystem and filesystem support), thus producing better performance.
One last concept is I/O pacing. This is an AIX feature that prevents disk I/O-intensive applications from flooding the CPU and disks. Appropriate usage of disk I/O pacing helps prevent programs that generate very large amounts of output from saturating the system's I/O and causing system degradation. Tuning the maxpout and minpout helps prevent threads performing sequential writes to files from dominating system resources.
You can also limit the effect of setting global parameters by mounting file systems using an explicit 0 for minput and maxpout:
# mount -o minpout=0,maxpout=0 /u
.
Since AIX 6, the I/O pacing is enabled by default on the sys0 device, but you can also control the pacing on your other drives.
Note that you can also remount existing filesystems and set the I/O pacing, which can be helpful if you want to alter the performance of a disk that is already actively providing service.
Monitoring
AIX-specific tools (sar, topas, and nmon) are available to monitor disk I/O activity. These tools allow you to troubleshoot quickly a performance problem and capture data for historical trending and analysis.
Don't expect to see iostat in this section, as iostat is a UNIX utility that allows you to determine quickly if there is an imbalanced I/O load between your physical disks and adapters. Unless you decide to write your own scripting tools using iostat, it will not help you with long-term trending and capturing data.
sar is one of those older generic UNIX tools that have been improved over the years. While I generally prefer the use of more specific AIX tools, such as topas or nmon, sar provides strong information with respect to disk I/O. Let's run a typical
sar
command to examine I/O activity (see Listing 4).
Listing 4. Using
sar
# sar -d 1 2
AIX l488pp065_pub 1 7 00F604884C00 08/11/10
System configuration: lcpu=4 drives=1 ent=0.25 mode=Uncapped
11:38:44 device %busy avque r+w/s Kbs/s avwait avserv
11:38:45 hdisk0 1 0.0 6 24 0.0 1.9
11:38:46 hdisk0 0 0.0 3 15 0.0 2.3
Average hdisk0 0 0.0 4 19 0.0 2.1
Let's break down the column headings from Listing 4.
- %busy: This command reports back the portion of time that the device was busy servicing transfer requests.
- avque: In AIX Version 5.3, this command reports back the number of requests waiting to be sent to disk.
- r+w/s: This command reports back the number of read or write transfers to or from a device (512 byte units).
- avwait: This command reports the average wait time per request (milliseconds).
- avserv: This command reports the average service time per request (milliseconds).
You want to be wary of any disk that approaches 100 percent utilization or a large amount of queue requests waiting for disk. While there is some activity on the sar output, there really are no I/O problems because there is no waiting for I/O. You need to continue to monitor the system to make sure that other disks are also being used besides hdisk0. Where sar is different than iostat is that it has the ability to capture data for long-term analysis and trending through its system activity data collector (sadc) utility. Usually turned off in cron, this utility allows you to capture data for historic trending and analysis.
Here's how this works. As delivered on AIX systems by default, there are two shell scripts that are normally commented out (/usr/lib/sa/sa1 and /usr/lib/sa/sa2) that provide daily reports on the activity of the system. The
sar
command actually calls the sadc routine to access system data (see Listing 5).
Listing 5. Example cronjob
# crontab -l | grep sa1
0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &
0 * * * 0,6 /usr/lib/sa/sa1 &
0 18-7 * * 1-5 /usr/lib/sa/sa1 &
What about something a little more user-friendly? Did you say topas? topas is a nice performance monitoring tool that you can use for a number of purposes, including, but not limited to, your disk I/O subsystem (see Figure 2).
Figure 2. topas
Take a look at the topas output from a disk perspective. There is no I/O activity going on here at all. Besides the physical disk, pay close attention to "Wait" (in the CPU section up top), which also helps determine if the system is I/O bound. If you see high numbers here, you can then use other tools (such as filemon, fileplace, lsof, or lslv) to help you figure out which processes, adapters, or file systems are causing your bottlenecks. topas is good for quickly troubleshooting an issue when you want a little more than iostat. In a sense, topas is a graphical mix of iostat and vmstat, though with recent improvements, it now allows the ability to capture data for historical analysis.
Also useful is the topas physical hard disk output (
-D
). It shows disk statistics and can show you if a single hardware disk is being hammered and would benefit from having filesystems or information spread and moved over other disks. You can see a sample of the output in Figure 3.
Figure 3. Sample output for disk statistics
In particular, you should check the ART/AWT and MRT/MWT which show the average and maximum wait times for reads and writes to the disk. High values indicate a very busy disk. The AQW shows the average number of queues waiting per request to the I/O device. Again, high values may indicate a disk that is unable to keep up with the demands being requested of it.
This is nmon (my favorite AIX performance tool). While nmon provides a front-end similar to topas, it is much more useful in terms of long-term trending and analyses. Further, it gives the system administrator the ability to output data to an Excel spreadsheet that comes back in charts (tailor-made for senior management and functional teams) that clearly illustrate your bottlenecks. This is done through a tool called nmon analyzer, which provides the hooks into nmon. With respect to disk I/O, nmon reports back the following data: disk I/O rates, data transfers, read/write ratios, and disk adapter statistics.
Here is one small example of where nmon really shines. Say you want to know which processes are taking most of the disk I/O and you want to be able to correlate it with the actual disk to clearly illustrate I/O per process. nmon usage helps you more than any other tool. To do this with nmon, use the
-t
option; set your timing and then sort by I/O channel.
How do you use nmon to capture data and import it into the analyzer? Use the
sudo
command and run nmon for three hours, taking a snapshot every 30 seconds: # sudo nmon -f -t -r test1 -s 30 -c 180
. Then sort the output file that gets created: # sort -A testsystem_yymmdd.nmon > testsystem_yymmdd.csv
.
When this is completed, ftp the .csv file to your PC, start the nmon analyzer spreadsheet (enable macros), and click on analyze nmon data. You can download the nmon analyzer from here.
Figure 4 provides a disk summary for each disk in kilobytes per second for reads and writes.
Figure 4. Disk summary for each disk in kilobytes per second for reads and writes
Conclusion
This article addressed the relative importance of the disk I/O subsystem. It defined and discussed the AIX I/O stack and how it related to both physical and logical disk I/O. It also covered some best practices for disk configuration in a database environment, looked at the differences between direct and concurrent I/O, and also discussed asynchronous I/O and I/O pacing. You tuned your asynchronous I/O servers and configured I/O pacing. You started up file systems in concurrent I/O mode and studied when to best implement concurrent I/O. Further, you learned all about iostat and captured data using sar, topas, and nmon. You also examined different types of output and defined many of the flags used in sar and iostat. Part 2 of this series will drill down to the logical volume manager layer of the AIX I/O stack and looks at some of the snapshot-type tools, which help you quickly access the state of your disk I/O subsystem. Part 3 will focus primarily on tracing I/O usage using tools, such as filemon and fileplace, and how to improve file system performance overall.
0 blogger-disqus:
Post a Comment