Sunday 20 August 2017

AIX System Resource Controller (SRC) Overview


The System Resource Controller (SRC) provides a set of commands and subroutines to make it easier for the system admins/users to create and control subsystems.

What is Subsystem?
A subsystem is any program or process or set of programs or processes that is usually capable of operating independently or with a controlling system.A subsystem is designed as a unit to provide a designated function.

Features of  System Resource Controller (SRC):
  • Subsystem is a set of related programs designed to perform one particular function.
  • The subsystems can be sub divided into subservers(daemons).
  • SRC helps you to manage the whole subsystems and their respective subservers by creating subsystem groups.
  • SRC allows us to stop, start, trace, list and refresh subsystems and subservers (daemons).
  • SRC is started during the system initialization with a record for /usr/bin/srcmstr daemon in the /etc/inittab file.
Basic Components of SRC:

There are three basic components of SRC
Subgroup –> SubSystem –> SubServer (daemons)

Subserver
A subserver is a program or process that belongs to a subsystem called as deamons as well. 
Eg: sshd,ftpd 

Subsystem
A subsystem can have multiple subservers and is responsible for starting, stopping, and providing status of subservers.
Eg: gated,inetd,named etc.,

Subsystem Group
A subsystem group is a group of any specified subsystems. Grouping subsystems together allows the control of several subsystems at one time. 
Eg: TCP/IP,  Network Information System (NIS), and Network File Systems (NFS).



In the above example 
Subsystem group "tcp/ip" under which there is subsystem "inetd" under which there is a subserver called "ftp" here.

Subsystem Operational  Commands: 

ItemDescription
srcmstr Deamon starts the System Resource Controller
startsrc Starts a subsystem, subsystem group, or subserver
stopsrc Stops a subsystem, subsystem group, or subserver
refresh Refreshes a subsystem
traceson  Turns on tracing of a subsystem, a group of subsystems, or a subserver
tracesoff Turns off tracing of a subsystem, a group of subsystems, or a subserver
lssrc  Gets status on a subsystem.

Command Description
lssrc -a  To list the status of all subsystems
lssrc -h node1-a To list the status of all subsystems  on foreign host node1
lssrc -s inetd   To list the status of the subsystem inetd
lssrc -g tcpip   To get the status of the subsystem group tcpip 
startsrc -s inetd  To start the subsystem inetd
startsrc -g tcpip  To start the subsystem group tcpip
stopsrc -s inetd  To stop the subsystem inetd (If process is under srcmstr. ie PPID of process=PID of srcmstr)
stopsrc -g tcpip   To stop the subsystem group tcpip
refresh -s nfsd    To refresh nfsd subsystem
refresh -g tcpip   To refresh tcpip subsystem group
lssrc -p [PID of process]To get  status of the subsystem by process ID 
kill  [PID of process] To kill a process that not started by srcmstr 
Subsystem Config Commands:

mkssy ==> Create Subsystem
chssys  ==> Chnage or modify Subsystem Parameters
rmssys  ==> Remove Subsystem
# mkssys -p /usr/sbin/sshd \   /* Absolute path to the subsystem executable
                                  program. */
         -s sshd_adm \         /* Name that uniquely identifies the subsys. */
         -u 0 \                /* User id for the subsystem. */
         -a "-D -f /etc/ssh/sshd_config_adm" \   /* Arguments that must be
                                                    passed to the command. */
         -e /dev/console \     /* Where the subsystem standard error data is
                                  placed. */
         -i /dev/console \     /* Where the subsys. standard input is routed. */
         -o /dev/console \     /* Where the subsys. standard output is placed. */
         -R \                  /* Subsystem is restarted if the subsystem stops
                                  abnormally. */
         -Q \                  /* Multiple instances of the subsystem are not
                                  allowed to run at the same time. */
         -S \                  /* Subsystem uses the signals communication
                                  method. */
         -f 9 \                /* Signal sent to the subsystem when a forced
                                  stop of the subsystem is requested. */
         -n 15 \               /* Signal sent to the subsystem when a normal
                                  stop of the subsystem is requested. */
         -E 20 \               /* Execution priority of the subsystem. */
         -G ssh \              /* Subsystem belongs to the group specified. */
         -d \                  /* Inactive subsystems are displayed when the
                                  lssrc -a command request is made. */
         -w 20                 /* Time, in seconds, allowed to elapse between a
                                  stop cancel (SIGTERM) signal and a subsequent
                                  SIGKILL signal. */
Check the service's configuration:

# lssrc -S -s sshd_adm                
#subsysname:synonym:cmdargs:path:uid:auditid:standin:standout:standerr:action:multi:contact:svrkey:svrmtype:\
 priority:signorm:sigforce:display:waittime:grpname:
sshd_adm::-D -f /etc/ssh/sshd_config_adm:/usr/sbin/sshd:0:0:/dev/console:/dev/console:/dev/console:-R:-Q:-S:0:0:\
20:15:9:-d:20:ssh:

# odmget -q subsysname=sshd_adm SRCsubsys

SRCsubsys:
        subsysname = "sshd_adm"
        synonym = ""
        cmdargs = "-D -f /etc/ssh/sshd_config_adm"
        path = "/usr/sbin/sshd"
        uid = 0
        auditid = 0
        standin = "/dev/console"
        standout = "/dev/console"
        standerr = "/dev/console"
        action = 1
        multi = 0
        contact = 2
        svrkey = 0
        svrmtype = 0
        priority = 20
        signorm = 15
        sigforce = 9
        display = 1
        waittime = 20
        grpname = "ssh"

Thursday 7 May 2015

Default ports of ITM components

1) PORT 1920:

By default, a http port 1920 is allocated during starting of the first ITM component. This port is used to serve service console request as well as the TEPS and SOAP server request. if this port is not available, a random port is allocated and used for http request.

The owner of the base HTTP port 1920 will redirect the calls to the random http port allocated by the other ITM components. The same can happen to the HTTPS port 3661.

Note that the allocated random http port is bound to 1920, so the http request is served using a random port as well as 1920

The random http and https ports allocated by the ITM components can be identified from the RAS1 log of the ITM component

(4FEDD526.0015-32B4:kdhslqm.c,349,"add_listener") listening: ip.tcp.http:59029
(4FEDD526.0027-32B4:kdhslqm.c,349,"add_listener") listening: ip.ssl.https:59031
(4FEDD526.0013-32B4:kdhslqm.c,349,"add_listener") listening: ip6.tcp.http:59028
(4FEDD526.0021-32B4:kdhslqm.c,349,"add_listener") listening: ip6.ssl.https:59030

2) PORT 3661:

by default, a https port 3661 is allocated for secure http request during starting of the first ITM component. This port is used to serve service console request as well as the TEPS and SOAP server request. if this port is not available, a random port is allocated and used for https request.

3) PORT 4096:

A basic services port is allocated which is based on ITM port allocation algorithm
(well-known port + count * 4096)

where the well-known port is the port number assigned to the monitoring server . The default port number assigned to monitoring server is 1918.

Let's say, if the well-known port assigned to monitoring server is 1918, the first started monitoring agent gets 1918+4096=6014, if the port 6014 is busy then 1918 + 2 * 4096 = 10110 port will be allocated to the monitoring agent and so on.. if the port 6014 is RESERVED, then the agent won't start and fail

4)PORT 15001:

In addition TEPS uses 15001 for its own purpose.

NOTE : all the above ports can be controlled using POOL parameter and the explanation of POOL parameter is not the scope of this technote

5) PORT 9999:

By default, eclipse help server uses port 9999. This can be changed by reconfiguring the eclipse help server

6) Loopback Ports:

Also, whenever the ITM components including agent started, there are few outbound loopback ports allocated. Below is the some example for loopback ports:

127.0.0.1.1052
127.0.0.1.1053
127.0.0.1.1054

7) PORT 1978:

Default port 1978 is used by the remote deployment process

NOTE: The loopback and the default remote deployment ports cannot be directed to specific port range

Saturday 2 May 2015

Where does my space gone in AIX/Linux filesystem ?


One of my friend got a situation where in she is seeing 9 GB allocated to one of the filesystems which is 100%utilized  but actual usage is 4GB  when verified with "du" command
#df -k  /mytest
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/mytestlv  9216 9216 100% 48804 12% /mytest 
She was wondering where does the other 5GB gone . 

Reason: 

This situation happens  when a  process is opening a file and dumping data into it and the file is removed while said process still has file open.So called process still holds that file space even file deleted.

How to Rectify ?

At first you need to check  what are all the  processes using  a particular filesystem using "fuser" command.
# fuser -c /mytest
/mytest:  2567 4006c 6548c 8657
You need to kill the above process if you want to free up the space.

Note: You need to inform the respective application owner/support team and take the application down time if this file-system is used by any-application.

How to Kill the proceses ?

# fuser -kc /mytest 
This  will kill  all the processes and  space will be freed up.

Check the space now 
#df -k  /mytest
Filesystem 1024-blocks Free %Used Iused %Iused Mounted on
/dev/mytestlv  9216 4096 45% 48804 12% /mytest 

Saturday 18 April 2015

MS Word Secret Code

I think most of you all know about this but I wanna  remind again.

Do you want a Word document full of text without typing it all?

This is useful specially for print test purpose.
open Microsoft Word in RUN: type winword and press enter.

open a new Word document and type in

=rand(3,9)


then press''enter'

3 ==> number of paragraphs
9 ==>  number of sentences for paragraph.
Its your choice  to choose the number of paragraphs and sentences.








After Enter  you would see this 


Sunday 11 January 2015

Not enough free space to shrink the file system issue in AIX


Recently got an issue in reducing jfs2 filesystem  with osverion 6.1 and have enough space to reduce filesystem.
root@umaix /tmp>df -g /orafs1
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/oralv1   100.00    75.00   25%      555     1%  /orafs1

root@umaix /tmp>chfs -a size=-15G /orafs1
chfs: There is not enough free space to shrink the file system.
This issue will occur whenever you try to reduce big chunk of data ( in this case 15GB) that may not be contiguous in the file-system because you have files scatted everywhere.

Try   the following methods one by one until your issue fixed

1. Try to defrag the FS:

#defragfs -s /orafs1

2. Reduce in smaller chunks:

If you still can't reduce it after this. Try reducing the filesystem  in smaller chunks. Instead of 15G at a time, try reducing 1 or 2 gigs. Then, repeat the operation.

3. Check the processes:

Sometimes processes open big files and use lots of temporary space in those filesystem.
You could check processes/applications running against the filesystem and stop them temporarily, if you can.
#fuser -cu[x] <filsystem>

4. Move the large files and try shrink

Try looking for files large using the find cmd and move them out temporarily, just to see if we can shrink the fs without them:
#find /<filesystem> -xdev -size +2048 -ls|sort -r +10|pg

Finally the last method, the alternative approach if any one of above methods are not working then go for filesystem recreation.

==> You should be very care full , need to take fs backup and as well as approach application before removing the filelsystem.

5) Recreate filesystem:

  • - Take databackup of the fielsystem  ( very Important,dont skip this )
  •   Either you can take using your backup tools like TSM / netbackup or move data to a temporary   directory

  • - Remove the  filesystem  (  #rmfs /orafs)
  • - Create the filesystem again
  •    #mklv -y oralv1 -t jfs2 oravg 600  ( in this case we need 75GB and pp size is 128)
       #crfs -v jfs2 -d oralv1 -m /orafs1 -A yes  (create orafs1 filesystem)

  • - Restore data to the filesystem
  • - Verify fs size

  • root@umaix /tmp>df -g /orafs1
    Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
    /dev/oralv1   75.00    50.00   33%      555     1%  /orafs1

Wednesday 7 January 2015

How to mirror VIOS Boot Disk?

Here is the procedure to mirror VIOs boot disk.
# lspv
NAME             PVID                 VG               STATUS
hdisk0           00c122d4341c6e62     rootvg           active
hdisk1           00cd55a4fg6b676f     None
hdisk2           00c5524409a99b77     None
Here hdisk0 is rootvg disk , now we need to check free disk.
you can use lspv -free command to check the un-mapped free disks.
$ lspv -free
NAME            PVID                                SIZE(megabytes)
hdisk1         00cd55a4fg6b676f                     256000
hdisk2         00c5524409a99b77                     256000
So In this case, hdisk1 is free and un-mapped . So we're going to use hdisk1 to mirror with hdisk0.

Add hdisk1 into rootvg:
# extendvg rootvg hdisk1 0516-1254 extendvg: Changing the PVID in the ODM.
Now mirror the disk but defer the automatic reboot:
$ mirrorios -defer hdisk1
Now check the boot list:
$ bootlist -mode normal -ls
hdisk0 blv=hd5 pathid=0
We only have hdisk0 at the moment.  Need to add hdisk1 into this:
$ bootlist -mode normal hdisk0 hdisk1
Check that worked:
$ bootlist -mode normal -ls
hdisk0 blv=hd5 pathid=0
hdisk1 blv=hd5 pathid=0
You now have a mirrored rootvg.