Degradation

From MidrangeWiki
Revision as of 19:38, 27 June 2005 by Al Mac (talk | contribs)
Jump to: navigation, search

We get so we are accustomed to some level of performance with our 400, then unexpectedly it is like someone has put on the brakes. Whoah, how come it is so sluggish right now? Here are links to tips and techniques what you can do when this happens, to figure out what is going on, and fix it.

Tips & Techniques

These tips can also help when you experience what might be called a slide into oblivion, in which the 400 seems to slowly but steadily appear to be running more slowly. What the heck is going on, and how do we fix it?

  • Backup Save Restore might be worth a review before any major changes get made.
  • BPCS Files can have millions of records, going back a year or more, while 90% of the users really only need the last few weeks worth when they access inquiry into those files. There are enhancements available to archive the older stuff into another library, so it is not part of the standard inquiry, but is available for reports when needed. Check BPCS-L archives for discussion of alternate 3rd party solutions for this.
  • Check Disk Space Health
  • DSPJOBTBL preferably via several benchmark check points.
  • DSPMSG QCFGMSGQ = If you create this message queue, IBM will send to it messages about hardware problems. If some work station has gone flakey, it can connect, disconnect, connect, have a string of unwanted garbage.
  • DSPMSG QSYSMSG = If you create this message queue, IBM will send to it some messages about very bad stuff, like perhaps the cache battery on your disk drive is going flakey and needs to be replaced.
  • DSPMSG QSYSOPR = is there some problem right now awaiting a response? Do you know what a runaway job is?
  • Kill Jobs Preparation
  • Locks and Deadly Locks
  • Manuals that can be helpful in this scenario:
    • Work Management
  • Performance Tuning may be needed.
  • Remote Printer Hung
  • SYSCMDUSNO = one of the CLP/400 examples. This CLP/400 program lists bad stuff that's recently been going on, such as:
    • CPF4058 = Here's a file with significant growth, better do something before it explodes.
    • CPI1479 = Your 400 has become over-taxed with interactive activity. Your choices include:
      • Grin and Bear it
      • Bare your company wallet to IBM
      • Check TIMES this is happening (my first choice)
        • If it happens same time each day, and that time coincides with shift change, or lunch break, suggest to some co-workers that if they sign on or off a few minutes before or after shift change, it might go faster.
        • If it happens same time each day, and that time is like the middle of people's work day, then use DSPLOG to see what kind of tasks typically run at that hour.
        • If you identify a particular program that seems like it might be the culprit, take a look at the files it accesses how. Perhaps there is a poorly designed join of some humongous files.
      • Analyse interactive tasks to see if any can be moved to JOBQ (my second choice)
        • Teach co-workers how to send Query to JOBQ.
      • Do something that can get you in big trouble with IBM
      • Downsize the company
      • Update your resume
  • Sub System abuse such as a batch job running in Interactive mode
  • Consider the merits of [Temporary Logicals http://wiki.midrange.com/index.php/DB2#Temporary_Access_Paths]
  • WHO BAD = CLP/400 program to display what jobs are using 3 % of system resources or more. You can customize where you do your cut-off.
  • WRKPRB = Get at list of recent events that IBM categorizes as hardware problems
  • 400 101 could be reviewed in case of any common misconceptions

Related Troubleshooting

There are other problems in which many of the same problem solving tools may need to be referenced.

  • A error occurs in the execution of a Job on a Job Queue, and is not immediately noticed by people. Other Jobs tend to pile up on the JOBQ until someone, who is accustomed to stuff going into the Q, and completing in a predictable time interval, asks a question, by which time we have a huge pile of jobs waiting, and we need to alter their sequence, in addition to dealing with the hung job.