Friday, October 9, 2009

Youdunit: embarrassing computer Snafus; learn from this collection of "IT anonymous" blunders.(STRATEGIC DIRECTIONS).

Wal-Mart.com USA, LLC



Microsoft Store



ArabicChinese (Simplified)Chinese (Traditional)DeutchEspanolFrenchItalianJapaneseKoreanPortugueseRussian



A famous poet once said, "None of us is as dumb as all of us." When applied to technology, the quote reminds us that in the battle of man versus machine (in this case the computer), man (and woman) generally emerge as "dumb."

A technology mistake can be as simple as failing to plug something in or as complicated as misunderstanding how to use a programming language efficiently. "Did you hear the one about the guy whose stapler was heavy enough to activate the field exit key?" morphs into "The IBM i Reclaim Storage command is an overused approach for managing the health and correctness of the IBM DB2 for i system cross-reference."


So, if you've ever made a computing mistake, laughable or costly, hang on as these authors and readers share some of their unfortunate boo-boos. By understanding their blunders, you can avoid making your own.

The 911 Call Center Design Disaster

Decisions made by committee can be especially dangerous, as there is a tendency for everyone to think someone else on the team is paying attention to everything. We experienced this in a big way with a 911 call center design disaster. A life-critical resource such as this must have redundant systems, and we proposed them in spades: dual UPS systems, backup power generator, fully symmetric network redundancy, RAID disk storage--you name it, we added it. As we discussed which devices should be powered on the generator-backed circuits, one of the committee members mentioned emergency lighting. We analyzed the power draw of the special battery-powered emergency lights we planned to install. Could the generator handle it? Yes, it could, we decided. We could even power the emergency lights down the hall to the exit doors, an added safety bonus. Committee members summarily signed off on the emergency lights, convinced that they had total generator backup power protection.



An architect, a general contractor, an electrical engineer, and two electricians reviewed the design. The electrical engineer double-checked our power calcula tions and recommended sizing the generator slightly larger--we had neglected to allow for the emergency light battery recharging current. Finally, we constructed the call center and installed everything exactly to plan--including the emergency lights.

Redundancy needs to be tested, of course, so we scheduled an all-hands dry run of the system. At the appointed time, with staff members, contractors, and managers all crowded into the gleaming new 911 call center, we threw the switch to cut the utility power. Immediately, the main lights went out, but the emergency lights came on and the 911 computer systems switched to their battery backup while the generator started up. Seconds later, the generator thrummed to life, but the lights went out. The eerie glow from equipment bulbs and computer screens illuminated our surprised faces, but the emergency lights had failed, plunging the room into gloomy darkness.

Five minutes went by before someone figured it out: Emergency lights remain on only if they have no AC power input. That condition lasted just for the time it took the generator to come online--perhaps 30 seconds. With the generator happily feeding 120V AC to the emergency lights, the lights figured the emergency was over and promptly shut off. Putting the emergency lights on the generator was exactly the wrong thing to do!

--Mel Beckman System MEWS Senior Technical Editor

The Case of the Obvious Names

I once blew away, without realizing it, months' worth of important data at a client location. I had used an SQL script from a PC-based SQL utility to delete records before and after testing. The client had an i for production and one for test. The IPs for the systems were off by one. I thought I remembered which one was for production, but I ran the SQL against the wrong IP I felt pret ty safe because the library name was CONVDTALIB. What could go wrong with deleting records from a "conversion" library? The big problem was that the client did not see the issue until the month-end processing. By then the recovery process was extremely complicated.

What did I learn? Always use logical names rather than IPs for your systems--obvious names such as Production and Test--and don't assume anything from the object names of existing applications.

--Don Denoncourt System iNEWS Technical Editor

(Editor's Note: Contributors to the System iNetwork and Midrange-L forums provided the stories that follow.)

The Disappearance of QSYS/QCMD

A couple years ago while I was temporarily signed on with *ALLOBJ authority, I deleted QSYS/QCMD--the core of the i operating system. I had created a program called QCMD in QTEMP and had erased it from the command line using DLTPGM QCMD. (I know, really bad idea to create a program called QCMD and to delete anything without qualifying it first.) Later on, I used F9 to go back and repeat previous commands. I pressed F9 several times and then pressed Enter without seeing that the machine had slowed down/locked up. When it came back, I realized that I'd pressed F9 one time too many.

Slow motion followed. It took a few seconds for QCMD to disappear, and the computer was finally able to send the "Program QCMD in library QSYS deleted" message. However, as soon as I tried to do anything else, it all went horribly wrong.

The Over-Ticking Realtime Clock

I used to help a customer who had a batch program that consumed 30 percent of the CPU (a model 810 with a 700 CPW) while it waited to process its data during a five-minute period. This meant that the job lingered until every five-minute mark of the realtime clock (8:05 a.m., 8:10 a.m., 8:15 a.m., and so on) to process the information, which generally took less than one minute to finish. Why was there 30 percent CPU consumption during that wait period?

The problem was twofold because of the 60 percent CPU consumption from two such batch jobs that were supposed to be idling while running the same program to process data from two different sales coverage areas. On many occasions during the day, the CPU showed ++++ in the WRKSYSSTS screen when a few more users ran their queries.

After completing a source code review, I found that the program read the i realtime clock and compared that time with the schedule data row stored in a file (e.g., "Is it now 8:05 a.m. so that I can do my job?"). If the time did not match the schedule, it immediately went back into the program loop that noted the time and compared again. Reading the realtime clock from the i repeatedly in a loop without any delay cycle consumed quite some CPU cycles!

I asked the customer to insert a DLYJOB for 15 seconds if the schedule mark was not met before the program went back into the loop so that he could read the time and compare it again. The CPU consumption of this program dropped to zero during the wait period. There was no need to read and compare the time more than once every 15 seconds or so for such work.

The laughable turn for this case was that the manager of my customer's IT shop did not deploy this simple change for another two years, citing a pending user acceptance test.

Where Oh Where Did the "Where" Go?

When I worked at a pension fund office, we had to apply a tax rule to the accumulated benefits field of particular participants who had met certain conditions, and we had to adjust the value in the field appropriately. This was a change to be made to the participant master file, where we kept the accumulated benefits field. This, of course, was one of the major files of the whole application.

We finally developed the SQL statement to use to make the update and tested it thoroughly. We gave the SQL statement to the operator and asked him to run it in the production environment at 4:50 p.m.

At 4:55 p.m., the operator approached me with a hard copy of the email I sent him containing the SQL statement. He said, "I ran this SQL statement, but was I also supposed to type in this 'where' part? I didn't do that."

My supervisor and I worked until about 10 p.m. restoring the participant master file. Did I mention that we had a blizzard that night?

The Vanishing Service Program

Once, as a consultant, I perused the *SRVPGM objects listed in QSYS. Dumb, I know. I scrolled along, typing 5 and pressing Enter a couple times to get to the list of modules. I worked up a rhythm doing this, keying 5 and pressing Enter, when, of course, I accidentally keyed 4 and ended up deleting a service program in QSYS before I even realized that the delete confirmation screen had appeared.

No big deal, I thought, I'll just do a restore from backup. Unfortunately, the restore command used the service program. Also, the compile commands used the service program, so all the other programmers found out that I had done something bad. Not good. Eventually, one of the other consultants figured out how to Ex it. To this day, he hasn't told me what he did. I think he's still mad.

The Runaway Keyboard

A young programmer somewhat new to the i needed a command to do something. He knew the first part of the command's name but not the rest of it. He typed SLTCMD ASN* and pressed Enter, thinking he would receive a list of all the commands that started with ASN. Unfortunately, he accidentally typed a "D" instead of an "S" on the command line, making it DLTCMD ASN*. You can imagine the rest. In fairness, the "D" key is right next to the "S" key on the keyboard.

The Runaway Keyboard Redux

I had to train a programmer fresh out of technical school to use CL. I gave him Ernie Malaga's book Complete CL and told him to read the first few chapters. The introduction explains typesetting and what italic, boldface, and so on mean. Unfortunately, Ernie used the PWRDWNSYS command with OPTION(*IMMED). Also unfortunate was that the only available terminal that day was the system console. As I sat in my office talking to a customer, my terminal suddenly went belly-up. I rushed to the programmers' room and asked if any of the programmers were experiencing problems. At the far end of the space I heard "Yeah, I just went down too." The next programmer, then the next, and then the rest of the programmers in sequence announced the same. I had to admonish the new programmer-in-training to read the book, not type the book.

The Stolen Password

Someone broke into one of our sales offices and stole several laptops. The incident occurred in the middle of the night, but our employees didn't discover it until they opened the office building in the morning.

One of the salesmen immediately called me at home, waking me up. He reported the theft and asked me to change his password. I did so, and the salesman promptly called me again, announcing angrily that his password no longer worked. "I don't want it changed on the whole network," he whined, "just on the stolen computer."

The Slippery Screwdriver

Back in the S/38 days, we had an IBM IT pro who came once a month just to make sure that everything operated properly. On one of his visits, we reported that one of the lights on the panel was lit. The 8-inch floppy disk drive on the S/38 was on the left. When the IT pro leaned over to take a look, the screwdriver in his pocket dropped into the open drive bay. We ended up with new disk drives, new cards, and a reload that day.

The Mystery of the Magnetic Field

We had just installed our first UPS to support our B50. It was the size of a small refrigerator and contained approximately 30 batteries, each about the size of an average car battery. The company president stopped by one morning to see the new UPS hardware and talk about other management information system issues. He leaned up against the front of the UPS to participate in the conversation.

When the president went to lunch after the discussion, he discovered that his credit card didn't work. Later we learned that all the cards in his wallet containing magnetic stripes had been wiped clean in the short time he spent leaning on the UPS.

The Riddle of the Sleeping S/38 and the Scrambled Diskettes

An S/38 at an insurance company in the mid-1980s developed a habit of dying in the middle of the day for no apparent reason. This created a huge problem because it shut down the whole business, and the average time to rebuild the database access paths on that machine was 11-13 hours. Several months of this dilemma frustrated the organization, and we were ready to rip out the S/38 and replace it with anything.

An IBM IT pro stopped by multiple times and replaced every card and component he could think of, but nothing seemed to ease the problem. The randomness of it baffled us. Any thought of blaming it on the operations manager (who worked on the day shift) was futile because the shutdown occurred even when he was off duty.

Meanwhile, at a programming school I attended, several people accused operations students of deleting the source code of programming students, who typed their code at a keypunch machine and wrote the data over to 8-inch diskettes. The operations students were responsible for loading the data to the mainframe. On several (also random) occasions, programming students complained that their source code was missing or corrupted after this process.

Finally, the operations instructor solved the riddle. He noticed that operations students sometimes placed a large, heavy telephone on top of the diskettes. If a call came in, the ringing of the phone emitted enough of a magnetic discharge to scramble the data on the diskettes. The instructor tested his theory several times and conclusively proved that the phone indeed caused the problem for the students.

The same day that the instructor solved the mystery of the damaged diskettes, I walked into the computer room at the insurance office and noticed the same kind of big, black telephone sitting on top of the S/38 between the diskette magazines and the display panel. I mentioned to the operations manager that a similar phone caused the data loss at my programming school, but he assured me that the telephone at the insurance company could not possibly be the source of the S/38 outages. However, he never put the phone on top of the diskettes again, and the problem never recurred.

The Paper Disk Caper

A customer reported that he was having a problem installing software from a diskette we had sent to him. We asked him to make a copy of the diskette and return it to us. A couple days later, we received a photocopy of the diskette.

The Tell-Tale Characters

One employee called IT from time to time reporting that characters she did not type appeared mysteriously on her screen. After careful observation, we determined that the buxom worker should lean back from her desk, raise her chair, or lower her keyboard. I don't think anyone ever really explained the true nature of the "problem" to her.

Vicki Hamende

Vicki Hamende is senior editor of System iNEWS.

Source Citation:Hamende, Vicki. "Youdunit: embarrassing computer Snafus; learn from this collection of 'IT anonymous' blunders.(STRATEGIC DIRECTIONS)." Iseries News (Dec 2008): 33(4). General OneFile. Gale. Alachua County Library District. 9 Oct. 2009
.



Personalized MY M&M'S® Candies

(Web-Page) http://computer.tutor2008.googlepages.com/tutor2




(Album / Profile) http://www.facebook.com/album.php?aid=5745&id=1661531726&l=970be7e401














(Album / Profile) http://www.facebook.com/album.php?aid=5745&l=970be&id=1661531726

Click here for the Best Buy Free Shipping Offers


Shop the Official Coca-Cola Store!

leonard.wilson2008@hotmail.com

No comments: