Score:0

Mitigating Windows Server Update risk?

bf flag

I'm new to the IT industry; I'm not a sysadmin...I'm a road construction guy by trade. So let me know if I've mixed up any terminology, concepts, etc.


My organization's IT department is very small. We seem to be barely treading water when it comes to keeping our IT systems "up".

The thing that causes the most system outages is Windows Server updates. Windows updates seem to bring down our application servers about once per month.

The updates are scheduled to happen automatically -- after-hours on a monthly basis.


Here's an example of a recent outage:

  1. The application server that contains the WebSphere JVMs for our work order management system was automatically updated late at night (via the 1-month schedule).

  2. Today, when users started using the work order management system, we got a bunch of integration/java errors from the JVM.

  3. When investigated, it was clear that issue was caused by the updates. We've never had that issue before, and it happened right after the updates were applied.

  4. Our sysadmin restarted the server, which seemed to solve the issue right away. We dealt with the failed integration messages, etc. and life went on (at least until the next update happens).


I know that my organization isn't the only one that struggles with with windows updates issues. It seems to be a pretty widespread problem.

But my question is:

Are there techniques for handling updates that my organization might have overlooked?

For example, it occurred to me that we could:

  1. Manually apply the updates and restart the servers on Saturday mornings
  2. Thoroughly test all of our systems
  3. Have all weekend to deal with issues and restart servers if we need to (instead of doing it live during business hours).

Are there any standard practices like that for mitigating the risk that comes with updating Windows servers?

anx avatar
fr flag
anx
The *"right way"* might well have very little to do with the windows OS and very much with improving the general quality of the business-critical software that runs on top. You see, software targeting the windows OS should not fail on things that are normal during operation of the windows OS. From the outside it is hard to tell though, experienced admins tend to be more careful about "it happened right after X, so it must be caused by X" conclusions.
Score:1
ng flag

Sorry to hear about all the errors in your production environment due to the Windows server updates. I dont work with windows servers but almost pretty sure those updates can be disabled to do not apply automatically. What normally happens is that there are two environments one for testing and one for production so you dont deploy anything that has not been tested first in your test enviroment.

You may also find interesting topics such as DevOps ITIL Cloud Computing.

If the information was helpful please don't forget to up vote or accept the answer.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.