Category: Higher Ed

  • Crowdstrike Outage “Not What You Thought”

    Crowdstrike Outage “Not What You Thought”

    It’s been six months since the Crowdstrike outage – enough time to reflect on the incident and take stock. I had lunch with my CISO about a week after the outage. It was the first time we had seen each other in several weeks. “So,” I asked sheepishly, “how have you been since the outage?” “I’ve been fine. But the Service Desk has been swamped. Since my security team wasn’t that busy, we pitched in to help remediate the outage. They touched 15,000 servers and client machines in three days.” I inquired further. His role focused on the management of encryption keys that were necessary to unlock and manually patch the operating systems of the affected machines. “The hard part of the recovery was managing the keys,” he said. As his team was jointly responsible for the security of those keys, that was the extent of his involvement. You see, Crowdstrike pushed a bad patch – one file – but an important one that loads at the kernel level. This caused all of those Windows machines to “blue screen.”

    Something didn’t compute. I thought he was going to be falling asleep at the table, eyes bloodshot, bags under them, a quart jug of coffee in his hand. Instead, he seemed rather chipper. Then it hit me. This wasn’t a security incident. Rather, it’s what we call in ITSM a deployment and release management issue. It’s not that Security Management wasn’t involved, they were. But it was apparent early in the Problem cycle that this wasn’t a cyberattack.

    The response from our university IT was quick and appropriate. Within thirty seconds of the patches being applied, customers began to call and report “blue screens.” This spawned a number of related incidents at the Service Desk. These incidents were quickly correlated into a Problem record, which was upgraded to a major incident (i.e., outage) record in less than an hour, all of this happening around midnight on July 19th. During the early morning hours, an incident response team did a root cause analysis and quickly determined the problem was a vendor patch.

    The vendor response was quick and the patch was available by early morning, although the CEO of Crowdstrike was criticized in subsequent days for not issuing a timely apology. The damage to Crowdstrike’s reputation was done. After all, the outage affected roughly 8.5 million computers. Crowdstrike was quickly seen as the responsible party and IT folks around the world became heroes as the outage response progressed. But Microsoft was also responsible for letting Crowdstrike play in the Windows kernel. Microsoft distanced themselves from responsibility by asserting, “Although this was not a Microsoft incident, given it impacts our ecosystem, we want to provide an update on the steps we’ve taken with CrowdStrike and others to remediate and support our customers.” In this instance, Microsoft was acting as an integrator, more specifically, as a Service Guardian, where they managed both a third-party vendor (Crowdstrike) and provided services (Windows). In this instance, ITIL best-practices dictate that we have a high-level of communication and trust with the integrator, but also acknowledge that our customers will hold us – not our vendors – responsible. After all, who are our customers going to blame – us or our vendor?

    I see a double failure here. Crowdstrike failed by deploying a service with a critical bug in it, which they should’ve uncovered in their acceptance testing. This is not George Kurtz’s first high-visibility failure. In 2010, he was CEO of McAfee when a similar outage occurred. The second failure was Microsoft’s mismanagement of their vendor. One may ask why they allowed a vendor to deploy a file at the kernel level without sufficient testing. You would also expect Microsoft to have caught the error prior to approving the release of the errant file. Was Microsoft’s trust of Crowdstrike so great that they didn’t do acceptance testing and simply passed the updates through? If so, they need to review their Deployment and Release Management practices. Of course, this is pure speculation.

    Meanwhile, back at “the ranch,” the IRT created a Change Request that included testing of the patch on a number of machines. Procedures to apply the patch were documented at both the individual asset level and the more strategic coordination level. On the communication side, customer communication began as soon at the Problem was identified, about an hour into the incident, with a number of communications happening in the early morning hours via IT staff in the colleges and university communications to stakeholders. Communication continued through the next few days as the incidents were remediated and non-reported servers and endpoints patched. An After Action Review was conducted less than a week after the initial incident was reported. Lessons learned were documented. DONE!! YAY!!!

    Since I retired from IT, I’m an “observer” these days and I can tell you that I don’t miss the excitement surrounding outages. Been there, done that, got the t-shirt. But I must say that I’m very proud of the way our university handled this major incident – responsive, professional, by the book. I don’t think our response would’ve been as good five years ago. We’ve come a long way in our journey in understanding ITSM.

    In summary, what ITSM practice areas were involved in this outage?

    1. Service Desk
    2. Incident Management
    3. Problem Management
    4. Continuity Management (via Major Incident/Outage)
    5. Vendor Management
    6. Asset Management
    7. Relationship Management (i.e., communication with stakeholders)
    8. Change Management
    9. Security Management (indirectly)

    This is a pretty impressive slice of the ITIL ITSM Practices for a single issue. I think our IT folks would report that we have varying levels of maturity in each of the Practice areas, but I can tell you from experience that this kind of outage hones our skills to respond better the next time. Iron sharpens iron.

  • Home for the Holidays

    Home for the Holidays

    Don’t you love spending quality time with family and friends eating and drinking lots of things you don’t need? You gotta love those conversations around the dinner table and if you’re anything like me, you’re thankful that Christmas comes but once a year. I had a number of interesting conversations this holiday, but I’m only going to share one of them.

    This conversation happened at the dinner table and was initiated by one of my “in-laws.” I won’t say which one to protect the guilty. Now that I think of it, it wasn’t much of a conversation – more like a diatribe. “Higher education is a complete waste of time and money. All they do is indoctrinate your kids and teach them a bunch of radical ideology. Nobody should go to college. Anything you need to know, you can teach yourself for free. You just have to love learning, that’s the key.” It was an interesting topic to broach when you’re sitting next to two college professors.

    Except for that last part, you can guess that I disagreed with just about all of it. But it was unsettling on a number of levels. For one, this type of argument against higher education used to be the exclusive purview of liberal-minded folks, but I’m hearing it more and more frequently from conservative types such as those at the Daily Wire. Matt Walsh is especially venomous in his attacks of “liberal college professors.”

    While there is some truth in the “irrelevance” argument against higher education, I believe that we have a lot to offer students, if we can remember what higher education is really about. But poor leadership, a decline in the classic liberal arts education, and a rash of institutions in the news lately for behaving badly have fueled the fire. Higher ed is not making a good value proposition anymore. Steeply rising costs mainly as a result of falling state and federal support over the last 35 years coupled with the explosion of the administrative university have hindered our ability to provide value. We’ve allowed others to define the goal of higher education solely as skill development. This is why for-profit, online, nimble educational corporations are beating us, at least for the moment.

    So what is education really about? I believe that colleges should re-focus on teaching the classic liberal arts education which, by definition, develops a student’s intellectual and moral character, rather than simply teaching them a set of skills. This type of education is designed to provide students with a broad understanding of the world and its history, as well as to teach them how to think critically and communicate effectively. It includes subjects such as literature, philosophy, history, and the fine arts.

    And this is important: we need to educate students in these things in addition to teaching them skills that are useful to employers. After all, skills help us succeed in the workplace, but virtue helps us succeed in life.

    What makes me sad is that our colleges seem to have lost this vision. Colleges of liberal arts are under attack and being cut at every turn. While part of this result is self-inflicted, it seems that not many college professors are interested in mentoring our students to pursue truth and virtue.

    Higher education needs to rediscover what made it great in the first place.

  • You want me to do WHAT?!?!

    Nobody likes email retention, especially me. Believe me. No … really. Why on earth would I ask you, a productive researcher with no time for chit-chat, much less time to police your email, to spend time looking at your email, pondering it, and (hopefully) deleting a bunch of it? No sane person would do that.

    As it turns out, email retention, while it can be painful, is actually a good thing, for you and for the agency. There are two main risk-related reasons to implement email retention.

    Risk Reduction

    First, email retention policies, “…are driven by the risk of TPIA requests, litigation subpoenas, and discovery requests, along with the requirement to eliminate transitory information and to properly maintain other state records,” says TAMUS General Counsel Brooks Moore, who is our System expert on this topic. “An automatic delete policy is a best practice,” he states. Furthermore, researchers that are reticent to implement email retention, “have [probably] not been involved in the voluminous email and document production from a number of TTI open records requests and discovery requests/subpoenas (guardrail litigation, etc.).  In my experience, once a researcher has experienced this, they become an advocate for automatic delete policies.”

    “An automatic delete policy is a best practice.”

    Brooks Moore, TAMUS General Counsel

    Retention Compliance

    Second, retention of important information, i.e., state records, should not be maintained in email, “but should be retained in approved systems for electronic files and state records.  An automatic delete policy encourages compliance with these requirements by forcing employees to properly file emails outside of email for continued preservation.” What are those approved systems? For TTI, OneDrive, mainly. On the flip-side of this argument, most emails are NOT considered state records, but transitory information. Transitory information should NOT be kept as it is not subject to records retention. Keeping it incurs risk as noted above.

    Reducing risk and increasing retention compliance are a one-two punch that helps keep us safe and secure.

    Related Links

  • NIS Strategic Plan Posted

    NIS recently published an updated strategic plan. High-level objectives include:

    • Move Toward a Cloud-First Environment
    • Support Mobile-First Computing
    • Support Research Computing
    • Develop NIS Staff
    • Mature IT Service Management Practices
  • Hackers Exploit Post-COVID Return to Work

    As we return to the office, cyber threat actors are changing their tactics as they did during the work from home transition. Here at TTI, we’ve seen a number of these sophisticated attacks that attempt to redirect victims to a site to enter their credentials. But the site is NOT our Microsoft login site.

    The latest scam includes pelting recipients with emails purportedly from their CIOs or other highly placed executive welcoming employees back into offices. Threat Post indicates that the fake CIO email prompts victims to link to a fake Microsoft SharePoint page with two company-branded documents, both outlining new business operations. However, if a victim decides to interact (click) on either document a login panel appears and prompts the recipient to provide login credentials to access the files.

    What can I do to stay safe?

    Whenever you enter credentials in a browser window, be sure that the address of the site is login.microsoftonline.com/… AND that there is a lock icon next to the address, like this:

    When logging on, be sure that the address of the site is login.microsoftonline.com/…
  • Baby, it’s cold outside!

    David’s back yard, 2/16/21

    Brrrrrr!

    I don’t remember it being this cold ever, and I’ve been living in Texas my whole life. Like you, I’ve been adapting to the cold weather over the last several days. My numerous cats (I won’t say how many) are doing gymnastics in the upstairs classroom, my chickens are getting warm bowls of water every few hours, and my dog is enjoying the relative warmth of the laundry room. Candles have been collected and dusted off, ready for use this evening. Dinner will be a quick Mediterranean shrimp stew using a cast iron skillet and my propane stove. All in all, we are comfortable, not too inconvenienced.

    One area this doesn’t extend to is work. I live in the country and so have very poor network connections. My house uses DSL from Centurylink (wouldn’t wish it on my worst enemy). Until recently, this was the best game in town. But my local co-op electricity provider has decided to provide fiber-based broadband to its subscribers. It advertises speeds up to 1GBps – we’ll see if they can deliver. After a series of scheduling fiascos and installing the fiber box in the wrong location, they were due to install it last Friday before all of this bad weather hit but didn’t quite make it. So it looks like I missed it by a few days. Rats!

    Even so, I had purchased wireless hotspot service on my mobile phone to use in emergencies. While the service can provide speeds up to 10 MBps, that requires a good wireless signal, and my house doesn’t always get that signal.

    I started out naively using my normal desktop workstation in my home office, but the constant on and off of the power made it clear that using the DSL/desktop solution wasn’t going to work. So, with a few rerouted wires, I hooked up my laptop to an external monitor and fired up my mobile phone hotspot – voila! Now I’m “immune” from rolling power outs although throughput is limited. When the power is out, the battery in my laptop and phone keep things going. I was sure to also hook up the chargers so that they would charge when the power was on.

    Things are still not working as well as I would like. For one, even with the hotspot connection, DNS lookups (i.e., website locations) fail regularly, the hotspot has a tendency to disconnect, and sometimes the throughput crawls to 10KBps making even simple tasks such as loading and sending email difficult.

    What’s the lesson learned? Connectivity counts! TTI has a small task force working on connectivity recommendations post-COVID. I’m looking forward to implementing those recommendations as we move forward.

    In the meantime, please pass the hot cocoa!

  • ITIL MP Transition Course

    How to Transition from ITIL v3 to ITIL v4

    TTI is partnering with Global Knowledge to host an ITILv4 Managing Professional Transition course on October 12-16, 2020. This is an online course offering. The purpose of the course is to transition those of you with ITILv3 Expert certification into the new ITILv4 Managing Professional certification.

    The cost of the five-day course is $2,300. The format is online. The cost includes a Global Knowledge instructor for five days, printed courseware, and a voucher to take the certification exam. A retake of the exam is not included. This is a 30% discount off of the regular course price ($3,250) and you don’t have to pay for travel.

    A course overview can be found here. A detailed course syllabus can be found here. You must have the ITILv3 Expert designation or have a minimum of 17 credits under the ITILv3 scheme. ITILv4 Foundation is not necessary, although helpful.

    [ninja_forms id=2]
  • Managing Effectively When You’re Working Remotely

    Mark Horstman

    One of my favorite resources is Manager Tools. I’ve been listening to their management advice for more than 15 years. No matter what the topic, these folks are all about actionable management advice.

    This week, MT released a freely available audio podcast on how to manage effectively during a pandemic. Here’s a link to Part 2. I would encourage you to check it out. Some of the larger points included in the cast;

    • Working remotely increases employee’s fear, anxiety, and mistrust.
    • The best way to combat this is through increased communication. You can do this by:
      • Continuing (or starting) weekly one-on-ones with your directs
      • Start a daily 10-minute morning “standup” via video call.
      • Communicate Aggressively
        • video call each of your directs briefly each day
        • don’t settle for text-only communications, even if your direct says it’s what they prefer