Category: ITSM

  • How much is that DOGE in the window?

    How much is that DOGE in the window?

    I was re-reading a really good book this week. A quote stuck out:

    In most governmental services, there is no market to capture. In place of capture of the market, a governmental agency should deliver economically the service prescribed by law or regulation. The aim should be distinction in service. Continual improvement in government service would earn appreciation of the American public and would hold jobs in the service, and help industry to create more jobs.

    W. Edwards Deming: Out of the Crisis, 1982, MIT Press.

    This seems especially relevant this week as we had the first meeting of the DOGE Subcommittee of the US House Oversight and Government Reform Committee. This is not to be confused with President Trump’s DOGE, headed by Elon Musk. There has been a lot of ink on the nature and relationship between these two DOGEs – enough to perplex and confuse most of the American public. I’m not here to lend an opinion about the relationship or constitutionality of the two organizations. I want to focus on higher questions in light of the above quote. But it seems that we may be losing sight of the bigger picture – that one of the purposes of DOGE is to improve our government by making it more fiscally efficient.

    No Market to Capture

    “No market to capture” means no competition. No competition results in an organizational culture of complacency and mediocrity operating with increasing inefficiency and producing less valuable programs and services unless/until someone/something holds them to account. It’s clear that an organization with no competition is an abnormal condition in a capitalist society. This is the crux of the Marxist argument – that competition should be replaced with socialism and eventually communism. But the end result is centralized control of the means of production, and we have seen what kind of society that leads to.

    Deming makes a key observation. He asserts that since it has a captured market, government has an exceptional duty to deliver economically efficient services in the absence of market forces. Is our government delivering on this promise?

    Distinction in Service

    Distinction in service would seem to indicate that the efficiency and effectiveness of government programs should be exemplary. The way to achieve exemplary services in any sector is to engage in a culture of continual improvement. Since our government services don’t appear to be exemplary in many cases, is this an indication of a lack of focus on continual improvement? How do we change that?

    The first two steps of the ITIL Continual Improvement process are 1) What is the vision? and 2) Where are we now? The vision (or strategy, if you will) comes from our executive branch, i.e., the president. This is the way our government is structured, whether we like it or not. Where are we now? I would point out that the debt-to-GDP ratio of the U.S. over the last 45 years has increase four-fold. In 1980, the ratio was 31%. Today, the ratio is 120%. We can argue about whether or not this fiscal path is sustainable, but that’s not my point. It would seem obvious to anyone that our current state is not efficient and arguably not effective. It is definitely not exemplary. How do we change this?

    Continual Improvement

    Deming points out HOW this is done – by focusing on continual improvement. As an ITSM practitioner and educator, I frequently think about continual improvement and how it affects value. Having worked in the government sector, I have seen how a lack of competition can lead to complacency and mediocrity. But I’ve also seen the results of having the RIGHT people in charge. My observation is that the biggest difference between the right people and the wrong people is a focus on developing a culture of continual improvement within the organization. In the case of our government, these people understand that they have an awesome and sacred responsibility to use their position with honesty and integrity, and in so doing will earn the respect and appreciation of the American people. This is what I believe our government can and should become.

  • Crowdstrike Outage “Not What You Thought”

    Crowdstrike Outage “Not What You Thought”

    It’s been six months since the Crowdstrike outage – enough time to reflect on the incident and take stock. I had lunch with my CISO about a week after the outage. It was the first time we had seen each other in several weeks. “So,” I asked sheepishly, “how have you been since the outage?” “I’ve been fine. But the Service Desk has been swamped. Since my security team wasn’t that busy, we pitched in to help remediate the outage. They touched 15,000 servers and client machines in three days.” I inquired further. His role focused on the management of encryption keys that were necessary to unlock and manually patch the operating systems of the affected machines. “The hard part of the recovery was managing the keys,” he said. As his team was jointly responsible for the security of those keys, that was the extent of his involvement. You see, Crowdstrike pushed a bad patch – one file – but an important one that loads at the kernel level. This caused all of those Windows machines to “blue screen.”

    Something didn’t compute. I thought he was going to be falling asleep at the table, eyes bloodshot, bags under them, a quart jug of coffee in his hand. Instead, he seemed rather chipper. Then it hit me. This wasn’t a security incident. Rather, it’s what we call in ITSM a deployment and release management issue. It’s not that Security Management wasn’t involved, they were. But it was apparent early in the Problem cycle that this wasn’t a cyberattack.

    The response from our university IT was quick and appropriate. Within thirty seconds of the patches being applied, customers began to call and report “blue screens.” This spawned a number of related incidents at the Service Desk. These incidents were quickly correlated into a Problem record, which was upgraded to a major incident (i.e., outage) record in less than an hour, all of this happening around midnight on July 19th. During the early morning hours, an incident response team did a root cause analysis and quickly determined the problem was a vendor patch.

    The vendor response was quick and the patch was available by early morning, although the CEO of Crowdstrike was criticized in subsequent days for not issuing a timely apology. The damage to Crowdstrike’s reputation was done. After all, the outage affected roughly 8.5 million computers. Crowdstrike was quickly seen as the responsible party and IT folks around the world became heroes as the outage response progressed. But Microsoft was also responsible for letting Crowdstrike play in the Windows kernel. Microsoft distanced themselves from responsibility by asserting, “Although this was not a Microsoft incident, given it impacts our ecosystem, we want to provide an update on the steps we’ve taken with CrowdStrike and others to remediate and support our customers.” In this instance, Microsoft was acting as an integrator, more specifically, as a Service Guardian, where they managed both a third-party vendor (Crowdstrike) and provided services (Windows). In this instance, ITIL best-practices dictate that we have a high-level of communication and trust with the integrator, but also acknowledge that our customers will hold us – not our vendors – responsible. After all, who are our customers going to blame – us or our vendor?

    I see a double failure here. Crowdstrike failed by deploying a service with a critical bug in it, which they should’ve uncovered in their acceptance testing. This is not George Kurtz’s first high-visibility failure. In 2010, he was CEO of McAfee when a similar outage occurred. The second failure was Microsoft’s mismanagement of their vendor. One may ask why they allowed a vendor to deploy a file at the kernel level without sufficient testing. You would also expect Microsoft to have caught the error prior to approving the release of the errant file. Was Microsoft’s trust of Crowdstrike so great that they didn’t do acceptance testing and simply passed the updates through? If so, they need to review their Deployment and Release Management practices. Of course, this is pure speculation.

    Meanwhile, back at “the ranch,” the IRT created a Change Request that included testing of the patch on a number of machines. Procedures to apply the patch were documented at both the individual asset level and the more strategic coordination level. On the communication side, customer communication began as soon at the Problem was identified, about an hour into the incident, with a number of communications happening in the early morning hours via IT staff in the colleges and university communications to stakeholders. Communication continued through the next few days as the incidents were remediated and non-reported servers and endpoints patched. An After Action Review was conducted less than a week after the initial incident was reported. Lessons learned were documented. DONE!! YAY!!!

    Since I retired from IT, I’m an “observer” these days and I can tell you that I don’t miss the excitement surrounding outages. Been there, done that, got the t-shirt. But I must say that I’m very proud of the way our university handled this major incident – responsive, professional, by the book. I don’t think our response would’ve been as good five years ago. We’ve come a long way in our journey in understanding ITSM.

    In summary, what ITSM practice areas were involved in this outage?

    1. Service Desk
    2. Incident Management
    3. Problem Management
    4. Continuity Management (via Major Incident/Outage)
    5. Vendor Management
    6. Asset Management
    7. Relationship Management (i.e., communication with stakeholders)
    8. Change Management
    9. Security Management (indirectly)

    This is a pretty impressive slice of the ITIL ITSM Practices for a single issue. I think our IT folks would report that we have varying levels of maturity in each of the Practice areas, but I can tell you from experience that this kind of outage hones our skills to respond better the next time. Iron sharpens iron.

  • Lost Improvements: An Analogy to Defects

    Lost Improvements: An Analogy to Defects

    Defects are not free. Somebody makes them, and gets paid for making them.

    W. Edwards Deming

    To summarize Deming’s teaching on defects, they cost an organization thrice. First, the defect is made, which robs the organization of a “working” product or service. Second, the defect must be identified, which also takes time and resources. Lastly, the defect must be resolved, thus taking more resources away from producing non-defective products and services. If this isn’t bad enough, these costs don’t include opportunity costs which could be mitigated with improvements.

    In manufacturing (and IT ;-)), a defect happens because of a quality failure either at the source or somewhere upstream. Once a defect is built into a product, there are two ways to detect it. First, it may be detected prior to shipping. Second, the customer may see the defect, which is significantly worse from a CX perspective. To draw the analogy to lost improvements, if there is no system in place to record improvements, that’s the equivalent of allowing a defect to get to the customer. Lack of improvement causes more technical debt and operational overhead down the line and will be reflected in much of the work that is done by the organization. These defects will be visible to customers, one way or another. How does an organization create a culture of continual improvement?

    First, an organization must embrace a culture of improvement. According to ITIL4, a culture of improvement requires three things; transparency, managing by example, and building trust (CDS, 2.3.4, 2.3.8). I’ll treat these three topics in more detail in a future post, but suffice it to say that my perspective is that the former are dependent on the latter – that is, trust is the “coin of the realm” and other aspects of an improvement culture are dependent on it. For example, organizations that have a high degree of trust manifest a corresponding high level of transparency.

    Trust is the “coin of the realm” and other aspects of an improvement culture are dependent on it.

    Second, an organization must provide mechanisms for conserving, prioritizing, and executing improvement initiatives. Starting with a Continual Improvement Register (CIR) is a good first step. If systems are too proscribed, or improvement processes not defined, team members don’t feel empowered (or able) to record improvement ideas. Without improvement, the organization will continue to produce defects. Making the CIR accessible at all levels of the organization is also recommended. Appointing a small, dedicated improvement person or team responsible for prioritizing and executing on those improvement opportunities closes the loop. Communicating the status of improvement opportunities creates buy-in from the organization and keeps the suggestions rolling in. In my experience, organizations go awry in the second requirement. They may build a culture of trust and improvement, but that culture must be operationalized to realize the true benefits.

  • The Non-Technical Economy

    The Non-Technical Economy

    It seems that everything these days is about AI and how the world as we know it will end. Some are prophesying that entire swaths of our economy will be replaced by AI. Writers are lining up on both sides of the argument. As IT professionals, much has been written (or assumed) about the use of AI in IT.

    In 2019, Brian Merchant wrote, “A robot is not ‘coming for’, or ‘stealing’ or ‘killing’ or ‘threatening’ to take away your job. Management is.” Is that true?

    While most people are focusing on those things that AI can do to replace human work, there are a significant number of things that AI can’t do better than humans. Interestingly, I asked AI (i.e., MS CoPilot) for a list of things that humans do better. Here’s the list:

    1. Emotional Intelligence: Humans are naturally equipped with the ability to understand, interpret, and respond to emotions in a way that AI cannot. This includes empathy, compassion, and the ability to read subtle social cues, which are essential for meaningful human interactions.
    2. Creativity and Innovation: While AI can generate new content based on existing data, humans have the innate ability to think outside the box, imagine completely new concepts, and create innovative solutions to problems.
    3. Ethical and Moral Judgment: Humans can navigate complex ethical dilemmas and make decisions based on a nuanced understanding of morality, societal norms, and personal values. AI lacks the consciousness to make such judgments.
    4. Adaptability and Flexibility: Humans can easily adapt to new situations, learn from diverse experiences, and apply knowledge in various contexts. AI systems, on the other hand, can struggle with tasks that deviate from their training data or require adaptation to new environments.
    5. Complex Problem-Solving: Humans are capable of solving poorly defined or novel problems that require intuition, judgment, and the ability to find connections between seemingly unrelated information. AI systems are typically limited to solving algorithmic problems and can struggle with tasks that cannot be systematically defined.
    6. Physical Dexterity and Mobility: Humans have fine motor skills and the ability to navigate complex physical environments, which are challenging for AI and robotics to replicate with the same level of proficiency.
    7. Interpersonal Skills: The ability to build relationships, collaborate effectively, and navigate social networks is a distinctly human trait. AI can assist with communication but cannot fully replicate the depth of human interaction.
    8. Learning from Limited Data: Humans can learn new skills or understand new concepts from a few examples, whereas AI often requires large datasets to learn effectively.
    9. Understanding Context and Nuance: Humans excel at understanding context, sarcasm, irony, and nuanced language, which can be challenging for AI to interpret correctly.

    It’s important to recognize that AI is a tool designed to augment human abilities, not replace them. The collaboration between human intelligence and AI has the potential to enhance productivity and innovation across various fields.

    What’s interesting about this list is that most of these skills are closely related to those needed to provide excellent IT service management. As the emphasis in IT has grown over the last three decades from technical to customer-service competencies, the identification of these soft skills has been one of the ways the profession has defined and delineated itself. Take, for example, the list of skills necessary to provide excellent service desk support (ITIL4 Foundation Training, 2024):

    • Customer service
    • Empathy
    • Incident analysis and prioritization
    • Effective communication
    • Emotional Intelligence

    It would appear, at least at this moment in time, that AI will not be able to do some of the fundamental things we do in IT service management. Indeed, a survey of those industries most susceptible to “takeover” by AI include manufacturing, finance, healthcare, cybersecurity, and education. Note that these fields don’t rely heavily on stakeholder interactions to be effective.

    So why are “managers” still trying to replace us? I think the answer is that they are thinking in a binary way – either we use AI to do work or we use humans. The real answer is that AI will augment and complement humans in IT service management, not replace them. The collaboration between human intelligence and AI has the potential to enhance productivity and innovation across various fields. This is reflected in the newest ITIL4 Create, Deliver, Support curriculum which stresses the effective integration of AI, among other tools. Mature IT Managers will realize that AI is a tool that can automate steps of the value stream, but at the end of the day, customers will have better outcomes and realize more value if humans are left to do what humans do best.

  • ITIL 4 and Aggregation Theory

    ITIL 4 and Aggregation Theory

    Back in the days of ITILv3, focusing on process was the right thing to do at the time. Building out robust, documented, repeatable processes went a long way toward consistent service delivery, and for many years, this approach to service management was enough. Then in the late two-thousands, significant changes in availability of IT service suppliers and the flattening of service delivery created a situation in which our customers, who had historically been a “captive” audience, now had choices. They quickly learned that we weren’t the only game in town. They had choices from outside the organization. Enter shadow IT. Were we still relevant to our customers? If our role wasn’t service provision, what was it?

    When ITIL 4 came around, the framework transitioned from an internal process-heavy focus to an external, customer-centered focus. At the time, the shift toward customer value “felt” right, but I couldn’t put my finger on the reason why. For a number of years, I had noticed that our customers were reasonably happy with the services we provided. But when we started engaging them strategically with BRM (Business Relationship Management) by fostering a relationship in order to understand their business and what they really valued, their happiness increased significantly. This practice worked in a big way, but why?

    Today, I made a connection between the outsized results we reaped with BRM and Aggregation Theory. The basic idea of aggregation theory is that value chains have three different groups: suppliers, distributors, and consumers/users1. Before the Internet disrupted everything, distribution was expensive. Take the example of newspapers. Newspapers had to be physically distributed. Competitive advantage was gained by the distributors (e.g. New York Times, The Washing Post, etc.) integrating the suppliers (i.e., journalists). The reason this worked was because customers outnumbered suppliers. A distributor that integrated supplier relationships had a significant advantage over distributors that didn’t. This was integration up the value stream.

    Post-Internet 2.0, the cost of the customer transaction decreased to practically zero as distribution became aggregated. Using our example, newspapers moved to digital editions and the cost of distribution decreased. But along with lowering customer transaction costs came de-personalization of the relationship. I missed the sight of my paperboy meandering down the street on his bike only to toss my paper in the bushes. In the new era, customers became weary of thousands of scattershot email solicitations, the rampant buying and selling of their information, and the always annoying automated feedback requests.

    “You’ve been chosen as one of our special customers to give us feedback today. For your time and effort, you’ll be eligible to receive a totally worthless coupon that you can’t redeem unless you stand on your head, pat your belly, and cough three times.”

    Customers actually missed drop-in visits from support team members, calls from their sales reps, and conversations with the engineering teams. The ubiquity of low-value customer connections had increased the value of the personal relationship. And it wasn’t just the relationship, it was the nature of what we did for them. While we continued to provide IT services (if not all), our role had to shift to that of a strategic partner. We had to grieve that we would no longer have the exclusive affections of our customers and accept that they had become poly-amorous, so to speak.

    This is why the focus on value and relationship has taken center stage today. Successful organizations will be those that provide the best user experience. This means an increased focus on customer relationships and a careful curation of customer experiences – integrating customers down the value stream. It means continuously understanding what the customer really values. It means getting out and talking to our customers, and I don’t mean our robots talking to their robots. I mean WE have to talk with THEM.


    1Incidentally, ITIL 4 simplifies this model by describing two top-level roles: providers and consumers, and then extending the concept to the three-part model by stating that organizations are both consumers and providers. ITIL 4 focuses on the relationships between organizations in the service relationship model.

  • Van Halen Breaks AI

    Van Halen Breaks AI

    I’m a huge Spotify fan. As a GenX’er, I love ’80s arena rock – Queen, Van Halen, AC/DC, Journey, Yes – you get the picture. Spotify feeds my hard rock nostalgia hunger. Spotify released a new feature earlier this year called DJ that is essentially an AI bot that examines your musical tastes and curates sets based upon them, including new selections it thinks you may like. According to Spotify, “The DJ knows you and your music taste so well that it will scan the latest releases we know you’ll like, or take you back to that nostalgic playlist you had on repeat last year.”

    After using DJ for some time, I can attest that it’s pretty good at creating sets of my favorites but pretty bad at selecting new music. For example, I can’t stand Rap, but it keeps setting me up with Rap/Hip-Hop sets which I quickly fast-forward through. DJ doesn’t seem to think fast-forwarding is disapprobation, so after a few songs, I finally get fed up and hit the “DJ” button to explicitly show my displeasure. It makes me wonder if they are selling plays of those artists which would explain why they keep pushing genres that the AI should know I don’t like.

    Several weeks ago, DJ selected a Van Halen song from their album 1984. The song was the instrumental first track on the album. Now, I had that record on vinyl when I was a kid. I know every lyric, every note, every riff. So when I heard the instrumental track 1984, I was expecting to also hear the second track on the album, Jump. I’ve always considered 1984 as an instrumental prelude to Jump. This is a “by design” artifact of “album rock” that creates associations and meaning out of the order tracks are set on the album. If you doubt me, take a look at Pink Floyd The Wall. Other examples of instrumental preludes include Yes’ Cinema (followed by Leave It) on their 90125 album and Queen’s We Will Rock You (We Are the Champions) from their album News of the World. Regarding this last example, I always get irritated when radio stations play We Will Rock You by itself. In the 80s and 90s, they would ALWAYS play them together. Somewhere along the road, radio media forgot this tacit knowledge and wisdom.

    And here’s the point. When I heard 1984, I let it run expecting Jump to play next, but it never came. Because I let it run, in subsequent weeks, DJ kept serving that song solo, aggravating me all the more. Why didn’t DJ know that those two songs were supposed to be together? Because AI doesn’t understand tacit knowledge. Van Halen had “broken” AI.

    AI will never be able to recreate the beautiful complexity of learning through living in the world with all of the experiences – personal, communal, cultural, and national – that come with it. This is why I don’t think AI is as far along as some would have us believe. Some folks agree with me. While AI may be helpful in many contexts, I don’t think it will ever replace the complexity of human reason completely.

  • It’s for fun…

    It’s for fun…

    On June 30, 2023, I retired from the Texas A&M University System after 29 years of service. Recent events on the main TAMU campus have led to me return almost immediately as a “working retiree.” What would bring me back, you ask?

    Over the last three years, the Technology Management program has completely reshaped and updated their degree to reflect a foundation in IT Service Management (ITSM). The new degree is a Bachelor of Arts in IT Service Management and receives its first cohort of students in in the fall, 2024. This degree reflects what I and many of our faculty believe is a new focus for IT professionals – one based on a foundation of IT Service Management with a strong portfolio of technical skills and plenty of room for our students to build out concentrations of study for the myriad IT-related jobs in and around the perimeter of the industry.

    But the changes in the degree are just the tip of the iceberg. We are working with educational psychologists (I’m one) to focus the program heavily on career development during the student’s academic tenure. This is a holistic approach based upon career assessment starting on day one and extending to graduation and beyond. With ITSM as the foundation, our program can be shaped and molded to accommodate virtually any IT career our students can think of.

    For example, this semester, I have a student that wants to be an IT-focused auditor. She has identified a minor in finance that, when coupled with her IT technical skills and ITSM foundation, will position her for a strong IT career most likely in the banking sector when she graduates.

    Why did I come back? It’s for fun.

  • ITIL MP Transition Course

    How to Transition from ITIL v3 to ITIL v4

    TTI is partnering with Global Knowledge to host an ITILv4 Managing Professional Transition course on October 12-16, 2020. This is an online course offering. The purpose of the course is to transition those of you with ITILv3 Expert certification into the new ITILv4 Managing Professional certification.

    The cost of the five-day course is $2,300. The format is online. The cost includes a Global Knowledge instructor for five days, printed courseware, and a voucher to take the certification exam. A retake of the exam is not included. This is a 30% discount off of the regular course price ($3,250) and you don’t have to pay for travel.

    A course overview can be found here. A detailed course syllabus can be found here. You must have the ITILv3 Expert designation or have a minimum of 17 credits under the ITILv3 scheme. ITILv4 Foundation is not necessary, although helpful.

    [ninja_forms id=2]