How Harness ‘leverages’ agent AI to help improve enterprise incident response with automated data collection and playbooks


Join our daily and weekly newsletters for the latest updates and exclusive content on industry leading AI coverage. Learn more


Incident response, the process of responding to system outages and slowdowns, is a critical aspect of IT operations. It is also an activity that has traditionally involved many manual, time-consuming processes.

That’s a challenge Harness focuses on a new incident response service. The technology entered early access today as a module of the company’s eponymous platform. Harness started in 2017 with an initial focus on continuous integration/continuous delivery (CI/CD) automation for DevOps. In the years since, the company has expanded a software delivery platform with multiple modules. In the fall of 2024 Harness was broken in agent AIinitially to help support software development.

Now the company is expanding on that same core agent AI foundation incident response. The new solution also benefits from licensed capabilities originally developed by the vendor in the development workflow. Transposit. Tina Huang, cofounder of Transposit, along with several members of her team, joined Harness in September 2024.

The goal of Harness Incident Response is to speed up the mean time to resolution (MTTR) for an incident.

“If you think about what DevOps platforms have been up to now, it’s mainly about helping you do deployments,” Huang told VentureBeat. “I think the natural place to go after that is, ‘How do I handle your deployments after hitting production?'”

How Harness does autonomous incident response with agent AI

At the core of Harness’ Incident Response module is the company’s AI agent architecture, first introduced in September 2024.

Jyoti Bansal, Harness CEO and cofounder, explained to VentureBeat that its AI agents are designed to provide autonomous assistance, which goes beyond alerting engineers to incidents. Traditional incident response technology uses an approach known as a playbook. IT teams, often working with site reliability engineers (SREs), define playbooks that outline step-by-step processes for recovering from different types of service disruptions.

Instead of relying solely on predefined playbooks, AI agents can suggest actions, identify potential triggers and even create new playbooks on the fly.

“A workflow agent suggests actions that need to be taken,” Bansal said.

Huang explained that AI agents are implementing several measures that are important to help organizations respond to incidents faster. Even before a playbook can be run, there’s a certain amount of triage that needs to happen, Bansal explains. General triage can, for example, identify which services are affected or determine the upstream and downstream dependencies that are also affected by the incident.

The Harness system has agents that are aware of and plugged into multiple systems, and can automatically collect information, including information and discussion from Slack channels. That information helps other agents alert people and provide autonomous assistance.

While the system has a high level of automation, Huang emphasized that people are still in the loop. But instead of someone being alerted to a problem and then having to figure out if a playbook exists — and if so how to run it — the system recommends a fix and the person just has to approve it.

Incident response requires more than technology

The Harness Incident Response module can run on its own, meaning organizations do not need to run any other Harness modules.

Bansal expects, however, that the integrated offering – which enables the integration of many other workflows including DevOps or chaos engineering – will be beneficial. Chaos engineering is the process of injecting unexpected variables and events into an application to see how it responds. Harness has a chaos engineering module as part of its platform since 2022.

Huang explained that as part of the incident response platform, an organization can run ‘fire drills’ in conjunction with the chaos engineering module to test different scenarios.

“Incidents happen rarely, and they’re often the unfortunate result of something you didn’t know about beforehand,” Huang said. “We want to take a proactive approach to incident response.”

How businesses can benefit from agent AI driven incident response

A Harness customer using the incident response module is Tyler Technologieswhich develops software for the public sector.

The company uses the Harness platform for continuous deployment, cloud cost management and flagship development. Increasing incident response will help solve a significant challenge at hand, explained Jeff Green, CTO of Tyler Technologies.

“Our main challenge is to integrate all the operational data, metrics and processes, then correlate them with a unified approach to managing incidents and automating our response to them,” he said. on VentureBeat. “Our portfolio includes over 100 products built on different technologies using a wide variety of devops tools and platforms.”

The incident response capability will complement the existing operations already performed by Tyler Technologies in Harness. For example, being able to correlate deployments with incidents, or feature flags with incidents.

“We think that the AI ​​capabilities built into the product will save a lot of time by helping us with root cause analysis, identifying ways to mitigate or resolve incidents, and with incident prevention,” said said Green. “Much of this work is now done by people pulling data from multiple sources, scoring logs and application performance monitoring (APM) data and looking for patterns, all of which a task better suited to AI.”

Agent AI’s ROI for incident response

Another Harness customer evaluating the incident response module is Omar Alwattar, Sr DevOps engineer at InStride.

Alwattar told VentureBeat that his company uses the Harness Continuous Delivery module. He noted that when it comes to incident response, his organization has two key challenges: preventing monitoring and identifying the root cause. Harness’s new incident response tool is interesting to his company, he said, because it helps with faster issue identification and automated repair suggestions.

“In terms of ROI, the most important impact is the reduction of downtime, as it directly influences SLA compliance and customer satisfaction,” Alwattar said. “Additionally, by automating aspects of incident response, our 11-person DevOps team can focus more on strategic projects and innovation rather than constant troubleshooting.”



Source link
  • Related Posts

    How to watch the latest Xbox Developer Direct showcase

    Xbox hosts it Direct to the Developer showcase today, and you can watch live on YouTube, Twitch or our handy embed below. The stream starts as 1PM ET / 10AM…

    Don’t Let the High APY Pass You By. Today’s CD Rates, Jan. 23, 2025

    The best CDs today earn up to 4.65% APY. Your APY is fixed when you open a CD. As APYs fall, locking in a high APY can now maximize your…

    Leave a Reply

    Your email address will not be published. Required fields are marked *