Zum Hauptinhalt springen
OpenAI's AgentKit Goes GA — Autonomous Workflows Are Now a Procurement Decision, Not a Pilot
OpenAIAgentKitAI AgentsWorkflow AutomationEnterprise AI

OpenAI's AgentKit Goes GA — Autonomous Workflows Are Now a Procurement Decision, Not a Pilot

T. Krause

OpenAI's AgentKit graduated from preview to general availability this week, with native handoffs between agents, durable execution, and a permissions model built for IT. The pilot phase of enterprise AI agents is ending. The procurement phase is beginning.

The phrase "AI agents are the future" has been doing a lot of work in enterprise software for the last 18 months. Vendor demos showing agents booking flights, filing tickets, and orchestrating multi-step workflows have been polished, impressive, and almost entirely unbuyable. Production deployments stayed pilots because the underlying tooling treated agents as a research toy, not infrastructure.

OpenAI's AgentKit GA release this week ends that pattern. The framework moved out of preview with three additions that change its character: durable execution state, declarative permissions tied to identity providers, and a native handoff protocol that lets specialized agents pass work to each other without losing context. None of those features sound dramatic. Together they cross the line from "interesting experiment" to "thing your CIO can sign a contract for."

The procurement conversations starting this quarter look different from the ones from twelve months ago. The question is no longer "Can agents do useful work?" It's "Whose framework are we standardizing on?"

What Actually Shipped — And Why It Matters to IT

The features that grab attention in a launch post — better model performance, broader tool integrations, slicker UIs — are not the ones that determine whether enterprise IT signs the deployment. The ones that do are unglamorous and uniformly about operability.

Durable execution. Earlier versions of AgentKit ran agent workflows in memory. If the process restarted mid-execution — because of a deploy, a crash, or just a long-running task hitting a timeout — the state was gone. GA adds durable state: agent workflows can be paused, resumed, inspected, and replayed. This is the boring infrastructure change that makes agents acceptable to operate in production. It's also why agents can now run multi-hour workflows safely.

Permissions tied to identity. AgentKit GA integrates with Entra ID, Okta, and Google Workspace identity providers. An agent acting on behalf of a user inherits that user's permissions, scoped down by per-agent policy. This is the unlock that makes "agent reads my email and summarizes" a feature rather than a security incident. It's also what makes auditing possible — every action an agent takes is logged against an identity, which is what auditors will demand.

Native handoffs. GA introduces an explicit handoff protocol: an agent specializing in legal review can pass a draft contract to an agent specializing in financial terms, which can pass it to a final reviewer agent, with shared context preserved across handoffs. This is what makes "agent teams" actually work — the alternative (one giant prompt that tries to do everything) collapses past a certain task complexity.

How This Reshapes the Build-vs-Buy Question

Most enterprise teams considering AI agents face a forking choice: build on top of a framework (LangGraph, CrewAI, AutoGen, AgentKit) or buy a vendor product that bundles agent capabilities into a vertical application. The GA release shifts that calculus in specific ways.

Build-on-framework just got more viable. The operational gaps that pushed teams toward vendor products — durability, identity, observability — are now framework features. Internal IT teams that already have FastAPI/Node services running in production can stand up agentic workflows without learning a new ops stack. The build-vs-buy line moves toward "build" for any workflow that's specific to your business.

Vendor products built on weaker frameworks lose differentiation. A SaaS company whose product wrapper around LangGraph or AutoGen was, in effect, "agent framework plus IT-grade operability" now has less to sell. Their framework choice was the differentiator; that differentiator just commoditized. Expect those products to pivot toward vertical-specific data, integrations, and workflows — the parts that genuinely don't fit in a horizontal framework.

Multi-agent orchestration becomes a real architecture pattern. Until now, "we'll have multiple agents that work together" was a deck slide. AgentKit's handoff protocol makes it a thing you can actually deploy. Expect to see reference architectures land in the next quarter for things like "intake agent → analyst agent → reviewer agent → human" pipelines, with clear contracts between stages.

Where This Lands in Specific Departments

The AgentKit GA changes the timeline for agent deployment in the departments that were always going to adopt first — but it also opens the door for departments that were holding back.

IT operations. Already running agents for ticket triage, incident summarization, and runbook execution. GA's durability features matter most here because IT operations workflows are long-running and intolerant of mid-execution failure. Expect this department to move from pilot to production fastest.

Finance and accounting. Holding back because of the audit and permissions story. The identity integration in GA is the unlock — an agent doing accounts payable triage can now be tied to a service account with defined approval thresholds, and every action is logged. CFO offices that put agent rollouts on hold pending compliance review can take them off hold.

Customer support. Already the most-deployed agent category, but mostly in vendor products (Zendesk, Intercom, etc.). GA gives in-house support teams the tooling to build escalation workflows that vendors can't match — because they can integrate the support agent with your specific internal systems (CRM, billing, fulfillment) more deeply than a vendor can.

Legal and compliance. Slowest adopters historically. AgentKit's handoff protocol opens up workflows where a research agent collects relevant precedents, a drafting agent produces a memo, and a senior-lawyer review step is the final stage. The combination of identity-scoped permissions and explicit handoffs maps onto how legal teams already think about delegation and review.

HR. A surprise dark horse. Onboarding workflows that span IT, payroll, and benefits enrollment are exactly the kind of multi-system handoff that AgentKit GA is designed for. Expect to see "first AI hire" framings in the next quarter, which is more marketing than substance but reflects real workflow consolidation.

What to Actually Do This Quarter

The GA release is recent enough that "let's pilot AgentKit" is not, by itself, a useful answer. The teams that capture value early will do specific things this quarter.

Map your existing automation surface area. Most enterprises have an inventory of business workflows that are partially automated — RPA bots, scheduled scripts, Zapier flows, internal cron jobs. That inventory is your AgentKit target list. Pick three workflows that combine "high volume" with "frequent edge cases that break the existing automation" — those are where agentic flexibility pays off relative to brittle scripting.

Stand up the platform layer before the use cases. The temptation is to start with a use case and build outward. Resist it. Stand up your AgentKit deployment with the platform pieces — identity integration, observability, audit logging, prompt and tool registries — as platform infrastructure that any use case can consume. The first use case becomes 5x easier when the third one ships.

Define your agent oversight model now. Who reviews what an agent did? Who is accountable when it does something wrong? What does "the agent made a mistake" look like in your incident management system? Most organizations defer these questions until after deployment, then have a quiet crisis the first time an agent does something visibly bad. Define them before you deploy.

Negotiate your data-residency and retention terms upfront. AgentKit running through OpenAI's API has implications for where your data sits and how long it's retained. The GA release added enterprise data controls; the work for your procurement team is to verify those controls map to your specific compliance posture (GDPR, HIPAA, SOC 2, sector-specific regs) before rather than after rollout.

The Strategic Shift: Agents Become Procurement, Not Innovation

The most consequential change in this release isn't technical. It's that AI agents have crossed the line from "innovation team experiment" to "vendor selection decision." The conversation moves out of the AI team and into procurement, security, and IT.

Organizations that handled this transition well in earlier waves of enterprise software — the move from on-prem to cloud, the move from monolith to microservices, the move from custom code to SaaS — share a pattern: they treated the transition as a platform problem, not a use-case problem. They invested in the substrate (identity, observability, governance) early and let use cases multiply on top. Organizations that handled it badly chased point use cases and ended up with a sprawl of unmanageable point solutions.

The pattern repeats here. The AgentKit GA release means agents are now a platform decision. The teams that recognize that and invest in the platform layer this quarter will deploy faster, more safely, and more broadly than the teams that treat each agent use case as a one-off. The pilot phase is over. What you do next determines whether the next 18 months look like a controlled rollout or a sprawl.

Continue reading

More from the blog

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

By clicking "Accept", you agree to our use of cookies.
Learn more.