Claude Workflows: Revisiting Process Modeling Languages Three Decades Later

Claude workflows revive the software process dream — the same problems, three decades later.

giu 01, 2026

Claude has announced a new workflow feature. As soon as I started reading about it, I had a precise, almost physical déjà vu. With a Claude workflow, you describe a process: these are the steps, here you call a tool, here you wait for a decision, here you start over if the check fails. The machine generates a script that runs it, coordinates the tools and the subagents, and returns a result. To anyone who wasn’t there, it looks like a 2026 novelty. To me, it’s the return of something I worked on for years, starting in the early Nineties. The research area is called software process.

A Thirty-Year-Old Déjà Vu

In the summer of 1994, I was at NTNU in Norway with Vincenzo Ambriola and the late Reidar Conradi (a dear friend who is sorely missed). Together, we wrote an article that appeared in ACM Transactions on Software Engineering and Methodology (TOSEM), “Assessing Process-Centered Software Engineering Environments”. It was an attempt to provide a framework for summarizing and assessing all the approaches developed in previous years in the areas of Process Modeling Languages (PMLs), Process-Centered Software Engineering Environments (PSEEs), and related software process technologies: a grid to evaluate and compare systems that sought to explicitly describe a software development process for a machine to carry out. As an initial testbed, we applied the grid to three European environments we were working on: OIKOS, EPOS, and SPADE; the last of these was developed at Politecnico di Milano.

The idea behind the software process research area (PMLs and PSEEs) was simple to state but brutally hard to implement. You wrote the process in a formal language, a Process Modeling Language. The model said who does what, in what order, with which tools, under which conditions. Then an engine (PSEE) executed the model, in what we called enactment, and offered a set of services: assistance to the people working, automation of routine tasks, invocation and control of the tools, and even, to quote the text from back then, “enforcement of mandatory rules and practices”. SPADE, our environment at Politecnico, evolved in that direction: it grew into OPSS, which ran on an event-based infrastructure called JEDI (Java Event-based Distributed Infrastructure) and coordinated human agents and external tools within a single, described, and governed process. In parallel, the same scheme was applied to business processes, and there it was called a Workflow Management System.

The dream underneath was clear and precise: to take intellectual work — coordination, decisions, handoffs between people and tools — and make it explicit enough to support, automate, and improve. The same dream that animates AI agent orchestration today.

Why It Didn’t Succeed

Thirty years ago, in practice, it didn’t work, at least not where it mattered most, in the attempt to govern human, intellectual work. The reason matters because that’s where you see what has changed.

The problem wasn’t computing power or the quality of the tools. It was that real work doesn’t follow a prescriptive model. A process described in a PML is rigid by construction: it anticipates the cases you imagined when you wrote it. But people deviate constantly: they skip a step because, in this specific case, it isn’t needed; they swap two of them; they handle an exception the model never contemplated; they reinterpret a rule in light of a new context. This is competence, not indiscipline, and the way skilled work functions. And a system that imposes “mandatory rules” on the people doing that work becomes a straitjacket, not a help. Robert Balzer had said it in 1991, in “Tolerating Inconsistency”: a system should tolerate inconsistency rather than enforce it away, a heresy then, and the seed of much of what I would work on next.

I spent years on this problem. At that time, again in ACM TOSEM, together with Gianpaolo Cugola, Elisabetta Di Nitto, and Carlo Ghezzi, I co-authored an article whose title says it all: “A framework for formalizing inconsistencies and deviations in human-centered systems”. The thesis was that a process support system should tolerate deviations, not fight them: accept that real behavior and the model diverge, and manage that divergence rather than demand strict compliance. It was the hardest point of all, and the one where the technology of the time broke down. A formal language that tolerates ambiguity is almost a contradiction in terms. The more flexible you made the model, the more complicated it became to write and maintain, up to the point where describing it cost more than doing the work by hand.

What a Claude Workflow Is

For a moment, looking at Claude’s new feature, I thought the missing piece had finally arrived: that it was enough to describe the process in words and let the model fill the gaps. Then I went to look at how these workflows work. And I found something quite interesting.

A Claude workflow is not a natural-language description. It’s a script (JavaScript code) that orchestrates the execution: it defines the steps, sequences them or runs them in parallel, decides the branches, and coordinates dozens or hundreds of subagents. The structure that governs the process is, once again, deterministic code. It is, in effect, a Process Modeling Language reincarnated in a different syntax.

The rigid skeleton hasn’t disappeared: it’s back.

The trigger is expressed in natural language (you include the word “workflow” in a prompt), but what Claude returns is code: a JavaScript script that defines the orchestration logic (which subagents to start, whether to run them sequentially or in parallel, how to pass data between them, how to aggregate results), and a separate runtime executes that script in the background. Each subagent is an independent Claude instance with its own context, receives a specific task, and returns its result to the orchestrator. So the orchestration logic lives in deterministic, inspectable JavaScript, while the individual subagents handle the parts that require linguistic intelligence and judgment. This separation between deterministic orchestration and probabilistic execution is the core idea of the feature. The official documentation states it plainly: “A dynamic workflow is a JavaScript script that orchestrates subagents at scale. Claude writes the script for the task you describe, and a runtime executes it in the background while your session stays responsive.” And it spells out the shift: “A workflow moves the plan into code. With subagents and skills, Claude is the orchestrator: it decides turn by turn what to spawn next, and every result lands in Claude’s context. A workflow script holds the loop, the branching, and the intermediate results itself.” (See Orchestrate subagents at scale with dynamic workflows, Claude Code documentation.)

Two Natures in One File

The natural language is the request, not the program. The program is JavaScript code. The primitives below — phase, parallel, agent — are the real ones, and the shape is the fan-out-reduce-synthesize skeleton that Claude Code’s own /deep-research is built on. I’ve left out the boilerplate a runnable script carries — the metaheader, the per-agent options, the schema that would back a field like confidence — to keep the two natures readable; the feature is in research preview, so the exact structure will still move. The sketch researches several questions in parallel, discards the weak findings, and then synthesizes the rest:

phase('Research')

// DETERMINISTIC: the script launches one agent per question, in parallel.
const findings = await parallel(
  // PROBABILISTIC: each agent() call is an LLM.
  questions.map(q => () => agent(`Research and report verified facts: ${q}`))
)

// DETERMINISTIC: ordinary code filtering the agents' results. No model here.
const solid = findings.filter(f => f && f.confidence > 0.7)

// DETERMINISTIC: a plain branch.
if (solid.length === 0) {
  return { report: 'No reliable findings.' }
}

phase('Synthesize')

// PROBABILISTIC: one last agent writes the briefing from what the script collected.
const report = await agent('Combine these findings into one briefing: ' + JSON.stringify(solid))

// DETERMINISTIC: the script returns the result.
return { questionCount: solid.length, report }

Read it top to bottom, and you see exactly who calls whom. The script is the orchestrator, and it is deterministic: parallel, .map, .filter, the if, the JSON.stringify are ordinary JavaScript. The machine runs them locally, as pure logic, and they return the same result every time you give them the same input. The script calls the agents, never the other way around.

The crucial distinction lies in what agent(...) actually does, because the appearance here may mislead you. It is written as a normal JavaScript function call, but it does not execute logic the way the surrounding lines do. It takes the string you pass it, sends it to a language model (a fresh instance of Claude), waits, and returns whatever the model produces. The JavaScript is only the plumbing: it carries the prompt out and the answer back. The real work, reading the question, searching, reasoning, writing the prose, happens inside the model, outside the JavaScript. The closest analogy for a programmer is a network call to an external service, a fetch() to some API: the line is JavaScript, but what happens on the other end of the wire is not. The service on the other end is not a database returning a fixed record; it is a model that reasons and composes something new on each request.

This is why that one line is probabilistic, and the others are not. JSON.stringify(solid) returns the exact same string every time. agent("write a briefing") does not: send the identical prompt twice, and you can get two different answers, because the model does not compute a unique result; it generates one token by token from a probability distribution. So the file is entirely JavaScript, yet two natures live inside it. The logic lines execute, and that is all (deterministic). The agent(...) lines are doors that open onto a probabilistic model, and what comes back through those doors is never guaranteed to be identical. The agents do not decide what runs next or talk to each other; they receive a task and return an answer. Control lives in the code; intelligence lives in the leaves.

The runtime is built around that separation. The documentation notes that a workflow run is resumable: if it stops partway, the agents that already finished return their saved results, and only the remaining ones run again. For this to hold, the orchestration code must replay identically each time, which is exactly what determinism guarantees. The skeleton must stay rigid, by design. Flexibility is allowed only where it belongs: inside the agents.

Where the Rigid Model Went

What has changed is the rigid model’s location, not its disappearance. Deviation, ambiguity, and contextual judgment haven’t been eliminated: they’ve been pushed down, into the leaves. Every single step of the script invokes a subagent (an LLM), and there, and only there, lives the flexibility we tried thirty years ago to encode into the PMLs. The skeleton is deterministic by necessity; the fuzzy part is confined to the terminal nodes.

At this point, my déjà vu became precise, because it is exactly the thesis of an article I have just written and submitted for publication, “Why Software Engineering Is Indispensable in the Age of Coding Agents”.

Fuggetta Why Software Engineering Is Indispensable (submitted Text)

66.6KB ∙ PDF file

Download

What makes an agent reliable is what surrounds the model: how you decompose the task, how you orchestrate the steps, how you handle errors, and how you verify results. It’s the orchestration layer that determines the system’s reliability. And the design principle is the one I’ve been repeating for months: keep deterministic what can be deterministic, and reserve the model for what requires interpretation and judgment under ambiguity.

The problem of thirty years ago and the problem of today are the same problem: where to put the boundary between the rigid rule and flexibility, between what must be enforced and what must be left to judgment. The difference is that back then, we managed that boundary in a single layer, the PML, asking it to be at once rigid (to control) and flexible (so as not to suffocate real work), and that contradiction made those approaches collapse. It’s the point I described this way in my latest article:

The vision was intellectually compelling. The execution was too rigid. […] A deterministic process engine either blocks or forces premature resolution that erases information the system cannot handle.

Today, the two requirements are separated into two layers: the orchestration code provides control, and the subagent provides flexibility. The deviation that once was a system error becomes a signal for human judgment, what in the article I call the shift from a deterministic model to a navigational one: no longer Process Model → Process Engine → enforces → Software Process, where deviation is an error, but Intent + Method + Structure → AI Agent → navigates → Software Process, where deviation is a signal. The problem hasn’t been solved; it has been decomposed. And decomposing it is, in a precise sense, an act of software engineering.

And here is the point: process technology, which had failed back then, may now finally be possible thanks to the combination of determinism and probability. The deterministic engine, on its own, crashed against the incompleteness and ambiguity of real work: it either blocked or imposed a premature resolution. A probabilistic engine, on its own, navigates ambiguity but offers no guarantee of control. It’s the composition of the two — the deterministic skeleton that governs, the probabilistic leaves that interpret — that makes governable what previously wasn’t.

As I put it:

AI resolves the problem at a deeper level. A probabilistic engine does not rely on formal constraint satisfaction to operate: it navigates incomplete specs, ambiguous requirements, and partially inconsistent context without blocking. Inconsistencies surface naturally as outputs — signals for human judgment — rather than as system failures.

The dream of process technology wasn’t wrong. What was missing was a component capable of closing the gap between the model and reality, and until now, only a person could do it. Today, part of that capacity is built into the tool, and for the first time, the two regimes can coexist rather than exclude each other.

The Same Tension, Relocated

This also explains why the old tension hasn’t disappeared; it has only repositioned itself. If the orchestration is code, it’s verifiable and reproducible, the way a good SPADE model was. But the leaves aren’t: a subagent can do the right thing for reasons you don’t control, and the wrong thing the same way. The questions from back then — control, verifiability, reproducibility, trust — all remain, shifted from the global model to the individual probabilistic nodes.

They are the same themes that run through everything I write about today’s AI, and it’s no coincidence: they are the themes of process technology, resurfaced thirty years later.

And nothing in the underlying idea determines which regime sits on top. In a Claude workflow, the deterministic layer always sits above and the probabilistic one below; that ordering is built into the feature, not into the idea of composing the two. The two regimes compose, and they can nest the other way round.

Picture an open, ill-defined task whose overall shape has to be worked out by judgment and then resolves into a sequence of build steps that must run exactly; there, the framing is probabilistic and the execution deterministic. The model sets the course at the top, and the rigid code carries it out at the bottom. It is, in fact, what you get with plain subagents, where Claude decides turn by turn what to do next, and any single step may be a deterministic command. Determinism and probability are not “the skeleton” and “the leaves”. They are two approaches, and the design act is to decide, at every level, which one governs and which one executes, in whichever order the work demands.

There is an objection worth meeting head-on, because the analogy invites it: that I am dressing up a success as a resurrection, and that the deterministic skeleton never died at all. The branch of this research that tried to govern human work — the PSEEs, with their “mandatory rules” imposed on the people doing the job — is the branch that failed. But the branch that coordinated computational steps, the workflow and orchestration engines, is the one that succeeded; those engines are everywhere today, and the event-based coordination JEDI belonged to is on that side. Read this way, a Claude workflow is not the return of a failed idea but the extension of a working one, with a new kind of leaf attached. I think both readings are true and not in conflict. What collapsed thirty years ago was the attempt to make a single deterministic layer govern ambiguous, deviation-prone human work. What is new is not the orchestration skeleton (that always worked where the steps were computational) but that the leaves can now be probabilistic, and that is what lets the skeleton reach into the ambiguous work it could never govern before.

But the distinction that matters most in practice is a different one. The orchestration layer, written in JavaScript, is as rigid as a PML was: it anticipates only the cases its author imagined. For some processes, that is what you want. A test suite, a deployment, a large mechanical migration: structured activities where deviation is an error, and where pinning the steps down in code is a feature. For others, the open, exploratory, knowledge-dependent ones, where the path is discovered as you walk it, that same rigidity reintroduces the problem we crashed into thirty years ago. The script that orchestrates is excellent at coordinating the foreseeable, and useless, or worse, harmful, when the work is not. Knowing which of the two you are in is the whole game, and it is a judgment, not a setting. The encouraging novelty is that the foreseeable part can now be deterministic code, and the unforeseeable part can be delegated to a probabilistic agent. But deciding where the line falls is no easier than it ever was.

Anthropic's own examples show where that line really falls. Most of what they run as workflows — codebase audits, security sweeps, migrations, a language port across thousands of files — is foreseeable work, where a rigid script is a virtue. But one of their flagship workflows, deep research, is open, exploratory work, the kind I just called hostile to rigid orchestration, and it works as a workflow anyway. The reason is the whole point: a workflow fixes the how, not the what. Deep research always follows the same recipe — fan out across sources, cross-check them, discard what doesn't hold, synthesize the rest — while the answer comes out different every time. The recipe is rigid, so it lives in the script; the answer is open, so it stays in the probabilistic leaves. That is the real test, sharper than "structured versus open work": not whether the work is predictable, but whether its method is. Some open problems have a fixed method, and those are exactly the ones a workflow can hold.

A Warning I’ve Already Raised

And here I have to add a warning, because I have already written it once. In 2000, in “Software Process: A Roadmap”, I took a critical look at that whole research season. The diagnosis was blunt: we had produced interesting results, but lacked focus and tended to build whatever the technology of the day made possible. Researchers proposed “yet another PML or yet another PSEE,” each more expressive than the last. The real failure, though, was subtler than that appetite for technology: we underestimated the one problem that mattered and was hard. Not how to describe a process, which we kept doing with growing sophistication, but how to govern what no description can hold: flexibility, deviation, the unexpected, the inconsistency that real work produces. Every system broke on that same rock, and we kept mistaking a more expressive notation for progress.

I reread that lesson today and find it applicable to agent orchestration, word for word. The risk is identical, and it hides in plain sight. The field is impressive at what it can already do, and it measures itself on benchmarks that reward exactly that. But a benchmark says nothing about the hard part. It does not test what happens when the task is ambiguous, when an agent deviates, when two runs disagree, or when the answer is fluent and wrong. That is where the old engines died, and it is where the new frameworks will be judged once the demos are over. Thirty years ago, it left us with elegant systems that almost no one used; today, with the stakes and the investments incomparably larger, the same mistake costs far more. The question to start from is not “what can I get these agents to do,” but “how does it behave when the work stops being foreseeable, and where does the inconsistency go?” Everything else, however brilliant, is the same mistake in new clothes.

And there is a corollary that follows directly: we already know a great deal about this. Three decades of software process research mapped, often the hard way, exactly where the boundary between rigid and flexible lies and how a deterministic model behaves when it meets real work. That experience is not a museum piece. It is the knowledge needed to decide where a workflow belongs and where it does not. The worst move now would be to throw it away and start over. Let’s not reinvent the wheel or make the same mistake twice.

The research is already converging on this, which is the surest sign the problem is real and not a private déjà vu. Through 2025, a line of work has been instantiating the classical process models, Waterfall, TDD, Scrum, and the V-Model, as explicit coordination scaffolds for teams of LLM agents. SOEN-101 (ICSE 2025) hands the agents the old roles: requirements engineer, architect, developer, tester, and scrum master, and reports that a Scrum-shaped process increases the correctness of the generated code by about 15% over an unstructured baseline. A companion study runs the same comparison across Waterfall, V-Model, and Agile, and lands where we landed thirty years ago: the processes do work as coordination, but each carries a trade-off; the rigid one is the most efficient, the agile one buys quality at a higher cost. It is not a vindication of the old enactment engines; rather, the same problem resurfaces with the same tensions, which is the point.

What Doesn’t Change

There’s something I learned back then that still holds. The value lies less in the engine that executes than in the ability to think the process through well: to understand what must be made explicit and what must be left to judgment, where rigor is needed and where flexibility is needed, which steps matter and which are noise. Thirty years ago, that ability served to write a good model in a PML. Today, that same ability serves to write a good workflow and decide which steps to entrust to a subagent. The tool is incomparably more powerful. The thinking it requires is exactly the same, and it is, once again, software engineering.

That’s why, when I read about Claude workflows, I didn’t feel like I was facing something completely new. What I had was rarer and more useful: recognizing an old problem that returns with new tools and the same underlying question.

And none of this cancels the human being. If anything, it finally puts the person in the right place. For thirty years, the human was the only thing holding a broken frame together: the component that, by hand, absorbed every gap between the model and reality, and paid for it when the model turned into a straitjacket. What changes now is not that the human is removed, but that the ambiguity which once fell entirely on the person is shared — the probabilistic leaves take on the interpretation, the deterministic skeleton takes on the coordination — and the human is left with what only the human can do: deciding what the process is for, where the boundary falls, what counts as a good result. The role is not erased but harmonized, at last, inside a frame that might work.

P.S.: Feel free to correct me or add to what I’ve written. Happy to learn more.

This post was written with the assistance of Claude. The ideas, the positions, and the reasoning are mine.

Referenced papers

V. Ambriola, R. Conradi, A. Fuggetta, “Assessing Process-Centered Software Engineering Environments”, ACM Transactions on Software Engineering and Methodology, vol. 6, no. 3, July 1997, pp. 283–328.
R. Balzer, “Tolerating Inconsistency”, Proceedings of the 13th International Conference on Software Engineering (ICSE), 1991, pp. 158–165.
G. Cugola, E. Di Nitto, A. Fuggetta, C. Ghezzi, “A Framework for Formalizing Inconsistencies and Deviations in Human-Centered Systems”, ACM Transactions on Software Engineering and Methodology, vol. 5, no. 3, 1996.
G. Cugola, E. Di Nitto, A. Fuggetta, “The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS”, IEEE Transactions on Software Engineering, 2001.
A. Fuggetta, “Software Process: A Roadmap”, Future of Software Engineering, ICSE 2000, pp. 25–34.
Anthropic, “Introducing dynamic workflows in Claude Code”, May 28, 2026 — the launch post, with the use cases referenced here (codebase audits, large migrations, deep research, and the Bun Zig→Rust port).
F. Lin, D. J. Kim, T.-H. Chen, “SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents”, ICSE 2025.
A. Nguyen Duc et al., “Evaluating Classical Software Process Models as Coordination Mechanisms for LLM-Based Software Generation”, arXiv:2509.13942, 2025.
Also referenced is my article “Why Software Engineering Is Indispensable in the Age of Coding Agents”, which I have just written and submitted for publication.*

Lascia un commento

© 2026 Alfonso Fuggetta & Sonia Montegiove. Salvo diversa indicazione, tutti i contenuti di questa pubblicazione sono protetti da copyright e rilasciati con licenza CC BY-NC-ND 4.0: https://creativecommons.org/licenses/by-nc-nd/4.0/deed.it

Mauro Labate

Great parallel with the process modeling languages and thanks for sharing your thoughts on the deterministic vs probabilistic portion of the execution.

One additional thought is the Claude approach allows you to also define the workflow using with human language by describing a use case. With traditional modeling languages you needed the understanding of a specific formalism to define the workflow, which prevented the business users from correctly using the tool. Claude lowers that barrier, but it also risks of having the deterministic part defined by the probabilistic LLM as for bespoke workflows, you prompt and Claude tried to build the steps.

I run the engineering department for a company selling a BPMN-based workflow system for 10 years and despite our promise to bring the business user onboard in the development process, we never really managed as the business preferred to delegate to the engineers the understanding of the model notation. What we learned through all these years was the notation was only part of the story. The best engineers were leading the most successful implementations because they could manage the inherent complexity of the problem at hand. The incidental complexity introduced by the modeling notation was already relatively low and Claude will further lower it by allowing using natural human language. What we don't yet know if how well will Claude manage the inherent complexity without the structured thinking of an engineer when dealing with complex problems.

Rispondi

Discussione su questo Post

Assolutamente, procediamo.