corsa-bind: The Idea of Language Processor Orchestration

Introduction

When looking at typescript-go, the native implementation of TypeScript, or tsgo, there is a moment where the way you think about it changes quite naturally.

At first, it looks like a story about making tsc faster. The TypeScript 6.0 announcement describes TypeScript 6.0 as a bridge release from the current JavaScript implementation toward the Go implementation used by TypeScript 7.0 and beyond, and describes that Go codebase as a new implementation that uses native code and shared-memory multi-threading.

But when I write code that integrates with tsgo from the outside, my interest slowly moves somewhere else.

What I want is not only a faster single run of tsgo --noEmit. I want a shape where multiple consumers, such as editors, linters, code generators, refactoring tools, AI agents, and build servers, gather around the same TypeScript program graph and checker, then repeatedly ask only the questions they need.

At that point, tsgo starts looking to me less like a CLI and more like a language processor server.

And corsa-bind is where I am pushing that view into implementation. I do not fork tsgo or rewrite its internals. Instead, I keep corsa-bind on the stdio API and LSP boundaries that upstream provides, then orchestrate workers, sessions, snapshots, transports, and caches around them.

A small note on the name: corsa is not the name of my project by itself. It comes from the tsgo side's code name. My project is corsa-bind, and its README describes the repository as bindings and orchestration layers for using typescript-go from Rust and Node.js. In that sense, bind does not mean exposing a single FFI function. It means building the connection and operations layer around tsgo for Rust, Node, and other native-language surfaces.

I want to call this idea language processor orchestration.

stdio API Is Not a Function Call

When you first see the tsgo stdio API, you might think it is just "sending requests and responses over standard input and standard output." But that view drops something important.

On the other side of stdio, there is a language processor running as another process. We send requests. It builds programs, reads source files, resolves types, holds symbols, manages snapshots, and returns responses.

In other words, the boundary looks like this.

caller process
  -> transport
  -> tsgo worker process
  -> program / checker / snapshot state

The important part is that the tsgo worker has state. If you start a process, read the config, open the project, type-check, and throw everything away each time, that is the CLI way of using it. But if you keep the worker alive, initialize once, reuse snapshots, and keep sending small queries into the same session, that is the server way of using it.

The corsa_bind_client docs on docs.rs describe this layer as high-level client bindings for the typescript-go stdio API. Concretely, it can spawn a tsgo worker process, initialize a session once and reuse it, create and reuse snapshots, and ask type, symbol, and syntax questions through typed helpers.

At that point, what I am doing in corsa-bind is no longer just binding. It is not simply exposing one function through FFI. The layer has to deal with process lifetime, transport choice, request types, snapshot handle lifetime, cleanup, timeouts, and observability together.

I am not building only a thin binding for calling a language processor. I am writing a layer for operating one.

Change the Shape of Work, Not the Compiler

There is one misunderstanding to avoid here.

I am not building corsa-bind to become a smarter compiler engine than tsgo. If you open the same project, resolve the same types, and produce the same diagnostics, the fundamental work is still done by tsgo. A wrapper does not magically make the engine itself faster.

What I want to change with corsa-bind is the shape of the work.

For example, these two workflows are very different.

CLI-shaped workflow:
  run tsgo
  load config
  open project
  build program
  answer one big question
  exit

orchestrated workflow:
  spawn worker
  initialize once
  open project once
  create snapshot
  answer many small questions
  reuse worker and snapshot
  close explicitly

The first shape is natural for batch processing. If you are checking the whole project in CI, this is still a very reasonable shape.

The second shape is closer to editors, lint rules, and agents. You want to know the type of one node. Then you want to resolve a symbol. Then you want to slightly change a virtual document and ask again. Reopening the entire project as a CLI operation every time is like starting a database server for every request in a distributed system.

So the performance model I use for corsa-bind is not "do the same work and beat tsgo." It is closer to "reuse the same engine state and avoid unnecessary work." The benchmarking guide follows that line of thought: separate engine speed from wrapper speed, separate cold runs from warm runs, and look at session reuse and transport choices as different questions.

A good claim is not "corsa-bind is faster than tsgo." A good claim is "in a warm editor workflow, reusing a live session can be cheaper than rerunning tsgo --noEmit every time."

That difference may look small, but it is large. The topic is no longer "how to build a faster compiler." It becomes a question of where to place the processing system around the compiler, which state to reuse, and which queries to send to which worker.

In other words, this is not a move away from compilers. It is a move toward treating a compiler not as a single executable or one-shot function call, but as a long-lived service. The parser, binder, checker, and program graph remain the core of the compiler, but when to start them, where to keep them, which consumers can share them, and what query granularity to expose become design problems outside the compiler itself.

That outside layer is what I want to work on in corsa-bind. I leave the compiler engine semantics to upstream tsgo. Then I design the transport, lifetime, and cache layers around that compiler.

A Snapshot Is a Handle to Remote State

One concept I care about in corsa-bind is the snapshot.

A snapshot is not just a JSON value. At least from my point of view while writing the orchestrator, it is a handle to language processor state living inside the worker.

Very roughly, the relationship looks like this.

ApiClient
  owns worker process
  owns transport
  initializes tsgo session

ManagedSnapshot
  points to snapshot state inside that session
  can be reused for type / symbol / syntax queries
  releases remote state when dropped or closed

The fact that it is a handle matters.

In a normal library call, you pass a value into a function and receive a return value. But with tsgo over stdio, the heavy state remains on the worker side. The caller holds a handle that identifies that state, then refers to that handle in the next request.

This is quite close to a database connection or an actor system. You ask, "for this snapshot, tell me the type at this file position." You ask, "create a snapshot that reflects this virtual document." You say, "release this, because I do not need it anymore."

In other words, you are managing the lifetime of language processor internal state from outside the process boundary. If you handle this casually, correctness breaks before performance even matters. If you forget to release snapshots, resources remain on the worker side. If you treat process cleanup as unimportant, you can leave zombie processes behind or distort later benchmarks.

That is why I bring operational concerns like timeouts, graceful shutdown, observer events, and queue capacity into the same design surface as the typed client in corsa-bind. It is less a binding and more a small operations layer.

Think Like Distributed Servers

If I push this idea one step further, corsa-bind starts to resemble a distributed server system to me.

Of course, if you are only spawning several tsgo workers on one machine, it is not literally a distributed system over the network. But the design problems are very similar.

Which worker should receive this request?
When should a worker start, and when should it stop?
How should cold workers and warm workers be treated?
Which snapshots can be reused?
If a worker crashes, how much can be recovered?
What should happen to a timed-out request?
Should the transport be JSON-RPC or msgpack?
At what granularity should observable events be emitted?

This is not just "calling a compiler API." It is closer to operating a small cluster.

For example, a concept like ApiProfile is not merely a name for a spawn config. It is a stable way to describe the character of a worker: which tsgo executable to use, which cwd to start it in, which transport to use, and which timeout and observer to attach.

Conceptually, it looks like this.

profile: "default-msgpack"
  -> worker pool
    -> worker 1
      -> session
      -> snapshots
    -> worker 2
      -> session
      -> snapshots

profile: "lsp-jsonrpc"
  -> worker pool
    -> worker 1
      -> lsp session
      -> virtual documents

It might be an overstatement to say that workers behave like database shards. But in the sense that an orchestrator on the client side manages remote, stateful processing entities, much of the same design vocabulary becomes useful.

A process is a node. A snapshot is a handle to remote state. A request is a message. A transport is a wire format. A profile is a deployment unit. An observer is telemetry. Cleanup is correctness.

When you look at it through this metaphor, language processor API design suddenly becomes much more interesting.

Why Not Fork Upstream?

I made the corsa-bind README very explicit about its no forks, no patches policy. It treats ref/typescript-go as an exact upstream checkout, uses upstream-supported entry points, and does not carry local patches.

This is not just my purism. If you are building an orchestration layer, it is important not to silently change the semantics of the foundation.

If I patched tsgo itself to make something faster, it would become unclear whether the speed came from corsa-bind's orchestration or from compiler engine changes. Benchmarks would become harder to read. Tracking upstream would become harder. Compatibility as seen from other language bindings would become more suspicious.

So I split the boundary.

tsgo is treated as the upstream engine for language processing. In corsa-bind, I handle the outside: transport, typed requests and responses, session reuse, snapshot lifetime, Node bindings, C ABI, and other language bindings.

I think this separation is healthy. Instead of entering the compiler and taking hold of everything, this approach respects the compiler as a specialized server and designs the way it is used around that server.

This is Unix-like, and it is also microservice-like. The only difference is that the server here is not an HTTP service. It is a type checker.

Node Bindings Are an Entry Point for Author Experience

I put not only the Rust-side client and orchestrator into corsa-bind, but also Node bindings built with napi-rs. This is not only about making Rust callable from JavaScript.

The authors of tools that use TypeScript type information often live on the JS / TS side. People writing lint rules, code mods, editor extensions, framework plugins, and AI agent tool adapters do not necessarily want to touch a Rust crate directly.

At the same time, writing heavy protocol handling, process management, and snapshot lifetime management on the JS side every time is painful. So I make Rust own the hot paths and process orchestration, while Node gets a convenient surface.

The direction I am taking with corsa-oxlint fits this idea well. Rule authors write rules in JS / TS. But retrieving type information and talking to the tsgo worker is handled by a Rust-backed layer.

Here too, the subject for me is not just "bindings." The subject is how to distribute the language processor as a shared resource.

From One Compiler to Many Consumers

With a traditional CLI, the relationship between compiler and consumer tends to be one-to-one.

tsc command
  -> compiler
  -> diagnostics

But the actual shape of editors and toolchains is closer to many-to-one.

editor hover
editor completion
lint rule
code action
AI agent
test runner
build watcher
  -> shared language-processing state

Of course, this does not mean everything should be truly concentrated into one process. Fault isolation matters, and it may be better to separate workers depending on the nature of the query. Whether read-heavy type queries and workflows that aggressively rewrite virtual documents should live in the same session is a question to treat carefully.

That is exactly why an orchestrator is needed.

An orchestrator is not a wrapper that hides the compiler. It is a control plane for arranging multiple consumers around the compiler.

"Control plane" may sound a little grand, but the work is similar. Which profile should create a worker? Which worker is warm? Which snapshot is still usable? Which request is stuck? Which transport fits this workflow? Which failures should be retried, and which should be returned to the caller?

These decisions live in a layer separate from the compiler engine itself.

Language Processors Will Stay Resident

I think language processors will increasingly be treated as resident processes.

The reason is simple: developer experience is moving from batch toward conversation.

It is not only that a human saves a file and the project is checked once. The editor offers suggestions while text is being entered. The linter runs rules while looking at the AST and types. An AI agent asks small questions like "what is the type of this node?", "where does this symbol come from?", and "did this fix remove the diagnostic?" Frameworks generate virtual files and map them back to the source files.

In that world, a language processor is less a command you start every time and more a service that keeps receiving questions.

And if it becomes a service, it needs orchestration. It needs worker pools, caches, snapshots, backpressure, timeouts, observability, graceful shutdown, version pinning, and compatibility policy.

I am building that layer in corsa-bind through the concrete target of tsgo.

A language processor does not end at the parser and checker. Keeping it alive, sharing it, folding it down when it breaks, and handing it to author experiences in other languages are all part of the future of language tooling.

Conclusion

tsgo is a faster tsc. But stopping there would be a little wasteful.

Seen through its stdio API, tsgo is also a stateful language processor server. And what I am building in corsa-bind is an orchestration layer for using that server from Rust, Node, and multiple native bindings.

The important point for me is not to forcefully absorb the compiler engine itself. Respect upstream tsgo, do not fork it, do not patch it, and instead reuse sessions, manage snapshots, choose transports, and coordinate workers from the outside.

This is close to the way we think about distributed servers. There are processes, messages, handles to remote state, lifetimes, failures, observations, and cleanup.

Do not only call the language processor as a library. Operate it as a service.

That is the language processor orchestration I am trying to build in corsa-bind.