feat: add headless gRPC server for external agent integration (#278)
* gRPC Server
* gRPC fix
* UpdProto
* fix: address PR review feedback for gRPC server
- Update bun.lock for new dependencies (frozen-lockfile CI fix)
- Add multi-turn session persistence via initialMessages
- Replace hardcoded done payload with real token counts
- Default bind to localhost instead of 0.0.0.0
* fix(grpc): startup parity, cancel interrupt, and cli text fallback
- Replace enableConfigs() with await init() in start-grpc.ts for full
bootstrap parity with the main CLI (env vars, CA certs, mTLS, proxy,
OAuth, Windows shell)
- Call engine.interrupt() before call.end() in the cancel handler so
in-flight model/tool execution is actually stopped
- Show done.full_text in the CLI client when no text_chunk was received,
preventing silent drops when streaming is unavailable
* fix(grpc): wire session_id end-to-end and remove dead provider field
- Move session_id from ClientMessage into ChatRequest to fix proto-loader
oneofs encoding bug and make the field functional
- Implement in-memory session store so reconnecting with the same
session_id resumes conversation context across streams
- Remove ChatRequest.provider — per-request provider routing requires
global process.env mutation, unsafe for concurrent clients; provider
is configured via env vars at server startup
* fix(grpc): mirror CLI auth bootstrap in start-grpc and fix tool_name field
scripts/start-grpc.ts now runs the same provider/auth bootstrap as the
normal CLI entrypoint: enableConfigs, safe env vars, Gemini/GitHub token
hydration, saved-profile resolution with warn-and-fallback, and provider
validation before the server binds.
ToolCallResult.tool_name was being populated with the tool_use_id UUID.
Added a toolNameById map (filled in canUseTool) so tool_name now carries
the actual tool name (e.g. "Bash"). The UUID moves to a new tool_use_id
field (proto field 4) for client-side correlation.
* fix(grpc): add tool_use_id to ToolCallStart and interrupt engine on stream close
Two blocker-level issues flagged in code review:
- ToolCallStart was missing tool_use_id, making it impossible for clients
to correlate tool_start events with tool_result when the same tool runs
multiple times. Added tool_use_id = 3 to the proto message and populated
it from the toolUseID parameter in canUseTool.
- On stream close without an explicit CancelSignal the server only nulled
the engine reference, leaving the underlying model/tool work running
as an orphan. Added engine.interrupt() in the call.on('end') handler
to stop work immediately when the client disconnects.
* fix(grpc): resolve pending promises on disconnect and guard post-cancel writes
Four lifecycle and contract issues identified during proactive review:
- Pending permission Promises in canUseTool would hang forever if the
client disconnected mid-stream. On call 'end', all pending resolvers
are now called with 'no' so the engine can unblock and terminate.
- The done message and session save could fire after call.end() when
a CancelSignal arrived mid-generation. Added an `interrupted` flag
set on both cancel and stream close to gate all post-loop writes.
- The session map had no eviction policy, allowing unbounded memory
growth. Capped at MAX_SESSIONS=1000 with FIFO eviction of the
oldest entry.
- Field 3 was silently absent from ChatRequest. Added `reserved 3`
to document the gap and prevent accidental reuse in future.
* fix(grpc): reset previousMessages on each new request to prevent session history leak
previousMessages was declared at stream scope and only overwritten when
the incoming session_id already existed in the session store. A second
request on the same stream with a new session_id would silently inherit
the first request's conversation history in initialMessages instead of
starting fresh, violating the session contract.
Fix: reset previousMessages to [] at the start of each ChatRequest
before the session-store lookup.
* fix(grpc): reset interrupted flag between requests and guard against concurrent ChatRequest
Two stream-scoped state bugs found during proactive audit:
- The `interrupted` flag was never reset between requests on the same
stream. If the first request was cancelled, all subsequent requests
would silently skip the done message, causing the client to hang.
- A second ChatRequest arriving while the first was still processing
would overwrite the engine reference, corrupting the lifecycle of
both requests. Now returns ALREADY_EXISTS error instead. Engine is
nulled after the for-await loop completes so subsequent requests
can proceed normally.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
252
src/grpc/server.ts
Normal file
252
src/grpc/server.ts
Normal file
@@ -0,0 +1,252 @@
|
||||
import * as grpc from '@grpc/grpc-js'
|
||||
import * as protoLoader from '@grpc/proto-loader'
|
||||
import path from 'path'
|
||||
import { randomUUID } from 'crypto'
|
||||
import { QueryEngine } from '../QueryEngine.js'
|
||||
import { getTools } from '../tools.js'
|
||||
import { getDefaultAppState } from '../state/AppStateStore.js'
|
||||
import { AppState } from '../state/AppState.js'
|
||||
import { FileStateCache, READ_FILE_STATE_CACHE_SIZE } from '../utils/fileStateCache.js'
|
||||
|
||||
const PROTO_PATH = path.resolve(import.meta.dirname, '../proto/openclaude.proto')
|
||||
|
||||
const packageDefinition = protoLoader.loadSync(PROTO_PATH, {
|
||||
keepCase: true,
|
||||
longs: String,
|
||||
enums: String,
|
||||
defaults: true,
|
||||
oneofs: true,
|
||||
})
|
||||
|
||||
const protoDescriptor = grpc.loadPackageDefinition(packageDefinition) as any
|
||||
const openclaudeProto = protoDescriptor.openclaude.v1
|
||||
|
||||
const MAX_SESSIONS = 1000
|
||||
|
||||
export class GrpcServer {
|
||||
private server: grpc.Server
|
||||
private sessions: Map<string, any[]> = new Map()
|
||||
|
||||
constructor() {
|
||||
this.server = new grpc.Server()
|
||||
this.server.addService(openclaudeProto.AgentService.service, {
|
||||
Chat: this.handleChat.bind(this),
|
||||
})
|
||||
}
|
||||
|
||||
start(port: number = 50051, host: string = 'localhost') {
|
||||
this.server.bindAsync(
|
||||
`${host}:${port}`,
|
||||
grpc.ServerCredentials.createInsecure(),
|
||||
(error, boundPort) => {
|
||||
if (error) {
|
||||
console.error('Failed to start gRPC server', error)
|
||||
return
|
||||
}
|
||||
console.log(`gRPC Server running at ${host}:${boundPort}`)
|
||||
}
|
||||
)
|
||||
}
|
||||
|
||||
private handleChat(call: grpc.ServerDuplexStream<any, any>) {
|
||||
let engine: QueryEngine | null = null
|
||||
let appState: AppState = getDefaultAppState()
|
||||
const fileCache: FileStateCache = new FileStateCache(READ_FILE_STATE_CACHE_SIZE, 25 * 1024 * 1024)
|
||||
|
||||
// To handle ActionRequired (ask user for permission)
|
||||
const pendingRequests = new Map<string, (reply: string) => void>()
|
||||
|
||||
// Accumulated messages from previous turns for multi-turn context
|
||||
let previousMessages: any[] = []
|
||||
let sessionId = ''
|
||||
let interrupted = false
|
||||
|
||||
call.on('data', async (clientMessage) => {
|
||||
try {
|
||||
if (clientMessage.request) {
|
||||
if (engine) {
|
||||
call.write({
|
||||
error: {
|
||||
message: 'A request is already in progress on this stream',
|
||||
code: 'ALREADY_EXISTS'
|
||||
}
|
||||
})
|
||||
return
|
||||
}
|
||||
interrupted = false
|
||||
const req = clientMessage.request
|
||||
sessionId = req.session_id || ''
|
||||
previousMessages = []
|
||||
|
||||
// Load previous messages from session store (cross-stream persistence)
|
||||
if (sessionId && this.sessions.has(sessionId)) {
|
||||
previousMessages = [...this.sessions.get(sessionId)!]
|
||||
}
|
||||
|
||||
const toolNameById = new Map<string, string>()
|
||||
|
||||
engine = new QueryEngine({
|
||||
cwd: req.working_directory || process.cwd(),
|
||||
tools: getTools(appState.toolPermissionContext), // Gets all available tools
|
||||
commands: [], // Slash commands
|
||||
mcpClients: [],
|
||||
agents: [],
|
||||
...(previousMessages.length > 0 ? { initialMessages: previousMessages } : {}),
|
||||
includePartialMessages: true,
|
||||
canUseTool: async (tool, input, context, assistantMsg, toolUseID) => {
|
||||
if (toolUseID) {
|
||||
toolNameById.set(toolUseID, tool.name)
|
||||
}
|
||||
// Notify client of the tool call first
|
||||
call.write({
|
||||
tool_start: {
|
||||
tool_name: tool.name,
|
||||
arguments_json: JSON.stringify(input),
|
||||
tool_use_id: toolUseID
|
||||
}
|
||||
})
|
||||
|
||||
// Ask user for permission
|
||||
const promptId = randomUUID()
|
||||
const question = `Approve ${tool.name}?`
|
||||
call.write({
|
||||
action_required: {
|
||||
prompt_id: promptId,
|
||||
question,
|
||||
type: 'CONFIRM_COMMAND'
|
||||
}
|
||||
})
|
||||
|
||||
return new Promise((resolve) => {
|
||||
pendingRequests.set(promptId, (reply) => {
|
||||
if (reply.toLowerCase() === 'yes' || reply.toLowerCase() === 'y') {
|
||||
resolve({ behavior: 'allow' })
|
||||
} else {
|
||||
resolve({ behavior: 'deny', reason: 'User denied via gRPC' })
|
||||
}
|
||||
})
|
||||
})
|
||||
},
|
||||
getAppState: () => appState,
|
||||
setAppState: (updater) => { appState = updater(appState) },
|
||||
readFileCache: fileCache,
|
||||
userSpecifiedModel: req.model,
|
||||
fallbackModel: req.model,
|
||||
})
|
||||
|
||||
// Track accumulated response data for FinalResponse
|
||||
let fullText = ''
|
||||
let promptTokens = 0
|
||||
let completionTokens = 0
|
||||
|
||||
const generator = engine.submitMessage(req.message)
|
||||
|
||||
for await (const msg of generator) {
|
||||
if (msg.type === 'stream_event') {
|
||||
if (msg.event.type === 'content_block_delta' && msg.event.delta.type === 'text_delta') {
|
||||
call.write({
|
||||
text_chunk: {
|
||||
text: msg.event.delta.text
|
||||
}
|
||||
})
|
||||
fullText += msg.event.delta.text
|
||||
}
|
||||
} else if (msg.type === 'user') {
|
||||
// Extract tool results
|
||||
const content = msg.message.content
|
||||
if (Array.isArray(content)) {
|
||||
for (const block of content) {
|
||||
if (block.type === 'tool_result') {
|
||||
let outputStr = ''
|
||||
if (typeof block.content === 'string') {
|
||||
outputStr = block.content
|
||||
} else if (Array.isArray(block.content)) {
|
||||
outputStr = block.content.map(c => c.type === 'text' ? c.text : '').join('\n')
|
||||
}
|
||||
call.write({
|
||||
tool_result: {
|
||||
tool_name: toolNameById.get(block.tool_use_id) ?? block.tool_use_id,
|
||||
tool_use_id: block.tool_use_id,
|
||||
output: outputStr,
|
||||
is_error: block.is_error || false
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if (msg.type === 'result') {
|
||||
// Extract real token counts and final text from the result
|
||||
if (msg.subtype === 'success') {
|
||||
if (msg.result) {
|
||||
fullText = msg.result
|
||||
}
|
||||
promptTokens = msg.usage?.input_tokens ?? 0
|
||||
completionTokens = msg.usage?.output_tokens ?? 0
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (!interrupted) {
|
||||
// Save messages for multi-turn context in subsequent requests
|
||||
previousMessages = [...engine.getMessages()]
|
||||
|
||||
// Persist to session store for cross-stream resumption
|
||||
if (sessionId) {
|
||||
if (!this.sessions.has(sessionId) && this.sessions.size >= MAX_SESSIONS) {
|
||||
// Evict oldest session (Map preserves insertion order)
|
||||
this.sessions.delete(this.sessions.keys().next().value)
|
||||
}
|
||||
this.sessions.set(sessionId, previousMessages)
|
||||
}
|
||||
|
||||
call.write({
|
||||
done: {
|
||||
full_text: fullText,
|
||||
prompt_tokens: promptTokens,
|
||||
completion_tokens: completionTokens
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
engine = null
|
||||
|
||||
} else if (clientMessage.input) {
|
||||
const promptId = clientMessage.input.prompt_id
|
||||
const reply = clientMessage.input.reply
|
||||
if (pendingRequests.has(promptId)) {
|
||||
pendingRequests.get(promptId)!(reply)
|
||||
pendingRequests.delete(promptId)
|
||||
}
|
||||
} else if (clientMessage.cancel) {
|
||||
interrupted = true
|
||||
if (engine) {
|
||||
engine.interrupt()
|
||||
}
|
||||
call.end()
|
||||
}
|
||||
} catch (err: any) {
|
||||
console.error("Error processing stream:", err)
|
||||
call.write({
|
||||
error: {
|
||||
message: err.message || "Internal server error",
|
||||
code: "INTERNAL"
|
||||
}
|
||||
})
|
||||
call.end()
|
||||
}
|
||||
})
|
||||
|
||||
call.on('end', () => {
|
||||
interrupted = true
|
||||
// Unblock any pending permission prompts so canUseTool can return
|
||||
for (const resolve of pendingRequests.values()) {
|
||||
resolve('no')
|
||||
}
|
||||
if (engine) {
|
||||
engine.interrupt()
|
||||
}
|
||||
engine = null
|
||||
pendingRequests.clear()
|
||||
})
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user