feat: add headless gRPC server for external agent integration (#278)

* gRPC Server * gRPC fix * UpdProto * fix: address PR review feedback for gRPC server - Update bun.lock for new dependencies (frozen-lockfile CI fix) - Add multi-turn session persistence via initialMessages - Replace hardcoded done payload with real token counts - Default bind to localhost instead of 0.0.0.0 * fix(grpc): startup parity, cancel interrupt, and cli text fallback - Replace enableConfigs() with await init() in start-grpc.ts for full bootstrap parity with the main CLI (env vars, CA certs, mTLS, proxy, OAuth, Windows shell) - Call engine.interrupt() before call.end() in the cancel handler so in-flight model/tool execution is actually stopped - Show done.full_text in the CLI client when no text_chunk was received, preventing silent drops when streaming is unavailable * fix(grpc): wire session_id end-to-end and remove dead provider field - Move session_id from ClientMessage into ChatRequest to fix proto-loader oneofs encoding bug and make the field functional - Implement in-memory session store so reconnecting with the same session_id resumes conversation context across streams - Remove ChatRequest.provider — per-request provider routing requires global process.env mutation, unsafe for concurrent clients; provider is configured via env vars at server startup * fix(grpc): mirror CLI auth bootstrap in start-grpc and fix tool_name field scripts/start-grpc.ts now runs the same provider/auth bootstrap as the normal CLI entrypoint: enableConfigs, safe env vars, Gemini/GitHub token hydration, saved-profile resolution with warn-and-fallback, and provider validation before the server binds. ToolCallResult.tool_name was being populated with the tool_use_id UUID. Added a toolNameById map (filled in canUseTool) so tool_name now carries the actual tool name (e.g. "Bash"). The UUID moves to a new tool_use_id field (proto field 4) for client-side correlation. * fix(grpc): add tool_use_id to ToolCallStart and interrupt engine on stream close Two blocker-level issues flagged in code review: - ToolCallStart was missing tool_use_id, making it impossible for clients to correlate tool_start events with tool_result when the same tool runs multiple times. Added tool_use_id = 3 to the proto message and populated it from the toolUseID parameter in canUseTool. - On stream close without an explicit CancelSignal the server only nulled the engine reference, leaving the underlying model/tool work running as an orphan. Added engine.interrupt() in the call.on('end') handler to stop work immediately when the client disconnects. * fix(grpc): resolve pending promises on disconnect and guard post-cancel writes Four lifecycle and contract issues identified during proactive review: - Pending permission Promises in canUseTool would hang forever if the client disconnected mid-stream. On call 'end', all pending resolvers are now called with 'no' so the engine can unblock and terminate. - The done message and session save could fire after call.end() when a CancelSignal arrived mid-generation. Added an `interrupted` flag set on both cancel and stream close to gate all post-loop writes. - The session map had no eviction policy, allowing unbounded memory growth. Capped at MAX_SESSIONS=1000 with FIFO eviction of the oldest entry. - Field 3 was silently absent from ChatRequest. Added `reserved 3` to document the gap and prevent accidental reuse in future. * fix(grpc): reset previousMessages on each new request to prevent session history leak previousMessages was declared at stream scope and only overwritten when the incoming session_id already existed in the session store. A second request on the same stream with a new session_id would silently inherit the first request's conversation history in initialMessages instead of starting fresh, violating the session contract. Fix: reset previousMessages to [] at the start of each ChatRequest before the session-store lookup. * fix(grpc): reset interrupted flag between requests and guard against concurrent ChatRequest Two stream-scoped state bugs found during proactive audit: - The `interrupted` flag was never reset between requests on the same stream. If the first request was cancelled, all subsequent requests would silently skip the done message, causing the client to hang. - A second ChatRequest arriving while the first was still processing would overwrite the engine reference, corrupting the lifecycle of both requests. Now returns ALREADY_EXISTS error instead. Engine is nulled after the for-await loop completes so subsequent requests can proceed normally. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 12:54:10 +03:00
parent 112df59117
commit 26eef92fe7
8 changed files with 659 additions and 27 deletions
--- a/src/grpc/server.ts
+++ b/src/grpc/server.ts
@@ -0,0 +1,252 @@
+import * as grpc from '@grpc/grpc-js'
+import * as protoLoader from '@grpc/proto-loader'
+import path from 'path'
+import { randomUUID } from 'crypto'
+import { QueryEngine } from '../QueryEngine.js'
+import { getTools } from '../tools.js'
+import { getDefaultAppState } from '../state/AppStateStore.js'
+import { AppState } from '../state/AppState.js'
+import { FileStateCache, READ_FILE_STATE_CACHE_SIZE } from '../utils/fileStateCache.js'
+
+const PROTO_PATH = path.resolve(import.meta.dirname, '../proto/openclaude.proto')
+
+const packageDefinition = protoLoader.loadSync(PROTO_PATH, {
+  keepCase: true,
+  longs: String,
+  enums: String,
+  defaults: true,
+  oneofs: true,
+})
+
+const protoDescriptor = grpc.loadPackageDefinition(packageDefinition) as any
+const openclaudeProto = protoDescriptor.openclaude.v1
+
+const MAX_SESSIONS = 1000
+
+export class GrpcServer {
+  private server: grpc.Server
+  private sessions: Map<string, any[]> = new Map()
+
+  constructor() {
+    this.server = new grpc.Server()
+    this.server.addService(openclaudeProto.AgentService.service, {
+      Chat: this.handleChat.bind(this),
+    })
+  }
+
+  start(port: number = 50051, host: string = 'localhost') {
+    this.server.bindAsync(
+      `${host}:${port}`,
+      grpc.ServerCredentials.createInsecure(),
+      (error, boundPort) => {
+        if (error) {
+          console.error('Failed to start gRPC server', error)
+          return
+        }
+        console.log(`gRPC Server running at ${host}:${boundPort}`)
+      }
+    )
+  }
+
+  private handleChat(call: grpc.ServerDuplexStream<any, any>) {
+    let engine: QueryEngine | null = null
+    let appState: AppState = getDefaultAppState()
+    const fileCache: FileStateCache = new FileStateCache(READ_FILE_STATE_CACHE_SIZE, 25 * 1024 * 1024)
+
+    // To handle ActionRequired (ask user for permission)
+    const pendingRequests = new Map<string, (reply: string) => void>()
+
+    // Accumulated messages from previous turns for multi-turn context
+    let previousMessages: any[] = []
+    let sessionId = ''
+    let interrupted = false
+
+    call.on('data', async (clientMessage) => {
+      try {
+        if (clientMessage.request) {
+          if (engine) {
+            call.write({
+              error: {
+                message: 'A request is already in progress on this stream',
+                code: 'ALREADY_EXISTS'
+              }
+            })
+            return
+          }
+          interrupted = false
+          const req = clientMessage.request
+          sessionId = req.session_id || ''
+          previousMessages = []
+
+          // Load previous messages from session store (cross-stream persistence)
+          if (sessionId && this.sessions.has(sessionId)) {
+            previousMessages = [...this.sessions.get(sessionId)!]
+          }
+
+          const toolNameById = new Map<string, string>()
+
+          engine = new QueryEngine({
+            cwd: req.working_directory || process.cwd(),
+            tools: getTools(appState.toolPermissionContext), // Gets all available tools
+            commands: [], // Slash commands
+            mcpClients: [],
+            agents: [],
+            ...(previousMessages.length > 0 ? { initialMessages: previousMessages } : {}),
+            includePartialMessages: true,
+            canUseTool: async (tool, input, context, assistantMsg, toolUseID) => {
+              if (toolUseID) {
+                toolNameById.set(toolUseID, tool.name)
+              }
+              // Notify client of the tool call first
+              call.write({
+                tool_start: {
+                  tool_name: tool.name,
+                  arguments_json: JSON.stringify(input),
+                  tool_use_id: toolUseID
+                }
+              })
+
+              // Ask user for permission
+              const promptId = randomUUID()
+              const question = `Approve ${tool.name}?`
+              call.write({
+                action_required: {
+                  prompt_id: promptId,
+                  question,
+                  type: 'CONFIRM_COMMAND'
+                }
+              })
+
+              return new Promise((resolve) => {
+                pendingRequests.set(promptId, (reply) => {
+                  if (reply.toLowerCase() === 'yes' || reply.toLowerCase() === 'y') {
+                    resolve({ behavior: 'allow' })
+                  } else {
+                    resolve({ behavior: 'deny', reason: 'User denied via gRPC' })
+                  }
+                })
+              })
+            },
+            getAppState: () => appState,
+            setAppState: (updater) => { appState = updater(appState) },
+            readFileCache: fileCache,
+            userSpecifiedModel: req.model,
+            fallbackModel: req.model,
+          })
+
+          // Track accumulated response data for FinalResponse
+          let fullText = ''
+          let promptTokens = 0
+          let completionTokens = 0
+
+          const generator = engine.submitMessage(req.message)
+
+          for await (const msg of generator) {
+            if (msg.type === 'stream_event') {
+              if (msg.event.type === 'content_block_delta' && msg.event.delta.type === 'text_delta') {
+                call.write({
+                  text_chunk: {
+                    text: msg.event.delta.text
+                  }
+                })
+                fullText += msg.event.delta.text
+              }
+            } else if (msg.type === 'user') {
+              // Extract tool results
+              const content = msg.message.content
+              if (Array.isArray(content)) {
+                for (const block of content) {
+                  if (block.type === 'tool_result') {
+                    let outputStr = ''
+                    if (typeof block.content === 'string') {
+                      outputStr = block.content
+                    } else if (Array.isArray(block.content)) {
+                      outputStr = block.content.map(c => c.type === 'text' ? c.text : '').join('\n')
+                    }
+                    call.write({
+                      tool_result: {
+                        tool_name: toolNameById.get(block.tool_use_id) ?? block.tool_use_id,
+                        tool_use_id: block.tool_use_id,
+                        output: outputStr,
+                        is_error: block.is_error || false
+                      }
+                    })
+                  }
+                }
+              }
+            } else if (msg.type === 'result') {
+              // Extract real token counts and final text from the result
+              if (msg.subtype === 'success') {
+                if (msg.result) {
+                  fullText = msg.result
+                }
+                promptTokens = msg.usage?.input_tokens ?? 0
+                completionTokens = msg.usage?.output_tokens ?? 0
+              }
+            }
+          }
+
+          if (!interrupted) {
+            // Save messages for multi-turn context in subsequent requests
+            previousMessages = [...engine.getMessages()]
+
+            // Persist to session store for cross-stream resumption
+            if (sessionId) {
+              if (!this.sessions.has(sessionId) && this.sessions.size >= MAX_SESSIONS) {
+                // Evict oldest session (Map preserves insertion order)
+                this.sessions.delete(this.sessions.keys().next().value)
+              }
+              this.sessions.set(sessionId, previousMessages)
+            }
+
+            call.write({
+              done: {
+                full_text: fullText,
+                prompt_tokens: promptTokens,
+                completion_tokens: completionTokens
+              }
+            })
+          }
+
+          engine = null
+
+        } else if (clientMessage.input) {
+          const promptId = clientMessage.input.prompt_id
+          const reply = clientMessage.input.reply
+          if (pendingRequests.has(promptId)) {
+            pendingRequests.get(promptId)!(reply)
+            pendingRequests.delete(promptId)
+          }
+        } else if (clientMessage.cancel) {
+          interrupted = true
+          if (engine) {
+            engine.interrupt()
+          }
+          call.end()
+        }
+      } catch (err: any) {
+        console.error("Error processing stream:", err)
+        call.write({
+          error: {
+            message: err.message || "Internal server error",
+            code: "INTERNAL"
+          }
+        })
+        call.end()
+      }
+    })
+
+    call.on('end', () => {
+      interrupted = true
+      // Unblock any pending permission prompts so canUseTool can return
+      for (const resolve of pendingRequests.values()) {
+        resolve('no')
+      }
+      if (engine) {
+        engine.interrupt()
+      }
+      engine = null
+      pendingRequests.clear()
+    })
+  }
+}