* fix: make OpenAI fallback context window configurable and support external lookup table
Unknown OpenAI-compatible models fell back to a hardcoded 128k constant,
causing auto-compact to fire prematurely on models with larger windows
(issue #635 follow-up). Two escape hatches are added without touching the
built-in table:
- CLAUDE_CODE_OPENAI_FALLBACK_CONTEXT_WINDOW (number): overrides the 128k
default for all unknown models.
- CLAUDE_CODE_OPENAI_CONTEXT_WINDOWS (JSON object): per-model overrides that
take precedence over the built-in OPENAI_CONTEXT_WINDOWS table; supports
the same provider-qualified and prefix-matching lookup as the built-in path.
- CLAUDE_CODE_OPENAI_MAX_OUTPUT_TOKENS (JSON object): same pattern for output
token limits.
This lets operators deploy new or private models without patching
openaiContextWindows.ts on every model release.
* docs: add new OpenAI context window env vars to .env.example
Document CLAUDE_CODE_OPENAI_FALLBACK_CONTEXT_WINDOW,
CLAUDE_CODE_OPENAI_CONTEXT_WINDOWS, and
CLAUDE_CODE_OPENAI_MAX_OUTPUT_TOKENS with usage examples.
Addresses reviewer feedback on PR #861.
---------
Co-authored-by: opencode <dev@example.com>
Gates three injection sites behind OPENCLAUDE_DISABLE_TOOL_REMINDERS:
- FileReadTool cyber-risk mitigation reminder (appended to every Read
result when the model is not in MITIGATION_EXEMPT_MODELS)
- todo_reminder attachment for TodoWrite usage
- task_reminder attachment for TaskCreate/TaskUpdate usage
All three reminders are model-only side-channel instructions the user
cannot see today. Users who want full transparency over what the model
receives can now opt out without patching dist/cli.mjs on every upgrade.
Default behavior is unchanged when the flag is unset.
Closes#809
mock.module('./teammate.js', ...) only declared getAgentName/getTeamName/
getTeammateColor. Bun applies module mocks process-globally and
mock.restore() does not undo them, so whenever another test file ran
after hookChains.integration.test.ts and reached the real teammate
module it received undefined for isTeammate/isPlanModeRequired/
getAgentId/getParentSessionId.
This surfaced in CI as intermittent failures in
src/commands/provider/provider.test.tsx (TextEntryDialog / wizard
remount / ProviderWizard hides Codex OAuth), because getDefaultAppState
in AppStateStore.ts calls teammateUtils.isTeammate().
Match the mock surface to the real teammate.ts exports so downstream
consumers keep working even after the integration test pollutes the
module cache. Keeps the same behavioral overrides this test needed.
Closes#839
- Add exponential backoff retry to DuckDuckGo adapter (3 attempts with
jitter) to handle transient rate-limiting and connection errors.
- Add native fetch() fallback in WebFetch when axios hangs with custom
DNS lookup in bundled contexts.
- Prevent broken native-path fallback for web search on OpenAI shim
providers (minimax, moonshot, nvidia-nim, etc.) that do not support
Anthropic's web_search_20250305 tool.
- Cherry-pick existing fixes:
- a48bd56: cover codex/minimax/nvidia-nim in getSmallFastModel()
- 31f0b68: 45s budget + raw-markdown fallback for secondary model
- 446c1e8: sparse Codex /responses payload parsing
- ae3f0b2: echo reasoning_content on assistant tool-call messages
- Fix domainCheck.test.ts mock modules to include isFirstPartyAnthropicBaseUrl
and isGithubNativeAnthropicMode exports.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
Non-Anthropic / non-codex providers (minimax, kimi, generic OpenAI-compatible)
fell through to the DDG adapter when no paid search key was configured. DDG's
scraper is blocked on most IPs, so web_search surfaced an opaque "anomaly in
the request" error. Catch that response in the DDG provider and rethrow with
the exact env vars that would unblock the tool, or the option to switch to a
native-search provider.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kimi / Moonshot's chat completions endpoint requires that every assistant
message carrying tool_calls also carry reasoning_content when the
"thinking" feature is active. When an agent sends prior-turn assistant
history back (standard multi-turn / subagent / Explore patterns), the
shim previously stripped the thinking block:
case 'thinking':
case 'redacted_thinking':
// Strip thinking blocks for OpenAI-compatible providers.
break
That's correct for providers that would mis-interpret serialized
<thinking> tags, but Moonshot validates the schema strictly and rejects
with:
API Error: 400 {"error":{"message":"thinking is enabled but
reasoning_content is missing in assistant tool call message at
index N","type":"invalid_request_error"}}
Reproducer: launch with Kimi profile, run any tool-using command
(Explore, Bash, etc.) — every request after the first 400s.
Fix: in convertMessages(), when the per-request flag
preserveReasoningContent is set (only for Moonshot baseUrls today),
attach the original thinking block's text as reasoning_content on the
outgoing OpenAI-shaped assistant message. Other providers continue to
strip (unknown-field rejection risk).
OpenAIMessage type grows a reasoning_content?: string field.
convertMessages() accepts an options object and threads the flag
through; the only call site (_doOpenAIRequest) gates via
isMoonshotBaseUrl(request.baseUrl).
Tests (openaiShim.test.ts):
- Moonshot: echoes reasoning_content on assistant tool-call messages
(regression for the reported 400)
- non-Moonshot providers do NOT receive reasoning_content (guards
against leaking the field to strict-parse endpoints)
Full suite: 1195/1195 pass under --max-concurrency=1. PR scan clean.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
* Integrate request logging and streaming optimizer
- Add logApiCallStart/End for API request tracking with correlation IDs
- Add streaming state tracking with processStreamChunk
- Flush buffer and log stream stats at stream end
- Resolve merge conflict with main branch
* feat: add streaming optimizer and structured request logging
* fix: address PR review feedback
- Remove buffering from streamingOptimizer - now purely observational
- Use logForDebugging instead of console.log for structured logging
- Remove dead code (streamResponse, bufferedStreamResponse, etc.)
- Use existing logging infrastructure instead of raw console.log
- Keep only used functions: createStreamState, processStreamChunk, getStreamStats
* test: add unit tests for requestLogging and streamingOptimizer
- streamingOptimizer.test.ts: 6 tests for createStreamState, processStreamChunk, getStreamStats
- requestLogging.test.ts: 6 tests for createCorrelationId, logApiCallStart, logApiCallEnd
* fix: correct durationMs test to be >= 0 instead of exactly 0
* fix: address PR #703 blockers and non-blockers
1. BLOCKER FIX: Skip clone() for streaming responses
- Only call response.clone() + .json() for non-streaming requests
- For streaming, usage comes via stream chunks anyway
2. NON-BLOCKER: Document dead code in flushStreamBuffer
- Added comment explaining it's a no-op kept for API compat
3. NON-BLOCKER: vi.mock in tests - left as-is (test framework issue)
* fix: address all remaining non-blockers for PR #703
1. Remove dead code: flushStreamBuffer call and unused import
2. Fix test for Bun: remove vi.mock, use simple no-throw tests
The test "never returns negative even for unknown 3P models (issue #635)"
asserted that getEffectiveContextWindowSize() returns >= 33_000 for an
unknown 3P model under the OpenAI shim. That specific number assumes
reservedTokensForSummary = 20_000 (MAX_OUTPUT_TOKENS_FOR_SUMMARY), which
holds only when the tengu_otk_slot_v1 GrowthBook flag is disabled.
When the flag is ON — which is the case in CI but not always locally —
getMaxOutputTokensForModel() caps the model's default output at
CAPPED_DEFAULT_MAX_TOKENS (8_000). Then reservedTokensForSummary = 8_000,
floor = 8_000 + 13_000 = 21_000, and the test fails with 21_000 < 33_000.
The test reliably passes locally and reliably fails in CI, manifesting as
the intermittent PR-check failure.
Fix: relax the lower bound to 21_000 (cap-enabled worst case), which is
still well above zero — preserving the anti-regression intent of
issue #635 (no infinite auto-compact from a negative effective window)
without binding the test to GrowthBook flag state.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
getUserSpecifiedModelSetting() decides which env var to consult based on
the active provider. The check included openai and github but omitted
codex, nvidia-nim, and minimax — even though all three use the OpenAI
shim transport and get their model routing via CLAUDE_CODE_USE_OPENAI=1
+ OPENAI_MODEL (set by applyProviderProfileToProcessEnv).
Concrete failure: user switches from Moonshot profile (which persisted
settings.model='kimi-k2.6') to the Codex profile. The new profile
correctly writes OPENAI_MODEL=codexplan + base URL to
chatgpt.com/backend-api/codex. Startup banner reflects Codex / gpt-5.4
correctly. But at request time getUserSpecifiedModelSetting() returns
early for provider='codex' (not in the env-consult list), falls through
to the stale settings.model='kimi-k2.6', and the Codex API rejects:
API Error 400: "The 'kimi-k2.6' model is not supported when using
Codex with a ChatGPT account."
Fix: extract an isOpenAIShimProvider flag covering openai|codex|github|
nvidia-nim|minimax — all providers that set OPENAI_MODEL as their model
env var. The Gemini and Mistral branches stay as-is (they use
GEMINI_MODEL / MISTRAL_MODEL).
Five regression tests pin the fix for each OpenAI-shim provider plus
guard tests for openai and github that already worked.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
Adds Atomic Chat as a first-class preset inside the in-session /provider
slash command, mirroring the Ollama auto-detect flow. Picking it probes
127.0.0.1:1337/v1/models, lists loaded models for direct selection, and
falls back to "Enter manually" / "Back" when the server is unreachable
or no models are loaded. README updated to reflect the new setup path.
Made-with: Cursor
* fix(provider): saved profile ignored when stale CLAUDE_CODE_USE_* in shell
Users reported "my saved /provider profile isn't picked up at startup —
the banner shows gpt-4o / api.openai.com even though I saved Moonshot".
Root cause: applyActiveProviderProfileFromConfig() bailed out whenever
hasProviderSelectionFlags(processEnv) was true — i.e. whenever ANY
CLAUDE_CODE_USE_* flag was present. But a bare `CLAUDE_CODE_USE_OPENAI=1`
with no paired OPENAI_BASE_URL / OPENAI_MODEL is almost always a stale
shell export left over from a prior manual setup, not genuine startup
intent. Respecting it skipped the saved profile and let StartupScreen.ts
fall through to the hardcoded `gpt-4o` / `https://api.openai.com/v1`
defaults — the exact symptom users see.
Fix: narrow the guard from "any flag set" to "flag set AND at least one
concrete config value (BASE_URL, MODEL, or API_KEY)". A bare stale flag
no longer blocks the saved profile. A real shell selection (flag + URL
or flag + model) still wins, preserving the "explicit startup intent
overrides saved profile" contract.
New helper: hasCompleteProviderSelection(env). Per-provider check for a
paired concrete value. Bedrock/Vertex/Foundry keep the flag-alone
semantic since they rely on ambient AWS/GCP credentials rather than env
config.
Three new tests cover the bug and the two counter-cases:
- bare USE flag → profile applies (fixes the bug)
- USE flag + BASE_URL → profile blocked (preserves explicit intent)
- USE flag + MODEL → profile blocked (preserves explicit intent)
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
* fix(provider): don't overlay stale legacy profile on plural-managed env
Second half of the "saved profile not picked up in banner" bug. The prior
commit fixed the guard that prevented applyActiveProviderProfileFromConfig()
from firing when a stale CLAUDE_CODE_USE_* flag was in the shell. But even
when the plural system applies correctly, buildStartupEnvFromProfile() was
then loading the legacy .openclaude-profile.json AND overwriting the
plural-managed env with whatever that file contained.
addProviderProfile() (the call path the /provider preset picker uses) does
NOT sync the legacy file, so a user who went:
manual setup: CLAUDE_CODE_USE_OPENAI=1 + OPENAI_MODEL=gpt-4o
→ writes .openclaude-profile.json as { openai, gpt-4o, ... }
/provider: add Moonshot preset, mark active
→ writes plural config; legacy file UNCHANGED
would see startup reliably apply Moonshot env first, then get it clobbered
by the stale legacy file. Banner shows gpt-4o / api.openai.com while
runtime ends up with the correct env via a different code path — exactly
the user-reported symptom.
Fix: in buildStartupEnvFromProfile, when the plural system has already
set env (CLAUDE_CODE_PROVIDER_PROFILE_ENV_APPLIED === '1'), skip the
legacy-file overlay entirely and return processEnv unchanged. Legacy is
now strictly a first-run / fallback path for users who haven't adopted
the plural system.
Also removes the stripped-then-rebuilt env construction that was part of
the old overlay path — no longer needed.
Test updates:
- Replaced "lets saved startup profile override profile-managed env"
(encoded the old broken behavior) with a regression test that pins
the new semantic: plural env survives when legacy is stale.
- Added "falls back to legacy when plural hasn't applied" to pin the
first-run path still works.
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
---------
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
First-run users with a credential already exported (ANTHROPIC_API_KEY,
OPENAI_API_KEY, etc.) currently still have to navigate the provider picker
or set CLAUDE_CODE_USE_* flags manually. Selecting the right provider from
ambient state should be automatic.
New module src/utils/providerAutoDetect.ts:
- detectProviderFromEnv() — synchronous env scan in a deterministic priority
order (anthropic → codex → github → openai → gemini → mistral → minimax).
Also detects Codex via ~/.codex/auth.json presence.
- detectLocalService() — parallel probes for Ollama (:11434) and LM Studio
(:1234), with honoring of OLLAMA_BASE_URL / LM_STUDIO_BASE_URL overrides.
Short 1.2s default timeout so first-run latency stays low when no local
service is running.
- detectBestProvider() — orchestrator. Env scan short-circuits the probe;
only hits the network when env has nothing.
All detection paths are side-effect-free: returns a DetectedProvider
descriptor describing what was found and why. Callers decide whether to
apply it (gated on hasExplicitProviderSelection() / profile file existence)
and how to hydrate the launch env.
Codex auth-file check is injectable (hasCodexAuth option) so tests are
hermetic from the dev machine's ~/.codex/auth.json state.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
* feat: add thinking token tracking and historical analytics
- extractThinkingTokens(): separate thinking from output tokens
- TokenUsageTracker class for historical analytics
- Track: cache hit rate, most used model, requests per hour/day
- Analytics: average tokens per request, totals
- Add tests (7 passing)
PR 4B: Features 1.10 + 1.11
* refactor: extract thinking and analytics to separate files
- Create thinkingTokenExtractor.ts with ThinkingTokenAnalyzer
- Create tokenAnalytics.ts with TokenUsageTracker
- Add production-grade methods and tests
- Update test imports
Fixes#774. When tool_result content contains multiple text blocks,
they were serialized as arrays instead of strings, causing DeepSeek
to reject the request with 400 error.
Changes:
- convertToolResultContent: collapse all-text arrays to joined string
- convertContentBlocks: defensive collapse for user/assistant messages
- Arrays with images are preserved (not collapsed)
Tests: 3 new tests added, 53 pass, 0 fail
Co-authored-by: nick.mesen <nickmesen@users.noreply.github.com>
Most everyday turns ("ok", "thanks", "yep go ahead", "what does that do?")
get no measurable quality improvement from Opus-tier models over Haiku-tier,
but cost ~10x more and stream slower. Smart routing opts a user into
automatically routing obviously-simple turns to a cheaper model while
keeping the strong model for anything non-trivial.
New module src/services/api/smartModelRouting.ts:
- routeModel(input, config) → { model, complexity, reason }
- Pure primitive: no env reads, no state, caller supplies everything.
- Config is opt-in (enabled: false by default).
Routes to strong (conservative) when ANY of:
- First turn of session (task-setup is worth the quality)
- Code fence or inline code span present
- Reasoning/planning keyword (plan, design, refactor, debug, architect,
investigate, root cause, etc. — 20+ anchors)
- Multi-paragraph input
- Over char/word cutoff (defaults: 160 chars, 28 words; matches hermes)
Routes to simple only for clearly-trivial chatter.
Decision includes a reason string for a future UI indicator that shows
which tier handled the turn.
Integration into query path is intentionally deferred to a follow-up PR so
the heuristics can be reviewed and tuned in isolation first.
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
* feat(provider): first-class Moonshot (Kimi) direct-API support
Moonshot's direct API (api.moonshot.ai/v1) is OpenAI-compatible and works
today via the generic OpenAI shim, including the reasoning_content channel
that Kimi returns alongside the user-visible content. But the UX was rough:
unknown context window triggered the conservative 128k fallback + a warning,
and the provider displayed as "Local OpenAI-compatible".
Makes Moonshot a recognized provider:
- src/utils/model/openaiContextWindows.ts: add the Kimi K2 family and
moonshot-v1-* variants to both the context-window and max-output tables.
Values from Moonshot's model card — K2.6 and K2-thinking are 256K,
K2/K2-instruct are 128K, moonshot-v1 sizes are embedded in the model id.
- src/utils/providerDiscovery.ts: recognize the api.moonshot.ai hostname
and label it "Moonshot (Kimi)" in the startup banner and provider UI.
Users can now launch with:
CLAUDE_CODE_USE_OPENAI=1 \
OPENAI_BASE_URL=https://api.moonshot.ai/v1 \
OPENAI_API_KEY=sk-... \
OPENAI_MODEL=kimi-k2.6 \
openclaude
and get accurate compaction + correct labeling + correct max_tokens out
of the box.
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
* fix(openai-shim): Moonshot API compatibility — max_tokens + strip store
Moonshot's direct API (api.moonshot.ai and api.moonshot.cn) uses the
classic OpenAI `max_tokens` parameter, not the newer `max_completion_tokens`
that the shim defaults to. It also hasn't published support for `store`
and may reject it on strict-parse — same class of error as Gemini's
"Unknown name 'store': Cannot find field" 400.
- Adds isMoonshotBaseUrl() that recognizes both .ai and .cn hosts.
- Converts max_completion_tokens → max_tokens for Moonshot requests
(alongside GitHub / Mistral / local providers).
- Strips body.store for Moonshot requests (alongside Mistral / Gemini).
Two shim tests cover both the .ai and .cn hostnames.
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
* fix: null-safe access on getCachedMCConfig() in external builds
External builds stub src/services/compact/cachedMicrocompact.ts so
getCachedMCConfig() returns null, but two call sites still dereferenced
config.supportedModels directly. The ?. operator was in the wrong place
(config.supportedModels? instead of config?.supportedModels), so the null
config threw "Cannot read properties of null (reading 'supportedModels')"
on every request.
Reproduces with any external-build provider (notably Kimi/Moonshot just
enabled in the sibling commits, but equally DeepSeek, Mistral, Groq,
Ollama, etc.):
❯ hey
⏺ Cannot read properties of null (reading 'supportedModels')
- prompts.ts: early-return from getFunctionResultClearingSection() when
config is null, before touching .supportedModels.
- claude.ts: guard the debug-log jsonStringify with ?. so the log line
never throws.
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
* fix(startup): show "Moonshot (Kimi)" on the startup banner
The startup-screen provider detector had regex branches for OpenRouter,
DeepSeek, Groq, Together, Azure, etc., but nothing for Moonshot. Remote
Moonshot sessions fell through to the generic "OpenAI" label —
getLocalOpenAICompatibleProviderLabel() only runs for local URLs, and
api.moonshot.ai / api.moonshot.cn are not local.
Adds a Moonshot branch matching /moonshot/ in the base URL OR /kimi/ in
the model id. Now launches with:
OPENAI_BASE_URL=https://api.moonshot.ai/v1 OPENAI_MODEL=kimi-k2.6
display the Provider row as "Moonshot (Kimi)" instead of "OpenAI".
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
* refactor(provider): sort preset picker alphabetically; Custom at end
The /provider preset picker was in ad-hoc order (Anthropic, Ollama,
OpenAI, then a jumble of third-party / local / codex / Alibaba / custom /
nvidia / minimax). Hard to scan when you know the provider name you want.
Sorts the list alphabetically by label A→Z. Pins "Custom" to the end —
it's the catch-all / escape hatch so it's scanned last, not shuffled into
the alphabetical run where a user looking for a named provider might
grab it by mistake. First-run-only "Skip for now" stays at the very
bottom, after Custom.
Test churn:
- ProviderManager.test.tsx: four tests hardcoded press counts (1 or 3 'j'
presses) that broke when targets moved. Replaces them with a
navigateToPreset(stdin, label) helper driven from a declared
PRESET_ORDER array, so future list edits only update the array.
- ConsoleOAuthFlow.test.tsx: the 13-row test frame only renders the first
~13 providers. "Ollama", "OpenAI", "LM Studio" sentinels moved below
the fold; swap them for alphabetically-early providers still visible
in-frame ("Azure OpenAI", "DeepSeek", "Google Gemini"). Test intent
(picker opened with providers listed) is preserved.
Co-Authored-By: OpenClaude <openclaude@gitlawb.com>
---------
Co-authored-by: OpenClaude <openclaude@gitlawb.com>
* feat: add model caching and benchmarking utilities
- Add modelCache.ts for disk caching of model lists
- Add benchmark.ts for testing model speed/quality
* fix: address review feedback - async fs, multi-provider support, error handling
* feat: add /benchmark slash command and unit tests
* feat: add /benchmark slash command and unit tests
* feat: enable 16 additional feature flags in open build
Activate features whose source is fully available in the mirror and
that have no Anthropic-internal infrastructure dependencies:
UI/UX: MESSAGE_ACTIONS, HISTORY_PICKER, QUICK_SEARCH, HOOK_PROMPTS
Reasoning: ULTRATHINK, TOKEN_BUDGET, SHOT_STATS
Agents: FORK_SUBAGENT, VERIFICATION_AGENT, MCP_SKILLS
Memory: EXTRACT_MEMORIES, AWAY_SUMMARY
Optimization: CACHED_MICROCOMPACT, PROMPT_CACHE_BREAK_DETECTION
Safety: TRANSCRIPT_CLASSIFIER
Debug: DUMP_SYSTEM_PROMPT
Also reorganize featureFlags into documented sections (disabled/upstream/new)
with inline comments explaining each flag's purpose.
* feat: add centralized GrowthBook defaults map for open build
Add _openBuildDefaults in the GrowthBook stub (no-telemetry-plugin.ts)
with all 66 runtime feature keys, organized by category with inline
comments describing each flag's purpose.
Override tengu_sedge_lantern (AWAY_SUMMARY) and tengu_hive_evidence
(VERIFICATION_AGENT) to true so these features work out of the box
without requiring manual ~/.claude/feature-flags.json setup.
Priority: feature-flags.json > _openBuildDefaults > upstream default
* feat: replace refusal language with positive security guidance
Remove refusal instructions from CYBER_RISK_INSTRUCTION since they are
redundant for Anthropic models (applied server-side) and useless for
uncensored models in multi-provider setups. Keep positive guidance for
security testing contexts and add red teaming support.
* Revert "feat: replace refusal language with positive security guidance"
This reverts commit 0463676a8f.
* fix: add EXTRACT_MEMORIES runtime gate overrides to open-build defaults
EXTRACT_MEMORIES was enabled at build-time but its runtime GrowthBook
gates (tengu_passport_quail, tengu_coral_fern) still defaulted to false,
preventing the feature from activating. Add both keys to
_openBuildDefaults so memory extraction works out of the box.
Also adds test coverage for _openBuildDefaults precedence behavior.
* docs: update GrowthBook runtime keys catalog to 88 keys
Expand the reference catalog in no-telemetry-plugin.ts from ~62 to 88
unique keys, covering all tengu_* call sites found in src/. Adds 27
previously undocumented keys including VSCode gates, dynamic configs
(auto-mode, cron, bridge), security gates, and KAIROS cron keys.
Adds "not exhaustive" disclaimer as suggested by Copilot reviewer.
Reorganizes categories with section dividers for readability.
* fix(security): harden project settings trust boundary + MCP sanitization
- Sanitize MCP tool result text with recursivelySanitizeUnicode() to prevent
Unicode injection via malicious MCP servers (tool definitions and prompts
were already sanitized, but tool call results were not)
- Read sandbox.enabled only from trusted settings sources (user, local, flag,
policy) — exclude projectSettings to prevent malicious repos from silently
disabling the sandbox via .claude/settings.json
- Disable git hooks in plugin marketplace clone/pull/submodule operations
with core.hooksPath=/dev/null to prevent code execution from cloned repos
- Remove ANTHROPIC_FOUNDRY_API_KEY from SAFE_ENV_VARS to prevent credential
injection from project-scoped settings without trust verification
- Add ssrfGuardedLookup to WebFetch HTTP requests to block DNS rebinding
attacks that could reach cloud metadata or internal services
Security: closes trust boundary gap where project settings could override
security-critical configuration. Follows the existing pattern established
by hasAllowBypassPermissionsMode() which already excludes projectSettings.
Co-authored-by: auriti <auriti@users.noreply.github.com>
* fix(security): remove unauthenticated file-based permission polling
Remove the legacy file-based permission polling from useSwarmPermissionPoller
that read from ~/.claude/teams/{name}/permissions/resolved/ — an unauthenticated
directory where any local process could forge approval files to auto-approve
tool uses for swarm teammates.
The file polling was dead code:
- The useSwarmPermissionPoller() hook was never mounted by any component
- resolvePermission() (the file writer) was never imported outside its module
- Permission responses are delivered exclusively via the mailbox system:
Leader: sendPermissionResponseViaMailbox() → writeToMailbox()
Worker: useInboxPoller → processMailboxPermissionResponse()
Changes:
- Remove file polling loop, processResponse(), and React hook imports from
useSwarmPermissionPoller.ts (now a pure callback registry module)
- Mark 7 file-based functions as @deprecated in permissionSync.ts
- Add 4 regression tests verifying the removal
No exported functions removed — only deprecated. All 5 consumer modules
verified: they import only mailbox-based functions that remain unchanged.
---------
Co-authored-by: auriti <auriti@users.noreply.github.com>
* feat(api): compress old tool_result content for small-context providers
Adds a shim-layer pass that tiers tool_result content by age on
providers
with small effective context windows (Copilot gpt-4o 128k, Mistral,
Ollama). Recent turns remain full; mid-tier results are truncated to
2k
chars; older results are replaced with a stub that preserves tool name
and arguments so the model can re-invoke if needed.
Tier sizes auto-tune via getEffectiveContextWindowSize, same
calculation
used by auto-compact. Reuses COMPACTABLE_TOOLS and
TOOL_RESULT_CLEARED_MESSAGE to complement (not duplicate)
microCompact.
Configurable via /config toolHistoryCompressionEnabled.
Addresses active-session context accumulation on Copilot where
microCompact's time-based trigger never fires, which surfaces as
"tools appearing in a loop" and prompt_too_long errors after ~15
turns.
* fix: config tool history
Updates both the model config mappings (configs.ts) and the runtime
fallback in getDefaultOpusModel() (model.ts) so Gemini mode no longer
falls back to the discontinued preview model when GEMINI_MODEL is unset.
Fixes#398
ProviderManager was blocking the main thread with synchronous file I/O
on mount (useState initializer), activation (setActiveProviderProfile),
and refresh (getProviderProfiles). This caused noticeable lag on Windows
where disk I/O can be slow due to antivirus scans, NTFS metadata, or
cache misses.
Changes to ProviderManager:
- Deferred initialization: useState now starts empty, loads via queueMicrotask
- Added isInitializing state with loading UI
- refreshProfiles() now defers reads via queueMicrotask
- activateSelectedProvider() now defers writes via queueMicrotask
- Memoized menuOptions array to prevent re-renders during navigation
Note: ProviderChooser useMemo change was reverted as it's dead code
(ProviderWizard is not used in production - /provider uses ProviderManager).
Co-authored-by: Ali Alakbarli <ali.alakbarli@users.noreply.github.com>
* fix: rename .claude.json to .openclaude.json with legacy fallback
Rename the global config file from ~/.claude.json to ~/.openclaude.json,
following the same migration pattern as the config directory
(~/.claude → ~/.openclaude).
- getGlobalClaudeFile() now prefers .openclaude.json; falls back to
.claude.json only if the legacy file exists and the new one does not
- Add .openclaude.json to filesystem permissions allowlist (keep
.claude.json for legacy file protection)
- Update all comment/string references from ~/.claude.json to
~/.openclaude.json across 12 files
New installs get .openclaude.json from the start. Existing users
continue using .claude.json until they rename it (or a future explicit
migration).
* test: add unit tests for getGlobalClaudeFile migration branches
Covers the three cases:
- new install (neither file exists) → .openclaude.json
- existing user (only legacy .claude.json exists) → .claude.json
- migrated user (both files exist) → .openclaude.json
---------
Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
* feat: native Anthropic API mode for Claude models on GitHub Copilot
When using Claude models through GitHub Copilot, automatically switch from
the OpenAI-compatible shim to Anthropic's native messages API format.
The Copilot proxy (api.githubcopilot.com) supports Anthropic's native API
for Claude models. This enables cache_control blocks to be sent and
honoured, allowing explicit prompt caching control (as opposed to relying
solely on server-side auto-caching).
Changes:
- Add isGithubNativeAnthropicMode() in providers.ts that auto-enables when
the resolved model starts with "claude-" and the GitHub provider is active
- Create a native Anthropic client in client.ts using the GitHub base URL
and Bearer token authentication when native mode is detected
- Enable prompt caching in claude.ts for native GitHub mode so cache_control
blocks are sent (previously only allowed for firstParty/bedrock/vertex)
- CLAUDE_CODE_GITHUB_ANTHROPIC_API=1 env var to force native mode for any
model
Benefits:
- Proper Anthropic message format (no lossy OpenAI translation)
- Explicit cache_control blocks for fine-grained caching control
- Potentially better Claude model behaviour with native format
Related: #515
* fix: scope force flag to Claude models and add isGithubNativeAnthropicMode tests
- CLAUDE_CODE_GITHUB_ANTHROPIC_API=1 now returns false for non-Claude models
(force flag still useful for aliases like 'github:copilot' with no model
resolved yet, where it returns true when model is empty)
- Add 7 focused tests covering mode detection: off without GitHub provider,
auto-detect via OPENAI_MODEL and resolvedModel, non-Claude model rejection,
and force-flag behaviour for claude/non-claude/no-model cases
* fix: detect github:copilot:claude- compound format, remove force flag
OPENAI_MODEL for GitHub Copilot uses the format 'github:copilot:MODEL'
(e.g. 'github:copilot:claude-sonnet-4'), which does not start with 'claude-'.
Auto-detection now handles both bare model names and the compound format.
The CLAUDE_CODE_GITHUB_ANTHROPIC_API force flag is removed: with proper
compound-format detection there is no remaining gap it could fill, and
keeping a broad override flag without a concrete use case invites misuse.
Tests updated to cover the compound format, generic alias (false), and
non-Claude compound model (github:copilot:gpt-4o → false).
* fix: use includes('claude-') for model detection, remove force flag
Detection was broken for the standard GitHub Copilot compound format
'github:copilot:claude-sonnet-4' which does not start with 'claude-'.
Using includes('claude-') handles bare names, compound names, and any
future variants without needing updates.
The CLAUDE_CODE_GITHUB_ANTHROPIC_API force flag is removed as it was
a workaround for the broken detection, not a genuine use case.
---------
Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
Reasoning models (MiniMax M2.7, GLM-4.5/5, DeepSeek, Kimi K2) inline
chain-of-thought inside <think>...</think> tags in the content field
rather than using the reasoning_content channel. The prior phrase-matching
sanitizer (looksLikeLeakedReasoningPrefix) only caught English-prose
preambles like "I should"/"the user asked", missed tag-based leaks
entirely, and risked false-stripping legitimate assistant output.
Replace with a structural tag-based approach (same pattern as hermes-agent):
- createThinkTagFilter() — streaming state machine that buffers partial
tags across SSE delta boundaries (<th| + |ink>), so tags split mid-chunk
still parse correctly.
- stripThinkTags() — whole-text cleanup for non-streaming responses and
as a safety net. Handles closed pairs, unterminated opens at block
boundaries, and orphan tags.
- Recognizes think, thinking, reasoning, thought, REASONING_SCRATCHPAD
case-insensitively, including tags with attributes.
- False-negative bias: flush() discards buffered partial tags at stream
end rather than leaking them.
Existing phrase-based shim tests updated to exercise the actual <think>
tag leak. Added regression tests confirming legitimate prose starting
with "I should..." is preserved (the old sanitizer's main false-positive).
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When set, disables strict schema normalization for non-Gemini providers.
Useful for OpenAI-compatible endpoints that reject MCP tools with complex
optional params (e.g. list[dict]) with "Extra required key ... supplied"
errors.
* fix: remove cached mcpClient in diagnostic tracking to prevent stale references
Resolves TODO comment about not caching the connected mcpClient since it can change.
Changes:
- Remove cached mcpClient field from DiagnosticTrackingService
- Add currentMcpClients storage to track active clients
- Update beforeFileEdited, getNewDiagnostics, and ensureFileOpened to accept client parameter
- Add backward-compatible methods to maintain existing API
- Update all callers to use new methods
- Add comprehensive test coverage
This prevents using stale MCP client references during reconnections,
making diagnostic tracking more reliable.
Fixes #TODO
* docs: add my contributions section to README
Add fork-specific section highlighting:
- Diagnostic tracking enhancement (PR #727)
- Technical skills demonstrated
- Links to original project and my work
- Professional contribution showcase
* revert: remove README.md contributions section to comply with reviewer request
- Remove 'My Fork & Contributions' section from README.md
- Keep README.md focused on original project documentation
- Maintain clean, project-focused README as requested by reviewer
* fix(api): drop orphan tool results to satisfy Mistral/OpenAI strict role sequence
* test: add test for orphan tool results and restore gemini comments
Problem: After auto-compaction with DeepSeek models (e.g., deepseek-chat),
the status line displayed ~16% remaining until next auto-compact, but users
expected ~30% (since compaction reduces usage to roughly half of the full
128k context).
Root cause: calculateTokenWarningState() used the auto-compaction threshold
(effectiveContextWindow - 13k buffer) as the denominator for percentLeft.
For DeepSeek-chat:
- Raw context: 128,000
- Effective: 119,808 (128k - 8,192 output reservation)
- Threshold: 106,808 (effective - 13k buffer)
At 90k usage:
- Old: (106,808 - 90k) / 106,808 ≈ 16%
- Expected: (128,000 - 90k) / 128,000 ≈ 30%
Fix: Change percentLeft calculation to use raw context window from
getContextWindowForModel() as denominator, while keeping threshold-based
warnings/triggers unchanged. This makes the displayed percentage show
remaining capacity relative to the model's full context size.
Impact:
- UI now shows correct % of total context remaining
- Auto-compaction trigger point unchanged (still ~90% of effective window)
- All other threshold calculations unaffected
Testing:
- Manual verification: DeepSeek-chat at 90k tokens shows 30% remaining (was 16%)
- Manual verification: Threshold still triggers at ~106k tokens
- Build succeeds: npm run build
- No breaking changes: Callers only depend on percentLeft for display; threshold logic unchanged
Fixes the user-reported discrepancy for DeepSeek and other OpenAI-compatible models.
* fix(mcp): sync required array with properties in tool schemas
MCP servers can emit schemas where the required array contains keys
not present in properties. This causes API 400 errors:
"Extra required key 'X' supplied."
- Add sanitizeSchemaRequired() to filter required arrays
- Apply it to MCP tool inputJSONSchema before sending to API
- Also fix filterSwarmFieldsFromSchema to update required after
removing properties
Fixes#525
* test: add MCP schema required sanitization test
* add mistral and gemini provider type for profile provider field
* load latest locally selected
* env variables take precedence over json save
* add gemini context windows and fix gemini defaulting for env
* load on startup fix
* fix failing tests
* clarify test message
* fix variable mismatches
* fix failing test
* delete keys and set profile.apiKey for mistral and gemini
* switch model as well when switching provider
* set model when adding a new model
Previously, the startup intro screen always displayed
'https://api.anthropic.com' as the endpoint for Anthropic provider,
even when a custom endpoint was configured via ANTHROPIC_BASE_URL.
This fix reads ANTHROPIC_BASE_URL from environment and displays the
actual configured endpoint, providing accurate information to users
about where their API requests will be sent (proxy gateways, staging,
custom Anthropic-compatible APIs).
Also adds isLocal detection for local endpoints to show appropriate
visual indicator in the startup banner.
Co-authored-by: Ali Alakbarli <ali.alakbarli@users.noreply.github.com>