Commit Graph

10 Commits

Author SHA1 Message Date
euxaristia
a02c44143b fix(web-search): close SSRF bypasses in custom provider hostname guard (#610)
The previous `isPrivateHostname` used a list of regexes against
`URL.hostname`. Several literal-address forms slipped past it:

- IPv4-mapped IPv6 `[::ffff:127.0.0.1]` (WHATWG URL normalizes to
  `[::ffff:7f00:1]`, which no regex matched) — lets callers reach
  loopback and other private v4 via an IPv6 literal.
- ULA `fc00::/7` (e.g. `[fc00::1]`) — not covered.
- Link-local `fe80::/10` (e.g. `[fe80::1]`) — not covered.
- IPv4 `169.254.0.0/16` (cloud metadata, including 169.254.169.254),
  `100.64.0.0/10` (CGNAT), and the full `0.0.0.0/8` — not covered.
- The IPv6 regex `/^\[::1?\]$/` also required brackets, but `URL.hostname`
  returns bracketed form anyway, so this part happened to work.

WHATWG `new URL(...)` already normalizes short-form / numeric / hex /
octal IPv4 to dotted-quad before we see it, so those cases were in fact
handled — the remaining gaps were IPv6 and a few missing v4 ranges.

Replace the regex list with:
- a dotted-quad IPv4 parser + int range check covering 0/8, 10/8,
  100.64/10, 127/8, 169.254/16, 172.16/12, 192.168/16;
- a small IPv6 parser (handles `::` compression and embedded v4 suffix)
  + a byte-range check covering `::`, `::1`, IPv4-mapped (recursing
  into the v4 classifier), IPv4-compatible, `fc00::/7`, `fe80::/10`,
  and `fec0::/10`.

Export `isPrivateHostname` and add unit tests covering every bypass
listed above plus public-address negatives.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 21:09:46 +08:00
euxaristia
7817fe88bd fix(web-search): stop leaking abort listeners in custom provider retry (#611)
`fetchWithRetry` created a fresh `AbortController` per attempt and did:

    signal?.addEventListener('abort', () => controller.abort(), { once: true })

The listener was never removed. Consequences:

- On retry, a second listener was attached to the caller's signal,
  each closing over a different controller.
- After a successful fetch, the listener remained on the caller's
  signal indefinitely, referencing a controller whose work was done.
  For a long-lived caller signal this is a slow leak.
- The `{ once: true }` only helps if the signal actually fires — on
  non-aborted signals the listener stays attached forever.

Replace the manual controller + timer + listener dance with
`AbortSignal.any([signal, AbortSignal.timeout(ms)])`, which the
codebase already uses elsewhere (see src/services/mcp/xaa.ts). This:

- has no user-code listener to leak,
- gives each attempt a fresh independent timeout,
- cleanly distinguishes caller-initiated abort from timeout via
  `signal.aborted` vs `timeoutSignal.aborted` before rewriting the
  error as "Custom search timed out after Ns".

Also resets `lastStatus` per attempt so a 5xx on attempt 0 can't leak
into attempt 1's retry decision, and collapses the two redundant
retry branches (`lastStatus >= 500` and `lastStatus === undefined`)
into one.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 19:37:08 +08:00
FluxLuFFy
91e4cfb15b fix: WebSearch providers + MCPTool bugs (#593)
* fix: WebSearch providers + MCPTool bugs

WebSearchTool:
- custom.ts: fix buildAuthHeadersForPreset WEB_AUTH_HEADER opt-out
- custom.ts: fix WEB_AUTH_SCHEME empty string handling
- custom.ts: fix walkJsonPath null safety for jsonPath parsing
- duckduckgo.ts: use SafeSearchType enum instead of raw 0
- mojeek.ts: always send Accept: application/json header
- README: fix timeout documentation (15s -> 120s to match code)
- custom.test.ts: add tests for auth header behavior

MCPTool:
- MCPTool.ts: fix outputSchema to accept ContentBlockParam[] (not just string)
- MCPTool.ts: fix isResultTruncated for array output (iterates text blocks)

* fix: address PR #593 review feedback

1. Export buildAuthHeadersForPreset and add direct tests for:
   - WEB_AUTH_HEADER="" explicit opt-out behavior
   - WEB_AUTH_SCHEME="" stripping scheme prefix
   - Preset defaults (authHeader + authScheme)
   - No WEB_KEY returns empty headers

2. Add duckduckgo.test.ts verifying SafeSearchType.STRICT === 0,
   confirming the enum change is semantically identical to the
   previous raw value.

Addresses review by @Vasanthdev2004 at
pullrequestreview-4093533095

---------

Co-authored-by: FluxLuFFy <flux@openclaude.dev>
Co-authored-by: Fix Bot <fix@openclaude.local>
2026-04-11 21:07:20 +08:00
FluxLuFFy
32fbd0c7b4 fix: custom web search — WEB_URL_TEMPLATE not recognized, timeout too short, silent native fallback (#537)
* fix: custom web search — WEB_URL_TEMPLATE not recognized, timeout too short, silent native fallback

1. custom.ts: Add WEB_URL_TEMPLATE to isConfigured() so the custom provider
   is recognized when configured via URL template alone.

2. custom.ts: Bump DEFAULT_TIMEOUT_SECONDS from 15s to 120s.
   Self-hosted search APIs (SearXNG, internal) commonly need 30-90s.

3. WebSearchTool.ts: When an explicit adapter is selected via
   WEB_SEARCH_PROVIDER=custom, do not silently fall through to the
   native Anthropic path on adapter errors or 0-hit results.
   - 0 hits: return directly (no fallback)
   - Error: throw the real error (no fallback)
   - Auto mode: existing fallback behavior preserved

* fix: tighten auto-mode adapter fallback — only swallow transient errors

Address review feedback: in auto mode, only fall through to native on
transient errors (network failure, timeout, HTTP 5xx). Config and
guardrail errors (SSRF, HTTPS, bad URL, header allowlist, etc.) now
surface properly instead of being silently swallowed.

---------

Co-authored-by: FluxLuFFy <fluxluffy@users.noreply.github.com>
2026-04-09 20:41:58 +08:00
FluxLuFFy
4ad6bc50c1 refactor: provider adapter system + 7 new search providers (bug-fixed) (#512)
* refactor: provider adapter system + 7 new search providers

Architecture:
- Each search backend is a small adapter implementing SearchProvider
- 12 providers: custom, tavily, exa, you, jina, bing, mojeek, linkup, firecrawl, duckduckgo + native
- WEB_SEARCH_PROVIDER controls selection: auto (fallback chain) or specific provider
- Auth always in headers, never in query strings

Bug fixes from review feedback:
- Fix applyDomainFilters catch block: keep hits with malformed URLs on blocked_domains
  (can't confirm blocked), drop on allowed_domains (can't confirm allowed)
- Add safeHostname() helper: safely extract hostname from URLs without throwing
- Replace unsafe new URL(r.url).hostname in 7 providers with safeHostname()
- Remove dead code: buildAllHeaders, buildAuthHeaders, parseExtraHeaders from types.ts
- Fix WEB_PARMS typo: consistently use WEB_QUERY_PARAM everywhere
- AbortSignal forwarded to fetch() in all 12 providers
- DuckDuckGo: wrap dynamic import in try/catch for graceful error
- Exa: remove double domain filtering (server-side already)
- runSearch(): aggregate all provider errors instead of throwing only the last one
- Retry logic: check numeric status code directly, retry 5xx/network, skip 4xx

Test coverage (44 tests, all passing):
- types.test.ts: safeHostname, normalizeHit, applyDomainFilters (20 tests)
- index.test.ts: getProviderMode, getProviderChain, getAvailableProviders (13 tests)
- custom.test.ts: extractHits flexible response parsing (11 tests)

Co-authored-by: FluxLuFFy <195792511+FluxLuFFy@users.noreply.github.com>

* security: add guardrails to custom search provider (Option B)

- HTTPS-only by default (opt-out: WEB_CUSTOM_ALLOW_HTTP=true)
- Private/localhost IPs blocked by default (opt-out: WEB_CUSTOM_ALLOW_PRIVATE=true)
- Header allowlist: only known-safe headers allowed unless WEB_CUSTOM_ALLOW_ARBITRARY_HEADERS=true
- Configurable timeout in seconds (WEB_CUSTOM_TIMEOUT_SEC, default 15)
- Configurable POST body limit (WEB_CUSTOM_MAX_BODY_KB, default 300)
- Removed max URL size restriction
- Audit log warning on first custom search call
- Updated .env.example and README_SEARCH_PROVIDERS.md with all new options

* fix: remove custom provider from auto chain (Option 1)

Remove customProvider from the auto fallback chain so it is only
available when WEB_SEARCH_PROVIDER=custom is explicitly selected.

Changes:
- Remove customProvider from ALL_PROVIDERS array in providers/index.ts
- Add 3 new tests verifying custom is excluded from auto chain
- Update README_SEARCH_PROVIDERS.md: auto priority, mode table, note
- Update .env.example: auto priority comment, custom mode annotation

All 47 tests pass (44 existing + 3 new).

Co-Authored-By: @Vasanthdev2004

* fix: address review blockers (routing, abort, config check, domain matching)

1. Native/Codex routing precedence in auto mode
   shouldUseAdapterProvider() now checks if native/first-party/vertex/foundry
   or Codex paths are available before falling back to adapter providers.
   Auto mode: native paths take precedence; adapter is fallback only.

2. AbortError stops provider chain immediately
   runSearch() now checks for AbortError/aborted signal before continuing
   the fallback chain. Cancelled searches don't create extra outbound requests.

3. Explicit provider mode fails fast on missing credentials
   runSearch() validates isConfigured() for explicit modes before attempting
   requests. Throws clear error: 'Search provider "X" is not configured.'

4. Domain filter exact-or-subdomain matching (fixes suffix collision)
   New hostMatchesDomain() helper: exact match or .subdomain match.
   badexample.com no longer matches example.com.

5. Tests: 56 pass (9 new) covering all 4 fixes

Co-Authored-By: @Vasanthdev2004

---------

Co-authored-by: Claude Fix <fix@openclaude.local>
Co-authored-by: FluxLuFFy <195792511+FluxLuFFy@users.noreply.github.com>
Co-authored-by: bot <bot@openclaw.ai>
2026-04-09 02:51:25 +08:00
Anandan
462a985d7e Remove embedded source map directives from tracked sources (#329)
Inline base64 source maps had been checked into tracked src files. This strips those comments from the repository without changing runtime behavior or adding ongoing guardrails, per the requested one-time cleanup scope.

Constraint: Keep this change limited to tracked source cleanup only
Rejected: Add CI/source verification guard | user requested one-time cleanup only
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If these directives reappear, fix the producing transform instead of reintroducing repo-side cleanup code
Tested: rg -n "sourceMappingURL" ., bun run smoke, bun run verify:privacy, bun run test:provider, npm run test:provider-recommendation
Not-tested: bun run typecheck (repository has many pre-existing unrelated failures)

Co-authored-by: anandh8x <test@example.com>
2026-04-04 21:19:27 +08:00
Meetpatel006
e5c9a6f629 Enable Free DDG WebSearch For Non-Claude Models (#234)
* added duck duck go for websearch tools that allowed free searching

* update readme

* Replace @phukon/duckduckgo-search with duck-duck-scrape and fix Firecrawl routing priority, and add DDG error handling

* refactor: streamline DuckDuckGo search fallback to use Firecrawl directly on rate limit

* docs: update README to clarify DuckDuckGo web search fallback and its limitations with TOS
2026-04-04 09:21:54 +08:00
Leonardo Grigorio
ac4efae870 feat: add Firecrawl backend for WebSearch and WebFetch tools
WebSearch is currently disabled for all non-Anthropic providers (OpenAI
shim, DeepSeek, Ollama, etc.) because those providers have no native
search backend. This adds Firecrawl as a fallback that activates when
FIRECRAWL_API_KEY is set, unlocking web search for every model
openclaude supports.

WebFetch uses basic HTTP + Turndown for HTML-to-markdown conversion,
which fails silently on JS-rendered SPAs and bot-protected pages.
Firecrawl scrape replaces the fetch layer when FIRECRAWL_API_KEY is set,
returning clean markdown that handles dynamic content correctly.

Changes:
- WebSearchTool: add runFirecrawlSearch() using @mendable/firecrawl-js,
  respects allowed_domains (post-filter) and blocked_domains (-site: operators),
  includes result snippets alongside links. shouldUseFirecrawl() ensures
  firstParty/Vertex/Foundry/Codex providers keep their native backends.
- WebFetchTool: add scrapeWithFirecrawl(), drops into the existing
  applyPromptToMarkdown() pipeline so prompt processing is unchanged.
- Remove "Web search is only available in the US" restriction from
  prompt when Firecrawl is active (it works globally).
2026-04-02 12:18:20 -03:00
erdemozyol
cec3629017 fix: support codex web tools and non-git agents 2026-04-02 14:08:22 +03:00
did:key:z6MkqDnb7Siv3Cwj7pGJq4T5EsUisECqR8KpnDLwcaZq5TPr
d2542c9a62 asdf
Squash the current repository state back into one baseline commit while
preserving the README reframing and repository contents.

Constraint: User explicitly requested a single squashed commit with subject "asdf"
Confidence: high
Scope-risk: broad
Reversibility: clean
Directive: This commit intentionally rewrites published history; coordinate before future force-pushes
Tested: git status clean; local history rewritten to one commit; force-pushed main to origin and instructkr
Not-tested: Fresh clone verification after push
2026-03-31 03:34:03 -07:00