refactor: provider adapter system + 7 new search providers (bug-fixed) (#512)

* refactor: provider adapter system + 7 new search providers

Architecture:
- Each search backend is a small adapter implementing SearchProvider
- 12 providers: custom, tavily, exa, you, jina, bing, mojeek, linkup, firecrawl, duckduckgo + native
- WEB_SEARCH_PROVIDER controls selection: auto (fallback chain) or specific provider
- Auth always in headers, never in query strings

Bug fixes from review feedback:
- Fix applyDomainFilters catch block: keep hits with malformed URLs on blocked_domains
  (can't confirm blocked), drop on allowed_domains (can't confirm allowed)
- Add safeHostname() helper: safely extract hostname from URLs without throwing
- Replace unsafe new URL(r.url).hostname in 7 providers with safeHostname()
- Remove dead code: buildAllHeaders, buildAuthHeaders, parseExtraHeaders from types.ts
- Fix WEB_PARMS typo: consistently use WEB_QUERY_PARAM everywhere
- AbortSignal forwarded to fetch() in all 12 providers
- DuckDuckGo: wrap dynamic import in try/catch for graceful error
- Exa: remove double domain filtering (server-side already)
- runSearch(): aggregate all provider errors instead of throwing only the last one
- Retry logic: check numeric status code directly, retry 5xx/network, skip 4xx

Test coverage (44 tests, all passing):
- types.test.ts: safeHostname, normalizeHit, applyDomainFilters (20 tests)
- index.test.ts: getProviderMode, getProviderChain, getAvailableProviders (13 tests)
- custom.test.ts: extractHits flexible response parsing (11 tests)

Co-authored-by: FluxLuFFy <195792511+FluxLuFFy@users.noreply.github.com>

* security: add guardrails to custom search provider (Option B)

- HTTPS-only by default (opt-out: WEB_CUSTOM_ALLOW_HTTP=true)
- Private/localhost IPs blocked by default (opt-out: WEB_CUSTOM_ALLOW_PRIVATE=true)
- Header allowlist: only known-safe headers allowed unless WEB_CUSTOM_ALLOW_ARBITRARY_HEADERS=true
- Configurable timeout in seconds (WEB_CUSTOM_TIMEOUT_SEC, default 15)
- Configurable POST body limit (WEB_CUSTOM_MAX_BODY_KB, default 300)
- Removed max URL size restriction
- Audit log warning on first custom search call
- Updated .env.example and README_SEARCH_PROVIDERS.md with all new options

* fix: remove custom provider from auto chain (Option 1)

Remove customProvider from the auto fallback chain so it is only
available when WEB_SEARCH_PROVIDER=custom is explicitly selected.

Changes:
- Remove customProvider from ALL_PROVIDERS array in providers/index.ts
- Add 3 new tests verifying custom is excluded from auto chain
- Update README_SEARCH_PROVIDERS.md: auto priority, mode table, note
- Update .env.example: auto priority comment, custom mode annotation

All 47 tests pass (44 existing + 3 new).

Co-Authored-By: @Vasanthdev2004

* fix: address review blockers (routing, abort, config check, domain matching)

1. Native/Codex routing precedence in auto mode
   shouldUseAdapterProvider() now checks if native/first-party/vertex/foundry
   or Codex paths are available before falling back to adapter providers.
   Auto mode: native paths take precedence; adapter is fallback only.

2. AbortError stops provider chain immediately
   runSearch() now checks for AbortError/aborted signal before continuing
   the fallback chain. Cancelled searches don't create extra outbound requests.

3. Explicit provider mode fails fast on missing credentials
   runSearch() validates isConfigured() for explicit modes before attempting
   requests. Throws clear error: 'Search provider "X" is not configured.'

4. Domain filter exact-or-subdomain matching (fixes suffix collision)
   New hostMatchesDomain() helper: exact match or .subdomain match.
   badexample.com no longer matches example.com.

5. Tests: 56 pass (9 new) covering all 4 fixes

Co-Authored-By: @Vasanthdev2004

---------

Co-authored-by: Claude Fix <fix@openclaude.local>
Co-authored-by: FluxLuFFy <195792511+FluxLuFFy@users.noreply.github.com>
Co-authored-by: bot <bot@openclaw.ai>
This commit is contained in:
FluxLuFFy
2026-04-09 00:21:25 +05:30
committed by GitHub
parent 284d9bda36
commit 4ad6bc50c1
18 changed files with 2407 additions and 176 deletions

View File

@@ -0,0 +1,47 @@
/**
* Bing Web Search API adapter.
* GET https://api.bing.microsoft.com/v7.0/search?q=...
* Auth: Ocp-Apim-Subscription-Key: <key>
*/
import type { SearchInput, SearchProvider } from './types.js'
import { applyDomainFilters, type ProviderOutput } from './types.js'
export const bingProvider: SearchProvider = {
name: 'bing',
isConfigured() {
return Boolean(process.env.BING_API_KEY)
},
async search(input: SearchInput, signal?: AbortSignal): Promise<ProviderOutput> {
const start = performance.now()
const url = new URL('https://api.bing.microsoft.com/v7.0/search')
url.searchParams.set('q', input.query)
url.searchParams.set('count', '10')
const res = await fetch(url.toString(), {
headers: { 'Ocp-Apim-Subscription-Key': process.env.BING_API_KEY! },
signal,
})
if (!res.ok) {
throw new Error(`Bing search error ${res.status}: ${await res.text().catch(() => '')}`)
}
const data = await res.json()
const hits = (data.webPages?.value ?? []).map((r: any) => ({
title: r.name ?? '',
url: r.url ?? '',
description: r.snippet,
source: r.displayUrl,
}))
return {
hits: applyDomainFilters(hits, input),
providerName: 'bing',
durationSeconds: (performance.now() - start) / 1000,
}
},
}