Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Unified API Gateway for Secure and S...

Building a Unified API Gateway for Secure and Scalable Cross-Cloud AI Service

# Cross-Cloud AI Gateway — Unified API Gateway Solution

## Presentation Time

12:00 - 12:30

## Topic Introduction

This is a multi-functional API Gateway system primarily focused on cloud artificial intelligence service integration and enterprise internal authentication and authorization.

### Core Architecture Directions

- Authentication and Authorization
- Multi-cloud AI Backend Integration (Azure OpenAI, Anthropic, etc.)
- Traffic Control and Resource Management
- Monitoring and Metrics Aggregation

These directions collectively form the core architecture of this project, covering security authentication, traffic control, with powerful multi-backend AI service invocation capabilities, and automated processes for continuous integration and deployment, making it ideal for enterprise-level application demonstrations.

https://cloudsummit.ithome.com.tw/2025/session-page/3684

Avatar for Bo-Yi Wu

Bo-Yi Wu

July 02, 2025
Tweet

More Decks by Bo-Yi Wu

Other Decks in Technology

Transcript

  1. Building a Uni fi ed API Gateway for Secure and

    Scalable Cross-Cloud AI Service Bo-Yi Wu @ Mediatek https://blog.wu-boy.com/ 2025/07/03 12:00 - 12:30
  2. Core Feature • API Gateway & Routing Management • Authentication

    & Authorization • Quota & Billing Management • Multi AI/ML Service Integration • Monitoring & Logging
  3. // Azure OpenAI Routing r.Any("/openai/*path", azureOpenAIHandler) // AWS Bedrock Routing

    r.Any("/bedrock/*path", bedrockHandler) // Google Gemini Routing r.Any("/gemini/*path", geminiHandler) // Azure Cognitive Routing r.Any("/cognitive/*path", cognitiveHandler) // CSES LLM Routing r.Any("/cses/*path", csesHandler)
  4. type endpointconfig struct { weight int // Weight value url

    string labels []string // ... other configurations } type dispatcher struct { online []*endpoint // Online endpoint list fallback *endpoint // Fallback endpoint weight int // Total weight } &OEQPJOU$PO fi HVSBUJPO %JTQBUDIFS$PO fi HVSBUJPO
  5. func (d *dispatcher) pick(req *http.Request) *endpoint { // 1. Check

    for specific destination if dst := req.Header.Get("X-DST"); dst != "" { for _, e := range d.online { if strings.Contains(e.config.url, dst) { return e } } } // 2. Weighted random selection c := 0 w := rand.Intn(d.weight) // Generate random number for _, e := range d.online { // 3. Special handling for PTU mode if e.config.ptu { if len(ptuVIPs) == 0 || ptuVIPs[req.Header.Get("X-User-Id")] { return e } if e.isOnline() { return e } } // 4. Cumulative weight comparison if c += e.config.weight; w < c { if !e.isOnline() { continue // Skip offline endpoints } return e } } return d.fallback // Return fallback endpoint } 8FJHIU%JTUSJCVUJPO"MHPSJUIN
  6. type endpoint struct { success uint64 // Successful request count

    total uint64 // Total request count disable int32 // Disable flag (atomic operation) } func (e *endpoint) isOnline() bool { return atomic.LoadInt32(&e.disable) == 0 } func (e *endpoint) offline() bool { return atomic.CompareAndSwapInt32(&e.disable, 0, 1) } &OEQPJOU4UBUF.BOBHFNFOU )FBMUI$IFDL.FDIBOJTN
  7. func (e *endpoint) recordStatus(resp *http.Response, ttfb time.Duration, logger *slog.Logger) {

    // 1. Determine if request was successful if e.config.judgeFn == nil || (resp != nil && e.config.judgeFn(resp, ttfb)) { atomic.AddUint64(&e.success, 1) if atomic.AddUint64(&e.total, 1) == 61 { e.reset() // Reset counters } } else { atomic.AddUint64(&e.total, 1) } // 2. Calculate success rates for all endpoints endpoints := e.othersFn() rates := make([]float64, 0, len(endpoints)) for _, oe := range endpoints { ptotal := atomic.LoadUint64(&oe.total) psuccess := atomic.LoadUint64(&oe.success) rates = append(rates, float64(psuccess)/float64(ptotal)) } // 3. Statistical analysis - calculate threshold mean := float64(success) / float64(total) stdv := math.Sqrt(stdv / float64(len(rates))) threshold := mean - 1.9*stdv // 1.9 standard deviations if threshold > 0.9 { threshold = 0.9 } } *OUFMMJHFOU)FBMUI"TTFTTNFOU
  8. // Automatically take endpoint offline when success rate falls below

    threshold if (!e.config.ptu && rates[j] < threshold) && e.config.testReqFn != nil && e.offline() { logger.Info("endpoint_off", "req", e.config.labels) // Background automatic recovery check go func(threshold float64, testReqFn func() *http.Request) { for i := 0; ; i++ { // Execute test request res := httptest.NewRecorder() e.proxy.ServeHTTP(res, testReqFn()) // Check if success rate has recovered rate := float64(psuccess) / float64(ptotal) if rate > threshold { e.online() // Bring back online logger.Info("endpoint_on", "req", e.config.labels) break } time.Sleep(1 * time.Second) } }(threshold, e.config.testReqFn) } "VUPNBUJD'BJMPWFS
  9. 8FJHIU$PO fi HVSBUJPO +VEHF'VODUJPO$PO fi HVSBUJPO cfg.judgeFn = func(resp *http.Response,

    ttfb time.Duration) bool { return resp.StatusCode != 429 && // Not rate limited resp.StatusCode < 500 && // Not server error ttfb < deadline // TTFB within deadline } "gpt-4o": NewDispatcher(tr, logger, true, spendFn, []endpointconfig{ aoai(600000, "openaijapaneast", "gpt-4o"), // High weight aoai(60000, "openaieastus2", "gpt-4o"), // Standard weight aoai(60000, "aideopenaiwestus", "gpt-4o"), // Standard weight }),
  10. Features Summary 1. Weighted Random Distribution: Probabilistic distribution based on

    endpoint weights 2. Real-time Health Monitoring: Intelligent assessment based on success rates and response times 3. Statistical Threshold: Dynamic health threshold calculation using standard deviation 4. Automatic Recovery: Background continuous testing of of fl ine endpoints with automatic recovery 5. Special Routing Support: Support for PTU mode and speci fi c destination routing 6. Graceful Degradation: Fallback endpoint used when all endpoints are unavailable
  11. Permission Control • Verify JWT Token • Check Model Permission

    • Check User Tokens • Cross Check Multiple Domain (Secure Region)
  12. Core Interface Design • Quota Consumption: Spend • Alert Mechanism:

    Alert • Rate Limiting: Limit • Quota Con fi guration: SetCap / GetCap • Rate Limit Con fi guration: SetLimit / GetLimit
  13. type QuotaClient interface { Spend(ctx context.Context, user string, amount int)

    (remain int, err error) Alert(ctx context.Context, user string) (cap, amount, alert int, err error) Limit(ctx context.Context, user string) (ok bool, req, cap, gap int64, err error) GetLimit(ctx context.Context, user string) (result []int, err error) SetLimit(ctx context.Context, user string, req, gap int64) (err error) SetCap(ctx context.Context, user string, amount int, ts int64) (err error) GetCap(ctx context.Context, user string) (usage, cap int64, err error) } 2VPUB$MJFOU*OUFSGBDF {user}.cap - Quota limit {user}.YYYYMMDD - Daily usage {user}.alert.YYYYMMDD - Daily alert count {user}.req - Rate limit request count {user}.gap - Rate limit time window {user}.set - Rate limit request records 3FEJT,FZ%FTJHO4USBUFHZ
  14. func (c *RedisQuotaClient) Spend(ctx context.Context, user string, amount int) (remain

    int, err error) { if user == "" { return 1, nil // Return remaining 1 for empty username } // Redis Key design k1 := fmt.Sprintf("{%s}.cap", user) // Quota limit key k2 := fmt.Sprintf("{%s}.%s", user, time.Now().Format("20060102")) // Daily usage key ret := [2]int64{0, 0} // Execute atomic operations using Redis transaction resps, err := c.client.DoMulti( ctx, c.client.B().Multi().Build(), // Begin transaction c.client.B().Incrby().Key(k1).Increment(0).Build(), // Read quota limit c.client.B().Incrby().Key(k2).Increment(int64(amount)).Build(), // Increment usage c.client.B().Expire().Key(k2).Seconds(86400*7).Build(), // Set 7-day expiry c.client.B().Exec().Build(), // Execute transaction )[4].ToArray() // Get EXEC result // Parse results for i, resp := range resps[:len(ret)] { if ret[i], err = resp.ToInt64(); err != nil { return 0, err } } // Default quota logic if ret[0] == 0 && c.capping { ret[0] = DEFAULT_CAPPING // Use default quota limit } // Calculate remaining quota if ret[0] > 0 { return int(ret[0] - ret[1]), nil // Quota limit - Used amount } return 1, nil // Always return 1 for unlimited quota } 2VPUB$POTVNQUJPO.FDIBOJTN
  15. Design Points (Spend) • Atomicity: Uses Redis MULTI/EXEC to ensure

    atomic operations • Key Naming: {user}.cap and {user}.YYYYMMDD ensure same user's keys are in the same slot in Redis Cluster • Expiration Strategy: Daily usage keys expire after 7 days to prevent unlimited accumulation • Default Quota: Uses default value when quota is 0 and capping is enabled
  16. func (c *RedisQuotaClient) Alert(ctx context.Context, user string) (cap, amount, alert

    int, err error) { date := time.Now().Format("20060102") k1 := fmt.Sprintf("{%s}.cap", user) // Quota limit k2 := fmt.Sprintf("{%s}.%s", user, date) // Daily usage k3 := fmt.Sprintf("{%s}.alert.%s", user, date) // Daily alert count ret := [3]int64{0, 0, 0} resps, err := c.client.DoMulti( ctx, c.client.B().Multi().Build(), c.client.B().Incrby().Key(k1).Increment(0).Build(), // Read quota c.client.B().Incrby().Key(k2).Increment(0).Build(), // Read usage c.client.B().Incrby().Key(k3).Increment(1).Build(), // Increment alert count c.client.B().Expire().Key(k3).Seconds(86400/2).Build(), // 12-hour expiry c.client.B().Exec().Build(), )[5].ToArray() // Parse and return results for i, resp := range resps[:len(ret)] { if ret[i], err = resp.ToInt64(); err != nil { return 0, 0, 0, err } } if ret[0] == 0 && c.capping { ret[0] = DEFAULT_CAPPING } cap, amount, alert = int(ret[0]), int(ret[1]), int(ret[2]) return } "MFSU.FDIBOJTN
  17. Alert Mechanism Features • Frequency Control: Alert counter expires in

    12 hours to avoid excessive alerts • Triple Information: Returns quota limit, usage amount, and alert count
  18. var RateScript = rueidis.NewLuaScript(` local capgap = redis.call('MGET',KEYS[1],KEYS[2]) local cnt,now,cap,gap

    = 0,tonumber(ARGV[1]),tonumber(capgap[1]),tonumber(capgap[2]) if cap == nil or gap == nil then return {true,0,0,0} -- No limit set, allow through end -- Remove expired request records redis.call('ZREMRANGEBYSCORE',KEYS[3],0,now-gap*1000) -- Count requests in current time window cnt = redis.call('ZCARD',KEYS[3]) if cnt < cap then -- Under limit, record new request redis.call('ZADD',KEYS[3],now,now) redis.call('EXPIRE',KEYS[3],gap) return {true,cnt+1,cap,gap} end -- Over limit return {false,cnt,cap,gap} `) -VB4DSJQU*NQMFNFOUJOH 4MJEJOH8JOEPX"MHPSJUIN ,FZ3FRVFTU-JNJU$PVOU ,FZ5JNFXJOEPX ,FZ3FRVFTU3FDPSE4FU
  19. Sliding Window Features • Precise Control: Uses Sorted Set to

    record exact timestamp of each request • Auto Cleanup: ZREMRANGEBYSCORE automatically removes expired requests • Flexible Window: gap parameter controls time window size •
  20. func (c *RedisQuotaClient) SetCap(ctx context.Context, user string, amount int, ts

    int64) (err error) { key := fmt.Sprintf("{%s}.cap", user) for _, resp := range c.client.DoMulti( ctx, c.client.B().Set().Key(key).Value(strconv.Itoa(amount)).Build(), // Set quota c.client.B().Expireat().Key(key).Timestamp(int64(ts)).Build(), // Set expiry time ) { if err := resp.Error(); err != nil { return err } } return nil } 4FU2VPUB-JNJU
  21. func (c *RedisQuotaClient) GetCap(ctx context.Context, user string) (usage, cap int64,

    err error) { k1 := fmt.Sprintf("{%s}.cap", user) k2 := fmt.Sprintf("{%s}.%s", user, time.Now().Format("20060102")) ret := [2]int64{0, 0} resps, err := c.client.DoMulti( ctx, c.client.B().Multi().Build(), c.client.B().Incrby().Key(k1).Increment(0).Build(), // Read quota limit c.client.B().Incrby().Key(k2).Increment(0).Build(), // Read daily usage c.client.B().Expire().Key(k2).Seconds(86400*7).Build(), c.client.B().Exec().Build(), )[4].ToArray() // Process results... return ret[1], ret[0], nil // Return (usage, quota limit) } (FU2VPUB4UBUVT
  22. Key Design Principles (Performance Optimization) • Batch Operations: Uses DoMulti

    to reduce network round trips • Atomicity: All critical operations execute within Redis transactions • Expiration Strategy: Automatically cleans up expired data to prevent memory leaks •
  23. Key Design Principles (Fault Tolerance Design) • Default Values: Uses

    environment variable defaults when con fi guration doesn't exist • Empty User Handling: Returns safe default values for empty usernames • Error Handling: Complete error handling for every Redis operation
  24. Service Integration • Azure OpenAI Service (Multi-Region) • AWS Bedrock

    (Claude Series) Integration • Google Gemini Integration • Azure Cognitive Services Integration • Mediatek Internal LLM Services
  25. return spendFn{ Pre: func(user, domain string, req *http.Request, body string)

    (estimate int, remain int, err error) { // Token calculation tokens := len(encoder.Encode(body, nil, nil)) // Special token calculation for models like GPT-4 Vision if strings.Contains(model, "gpt-4o") || strings.Contains(model, "vision") { var sb streambody if err := json.Unmarshal([]byte(body), &sb); err == nil { tokens = sb.CountToken(encoder, tokens) } } estimate = tokens * promptrate remain, err = qc.Spend(context.Background(), user, estimate) return }, Post: func(user string, resp *http.Response, body string, preestimate int) (estimate int) { // Calculate actual cost based on response if resp.Header.Get("content-type") == "text/event-stream" { estimate = strings.Count(body, "\ndata: {") * completionrate } else { var resp AoaiResp if err := json.Unmarshal([]byte(body), &resp); err == nil { estimate = completionrate*resp.Usage.CompletionTokens + promptrate*resp.Usage.PromptTokens - preestimate } } return }, } $PTU$BMDVMBUJPOBOE2VPUB.BOBHFNFOU
  26. Metrics • ttfbHistogram: Measures Time-To-First-Byte (TTFB) for API responses •

    latencyHistogram: Tracks complete request duration • successRateGauge: Monitors success rate of API calls • endpointStatusGauge: Tracks endpoint availability (1 = online, 0 = of fl ine) •
  27. Structured Log Format • Structured JSON logging with key-value pairs

    • Includes HTTP headers, status codes, and performance metrics • Tracks API versions and destinations • Records estimated costs and priority levels