Error Recovery Guide¶
This comprehensive guide helps you understand, diagnose, and resolve errors when using the Pulse SDK. It covers error categories, specific error codes, recovery strategies, and best practices for robust error handling.
Quick Reference¶
| Error Category | Severity | Typical Recovery | Auto-Retry |
|---|---|---|---|
| Network Errors | Transient | Retry with backoff | ✅ Yes |
| Authentication Errors | Permanent | Fix credentials | ❌ No |
| API Rate Limiting | Transient | Wait and retry | ✅ Yes |
| Configuration Errors | Permanent | Fix configuration | ❌ No |
| API Errors | Mixed | Depends on code | ⚠️ Selective |
Error Categories and Codes¶
Network Errors¶
Network errors are typically transient and should be retried with exponential backoff.
Connection Errors¶
Error Code: NetworkError / ConnectError
HTTP Status: N/A (Connection failed)
Severity: Transient
import httpx
from pulse.core.exceptions import NetworkError
try:
client = CoreClient.with_client_credentials()
result = client.create_embeddings(["test"])
except NetworkError as e:
print(f"Network error: {e}")
# Automatic retry handled by SDK
Common Causes: - DNS resolution failure - Network connectivity issues - Firewall blocking connections - Server temporarily unavailable
Recovery Strategies: 1. Automatic Retry (SDK handles this) - 3 attempts with exponential backoff - Delays: 0.5s, 1s, 2s 2. Manual Retry with longer delays 3. Check Network Configuration - Verify internet connectivity - Check firewall settings - Validate DNS resolution
Timeout Errors¶
Error Code: TimeoutError / ConnectTimeout / ReadTimeout / WriteTimeout
HTTP Status: N/A (Request timed out)
Severity: Transient
from pulse.core.exceptions import TimeoutError
try:
# Large dataset that might timeout
client = CoreClient.with_client_credentials()
result = client.create_embeddings(large_text_list, fast=False)
except TimeoutError as e:
print(f"Request timed out after {e.timeout}ms for {e.url}")
# Consider breaking into smaller batches
Recovery Strategies: 1. Increase Timeout
import httpx
# Custom timeout configuration
timeout = httpx.Timeout(
connect=30.0, # 30s to establish connection
read=300.0, # 5 minutes to read response
write=30.0, # 30s to write request
pool=10.0 # 10s to get connection from pool
)
client = CoreClient.with_client_credentials(timeout=timeout)
-
Batch Processing
# Break large requests into smaller chunks def process_in_batches(texts, batch_size=100): results = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] try: result = client.create_embeddings(batch) results.extend(result.embeddings) except TimeoutError: # Retry with smaller batch smaller_batches = [batch[j:j+50] for j in range(0, len(batch), 50)] for small_batch in smaller_batches: result = client.create_embeddings(small_batch) results.extend(result.embeddings) return results -
Use Fast Mode
# Use fast=True for quicker processing result = client.create_embeddings(texts, fast=True)
Authentication Errors¶
Authentication errors are typically permanent and require fixing credentials or configuration.
Invalid Credentials¶
Error Code: PulseAPIError with status 401
HTTP Status: 401 Unauthorized
Severity: Permanent
from pulse.core.exceptions import PulseAPIError
try:
client = CoreClient.with_client_credentials()
result = client.create_embeddings(["test"])
except PulseAPIError as e:
if e.status == 401:
print(f"Authentication failed: {e.message}")
if e.aws_www_authenticate:
print(f"AWS hint: {e.aws_www_authenticate}")
# Fix credentials and retry
Common Causes:
- Invalid PULSE_CLIENT_ID or PULSE_CLIENT_SECRET
- Expired or revoked credentials
- Incorrect authentication flow
- Wrong environment configuration
Recovery Strategies: 1. Verify Environment Variables
echo $PULSE_CLIENT_ID
echo $PULSE_CLIENT_SECRET
echo $PULSE_BASE_URL
-
Check Credential Validity
from pulse.debug import enable_debug enable_debug() # Enable debug logging client = CoreClient.with_client_credentials() # Check token status token_info = client.debug_auth_status() print(f"Has token: {token_info.has_token}") print(f"Is valid: {token_info.is_valid}") print(f"Is expired: {token_info.is_expired}") -
Test Different Authentication Methods
# Try explicit credentials from pulse.auth import ClientCredentialsAuth auth = ClientCredentialsAuth( client_id="your_client_id", client_secret="your_client_secret" ) client = CoreClient(auth=auth)
Token Expiry¶
Error Code: PulseAPIError with status 401 (token expired)
HTTP Status: 401 Unauthorized
Severity: Transient (auto-recoverable)
# SDK automatically handles token refresh
try:
client = CoreClient.with_client_credentials()
# Long-running process - token may expire
for batch in large_dataset:
result = client.create_embeddings(batch)
# Token refresh happens automatically if needed
except PulseAPIError as e:
if e.status == 401 and "expired" in e.message.lower():
print("Token expired - SDK should auto-refresh")
# This should rarely happen due to automatic refresh
Recovery Strategies: 1. Automatic Refresh (SDK default behavior) - SDK refreshes tokens 60 seconds before expiry - No manual intervention required
- Manual Token Refresh
# Force token refresh client._auth._refresh_token()
API Rate Limiting¶
Rate limiting errors are transient and should be handled with appropriate backoff strategies.
Rate Limit Exceeded¶
Error Code: PulseAPIError with status 429
HTTP Status: 429 Too Many Requests
Severity: Transient
import time
from pulse.core.exceptions import PulseAPIError
def make_request_with_rate_limiting(client, texts):
max_retries = 5
base_delay = 1.0
for attempt in range(max_retries):
try:
return client.create_embeddings(texts)
except PulseAPIError as e:
if e.status == 429:
# Extract retry-after header if available
retry_after = e.headers.get('retry-after')
if retry_after:
delay = float(retry_after)
else:
# Exponential backoff
delay = base_delay * (2 ** attempt)
print(f"Rate limited. Waiting {delay}s before retry {attempt + 1}/{max_retries}")
time.sleep(delay)
else:
raise
raise Exception("Max retries exceeded for rate limiting")
Recovery Strategies: 1. Respect Retry-After Header
if e.status == 429:
retry_after = e.headers.get('retry-after', '60')
time.sleep(float(retry_after))
-
Implement Exponential Backoff
def exponential_backoff(attempt, base_delay=1.0, max_delay=300.0): delay = min(base_delay * (2 ** attempt), max_delay) jitter = delay * 0.1 * random.random() # Add jitter return delay + jitter -
Reduce Request Rate
import time # Add delays between requests for batch in batches: result = client.create_embeddings(batch) time.sleep(0.1) # 100ms delay between requests
Configuration Errors¶
Configuration errors are permanent and require fixing the configuration.
Invalid Environment Configuration¶
Error Code: ValueError / ConfigurationError
HTTP Status: N/A
Severity: Permanent
try:
client = CoreClient.with_client_credentials()
except ValueError as e:
if "Client Secret is required" in str(e):
print("Missing PULSE_CLIENT_SECRET environment variable")
print("Set it with: export PULSE_CLIENT_SECRET=your_secret")
Common Configuration Issues:
-
Missing Environment Variables
# Required variables export PULSE_CLIENT_ID=your_client_id export PULSE_CLIENT_SECRET=your_client_secret # Optional variables export PULSE_BASE_URL=https://pulse.researchwiseai.com export PULSE_AUDIENCE=https://pulse.researchwiseai.com -
Wrong Environment Settings
# Check current configuration from pulse.config import BASE_URL, AUDIENCE, CLIENT_ID print(f"Base URL: {BASE_URL}") print(f"Audience: {AUDIENCE}") print(f"Client ID: {CLIENT_ID}") -
Invalid URLs or Endpoints
# Validate configuration import httpx try: response = httpx.get(BASE_URL + "/health") print(f"API endpoint reachable: {response.status_code}") except Exception as e: print(f"API endpoint unreachable: {e}")
API Errors¶
API errors can be either transient or permanent depending on the specific error code.
Client Errors (4xx)¶
Error Code: PulseAPIError with status 4xx
HTTP Status: 400-499
Severity: Permanent (usually)
from pulse.core.exceptions import PulseAPIError
try:
# Invalid request data
result = client.create_embeddings([]) # Empty list
except PulseAPIError as e:
if 400 <= e.status < 500:
print(f"Client error {e.status}: {e.message}")
# Fix the request and retry
Common 4xx Errors:
| Status | Error | Cause | Recovery |
|---|---|---|---|
| 400 | Bad Request | Invalid request format | Fix request data |
| 401 | Unauthorized | Invalid credentials | Fix authentication |
| 403 | Forbidden | Insufficient permissions | Check account permissions |
| 404 | Not Found | Invalid endpoint | Check API documentation |
| 422 | Unprocessable Entity | Invalid data format | Validate input data |
Server Errors (5xx)¶
Error Code: PulseAPIError with status 5xx
HTTP Status: 500-599
Severity: Transient (usually)
try:
result = client.create_embeddings(texts)
except PulseAPIError as e:
if 500 <= e.status < 600:
print(f"Server error {e.status}: {e.message}")
# Retry with backoff
time.sleep(5)
result = client.create_embeddings(texts)
Common 5xx Errors:
| Status | Error | Cause | Recovery |
|---|---|---|---|
| 500 | Internal Server Error | Server-side issue | Retry after delay |
| 502 | Bad Gateway | Proxy/gateway error | Retry after delay |
| 503 | Service Unavailable | Server overloaded | Retry with longer delay |
| 504 | Gateway Timeout | Upstream timeout | Retry with longer delay |
Error Severity Classification¶
Transient Errors (Retry Recommended)¶
These errors are temporary and likely to succeed on retry:
- Network connectivity issues (
NetworkError,ConnectError) - Timeout errors (
TimeoutError,ReadTimeout,WriteTimeout) - Rate limiting (
429 Too Many Requests) - Server errors (
500,502,503,504) - Token expiry (
401with expired token)
Permanent Errors (Fix Required)¶
These errors require fixing the underlying issue:
- Authentication failures (
401with invalid credentials) - Authorization failures (
403 Forbidden) - Invalid requests (
400 Bad Request,422 Unprocessable Entity) - Configuration errors (
ValueError, missing environment variables) - Resource not found (
404 Not Found)
Multi-Error Prioritization¶
When multiple errors occur, handle them in this priority order:
Priority 1: Configuration Errors¶
Fix these first as they prevent any requests from working:
def diagnose_configuration():
issues = []
# Check required environment variables
import os
required_vars = ['PULSE_CLIENT_ID', 'PULSE_CLIENT_SECRET']
for var in required_vars:
if not os.getenv(var):
issues.append(f"Missing {var} environment variable")
# Check API connectivity
try:
import httpx
response = httpx.get(BASE_URL + "/health", timeout=10)
if response.status_code != 200:
issues.append(f"API health check failed: {response.status_code}")
except Exception as e:
issues.append(f"Cannot reach API endpoint: {e}")
return issues
Priority 2: Authentication Errors¶
Fix authentication before attempting API calls:
def diagnose_authentication():
try:
client = CoreClient.with_client_credentials()
token_info = client.debug_auth_status()
if not token_info.has_token:
return "No authentication token available"
elif not token_info.is_valid:
return "Authentication token is invalid"
elif token_info.is_expired:
return "Authentication token has expired"
else:
return None # Authentication OK
except Exception as e:
return f"Authentication error: {e}"
Priority 3: Network and API Errors¶
Handle these during normal operation:
def handle_api_request(client, texts, max_retries=3):
for attempt in range(max_retries):
try:
return client.create_embeddings(texts)
except PulseAPIError as e:
if e.status == 429: # Rate limiting
time.sleep(2 ** attempt) # Exponential backoff
continue
elif 500 <= e.status < 600: # Server errors
time.sleep(5) # Fixed delay for server errors
continue
else:
raise # Permanent error, don't retry
except (NetworkError, TimeoutError):
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
raise
raise Exception("Max retries exceeded")
Comprehensive Error Handling Example¶
Here's a complete example that handles all error types:
import time
import logging
from typing import List, Any
from pulse.core.client import CoreClient
from pulse.core.exceptions import PulseAPIError, NetworkError, TimeoutError
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class RobustPulseClient:
def __init__(self, max_retries=3, base_delay=1.0):
self.max_retries = max_retries
self.base_delay = base_delay
self.client = None
self._initialize_client()
def _initialize_client(self):
"""Initialize client with error handling."""
try:
self.client = CoreClient.with_client_credentials()
logger.info("Client initialized successfully")
except ValueError as e:
if "Client Secret is required" in str(e):
logger.error("Missing PULSE_CLIENT_SECRET environment variable")
raise ConfigurationError("Set PULSE_CLIENT_SECRET environment variable")
raise
except Exception as e:
logger.error(f"Failed to initialize client: {e}")
raise
def create_embeddings_robust(self, texts: List[str], **kwargs) -> Any:
"""Create embeddings with comprehensive error handling."""
# Validate input
if not texts:
raise ValueError("Input texts cannot be empty")
if len(texts) > 1000:
logger.warning(f"Large batch size ({len(texts)}), consider splitting")
# Attempt request with retries
last_exception = None
for attempt in range(self.max_retries):
try:
logger.info(f"Attempt {attempt + 1}/{self.max_retries}")
result = self.client.create_embeddings(texts, **kwargs)
logger.info(f"Successfully processed {len(texts)} texts")
return result
except PulseAPIError as e:
last_exception = e
if e.status == 401:
# Authentication error
logger.error(f"Authentication failed: {e.message}")
if "expired" in e.message.lower():
logger.info("Token expired, attempting refresh")
self.client._auth._refresh_token()
continue
else:
raise AuthenticationError(f"Invalid credentials: {e.message}")
elif e.status == 429:
# Rate limiting
retry_after = e.headers.get('retry-after', str(self.base_delay * (2 ** attempt)))
delay = float(retry_after)
logger.warning(f"Rate limited, waiting {delay}s")
time.sleep(delay)
continue
elif 500 <= e.status < 600:
# Server error - retry
delay = self.base_delay * (2 ** attempt)
logger.warning(f"Server error {e.status}, retrying in {delay}s")
time.sleep(delay)
continue
else:
# Client error - don't retry
logger.error(f"Client error {e.status}: {e.message}")
raise ClientError(f"Request failed: {e.message}")
except (NetworkError, TimeoutError) as e:
last_exception = e
if attempt < self.max_retries - 1:
delay = self.base_delay * (2 ** attempt)
logger.warning(f"Network/timeout error, retrying in {delay}s: {e}")
time.sleep(delay)
continue
else:
logger.error(f"Network error after {self.max_retries} attempts: {e}")
raise NetworkError(f"Network error after {self.max_retries} attempts: {e}")
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise
# If we get here, all retries failed
raise Exception(f"All {self.max_retries} attempts failed. Last error: {last_exception}")
# Custom exception classes for better error handling
class ConfigurationError(Exception):
"""Raised when there's a configuration issue."""
pass
class AuthenticationError(Exception):
"""Raised when authentication fails."""
pass
class ClientError(Exception):
"""Raised for client-side errors (4xx)."""
pass
# Usage example
def main():
try:
client = RobustPulseClient(max_retries=5, base_delay=1.0)
texts = ["Hello world", "How are you?", "This is a test"]
result = client.create_embeddings_robust(texts, fast=True)
print(f"Successfully created {len(result.embeddings)} embeddings")
except ConfigurationError as e:
print(f"Configuration error: {e}")
print("Please check your environment variables and try again")
except AuthenticationError as e:
print(f"Authentication error: {e}")
print("Please check your credentials and try again")
except ClientError as e:
print(f"Client error: {e}")
print("Please check your request data and try again")
except Exception as e:
print(f"Unexpected error: {e}")
print("Please check the logs for more details")
if __name__ == "__main__":
main()
Debugging Tools¶
Use the SDK's built-in debugging tools to diagnose issues:
from pulse.debug import enable_debug, get_debug_stats
# Enable comprehensive debugging
enable_debug(
log_requests=True,
log_responses=True,
log_timing=True,
log_auth_status=True
)
# Make requests and check statistics
client = CoreClient.with_client_credentials()
result = client.create_embeddings(["test"])
# Review debug statistics
stats = get_debug_stats()
print(f"Total requests: {stats.total_requests}")
print(f"Failed requests: {stats.failed_requests}")
print(f"Success rate: {stats.success_rate:.1f}%")
print(f"Average request time: {stats.average_request_time:.3f}s")
Best Practices¶
1. Always Use Timeouts¶
import httpx
# Configure appropriate timeouts
timeout = httpx.Timeout(
connect=30.0, # Connection timeout
read=300.0, # Read timeout (5 minutes for large requests)
write=30.0, # Write timeout
pool=10.0 # Pool timeout
)
client = CoreClient.with_client_credentials(timeout=timeout)
2. Implement Circuit Breaker Pattern¶
import time
from datetime import datetime, timedelta
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time = None
self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN
def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if datetime.now() - self.last_failure_time > timedelta(seconds=self.recovery_timeout):
self.state = 'HALF_OPEN'
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise
def on_success(self):
self.failure_count = 0
self.state = 'CLOSED'
def on_failure(self):
self.failure_count += 1
self.last_failure_time = datetime.now()
if self.failure_count >= self.failure_threshold:
self.state = 'OPEN'
3. Log Errors Appropriately¶
import logging
# Configure structured logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
try:
result = client.create_embeddings(texts)
except PulseAPIError as e:
logger.error(
"API error occurred",
extra={
'status_code': e.status,
'error_code': e.code,
'message': e.message,
'request_id': e.aws_request_id,
'url': 'create_embeddings'
}
)
except Exception as e:
logger.exception("Unexpected error occurred")
4. Monitor Error Rates¶
from collections import defaultdict
import time
class ErrorMonitor:
def __init__(self, window_size=300): # 5 minute window
self.window_size = window_size
self.errors = defaultdict(list)
def record_error(self, error_type):
now = time.time()
self.errors[error_type].append(now)
# Clean old errors
cutoff = now - self.window_size
self.errors[error_type] = [t for t in self.errors[error_type] if t > cutoff]
def get_error_rate(self, error_type):
return len(self.errors[error_type]) / (self.window_size / 60) # errors per minute
def should_alert(self, error_type, threshold=5):
return self.get_error_rate(error_type) > threshold
This comprehensive error recovery guide provides you with the tools and knowledge to handle any error scenario when using the Pulse SDK. Remember to always check the error category first, then apply the appropriate recovery strategy based on whether the error is transient or permanent.