Business Background
E-Çözüm Information Technology stands as a prominent Turkish integration solutions and digital transformation services provider, serving medium and large-scale enterprises across finance, manufacturing, and logistics sectors. The company specializes in system integration, critical infrastructure monitoring, and maintenance process digitalization, supporting numerous organizations in their digital transformation journey.
Business Challenge
E-Çözüm faced a critical operational challenge in managing and responding to high volumes of system alerts from their diverse client base. The primary issues included:
– Delayed response to critical alerts, especially during off-hours
– Manual processing of hundreds of varied-format email notifications
– Inability to effectively prioritize and filter alerts based on severity
– Need for real-time Turkish language processing and interpretation
– Lack of automated solution suggestions for technical issues
Technical Challenge
The technical team needed to develop a solution that could:
– Process and analyze alerts in Turkish language in real-time
– Accurately classify alert severity levels
– Automatically notify on-call personnel for high-priority issues
– Generate contextual solution suggestions
– Scale to handle hundreds of concurrent alerts
– Maintain low latency while ensuring high accuracy
Solution Architecture
E-Çözüm implemented the Callie solution by transitioning to a hybrid AI/ML architecture where GPU-based EC2 instances host customized NLP models for real-time inference. Amazon Bedrock and SageMaker are leveraged selectively for training, fine-tuning, and retrieval-augmented generation (RAG), while production inference runs entirely on dedicated GPU infrastructure to ensure low-latency and high-accuracy processing. The updated architecture includes:
1. Core Components:
• GPU-based EC2 instances hosting custom Turkish NLP models for real-time alert processing, severity classification, and speech synthesis/recognition.
• Analysis/Orchestration Engine acting as the orchestration and normalization layer, responsible for parsing incoming email alerts, structuring data, and coordinating calls to inference models.
• Amazon SageMaker for training and fine-tuning domain-specific models, and for embedding generation jobs.
• Amazon Bedrock for RAG with historical alert data stored in S3 and OpenSearch, enhancing contextual solution suggestions.
• Amazon OpenSearch for indexing, searching, and retrieving historical alerts and resolutions.
• ElastiCache (Redis Cluster) for caching inference results and accelerating repeated queries.
• Relational Database (Multi-AZ) to store alert history, severity scores, and resolution records.
• SNS for distributing classified alert notifications to multiple subscribers, including IVR systems and administrators.
2. Advanced Features:
• Preprocessing with orchestration: The Analysis Engine ensures that incoming unstructured email alerts are normalized and securely routed to NLP models.
• Dedicated GPU hosting guarantees low-latency inference for NLP, severity classification, and STT/TTS modules.
• Automated notification system: Severity outputs are published to SNS, enabling fan-out to multiple consumers (Genesys/Solveline IVR, SMS, dashboards, email alerts).
• Contextual enrichment via RAG: Bedrock integrates with OpenSearch and historical S3 data to provide solution suggestions aligned with past incidents.
• Security & compliance: WAF (with OWASP Top 10 rules), ACM for SSL, Secrets Manager, and KMS encryption ensure enterprise-grade security.
• Monitoring & observability: CloudWatch, CloudTrail, and log archival provide complete visibility and auditability.
3. Implementation Details:
• Inference pipeline runs entirely on GPU-based EC2 instances, isolated within private subnets.
• Analysis/Orchestration Engine sits behind an internet-facing ALB and acts as the secure entry point for email alerts.
• Severity classification results are pushed to SNS, enabling downstream actions such as IVR calls, SMS, or ticket creation.
• Model lifecycle management is handled in SageMaker, with Bedrock supporting retrieval-augmented scenarios.
• Data handling leverages S3 for training datasets, model artifacts, and log archives, while ECR hosts containerized components for scalable deployments.
Technical Implementation
The system processes alerts through multiple stages:
1. Email Scanner & Preprocessing
• Incoming alerts from the Email Monitoring Center are routed through an Application Load Balancer.
• The Analysis/Preprocessing Engine parses and normalizes emails into structured JSON.
• Alerts are securely forwarded to GPU-based NLP instances for real-time processing.
2. Severity Classification
• Custom NLP models running on GPU-based EC2 instances classify the severity of each alert.
• Outputs are published to Amazon SNS, which distributes notifications to subscribed endpoints.
• Critical alerts automatically trigger outbound IVR calls via Genesys/Solveline, ensuring immediate response.
3. Analysis Engine Orchestration
• Coordinates severity results with RAG queries via Bedrock and OpenSearch.
• Enriches alerts with contextual solution suggestions drawn from historical resolution databases.
• Caches repeated queries in Redis to minimize latency.
4. Notification & Resolution Tracking
• SNS notifies multiple downstream systems, enabling email, voice, or SMS delivery.
• Logs of all notifications and responses are stored for compliance and audit purposes.
• The relational database maintains complete alert histories and response outcomes.
Results and Benefits
The updated architecture delivered measurable improvements in both operational efficiency and business outcomes:
Operational Metrics:
• Latency reduction: Direct inference on GPU-based EC2 instances reduced alert classification latency by over 70% compared to the initial Bedrock-only deployment.
• High reliability: SNS-based fan-out ensured 100% delivery to subscribed endpoints, eliminating missed critical alerts.
• Faster response times: Average notification time decreased from 180 seconds to under 15 seconds (91.7% improvement).
• Improved accuracy: Custom-trained NLP models optimized for Turkish context achieved higher severity classification accuracy than general-purpose foundation models.
Business Impact:
• Enhanced SLA compliance through faster and more reliable incident handling.
• Improved customer satisfaction driven by rapid response and proactive resolution suggestions.
• Reduced support workload by automating parsing, classification, and contextual solution delivery.
• Future extensibility: SNS enables easy integration with new communication channels or incident management platforms.
The Callie system now supports dozens of enterprise clients, managing thousands of notifications monthly. By shifting inference to custom GPU-based NLP models while leveraging Amazon Bedrock and SageMaker for training and contextual enrichment, E-Çözüm achieved the optimal balance between performance, scalability, and flexibility.
This case study highlights how hybrid AI/ML architectures—combining custom model hosting with AWS managed services—can deliver transformative results in mission-critical operations.
