PepsiCo

Architect- AI Observability


PayCompetitive
LocationHyderabad/Telangana
Employment typeFull-Time

This job is now closed

  • Job Description

      Req#: 324828
      Overview

      As an Observability Engineer, you will be responsible for designing, implementing, and maintaining observability solutions that provide actionable insights into the health and performance of our key AI Platforms, Applications and infrastructure. You will work closely with development, operations, and security teams to ensure comprehensive monitoring, logging, and alerting across our technology stack.


      Responsibilities

      • Design and maintain observability frameworks, tools, and dashboards to monitor system performance, availability, and reliability.
      • Implement metrics collection, logging, and tracing solutions using industry-standard tools such as Prometheus, Grafana, ELK stack, Jaeger and frameworks like Open Telemetry.
      • Utilize AI and ML observability tools like WhyLabs, Uptrain, and others to ensure comprehensive coverage.
      • Set up and configure alerts to proactively detect and respond to system issues.
      • Work with teams to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications and services.
      • Implement and manage RAG (Retrieval-Augmented Generation) assessment tools.
      • Collaborate with development and operations teams to integrate observability solutions into the CI/CD pipeline.
      • Ensure observability solutions are aligned with security and compliance requirements.
      • Analyze and troubleshoot performance bottlenecks and system issues.
      • Provide actionable insights to improve system performance and reliability.
      • Utilize tools such as Logs GPT, Giskard, babyagi, and AutoGPT for enhanced observability.
      • Document observability standards, practices, and procedures.
      • Train and mentor team members on observability tools and best practices

      Qualifications

      Education and Experience:

      • Bachelor’s or master’s degree in computer science, Artificial Intelligence, or a related field.
      • At least 5 years of professional experience in AI and machine learning.
      • Proven experience in developing and deploying generative AI models in a professional setting.
      • Previous experience in a consumer goods company or a related industry is a plus.

      Required Skills and Qualifications:

      • Advanced programming knowledge in Python, with a deep understanding of its libraries and frameworks.
      • Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, WhyLabs, and Uptrain.
      • Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.
      • Experience with containerization and orchestration tools like Docker and Kubernetes (AKS).
      • Expertise in Azure OpenAI and Azure OpenAI Studio.
      • Strong API development skills and experience with integrating AI models into applications.
      • Strong problem-solving and analytical skills.
      • Excellent communication skills to collaborate effectively with cross-functional teams.
      • Understanding of ethical considerations in AI, focusing on transparency and fairness.

      Preferred Skills:

      • Knowledge of microservices architecture and RESTful APIs.
      • Familiarity with DevOps practices and CI/CD pipelines on Azure.

      Understanding of agile methodologies and ADO.


      Education and Experience:

      • Bachelor’s or master’s degree in computer science, Artificial Intelligence, or a related field.
      • At least 5 years of professional experience in AI and machine learning.
      • Proven experience in developing and deploying generative AI models in a professional setting.
      • Previous experience in a consumer goods company or a related industry is a plus.

      Required Skills and Qualifications:

      • Advanced programming knowledge in Python, with a deep understanding of its libraries and frameworks.
      • Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, WhyLabs, and Uptrain.
      • Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.
      • Experience with containerization and orchestration tools like Docker and Kubernetes (AKS).
      • Expertise in Azure OpenAI and Azure OpenAI Studio.
      • Strong API development skills and experience with integrating AI models into applications.
      • Strong problem-solving and analytical skills.
      • Excellent communication skills to collaborate effectively with cross-functional teams.
      • Understanding of ethical considerations in AI, focusing on transparency and fairness.

      Preferred Skills:

      • Knowledge of microservices architecture and RESTful APIs.
      • Familiarity with DevOps practices and CI/CD pipelines on Azure.

      Understanding of agile methodologies and ADO.


      • Design and maintain observability frameworks, tools, and dashboards to monitor system performance, availability, and reliability.
      • Implement metrics collection, logging, and tracing solutions using industry-standard tools such as Prometheus, Grafana, ELK stack, Jaeger and frameworks like Open Telemetry.
      • Utilize AI and ML observability tools like WhyLabs, Uptrain, and others to ensure comprehensive coverage.
      • Set up and configure alerts to proactively detect and respond to system issues.
      • Work with teams to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications and services.
      • Implement and manage RAG (Retrieval-Augmented Generation) assessment tools.
      • Collaborate with development and operations teams to integrate observability solutions into the CI/CD pipeline.
      • Ensure observability solutions are aligned with security and compliance requirements.
      • Analyze and troubleshoot performance bottlenecks and system issues.
      • Provide actionable insights to improve system performance and reliability.
      • Utilize tools such as Logs GPT, Giskard, babyagi, and AutoGPT for enhanced observability.
      • Document observability standards, practices, and procedures.
      • Train and mentor team members on observability tools and best practices
  • About the company

      PepsiCo, Inc. is an American multinational food, snack, and beverage corporation headquartered in Harrison, New York, in the hamlet of Purchase. PepsiCo's business encompasses all aspects of the food and beverage market. It oversees the manufacturing, distribution, and marketing of its products.