This job is now closed
Job Description
- Req#: 324828
- Design and maintain observability frameworks, tools, and dashboards to monitor system performance, availability, and reliability.
- Implement metrics collection, logging, and tracing solutions using industry-standard tools such as Prometheus, Grafana, ELK stack, Jaeger and frameworks like Open Telemetry.
- Utilize AI and ML observability tools like WhyLabs, Uptrain, and others to ensure comprehensive coverage.
- Set up and configure alerts to proactively detect and respond to system issues.
- Work with teams to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications and services.
- Implement and manage RAG (Retrieval-Augmented Generation) assessment tools.
- Collaborate with development and operations teams to integrate observability solutions into the CI/CD pipeline.
- Ensure observability solutions are aligned with security and compliance requirements.
- Analyze and troubleshoot performance bottlenecks and system issues.
- Provide actionable insights to improve system performance and reliability.
- Utilize tools such as Logs GPT, Giskard, babyagi, and AutoGPT for enhanced observability.
- Document observability standards, practices, and procedures.
- Train and mentor team members on observability tools and best practices
- Bachelor’s or master’s degree in computer science, Artificial Intelligence, or a related field.
- At least 5 years of professional experience in AI and machine learning.
- Proven experience in developing and deploying generative AI models in a professional setting.
- Previous experience in a consumer goods company or a related industry is a plus.
- Advanced programming knowledge in Python, with a deep understanding of its libraries and frameworks.
- Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, WhyLabs, and Uptrain.
- Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.
- Experience with containerization and orchestration tools like Docker and Kubernetes (AKS).
- Expertise in Azure OpenAI and Azure OpenAI Studio.
- Strong API development skills and experience with integrating AI models into applications.
- Strong problem-solving and analytical skills.
- Excellent communication skills to collaborate effectively with cross-functional teams.
- Understanding of ethical considerations in AI, focusing on transparency and fairness.
- Knowledge of microservices architecture and RESTful APIs.
- Familiarity with DevOps practices and CI/CD pipelines on Azure.
- Bachelor’s or master’s degree in computer science, Artificial Intelligence, or a related field.
- At least 5 years of professional experience in AI and machine learning.
- Proven experience in developing and deploying generative AI models in a professional setting.
- Previous experience in a consumer goods company or a related industry is a plus.
- Advanced programming knowledge in Python, with a deep understanding of its libraries and frameworks.
- Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, WhyLabs, and Uptrain.
- Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.
- Experience with containerization and orchestration tools like Docker and Kubernetes (AKS).
- Expertise in Azure OpenAI and Azure OpenAI Studio.
- Strong API development skills and experience with integrating AI models into applications.
- Strong problem-solving and analytical skills.
- Excellent communication skills to collaborate effectively with cross-functional teams.
- Understanding of ethical considerations in AI, focusing on transparency and fairness.
- Knowledge of microservices architecture and RESTful APIs.
- Familiarity with DevOps practices and CI/CD pipelines on Azure.
- Design and maintain observability frameworks, tools, and dashboards to monitor system performance, availability, and reliability.
- Implement metrics collection, logging, and tracing solutions using industry-standard tools such as Prometheus, Grafana, ELK stack, Jaeger and frameworks like Open Telemetry.
- Utilize AI and ML observability tools like WhyLabs, Uptrain, and others to ensure comprehensive coverage.
- Set up and configure alerts to proactively detect and respond to system issues.
- Work with teams to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications and services.
- Implement and manage RAG (Retrieval-Augmented Generation) assessment tools.
- Collaborate with development and operations teams to integrate observability solutions into the CI/CD pipeline.
- Ensure observability solutions are aligned with security and compliance requirements.
- Analyze and troubleshoot performance bottlenecks and system issues.
- Provide actionable insights to improve system performance and reliability.
- Utilize tools such as Logs GPT, Giskard, babyagi, and AutoGPT for enhanced observability.
- Document observability standards, practices, and procedures.
- Train and mentor team members on observability tools and best practices
OverviewAs an Observability Engineer, you will be responsible for designing, implementing, and maintaining observability solutions that provide actionable insights into the health and performance of our key AI Platforms, Applications and infrastructure. You will work closely with development, operations, and security teams to ensure comprehensive monitoring, logging, and alerting across our technology stack.
Responsibilities
QualificationsEducation and Experience:
Required Skills and Qualifications:
Preferred Skills:
Understanding of agile methodologies and ADO.
Education and Experience:
Required Skills and Qualifications:
Preferred Skills:
Understanding of agile methodologies and ADO.
About the company
PepsiCo, Inc. is an American multinational food, snack, and beverage corporation headquartered in Harrison, New York, in the hamlet of Purchase. PepsiCo's business encompasses all aspects of the food and beverage market. It oversees the manufacturing, distribution, and marketing of its products.