About NYSHEX
At NYSHEX, we are modernizing global shipping logistics by creating a digital infrastructure that brings together shippers and carriers through accountability, transparency, and automation. As we continue to scale, we are embracing artificial intelligence to optimize our platform reliability, accelerate deployment workflows, and intelligently respond to dynamic global conditions. We’re seeking a forward-thinking Senior DevOps Engineer who is passionate about infrastructure as code, system resilience - and leveraging AI to revolutionize DevOps productivity and observability.
Role Summary
As a Senior DevOps Engineer at NYSHEX, you’ll architect and maintain mission-critical AWS infrastructure and CI/CD pipelines-while integrating AI-powered automation, observability, and incident response tooling. You’ll work with cross-functional teams to create self-healing infrastructure, implement proactive alerting, and lead initiatives around predictive maintenance, cost optimization, and security compliance with the support of machine learning.
This is an ideal role for someone who excels in cloud automation, container orchestration, and thrives on staying ahead of the curve in applying AI to DevOps workflows.
Key Responsibilities
AI-Augmented Infrastructure Engineering
- Design and maintain scalable, secure AWS environments using Terraform, enriched with AI recommendations (e.g., auto-tuning of EC2/Container scaling via AI cost optimizers).
- Leverage AI-driven tools to detect infrastructure misconfigurations, unused resources, or potential performance bottlenecks proactively.
- Experiment with anomaly-detection ML models for predictive autoscaling and fault detection in Kubernetes clusters.
CI/CD Enhancement with GenAI
- Expand GitHub Actions pipelines with AI-generated workflows, including automated test coverage reports, changelog summaries, and incident tagging.
- Integrate AI into build pipelines to dynamically adjust deployment strategies (e.g., Canary vs. Blue-Green) based on contextual metrics and past performance.
AI-Driven Monitoring & Observability
- Implement and fine-tune monitoring stacks (Datadog, CloudWatch, Prometheus) with AI-assisted alert routing and noise reduction.
- Incorporate LLM-powered incident summaries, root cause analysis, and post-mortem auto-drafting tools into incident management workflows.
- Use AI for real-time log summarization, metric anomaly correlation, and predictive alerting.
Containerization & Orchestration
- Maintain a secure, observable Kubernetes environment, automating workload placement, cost-efficient scaling, and cluster upgrades.
- Use AI to analyze Pod health and propose optimizations for CPU/memory requests, sidecar usage, and interservice communication efficiency.
Security, Compliance & Automation
- Implement infrastructure security best practices including AI-enhanced vulnerability scanning (e.g., DeepSource, Snyk with GenAI recommendations).
- Assist with SOC2 readiness by using AI to generate and track infrastructure policy documentation, compliance drift reports, and audit responses.
Documentation, Mentorship & Knowledge Sharing
- Use GenAI tools to auto-document Terraform modules, CI/CD pipelines, and incident retrospectives.
- Mentor junior engineers on integrating AI-assisted workflows into DevOps practices.
- Lead internal workshops on AI in DevOps-covering infrastructure-as-code generation, AI security scanning, and observability best practices.
AI-Focused Tooling & Environment
Languages & Infrastructure
- Terraform, Bash, Python, Docker, Kubernetes, AWS (EKS, RDS, S3, CloudFront)
- GenAI-integrated IaC tools (e.g., Amazon Q for infrastructure suggestions, ChatGPT for module documentation)
CI/CD & Observability
- GitHub Actions + AI summaries (e.g., PR summaries, AI-labeled deployments)
- Datadog + LLM summarization for logs, CloudWatch Insights with GenAI patterns
- Sentry, Prometheus, Grafana with predictive alerting models
AI Productivity Tools
- GitHub Copilot, Amazon CodeWhisperer, Cursor, ChatGPT CLI
- Tools for auto-generating compliance documentation, pipeline configs, and runbooks
Qualifications
Required
- 7+ years in DevOps, SRE, or infrastructure engineering roles
- 2+ years managing AWS infrastructure with Terraform
- Experience with Kubernetes, CI/CD tooling (preferably GitHub Actions
- Strong scripting skills in Python and Bash
- Familiarity with AI-enabled DevOps tools (e.g., Copilot for Terraform, log summarizers)
- Experience mentoring engineers and advocating automation best practices
Preferred
- Experience integrating GenAI into infrastructure workflows (e.g., LLM incident summaries)
- Familiarity with anomaly detection in metrics/logs using ML models
- Experience securing and deploying AI endpoints (e.g., OpenAI, Bedrock) in cloud environments
- Prior exposure to policy-as-code (OPA, AWS SCP) with AI-authored rules
What We Offer
- Unlimited PTO and flexible hybrid work culture
- Health & wellness benefits including mental health and family planning
- Annual offsites for team bonding and innovation
- Continuous learning opportunities with conferences, AI workshops, and internal demos
- Cutting-edge infrastructure where you can shape the future of AI-integrated DevOps