Skip to content
View omkar-rsawant's full-sized avatar
🎯
Focusing on learning new things
🎯
Focusing on learning new things

Block or report omkar-rsawant

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
omkar-rsawant/README.md

👋 Hey, I'm Omkar Sawant

AI Infrastructure Engineer · GPU/AI Infrastructure · Linux Engineer · Ansible Automation Specialist

LinkedIn Email Location


🧠 About Me

"Automate everything. Manual is the enemy of scale."

I'm an AI Infrastructure Engineer with 18+ years of hands-on experience designing and operating enterprise-grade, mission-critical compute infrastructure. My focus is GPU/AI infrastructure — KVM/QEMU GPU virtualization, NUMA-aware and low-latency tuning, SR-IOV, NCCL-validated multi-node GPU fabrics, and Kubernetes/OpenShift orchestration — backed by deep Linux, Ansible/Terraform automation, and hybrid cloud experience.

  • 🔧 Currently building AI/GPU infrastructure solutions at Ali Bin Ali Technology Solutions (ABATS)
  • 🚀 Engineered KVM/QEMU/libvirt infrastructure for production GPU/AI workloads (NUMA, hugepages, virtio/SR-IOV)
  • 🔬 Validated multi-node GPU fabric throughput with NCCL benchmarks and NUMA-aware tuning
  • 📦 Built CI/CD-driven, reproducible OS image pipelines for immutable, versioned node rollouts
  • ⚙️ Reduced task completion time by 40% through Ansible-driven automation
  • 💸 Cut infrastructure costs by 30% via strategic VM migration (Hyper-V/VMware → KVM)
  • 📉 Slashed deployment time from 120 min → 20 min using Red Hat Satellite
  • ✅ Maintained 100% SLA compliance across all delivered projects

🛠️ Tech Stack & Expertise

🤖 GPU & AI Infrastructure

NVIDIA SR-IOV NUMA NCCL Deep Learning

🐧 Linux Administration

RHEL Oracle Linux HP-UX Ubuntu CentOS

⚙️ Automation & Infrastructure-as-Code

Ansible Terraform Red Hat Satellite GitHub Actions Bash Python

☁️ Cloud, Virtualization & Orchestration

Azure AWS OCI Kubernetes OpenShift KVM QEMU VMware

📊 Monitoring & Observability

Prometheus Grafana Splunk Zabbix Nagios

🔐 Certifications

OCI Architect OCI Foundation Azure Network Engineer Azure Admin VMware vSphere ITIL


📂 Featured Repositories

🔨 What I build and share — GPU/AI infrastructure, Ansible roles, image pipelines, and infrastructure-as-code

🤖 GPU / AI Infrastructure

Repository Description Tech
🔬 gpu-nccl-benchmarking Multi-node GPU fabric deployment & NCCL throughput validation on KVM/QEMU KVM, NCCL, NUMA
kvm-gpu-passthrough GPU passthrough / vGPU setup with SR-IOV, hugepages & NUMA tuning KVM, QEMU, libvirt
🧱 k8s-hardened-node-images Minimal, CIS-hardened Debian/Garden Linux images for Kubernetes nodes dracut, systemd-boot, CI/CD
📈 gpu-host-observability Prometheus/Grafana dashboards for GPU host & AI workload monitoring Prometheus, Grafana

🔁 Ansible Roles & Playbooks

Repository Description Tech
🗂️ ansible-linux-baseline Hardening & baseline configuration for RHEL/Oracle Linux Ansible, RHEL
🔄 ansible-patch-management Automated OS patching with pre/post health checks Ansible, Satellite
📦 ansible-satellite-lifecycle Lifecycle workflows using Red Hat Satellite 6.x Ansible, Satellite
🛡️ ansible-splunk-deploy Deploy & configure Splunk forwarders, indexers, search heads Ansible, Splunk
☁️ ansible-azure-infra Provision and configure Azure resources with Ansible Ansible, Azure
🖥️ ansible-kvm-migration Automate P2V and V2V migrations to KVM/RHEV Ansible, KVM

🐧 Linux Admin Scripts

Repository Description Tech
🔍 linux-health-check Comprehensive server health audit scripts Bash, Python
📋 openshift-node-inspector OpenShift node log & health diagnostics Bash, OCP
💾 hpux-ignite-automation HP-UX Ignite backup & restore automation Bash, HP-UX
🔐 linux-security-audit CIS benchmark compliance checker for Linux Bash, Python

📈 Infrastructure Impact — By the Numbers

┌─────────────────────────────────────────────────────────────────┐
│                  GPU / AI INFRA & AUTOMATION WINS              │
├────────────────────────────────────┬────────────────────────────┤
│  GPU Fabric Throughput             │  NCCL-validated        🔬  │
│  Node Rollout (CI/CD image pipe)   │  Immutable & versioned 📦  │
│  Deployment Time (RH Satellite)    │  120 min  ──►  20 min  🚀  │
│  Task Completion Time (Ansible)    │  Reduced by 40%        ⚡  │
│  Infra Cost (KVM Migration)        │  Saved 30%             💰  │
│  SLA Compliance                    │  100%                  ✅  │
│  Operational Performance Gain      │  +30%                  📈  │
└────────────────────────────────────┴────────────────────────────┘

🔬 Core AI Infrastructure Expertise

# What I build for GPU/AI infrastructure at scale
---
omkar_ai_infra_skills:
  gpu_ai:
    - KVM/QEMU GPU virtualization (passthrough & vGPU)
    - NUMA alignment, hugepages & low-latency tuning
    - SR-IOV & virtio for GPU-intensive workloads
    - NCCL multi-node fabric benchmarking
    - AI / Deep Learning workload provisioning

  orchestration:
    - Kubernetes / OpenShift cluster operations
    - GPU node provisioning & resource management
    - LXC/LXD container management

  automation_iac:
    - Ansible configuration management at scale
    - Terraform infrastructure-as-code
    - Reproducible CI/CD OS image pipelines (cloud-init, dracut, systemd-boot)
    - Zero-touch server provisioning

  observability:
    - Prometheus / Grafana GPU host & workload dashboards
    - RCA (P1/P2), runbooks & security compliance

  integrations:
    - Red Hat Satellite 6.10 / Oracle Linux Manager
    - GitHub Actions
    - Azure / AWS / OCI
    - OpenShift / OCP clusters

🏢 Career Timeline

2007 ──► Embee Software          │  HP-UX & Windows L2 Support
2010 ──► AtoS India              │  Linux/Windows L3 Admin (7.5 yrs)
2018 ──► Vyom Labs               │  Automation & OpenShift Lead
2019 ──► KBC Technologies (QA)   │  Sr. Sysadmin @ Ooredoo Telecom
2020 ──► EBLA Consultancy        │  Sr. Sysadmin & Backup Admin
2022 ──► ABATS (Present)         │  AI Infrastructure Engineer ◄── NOW

📬 Get In Touch

I'm always open to discussions around GPU/AI infrastructure, Linux automation, Ansible best practices, hybrid cloud architecture, or infrastructure optimization.


"Infrastructure should work like clockwork — silent, reliable, and automated."

Profile Views

Popular repositories Loading

  1. omkar-rsawant omkar-rsawant Public

  2. ansible-linux-baseline ansible-linux-baseline Public

    Ansible playbook for automated baseline configuration and security hardening of RHEL and Oracle Linux servers — covering SSH hardening, sysctl tuning, firewalld, auditd, and AIDE file integrity mon…

  3. OS-Image-Lifecycle OS-Image-Lifecycle Public

    Fast VM Recovery — Infrastructure