Hamza Shaikh

Backend and infrastructure engineer.
I keep production systems observable,
quick to recover, and cheap to run.

In practice that's validator telemetry, Prometheus alerting, CI/CD, and the production cloud underneath them, usually with me as the only person on call.

AWARD · 2026

3rd · Superteam Germany Solana Ideathon

LIVE SYSTEM

SentinelSOL · Solana validator observability

M.SC.

Distributed Systems · RPTU Kaiserslautern

PRODUCTION

Sole DevOps owner · −25% cloud OpEx · live 3+ yrs

Frankfurt,DE · Dammam,KSA · Mumbai,IND
SIGNAL · available for infrastructure roles in Germany, Saudi Arabia, and Remote
OBSERVESelected Projects
DAEMON

SentinelSOL

live

Out-of-band observability for Solana/Jito validators.

3rd place, Superteam Germany / neosfer Ideathon Frankfurt 2026. $250 USD.

Live SystemSource Code

Go daemons collect ShredStream latency and Vote Credit velocity. PromQL + 3-sigma Z-Score against rolling baseline. Alertmanager routes to Telegram before delinquency. No sidecar, no cloud dependency.

SentinelSOL Telemetry
LIVE GRAPH
SentinelSOL telemetry dashboard showing node slot progression, vote credits, ShredStream ingestion latency, and Jito block engine bundle metrics.
Z-score anomaly windowOOBzero hot-path impact5srefresh cadence
Architecture Overview
PIPELINE
Jito-Solana RPC
Go OOB daemon
Prometheus / PromQL
Alertmanager → Telegram
Predictive

ShredStream latency and vote-credit velocity move before delinquency.

Out-of-band

Daemon polls independently, preserving validator hot-path resources.

Revenue-aware

Jito bundle acceptance becomes an operating signal, not an epoch post-mortem.

Incident Replay
NOMINALT+00:00
  1. T+00:00
    Vote Credit VelocityBASELINE

    Per-slot vote-credit accrual tracked against a 72h rolling mean. Nominal.

  2. T+00:06
    Deviation DetectedDETECT

    Accrual rate drifts below the mean; earnings slow before a single vote is missed.

  3. T+00:12
    Threshold BreachedDETECT

    Z-score crosses 3σ against the rolling baseline. Noise filtered out; this is real.

  4. T+01:12
    PromQL EvaluationDETECT

    Recording rule feeds the alert expression; it stays true across the for: 1m window.

  5. T+01:13
    Alertmanager TriggeredALERT

    Grouped, deduplicated, routed by severity. One page, no flapping.

  6. T+01:14
    Telegram NotificationALERT

    Operator paged in-channel with validator identity, the metric, and its current value.

  7. T+02:00
    Operator InvestigationRESPOND

    Check Jito block-engine bundle acceptance and peer set; correlate with the epoch boundary.

  8. T+06:30
    ResolutionRESOLVED

    Peer set reconnected before delinquency. Credit velocity re-stabilises; 0 slots lost.

1m13sdetect to page · automated~5mpage to resolve · operator0slots delinquent

Tradeoffs

  • Out-of-band monitoring over sidecar injection to avoid validator resource contention
  • 3-sigma threshold tuned empirically against 72-hour rolling baseline
  • Telegram over PagerDuty for cost and latency in crypto ops context
GoPrometheusPromQLAlertmanagerTelegram APIDocker Compose
NODE

AutoSRE

active

FastAPI service instrumented with RED metrics and threshold-driven recovery.

Live SystemSource Code

RED metrics via the Prometheus client against a 50ms p95 latency budget. Alerting thresholds drive automated recovery for stateless services. FastAPI serves the API, Docker Compose runs the stack, GitHub Actions handles CI/CD.

AutoSRE · Architecture
DATA FLOW · Metric
  1. Metric

    RED metrics on every request, scraped by the Prometheus client against a 50ms p95 budget.

  2. Alert

    The budget breaches; Alertmanager fires, grouped and deduplicated by severity.

  3. Automation

    The alert drives a recovery playbook instead of paging a human.

  4. Recovery

    The stateless service restarts under a concurrency cap and a circuit breaker.

  5. Healthy

    Liveness and readiness pass; the service rejoins rotation. No one woken.

50msp95 latency budgetREDrate · errors · durationCI/CDGitHub Actions gated

Tradeoffs

  • FastAPI over Flask for async support and OpenAPI generation
  • 50ms p95 budget chosen based on downstream service SLOs
  • Automated recovery limited to stateless services to avoid data corruption
PythonFastAPIPrometheusDockerDocker ComposeGitHub Actions
VECTOR

GridCast

shipped

LSTM-based regional temperature forecasting system.

IEEE ICSPCRE 2024 · Paper ID 652 (submitted)

Live System

28 LSTM models across 7 regions × 4 IMD seasons. 70 years of daily max temperature grids (1951–2021). 14-step input window, 7-step forecast. Optuna TPE per-subset hyperparameter search.

GridCast · Architecture
DATA FLOW · Dataset
  1. Dataset

    70 years of IMD daily-max temperature grids, 1951 to 2021, partitioned by region.

  2. Training

    28 LSTM models, one per region and IMD season, tuned with Optuna TPE per subset.

  3. Forecast

    A 14-step input window predicts a 7-day temperature horizon per cell.

  4. Output

    A Flask inference API serves a React and D3 heatwave map.

28region × season models70 yrsdaily temp grids7-dayforecast horizon

Tradeoffs

  • Region × season model partitioning over single global model for interpretability
  • 14-step input window balances temporal context vs. computational cost
  • Optuna TPE over grid search for efficient hyperparameter exploration
PythonTensorFlowKerasLSTMFlaskReact.jsOptunaGeoPandas
TRACEExperience

Patil Kaki· Shark Tank India B2C Startup

DevOps Intern, sole infrastructure owner

Jun – Aug 2023

Only engineer responsible for the production stack while the company scaled.

  • EC2 to ECS/Fargate migration with zero-downtime cutover
  • GitHub Actions gated deploys with automated rollback on health check failure
  • BullMQ/Redis async queue for order processing
  • CloudWatch + ALB observability stack
  • 25% OpEx reduction through right-sizing and reserved capacity

Stack still in production.

CMP Infotech· Microsoft Partner

Java Intern

Dec 2021 – Jan 2022

Desktop application development in Java.

  • School management system with multi-user access control
  • Java Swing UI with MySQL persistence via JDBC
ARCHIVEEducation

RPTU Kaiserslautern-Landau· Germany

M.Sc. Computer Science, Distributed Systems

Oct 2025 – 2027

Xavier Institute of Engineering· Mumbai

B.E. Information Technology, CGPA 8.69/10

2020 – 2024
IOStack & Capabilities

What I reach for, grouped by where it sits in the stack. Listed because I've shipped or operated something with it, not to pad a list.

Infrastructure

DockerKubernetesTerraformLinux

Observability

PrometheusPromQLGrafanaAlertmanager

Cloud

AWSGCPAzureOCIDigitalOcean

Languages

GoPythonBashC++

Automation

GitHub ActionsCI/CDNGINX

Storage

RedisBullMQMySQLSQLite
SIGNALCertifications
AWS Certified Cloud Practitioner2023
Postman API Fundamentals · Student Expert2023
DOSSIERCurriculum Vitae

PERSONNEL FILE

Hamza Shaikh

Infrastructure and reliability engineer. Sole DevOps owner at a Shark Tank India B2C startup, now doing a distributed systems master's at RPTU. Shipped SentinelSOL, building AutoSRE.

RoleInfrastructure · Reliability · SRE
FocusObservability, automated recovery, cloud cost
EducationM.Sc. Computer Science · RPTU Kaiserslautern
BaseKaiserslautern · Dammam · Mumbai
Open toSRE / Platform / DevOps · Europe & GCC

CAPABILITIES

Sole DevOps ownership of a production stack
Prometheus / PromQL / Alertmanager observability
AWS ECS / Fargate and Docker Compose operations
GitHub Actions CI/CD with automated rollback
Python and Go automation
Download PDF
Hamza-Shaikh-CV.pdf
VERIFIED
Loading preview...
Education

M.Sc. Computer Science

Oct 2025 – 2027

RPTU Kaiserslautern-Landau · Kaiserslautern, Germany

Major: Distributed Systems · Minor: Software Engineering

B.E. Information Technology

Aug 2020 – Jun 2024

Xavier Institute of Engineering · Mumbai, India

CGPA 8.69 / 10 (German eq. 1.6)

Recognition

3rd place · Superteam Germany / neosfer Solana Ideathon

Frankfurt 2026 · $250 · for SentinelSOL

Certifications

AWS Certified Cloud Practitioner2023
Postman API Fundamentals · Student Expert2023

Publication

GridCast · IEEE ICSPCRE 2024 (Paper ID 652), B.E. capstone (submitted)

Experience

DevOps Engineering Intern · Sole Infrastructure Owner

Jun – Aug 2023

Patil Kaki · Shark Tank India B2C startup

  • Migrated production from EC2 to ECS/Fargate; that deploy pipeline is still running 3+ years on.
  • GitHub Actions CI/CD gating main with rollback; a BullMQ-on-Redis queue for async jobs.
  • Cut cloud OpEx by 25% and ran incident response on the live stack.

Java Development Intern

Dec 2021 – Jan 2022

CMP Infotech · Microsoft Partner Program

  • Built a school management desktop app in Java Swing on a MySQL/JDBC backend.
ARCHIVEField Log

Not in the navigation. If you found this, you went looking. Field notes from the infrastructure and crypto ecosystems I've spent time in.

  1. Jun 2026 · Berlin, DE

    Solana Summit Germany

    SOLANA

    Superteam Germany's flagship. Tracks on validator economics, stablecoin rails, and Solana as infrastructure, the same ecosystem the SentinelSOL ideathon ran under.

  2. Jun 2026 · Berlin, DE

    Berlin Blockchain Week

    ETHEREUM

    Protocol-layer week (Protocol Berg, DappCon). Consensus internals, data availability, and what running a node actually costs an operator.

  3. 2026 · Berlin, DE

    Agentic AI Engineering

    AI

    Agent orchestration and evaluation. Most of the hard problems were reliability ones: retries, tool-call failure handling, observability for non-deterministic systems.

  4. 2025 · Berlin, DE

    ETH Day

    ETHEREUM

    Client diversity and validator tooling. A useful contrast to Solana's single-client reality when reasoning about failure domains.

  5. Sep 2023 · Mumbai, IN

    Solana Hacker House

    SOLANA

    RPC operations and validator sessions with Solana Labs engineers. First real look at the telemetry gaps SentinelSOL later went after.

  6. 2022 · Bengaluru, IN

    Solana Hacker House

    SOLANA

    Hands-on with the program model and the validator architecture: Anchor, accounts, and how a vote actually lands on-chain.

  7. 2022 · Mumbai, IN

    Google DevFest

    BUILD

    GDG cloud-native track. Containers, CI, and platform tooling across the Google developer stack.

  8. 2023 · Online

    Hack This Fall

    BUILD

    Shipped a working build under a 48-hour clock. Good practice for scoping hard and cutting the right corners.

  9. 2023 · Online

    Hack The League

    BUILD

    Team build, tight deadline. Reinforced that the boring parts (deploys, env parity, a health check) decide whether a demo survives.

COLOPHON // interface built as a working operator console. visual language borrowed from the Grid. end of line.

RELAYContact

Open to backend, infrastructure, platform, and SRE roles across Europe and the GCC. Fastest reach: Telegram or email.

Portfolio still in development. Thanks for stopping by!