<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Nakul Shivakumar]]></title><description><![CDATA[An Associate Technical Engineer at Kyndryl || Currently upskilling in DevOps for successful career transition]]></description><link>https://codeclouddevops.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!4g1j!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F407e9005-a2b4-4790-8bca-3a1c5b9ce57c_1280x1280.png</url><title>Nakul Shivakumar</title><link>https://codeclouddevops.substack.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 08 Jun 2026 07:22:58 GMT</lastBuildDate><atom:link href="https://codeclouddevops.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Nakul Shivakumar]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[codeclouddevops@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[codeclouddevops@substack.com]]></itunes:email><itunes:name><![CDATA[Nakul Shivakumar]]></itunes:name></itunes:owner><itunes:author><![CDATA[Nakul Shivakumar]]></itunes:author><googleplay:owner><![CDATA[codeclouddevops@substack.com]]></googleplay:owner><googleplay:email><![CDATA[codeclouddevops@substack.com]]></googleplay:email><googleplay:author><![CDATA[Nakul Shivakumar]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[I built an AI agent that answers DevOps questions across your entire stack — here’s how]]></title><description><![CDATA[Every DevOps engineer knows the feeling.Thanks for reading!]]></description><link>https://codeclouddevops.substack.com/p/i-built-an-ai-agent-that-answers</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/i-built-an-ai-agent-that-answers</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Thu, 23 Apr 2026 06:00:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ed7763db-853e-4d90-b186-44e1b670322a_2924x1328.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div><hr></div><p>Every DevOps engineer knows the feeling.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>3am. PagerDuty fires. You&#8217;re half-awake and someone&#8217;s asking &#8220;is this related to the deploy?&#8221; and you&#8217;re tabbing between Datadog, GitHub, Jira, Confluence, and Slack trying to piece together what happened.</p><p>Twenty minutes later you finally have enough context to actually start debugging.</p><p>That context-gathering is the problem I wanted to solve. Not with another dashboard. Not with another alert aggregator. With an agent you can just <em>ask</em>.</p><div><hr></div><h2>What I built</h2><p>OpsIQ is an open-source AI DevOps intelligence agent. You ask it questions in plain English &#8212; from Slack, a web UI, or your terminal &#8212; and it figures out which tools to query, calls them, and gives you one synthesised answer.</p><pre><code><code>$ opsiq "why is api latency spiking?"

  &#8594; calling github_get_deployments
  &#10003; api-service v2.4.1 &#183; deployed 14:32 UTC

  &#8594; calling observability_get_active_alerts
  &#10003; 2 critical &#183; p99_latency &gt; 2s, error_rate &gt; 5%

  &#8594; calling observability_query_logs
  &#10003; 847 errors &#183; all from /api/checkout endpoint

  api-service v2.4.1 (PR #847) introduced an unbounded
  DB query. Correlates with p99 &gt; 2s since 14:34 UTC.
  Recommend: rollback api-service immediately.

  Sources: github &#183; datadog &#183; observability_query_logs
</code></code></pre><p>That&#8217;s the entire incident triage workflow &#8212; in seconds, from one question.</p><div><hr></div><h2>The architecture decision that matters most</h2><p>The first design question was: how do I make this work for teams who don&#8217;t use Datadog?</p><p>Plenty of teams run Grafana. Others are on New Relic, Prometheus, CloudWatch, or Elastic. If I hardcoded Datadog, I&#8217;d immediately exclude most of the people this could help.</p><p>So I built a <strong>provider pattern</strong> &#8212; an abstract interface that every observability tool implements:</p><pre><code><code>class ObservabilityProvider(ABC):
    @abstractmethod
    def get_active_alerts(self, severity: str = "all") -&gt; list[dict]: ...
    
    @abstractmethod
    def query_logs(self, query: str, hours: int = 1) -&gt; list[dict]: ...
    
    @abstractmethod
    def get_service_health(self, service_name: str = None) -&gt; list[dict]: ...
    
    @abstractmethod
    def query_metric(self, metric_name: str, hours: int = 1) -&gt; list[dict]: ...
</code></code></pre><p>OpsIQ ships with six providers out of the box: <strong>Datadog, Grafana, New Relic, Prometheus, CloudWatch, and Elastic</strong>. The agent never knows which one is active &#8212; it just calls the interface.</p><p>Auto-detection is built in. Set <code>DD_API_KEY</code> in your <code>.env</code> and OpsIQ picks up Datadog. Set <code>GRAFANA_URL</code> and it switches to Grafana. No code changes, no config files &#8212; just credentials.</p><p>Adding a new provider for the community is a four-step contribution:</p><ol><li><p>Create <code>integrations/observability/providers/your_tool.py</code></p></li><li><p>Subclass <code>ObservabilityProvider</code> and implement the four methods</p></li><li><p>Add one line to the registry</p></li><li><p>Open a PR</p></li></ol><p>That&#8217;s the entire surface area. Honeycomb, Dynatrace, Azure Monitor &#8212; anyone can add them.</p><div><hr></div><h2>How the agent actually works</h2><p>The core of OpsIQ is an <strong>agentic loop</strong> built on Claude&#8217;s tool-use API:</p><pre><code><code>while tool_rounds &lt; MAX_TOOL_ROUNDS:
    response = claude.messages.create(
        model="claude-sonnet-4-20250514",
        tools=TOOLS,
        messages=messages,
    )
    
    if response.stop_reason == "tool_use":
        # Claude decided it needs more data
        # Execute the tool calls, append results, loop again
        
    elif response.stop_reason == "end_turn":
        # Claude has enough context &#8212; return the answer
        return final_text, tools_used
</code></code></pre><p>Claude decides which tools to call based on the question. For a latency spike question it might call <code>github_get_deployments</code>, then <code>observability_get_active_alerts</code>, then <code>observability_query_logs</code> &#8212; building up context across multiple rounds before synthesising the answer.</p><p>The system prompt tells Claude to:</p><ul><li><p>Call the minimum tools needed</p></li><li><p>Confirm before taking any write actions (creating tickets, sending Slack messages)</p></li><li><p>Always cite which sources it used</p></li><li><p>Be concise &#8212; DevOps engineers are busy</p></li></ul><div><hr></div><h2>The interfaces</h2><p>Three ways to talk to the agent, all backed by the same core:</p><p><strong>CLI</strong> &#8212; <code>opsiq "your question"</code> or interactive REPL mode with <code>--interactive</code></p><p><strong>Web UI</strong> &#8212; React + Tailwind, dark mode, WebSocket streaming so you see tool calls in real time as they happen</p><p><strong>Slack bot</strong> &#8212; <code>@opsiq why is checkout broken?</code> in any channel. Uses Block Kit for clean formatting. Threaded replies so conversations don&#8217;t pollute your channels.</p><div><hr></div><h2>Getting started</h2><pre><code><code>git clone https://github.com/nshivakumar1/opsiq
cd opsiq
cp .env.example .env   # add ANTHROPIC_API_KEY + your tool credentials
docker-compose up
</code></code></pre><p>Then:</p><pre><code><code>python -m interfaces.cli.main "what deployed to production today?"
</code></code></pre><p>The minimum viable setup is just <code>ANTHROPIC_API_KEY</code> + <code>GITHUB_TOKEN</code> + <code>GITHUB_REPO</code>. Every other integration is optional &#8212; OpsIQ gracefully skips any tool whose credentials aren&#8217;t present.</p><div><hr></div><h2>What&#8217;s next</h2><p>OpsIQ is at v0.1.0. The core is solid &#8212; the agent loop works, all six observability providers are implemented, and the three interfaces are live.</p><p>What I&#8217;m building next:</p><ul><li><p><strong>PagerDuty connector</strong> as a standalone alerting source (separate from observability)</p></li><li><p><strong>Prometheus + Loki</strong> as a combined provider (most self-hosted teams run both)</p></li><li><p><strong>OpsIQ Cloud</strong> &#8212; hosted version so teams can use it without running their own infrastructure</p></li><li><p><strong>Pro connectors</strong> &#8212; Splunk, Dynatrace, Honeycomb behind a license tier</p></li></ul><p>If you&#8217;re running a DevOps team and want to try it, the repo is open and the Docker setup works in under five minutes.</p><p>If you want to contribute &#8212; adding a new provider, a new integration, or improving the agent prompts &#8212; check <code>CONTRIBUTING.md</code>. There are <code>good first issue</code> labels waiting.</p><p><strong>github.com/nshivakumar1/opsiq</strong></p><div><hr></div><p><em>Nakul Shivakumar is a Subject Matter Expert-Team Lead at Kyndryl and builds under the theinfinityloop brand. He writes about DevOps, AI agents, and open source on this Substack.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[I Built a Live F1 Pit Stop Predictor on AWS — And Documented Every Mistake Along the Way]]></title><description><![CDATA[What 3 race weekends, 44 documented failures, and an AI pair programmer taught me about production ML]]></description><link>https://codeclouddevops.substack.com/p/i-built-a-live-f1-pit-stop-predictor</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/i-built-a-live-f1-pit-stop-predictor</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Tue, 07 Apr 2026 04:31:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pKBZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Lights out. Five red lights extinguish one by one.</p><p>Somewhere in the pit wall, engineers are watching tyre degradation curves, tracking gap data, calculating the optimal window. One call made too late loses a podium. One lap of hesitation loses a race.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I wanted to know if a machine learning model could see what they see.</p><p>So I built one. And ran it live, during actual races, this season.</p><p>This is the full story &#8212; the architecture, the stack, the embarrassing failures, and the unusual way I built the whole thing with an AI pair programmer that never lost context between sessions.</p><div><hr></div><h2>What It Does</h2><p>Every 60 seconds during a live race, the system predicts pit stop probability for all 22 drivers simultaneously &#8212; pulling live telemetry, scoring an XGBoost model on AWS, generating AI commentary, and pushing everything to a frontend that updates in real time.</p><p>Three race weekends live. $0.47 per weekend in AWS costs.</p><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pKBZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pKBZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 424w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 848w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pKBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png" width="1456" height="702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377583,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/193278159?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pKBZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 424w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 848w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!pKBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb889dc5-bad8-4bb3-8a5b-afbed9b63685_2936x1416.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><div><hr></div><h2>The Stack</h2><p>Let me be honest about what this actually runs on, because the architecture decisions matter as much as the model.</p><p><strong>The model</strong> is an XGBoost classifier trained on historical F1 pit stop data. AUC of 0.8854. It takes 11 features per driver: 7 raw inputs plus 4 derived features I had to compute &#8212; tyre age squared, heat degradation interaction (track temp &#215; tyre age), wet stint indicator, and absolute sector delta. Early on I forgot the derived features and sent only 7. The SageMaker endpoint returned HTTP 500 with a silent <code>Feature shape mismatch, expected: 11, got 7</code>. Cost me 40 minutes.</p><p><strong>Inference</strong> runs on AWS SageMaker Serverless &#8212; the model scales to zero between races and cold-starts in ~800ms. Perfect for a system that only needs to be alive 2 hours every other Sunday.</p><p><strong>Data</strong> comes from the OpenF1 API across 7 endpoints. The batching strategy matters here &#8212; more on that in the mistakes section.</p><p><strong>AI commentary</strong> is generated by Groq&#8217;s Llama 3.3 70B after each prediction batch. I originally planned to use Gemini. Google&#8217;s free tier returned <code>limit: 0</code> because my GCP project had billing enabled &#8212; which kills free-tier quotas &#8212; but paid-tier quota wasn&#8217;t configured either. Switched to Groq: free, no card required, 14,400 requests/day. The secret in AWS Secrets Manager is still called <code>f1-mlops/gemini-api-key</code>. I chose not to rename it to avoid touching Terraform. That choice is documented.</p><p><strong>Observability</strong> is New Relic. I started with Grafana and an ELK stack on EC2. Both are retired. New Relic replaced them. Custom <code>F1PitstopPrediction</code> events stream after every batch. CloudWatch Alarms go to Slack via AWS Chatbot. Sentry catches errors across all 5 Lambda functions.</p><p><strong>The data layer</strong> has no database. S3 flat files &#8212; JSON written per enrichment cycle, REST handler sorts by <code>LastModified</code> to find the latest. It works. It won&#8217;t survive past one season without a rethink.</p><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WeWh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WeWh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 424w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 848w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 1272w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WeWh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png" width="724" height="363.49175824175825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:346504,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/193278159?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WeWh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 424w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 848w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 1272w, https://substackcdn.com/image/fetch/$s_!WeWh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad99650d-8f03-4a82-bad6-dc0ef0733f63_2930x1472.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><div><hr></div><h2>The 44 Mistakes</h2><p>Here&#8217;s the part I actually want to talk about.</p><p>I kept a file at the root of the repository called <code>CLAUDE.md</code>. Every time something broke in a genuinely non-obvious way, I wrote it down &#8212; what happened, what the wrong assumption was, and exactly what to do differently. By the time I&#8217;m writing this, there are 44 entries.</p><p>Some are embarrassing. All of them were useful. None of them are the kind of thing tutorials warn you about.</p><p>A few that stayed with me:</p><div><hr></div><p><code>wget -q</code><strong> silently downloads HTML error pages.</strong></p><p>I used it to download the Terraform binary in AWS CodeBuild. When the URL 404&#8217;d, wget returned exit code 0, happily saved an HTML error page, and moved on. The subsequent <code>unzip</code> then failed &#8212; on what looked like a completely unrelated error. Switched to <code>curl -fsSL</code>. The <code>-f</code> flag makes it fail loudly on HTTP errors. One character that would have saved an hour.</p><div><hr></div><p><strong>OpenF1 pre-populates the full season calendar.</strong></p><p>Their API returns every session for the year, including races that haven&#8217;t happened yet. I was using <code>sessions[-1]</code> to get the latest session. During the Australian GP in March, it was returning Abu Dhabi in December. Fix: filter to <code>date_start &lt;= datetime.utcnow()</code> before picking the last entry. &#8220;Latest in the dataset&#8221; is not the same as &#8220;latest relative to now.&#8221;</p><div><hr></div><p><code>aws lambda update-function-configuration --environment</code><strong> replaces everything.</strong></p><p>I needed to update one environment variable. Passed <code>Variables={NEW_KEY=value}</code>. That replaced the entire environment map. SageMaker endpoint name, S3 bucket, Sentry DSN, Groq secret name &#8212; all wiped. I found out mid-race when every Lambda invocation started failing silently. The fix is to always read current env vars first, merge, then write back. Or better: use EventBridge target Input JSON for race-day overrides so you never touch the configuration at all.</p><div><hr></div><p><strong>Terraform&#8217;s </strong><code>templatefile()</code><strong> interprets </strong><code>%{</code><strong> as a template directive.</strong></p><p>My ELK setup script had Elasticsearch index patterns like <code>f1-ec2-metrics-%{+YYYY.MM.dd}</code> and Logstash grok patterns like <code>%{DATA:request_id}</code>. Terraform treats <code>%{</code> as the start of a <code>%{if}</code> or <code>%{for}</code> block. Every <code>.tpl</code> file with grok patterns was broken. Fix: escape with <code>%%{</code> &#8212; Terraform outputs a literal <code>%{</code> at runtime. Every single occurrence, regardless of context.</p><div><hr></div><p><strong>A live API key ended up committed to git.</strong></p><p>During a race-day debugging session, I ran an <code>aws secretsmanager put-secret-value</code> command with the New Relic license key inline. Claude Code saved that command to its allowed-commands list in <code>.claude/settings.json</code>. I committed it without checking the diff. GitGuardian flagged it within minutes of the push. Remediation: rotate the key, rewrite the commit with <code>git commit --amend</code>, force push to overwrite remote history. Not my finest hour. Entirely preventable.</p><div><hr></div><p>These aren&#8217;t edge cases. Every one of these hit me in production, under time pressure, with a race running.</p><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ai7_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ai7_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 424w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 848w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ai7_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png" width="1456" height="1251" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1251,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:385195,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/193278159?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ai7_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 424w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 848w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!Ai7_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F149eb99d-f455-46b2-9b84-a53b0da11c54_1720x1478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></blockquote><div><hr></div><h2>Building With Claude Code and Persistent Memory</h2><p>I want to talk about how this was actually built day-to-day, because it changed how I think about AI-assisted engineering in a way that wasn&#8217;t what I expected.</p><p>I used Claude Code as my primary engineering collaborator throughout &#8212; not as an autocomplete tool, but as a pair programmer with full system context. The thing that made it work wasn&#8217;t the code generation. It was <code>CLAUDE.md</code>.</p><p>Claude Code reads <code>CLAUDE.md</code> at the start of every session. I treated it as a persistent memory system. Every mistake documented. Every architectural decision explained. Every &#8220;do not do this&#8221; recorded with exactly why. The 44-entry list above lives there.</p><p>Here&#8217;s what that actually changed:</p><p>Without it, every new session starts cold. You re-explain the architecture. You re-describe the problem. You lose 20 minutes before doing any real work. And you risk repeating mistakes you already solved, because the previous session&#8217;s context is gone.</p><p>With it, Claude opens a session already knowing the system. Which Terraform version is correct (1.9.8 &#8212; I invented 1.14.0 early on). That the XGBoost model expects 11 features, not 7. That the CodeStar connection must already be AVAILABLE before Terraform touches it. That <code>wget -q</code> silently fails and <code>curl -fsSL</code> doesn&#8217;t.</p><p>On Japan GP weekend, the enrichment Lambda broke mid-race. Four separate issues: an S3 pagination bug, a New Relic timestamp bug (I was sending seconds, their API requires milliseconds), an OpenF1 rate limit issue, and live position data coming back empty due to a missing lookback window. In one session, without re-explaining anything, all four were diagnosed and fixed.</p><p>That&#8217;s what makes AI-assisted engineering genuinely powerful. Not the speed of code generation. The continuity of context.</p><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EiWO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EiWO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 424w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 848w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 1272w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EiWO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png" width="1456" height="1183" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1183,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:371167,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/193278159?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EiWO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 424w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 848w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 1272w, https://substackcdn.com/image/fetch/$s_!EiWO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b09061f-6115-4dc0-b873-4c68f0fd8aa5_1832x1488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><div><hr></div><h2>The Numbers</h2><p>Three races in production. $0.47 per race weekend. 88.5% AUC. 44 documented mistakes. 22 drivers scored every 60 seconds.</p><p>And three things I&#8217;d do differently:</p><p><strong>Replace the S3 flat-file store.</strong> Sorting JSON files by <code>LastModified</code> works for one season. For historical queries &#8212; &#8220;show me every lap where pitstop probability exceeded 70% this year&#8221; &#8212; you need Athena or DynamoDB.</p><p><strong>Ship Server-Sent Events.</strong> 30-second polling introduces up to 30 seconds of lag. When a pit window opens, that matters. SSE would cut perceived latency to near-real-time with minimal backend changes.</p><p><strong>Deploy the position model.</strong> Win probability is currently computed inline with a heuristic weighting function &#8212; position gap, tyre freshness, team strength, safety car state. A trained Random Forest model (already built, not yet deployed) would be more accurate. I chose the heuristic because the SageMaker endpoint for pitstop prediction was already covering 90% of the value.</p><blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UAAY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UAAY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 424w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 848w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 1272w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UAAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:319516,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/193278159?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UAAY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 424w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 848w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 1272w, https://substackcdn.com/image/fetch/$s_!UAAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc922a9e0-e456-4c47-ad6c-80eb7f99657d_2056x1480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><div><hr></div><h2>Refer to the Repo here &#128071;&#127995;</h2><p>The full repo is open. Terraform, Lambda functions, CI/CD pipeline, model training, all of it:</p><p><strong>&#128279; github.com/nshivakumar1/f1-mlops</strong></p><p>If any of this resonates with what your team is building, I&#8217;d genuinely love to talk.</p><div><hr></div><p><em>If you found this useful, subscribe. More production ML and infrastructure writing on the way.</em></p><p><em>And if you&#8217;ve made any of these same mistakes &#8212; or worse ones &#8212; I&#8217;d love to hear about it in the comments.</em></p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/i-built-a-live-f1-pit-stop-predictor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/i-built-a-live-f1-pit-stop-predictor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/i-built-a-live-f1-pit-stop-predictor?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Security and Compliance in DevOps: Let’s Cease to Treat Security as a Last-Minute Fire Drill]]></title><description><![CDATA[A developer&#8217;s guide to not being the reason your company makes the news]]></description><link>https://codeclouddevops.substack.com/p/security-and-compliance-in-devops</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/security-and-compliance-in-devops</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Fri, 10 Oct 2025 19:06:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4g1j!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F407e9005-a2b4-4790-8bca-3a1c5b9ce57c_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Remember that horrible feeling when you&#8217;re ready to deploy and the security team just puts down a 47-page vulnerability report in front of you? Yeah, we&#8217;ve all experienced that. That nauseating feeling when you know you may have to roll back weeks of work because someone discovered a critical vulnerability three hours prior to launch.</p><p>Here&#8217;s the thing: we&#8217;ve been securing backwards. </p><p>We did security at the end of the pipeline for years. Development teams would develop incredible features, ops would get them ready to ship, and security would swoop in like the last boss in a video game. They&#8217;d find most of the time expensive and time-consuming issues that needed to be fixed because, you know, they were baked into the foundation.</p><p>This methodology doesn&#8217;t only cause headaches. It&#8217;s actually broken in a world where we&#8217;re pushing code several times a day.</p><h3>DevSecOps: Because Security Shouldn&#8217;t Be a Surprise Party</h3><p>DevSecOps isn&#8217;t some buzzword that one of us dreamed up in a conference (although it probably was). It&#8217;s a mindset change that says, &#8220;Hey, what if we stopped making security a different department and made it everyone&#8217;s issue?&#8221;</p><p>Before you get flustered, it doesn&#8217;t involve dumping security guides onto developers. It involves embedding security in the process so naturally that secure coding is the norm, not the anomaly.</p><p>**The &#8220;Shift Left&#8221; Philosophy (And Why It&#8217;s Not About Politics)**</p><p>&#8220;Shifting left&#8221; means catching security issues early, during the planning and coding phases. Think about it like this: finding a bug in your code editor is annoying. Finding the same bug in production at 2 AM is a career-defining incident.</p><p>Here&#8217;s what this looks like in practice:</p><p>Your CI/CD pipeline executes security scans automatically. Period. SAST scanners read your code prior to compilation for known vulnerabilities. DAST scanners probe your running app like an ethical hacker. SCA tools scan whether you&#8217;re using that JavaScript library that had a critical vulnerability publicly disclosed last Tuesday.</p><p>You don&#8217;t have to be a PhD-level cybersecurity expert to join in. That&#8217;s where security champions fit in. They&#8217;re team developers who receive some additional security training and become experts your team can call on when security questions arise. They become the security translators, converting &#8220;you have a SQL injection vulnerability in your ORM queries&#8221; into &#8220;here&#8217;s how to use parameterized queries in our codebase.&#8221;</p><p></p><h3>Making Compliance Less Painful</h3><p>Compliance audits were like tax season. Rushing around to look for evidence, digging through records, trying to show you did everything you claimed to do.</p><p>With DevSecOps, compliance is ongoing. Your systems incidentally gather proof that you are complying with SOC 2, HIPAA, or GDPR. Policy-as-code ensures your rules of compliance exist in Git, get reviewed by peers just like code, and enforce themselves automatically. You have a super-amped assistant who logs everything you do.</p><h3>Container Security: Your Containers Aren&#8217;t As Secure As You Think</h3><p>Containers rock. They&#8217;re light, they&#8217;re mobile, and deployment is like magic. But with them came an entirely new class of &#8220;ways that things can go horribly wrong.&#8221;</p><p>The issue? Containers do not have their own kernel to share with the host system. If someone escapes a container, they&#8217;re not in a VM&#8212;they&#8217;re running on your actual machine. That&#8217;s the difference between a prisoner escaping to the prison yard versus the downtown area.</p><h3>Image Security: Trust Issues Are Good Actually</h3><p>Your container images are essentially ZIP files with trust issues. And those trust issues are entirely warranted.</p><p>Begin with base images from trusted sources. That arbitrary image on Docker Hub with 47 downloads? Most likely not the best choice for your production database. Use official images or images from vetted publishers.</p><p>Scan everything. Before an image hits production, run it through a vulnerability scanner. These tools unpack your image layer by layer, checking every package for known vulnerabilities. Found OpenSSL version 1.0.2 from 2018? Time to rebuild that image.</p><p>Pro tip: shrink your images. Use distroless images that hold nothing but your app and its bare essentials. Each additional utility you add is another tool a compromised attacker can use. That bash shell may be handy, but do you need it in production?</p><h3>Runtime Security: When Things Get Real</h3><p>When your containers are up and running, the rules change. You&#8217;re going to need runtime protection.</p><p>Set resource limits on each container. Without them, one rogue container can consume all your CPU or memory. This is important for security since resource exhaustion is an attack vector.</p><p>Network policies are your best friend. They&#8217;re like internal firewalls that restrict what containers can communicate with what other containers. Your frontend doesn&#8217;t have to communicate directly with your database&#8212;so don&#8217;t allow it to. This is known as microsegmentation, and what it means is that if one container is compromised, the attacker can&#8217;t simply walk their way across your entire infrastructure.</p><p>Run containers as non-root users. Make filesystems read-only where possible. Remove Linux capabilities you do not need. These are not paranoid practices&#8212;these are normal practice that makes container breakout attacks much more difficult.</p><h3>Kubernetes Security: Because Orchestration Is Complicated Enough</h3><p>If you&#8217;re running Kubernetes (and let&#8217;s be honest, chances are you are), securing the cluster itself is essential.</p><p>RBAC (Role-Based Access Control) isn&#8217;t optional. Each user and service account must only have the permissions it absolutely requires. Your CI/CD pipeline doesn&#8217;t require cluster-admin. Your developers likely don&#8217;t require deleting all pods in all namespaces (even if they mistakenly believe they do after a terrible deploy).</p><p>Admission controllers are your cluster gatekeepers. They examine each request to create or change resources and can respond with &#8220;nopes&#8221; if something breaks your policies. Want to deny containers running as root? Admission controller. Want to specify that all pods have resource limits? Admission controller.</p><p>Integration tools like kube-bench scan your cluster against security standards. Falco monitors for malicious runtime activity. Use them.</p><h3>Security as Code: Version Control for Everything</h3><p>Here&#8217;s a radical thought: what if we treated security policies like application code? Version control, automated testing, peer review, the whole nine yards.</p><p>This is security as code, and it&#8217;s revolutionary.</p><h4>Infrastructure Security That Doesn&#8217;t Make You Cry</h4><p>Remember the good old days when infrastructure configuration resided in a head or in a 2019-revisioned wiki? Infrastructure as Code (IaC) turned all that on its head. Security as code is doing the same for security policy.</p><p>Your security groups, encryption configurations, and IAM policies can be part of your Terraform modules. Everything&#8217;s versioned. Everything&#8217;s checkable. Everything&#8217;s replicable.</p><p>Policy-as-code systems such as Open Policy Agent allow you to author security policies one time and apply them everywhere&#8212;from admission control in Kubernetes to API authorization to data access. These are code. You test it. You version it. You take it seriously.</p><p>Testing Security (Yes, Really)</p><p>You test your app code, right? Test your security policies as well.</p><p>Create unit tests that ensure your policies behave as you expect. Create integration tests that test your infrastructure is secure before it goes to deploy. This is not some abstract concept&#8212;when your deployment pipeline breaks because your S3 bucket isn&#8217;t encrypted, that&#8217;s a win. You caught an issue before it reached production.</p><p>Secrets Management: End the Password Madness in Git</p><p>We need to have a chat. If you&#8217;re keeping secrets in environment variables within your code, we must intervene.</p><p>Do it properly with a secrets manager: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or something equivalent. Your app retrieves secrets at runtime. The secrets are encrypted at rest. Access is traced. Credentials can be rotated without an app redeployment.</p><p>This is not overtime work to be circumvented. This is the minimum for not getting owned.</p><p>The Culture Thing (Because Tools Alone Won&#8217;t Save You)</p><p>Here&#8217;s the painful reality: you can possess the greatest security tools on earth and still get penetrated if your culture is dysfunctional.</p><p>Security can&#8217;t be the &#8220;no&#8221; team. Security must be the team that enables everyone to move faster securely. When security adds friction, developers work around it. Human nature.</p><p>Make security training routine and hands-on. Not tedious PowerPoints on compliance law&#8212;hands-on sessions where developers get to take advantage of vulnerabilities in a controlled environment and then learn how to remediate. Capture-the-flag exercises. Game-based training. Make it fun.</p><p>Measure things that count. Remediation time for high-priority vulnerabilities. Code coverage percentage by security tests. Security gates achieved on first attempt. Leverage these metrics to optimize processes, not embarrass teams.</p><p>When something goes wrong, do blameless post-mortems. The point is to know what broke in the system so you can improve the system. Blame people and you build a culture where people conceal problems. Fix the system and you build a culture where people bring problems up early.</p><h3>The Bottom Line</h3><p>Baking security into DevOps is not about holding up development. It&#8217;s about creating systems that don&#8217;t collapse the first time someone nefarious glances at them sideways.</p><p>DevSecOps, container security, and security as code are not three distinct programs. They&#8217;re all parts of the same puzzle: making security a natural aspect of how you develop and operate software, not an afterthought.</p><p>The winners today are making decisions between security and speed, but they&#8217;ve realized that with the appropriate practices in place, security begets speed. If you catch your vulnerabilities early in development, you ship with confidence. If your security policy is automated, you don&#8217;t require three-day approval windows. If your containers are secure by default, you don&#8217;t wake up at 3 AM to a breach.</p><p>This is the future of software development. Agility isn&#8217;t the enemy of security&#8212;it&#8217;s what enables sustainable agility.</p><p>Now go ahead and shift left. Your future self (and your security team) will appreciate it.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/@codeclouddevops/note/p-175732013&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/@codeclouddevops/note/p-175732013"><span>Leave a comment</span></a></p><div class="directMessage button" data-attrs="{&quot;userId&quot;:329424988,&quot;userName&quot;:&quot;Nakul Shivakumar&quot;,&quot;canDm&quot;:null,&quot;dmUpgradeOptions&quot;:null,&quot;isEditorNode&quot;:true}" data-component-name="DirectMessageToDOM"></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[DevOps Monitoring Made Simple: A Beginner's Guide to Watching Your Apps Like a Pro]]></title><description><![CDATA[From blind deployments to bulletproof systems - Master metrics, logs, and traces with real examples!.]]></description><link>https://codeclouddevops.substack.com/p/devops-monitoring-made-simple-a-beginners</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/devops-monitoring-made-simple-a-beginners</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Fri, 13 Jun 2025 21:59:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TBRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBRl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg" width="728" height="485.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:596298,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/165893101?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TBRl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TBRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd0d0f01-915e-412e-a87c-7001c7b39f4c_3750x2500.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Picture this: You just deployed your first web application to production. It's 2 AM, and your phone starts buzzing with angry messages from users saying your app is down. You frantically check your server, but everything <em>looks</em> fine. Sound familiar? Welcome to the world of "blind" deployments &#8211; and exactly why monitoring and observability exist.</p><p>Think of monitoring your applications like being a doctor. You need to check vital signs (metrics), read medical history (logs), and sometimes do detailed scans (tracing) to understand what's really happening inside the patient's body. Let's learn how to be a good "doctor" for your applications.</p><h2>What Exactly Is This "Observability" Thing?</h2><p>Before we dive into tools, let's clear up some confusion. Monitoring and observability are related but different:</p><p><strong>Monitoring</strong> is like having a smoke detector in your house. It tells you when something's wrong (your app is down, response time is slow), but not necessarily why.</p><p><strong>Observability</strong> is like having security cameras, temperature sensors, and motion detectors throughout your house, plus the ability to review footage and understand exactly what happened and why.</p><p>The magic happens through three main approaches, which we call the "three pillars":</p><h2>Pillar 1: Metrics - Your App's Vital Signs</h2><h3>Real-World Example: The E-commerce Panic</h3><p>Let's say you run an online bookstore. Every Black Friday, your site crashes, and you lose thousands in sales. With metrics, you could have seen this coming:</p><ul><li><p><strong>Response time</strong>: Your page load time jumping from 200ms to 5 seconds</p></li><li><p><strong>Error rate</strong>: Suddenly 30% of requests failing instead of the usual 0.1%</p></li><li><p><strong>CPU usage</strong>: Your servers hitting 90% instead of normal 40%</p></li><li><p><strong>Memory usage</strong>: RAM consumption spiking to dangerous levels</p></li></ul><h3>Enter Prometheus: Your Metrics Best Friend</h3><p>Prometheus is like having a really smart assistant who constantly asks your application, "Hey, how are you doing?" and writes down all the answers in a notebook.</p><p>Here's how it works in practice:</p><p><strong>Step 1: Instrument Your App</strong></p><pre><code><code># In your Python Flask app
from prometheus_client import Counter, Histogram
import time

# Count how many books are purchased
books_sold = Counter('books_sold_total', 'Total books sold')

# Track how long each page takes to load
page_load_time = Histogram('page_load_seconds', 'Time spent loading pages')

@app.route('/buy-book')
def buy_book():
    start_time = time.time()
    
    # Your book buying logic here
    process_purchase()
    books_sold.inc()  # Increment the counter
    
    # Record how long this took
    page_load_time.observe(time.time() - start_time)
    
    return "Book purchased!"
</code></code></pre><p><strong>Step 2: Prometheus Collects the Data</strong> Every 15 seconds, Prometheus visits your app's <code>/metrics</code> endpoint and saves all these numbers. It's like taking your temperature every few minutes to track a fever.</p><h3>Grafana: Making Boring Numbers Look Awesome</h3><p>Raw metrics are like having a pile of thermometer readings. Grafana turns them into beautiful charts that actually make sense.</p><p><strong>Real Example Dashboard for Our Bookstore:</strong></p><ul><li><p>A line chart showing "Books sold per hour" (watch the Black Friday spike!)</p></li><li><p>A gauge showing current server CPU usage (red when over 80%)</p></li><li><p>A heatmap showing response times throughout the day</p></li><li><p>An alert panel that turns red when error rates exceed 5%</p></li></ul><p>The beauty is that you can see patterns. Maybe you notice that every day at 3 PM, your response times spike because that's when your batch job runs to update inventory. Now you can fix it before customers complain!</p><h3>Quick Start: Your First Metrics Setup</h3><ol><li><p><strong>Add Prometheus to your app</strong> (5 minutes)</p></li><li><p><strong>Run Prometheus server</strong> (docker run -p 9090:9090 prom/prometheus)</p></li><li><p><strong>Install Grafana</strong> (docker run -p 3000:3000 grafana/grafana)</p></li><li><p><strong>Create your first dashboard</strong> (import template #1860 for a great starter)</p></li></ol><p>Within an hour, you'll have a real-time view of your application's health!</p><h2>Pillar 2: Logging - Your App's Diary</h2><h3>The Mystery of the Midnight Crash</h3><p>Imagine your bookstore app crashes every night at midnight, but only on Tuesdays. Metrics tell you <em>when</em> it crashes, but logs tell you <em>why</em>:</p><pre><code><code>2025-06-14 23:59:45 INFO Starting weekly inventory update
2025-06-14 23:59:58 ERROR Database connection timeout after 30 seconds
2025-06-15 00:00:01 ERROR Failed to acquire lock on inventory table
2025-06-15 00:00:01 FATAL Application shutting down due to critical error
</code></code></pre><p>Aha! The weekly inventory job (which runs Tuesday nights) is causing database locks that crash your app.</p><h3>ELK Stack: The Classic Log Management Trio</h3><p>Think of the ELK stack like a smart filing system for a massive library:</p><p><strong>Elasticsearch</strong> = The filing cabinets (stores all your logs) <strong>Logstash</strong> = The librarian (organizes and files the logs) <strong>Kibana</strong> = The card catalog (helps you find what you need)</p><h3>Real-World Example: The Customer Service Hero</h3><p>Your customer service team gets a call: "I tried to buy a book at 2:30 PM but got an error message!"</p><p>With ELK, you can:</p><ol><li><p>Search Kibana for logs around 2:30 PM</p></li><li><p>Filter by error messages</p></li><li><p>Find the exact error: "Payment gateway timeout for user ID 12345"</p></li><li><p>Trace the issue to a third-party service outage</p></li></ol><p><strong>What this looks like in practice:</strong></p><pre><code><code>{
  "timestamp": "2025-06-14T14:30:15Z",
  "level": "ERROR",
  "service": "payment-service",
  "user_id": "12345",
  "message": "Gateway timeout: Stripe API not responding",
  "request_id": "abc-123-def",
  "response_time": 30000
}
</code></code></pre><h3>Loki: The Simple Alternative</h3><p>If ELK feels like building a rocket ship when you just need a bicycle, try Loki. It's like Prometheus but for logs &#8211; simpler to set up and cheaper to run.</p><p><strong>Why teams love Loki:</strong></p><ul><li><p>Uses the same query language as Prometheus (PromQL)</p></li><li><p>Integrates perfectly with Grafana (you can see logs and metrics together!)</p></li><li><p>Much cheaper to operate (doesn't index everything like Elasticsearch)</p></li></ul><p><strong>Quick example:</strong> Instead of complex Elasticsearch queries, you search logs like this:</p><pre><code><code>{service="bookstore"} |= "ERROR" | json | response_time &gt; 5000
</code></code></pre><p>Translation: "Show me all ERROR logs from the bookstore service where response time was over 5 seconds"</p><h3>Getting Started with Logging</h3><p><strong>Step 1: Structure Your Logs</strong> Instead of: <code>"Something bad happened"</code> Write: <code>{"level": "ERROR", "service": "payment", "error": "timeout", "user_id": "12345"}</code></p><p><strong>Step 2: Choose Your Tool</strong></p><ul><li><p><strong>Small team, simple needs</strong>: Start with Grafana Loki</p></li><li><p><strong>Large team, complex search needs</strong>: Go with ELK stack</p></li><li><p><strong>Don't want to manage anything</strong>: Use a cloud service like AWS CloudWatch</p></li></ul><p><strong>Step 3: Set Up Log Shipping</strong> Use tools like Filebeat or Fluent Bit to automatically send your logs from your servers to your log storage system.</p><h2>Pillar 3: Tracing - Following the Breadcrumbs</h2><h3>The Microservices Mystery</h3><p>Your bookstore has grown! Now you have separate services for:</p><ul><li><p>User authentication</p></li><li><p>Inventory management</p></li><li><p>Payment processing</p></li><li><p>Shipping calculations</p></li><li><p>Email notifications</p></li></ul><p>A customer complains: "I tried to buy a book, but it took 10 seconds!" Your metrics show the overall request was slow, but which service caused the delay?</p><h3>Enter Distributed Tracing</h3><p>Tracing is like giving each customer request a unique tracking number and following it through every service, like tracking a package through the postal system.</p><p><strong>Real Example: The 10-Second Book Purchase</strong></p><pre><code><code>Trace ID: abc-123-xyz
&#9500;&#9472; &#127978; Frontend Service (50ms)
&#9500;&#9472; &#128272; Auth Service (100ms) 
&#9500;&#9472; &#128218; Inventory Service (8.5 seconds) &#8592; Found the culprit!
&#9500;&#9472; &#128179; Payment Service (200ms)
&#9492;&#9472; &#128231; Email Service (150ms)
Total: 9 seconds
</code></code></pre><p>Now you know exactly where to look! The inventory service is taking 8.5 seconds &#8211; probably a slow database query.</p><h3>Jaeger: Your Tracing Superhero</h3><p>Jaeger (pronounced "YAY-ger") is like having a detective that follows every request through your system and draws you a map of where it went.</p><p><strong>Setting up tracing in your bookstore app:</strong></p><pre><code><code>from jaeger_client import Config
import opentracing

# Initialize Jaeger
config = Config(
    config={'sampler': {'type': 'const', 'param': 1}},
    service_name='bookstore-api'
)
tracer = config.initialize_tracer()

@app.route('/buy-book')
def buy_book():
    with tracer.start_span('buy-book-request') as span:
        span.set_tag('user.id', user_id)
        
        # Each service call gets its own span
        with tracer.start_span('check-inventory', child_of=span) as child_span:
            inventory_result = check_inventory(book_id)
            
        with tracer.start_span('process-payment', child_of=span) as child_span:
            payment_result = process_payment(amount)
            
        return "Success!"
</code></code></pre><p><strong>What you see in Jaeger UI:</strong></p><ul><li><p>A timeline showing each service call</p></li><li><p>How long each step took</p></li><li><p>Any errors that occurred</p></li><li><p>The full request flow across multiple services</p></li></ul><h3>When Tracing Saves the Day: Real Stories</h3><p><strong>Story 1: The Phantom Slowdown</strong> A team noticed their app got 2x slower after a deployment, but couldn't figure out why. Tracing revealed that a seemingly innocent change made one service call another service in a loop &#8211; what should have been 1 database call became 100!</p><p><strong>Story 2: The Cascade Failure</strong> During Black Friday, one small service started timing out. Without tracing, they would have never discovered that this caused 5 other services to retry repeatedly, creating a domino effect that brought down the entire system.</p><h2>Putting It All Together: The Complete Picture</h2><p>Here's how our three pillars work together in a real incident:</p><h3>The 3 AM Wake-Up Call</h3><p><strong>3:05 AM</strong>: Your phone buzzes. Grafana alert: "Error rate above 10%"</p><p><strong>3:06 AM</strong>: You check your metrics dashboard:</p><ul><li><p>Error rate: 15% (normally 0.5%)</p></li><li><p>Response time: 3 seconds (normally 200ms)</p></li><li><p>CPU usage: Normal</p></li><li><p>Memory usage: Normal</p></li></ul><p><strong>3:07 AM</strong>: You search your logs in Kibana:</p><pre><code><code>level:ERROR AND timestamp:[now-5m TO now]
</code></code></pre><p>You find: "Database connection pool exhausted"</p><p><strong>3:08 AM</strong>: You open Jaeger to trace a slow request: The trace shows that the inventory service is taking 2.8 seconds per database query &#8211; much slower than usual.</p><p><strong>3:10 AM</strong>: Armed with this information, you quickly restart the database connection pool. Problem solved!</p><p><strong>Total time to resolution: 5 minutes</strong> (instead of hours of guessing)</p><h2>Your Journey Starts Here: Practical First Steps</h2><h3>Week 1: Set Up Basic Metrics</h3><ol><li><p>Add Prometheus client library to your main application</p></li><li><p>Instrument basic metrics: request count, response time, error rate</p></li><li><p>Install Grafana and create your first dashboard</p></li><li><p>Set up one simple alert (like "error rate &gt; 5%")</p></li></ol><h3>Week 2: Centralize Your Logs</h3><ol><li><p>Ensure all your applications log in JSON format</p></li><li><p>Set up either Loki (easier) or ELK stack (more powerful)</p></li><li><p>Create a simple dashboard to view recent errors</p></li><li><p>Practice searching logs for specific user issues</p></li></ol><h3>Week 3: Add Basic Tracing</h3><ol><li><p>Choose Jaeger (more features) or Zipkin (simpler)</p></li><li><p>Add tracing to your main user journeys (login, purchase, etc.)</p></li><li><p>Practice following a request through your system</p></li><li><p>Identify your slowest operations</p></li></ol><h3>Common Beginner Mistakes (And How to Avoid Them)</h3><p><strong>Mistake 1: Trying to monitor everything at once</strong> <em>Fix</em>: Start with the basics &#8211; response time, error rate, and basic logs. Add more later.</p><p><strong>Mistake 2: Creating too many alerts</strong> <em>Fix</em>: Start with just 2-3 critical alerts. Alert fatigue is real!</p><p><strong>Mistake 3: Not standardizing log formats</strong> <em>Fix</em>: Pick JSON logging early and stick to it across all services.</p><p><strong>Mistake 4: Forgetting about storage costs</strong> <em>Fix</em>: Set up log retention policies from day one. You probably don't need logs older than 30 days.</p><h2>Tools Cheat Sheet for Beginners</h2><h3>Just Getting Started</h3><ul><li><p><strong>Metrics</strong>: Prometheus + Grafana</p></li><li><p><strong>Logs</strong>: Grafana Loki</p></li><li><p><strong>Tracing</strong>: Jaeger</p></li><li><p><strong>Total setup time</strong>: 1-2 days</p></li></ul><h3>Growing Team (10+ developers)</h3><ul><li><p><strong>Metrics</strong>: Prometheus + Grafana (maybe add Thanos for long-term storage)</p></li><li><p><strong>Logs</strong>: ELK Stack</p></li><li><p><strong>Tracing</strong>: Jaeger with Elasticsearch storage</p></li><li><p><strong>Total setup time</strong>: 1-2 weeks</p></li></ul><h3>Don't Want to Manage Anything</h3><ul><li><p><strong>All-in-one</strong>: DataDog, New Relic, or Dynatrace</p></li><li><p><strong>Cloud-native</strong>: AWS CloudWatch + X-Ray, Google Cloud Operations</p></li><li><p><strong>Total setup time</strong>: 1-2 hours (but ongoing monthly costs)</p></li></ul><h2>Your Observability Checklist</h2><p>Before you deploy your next application, make sure you can answer these questions:</p><ul><li><p>[ ] Can I see when my app is slow or throwing errors?</p></li><li><p>[ ] Can I search through logs to debug specific user issues?</p></li><li><p>[ ] Can I trace a request through all my services?</p></li><li><p>[ ] Will I know within 5 minutes if something breaks?</p></li><li><p>[ ] Can I understand why something broke without guessing?</p></li></ul><p>If you answered "no" to any of these, you know what to work on next!</p><h2>The Bottom Line</h2><p>Observability isn't about having the fanciest tools or the most complex setup. It's about answering three simple questions when things go wrong:</p><ol><li><p><strong>What is happening?</strong> (Metrics)</p></li><li><p><strong>Where is it happening?</strong> (Logs)</p></li><li><p><strong>Why is it happening?</strong> (Traces)</p></li></ol><p>Start simple, learn as you go, and gradually build up your observability muscles. Your future self (and your users) will thank you when you can fix issues in minutes instead of hours.</p><p>Remember: The best monitoring system is the one you actually use and understand. Start with the basics, get comfortable, then gradually add more sophisticated capabilities as your needs grow.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/@codeclouddevops/note/p-165893101&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/@codeclouddevops/note/p-165893101"><span>Leave a comment</span></a></p>]]></content:encoded></item><item><title><![CDATA[From Chaos to Symphony: Understanding Container Orchestration and Service Mesh]]></title><description><![CDATA[A beginner-friendly journey through Kubernetes, Helm, and Istio]]></description><link>https://codeclouddevops.substack.com/p/from-chaos-to-symphony-understanding</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/from-chaos-to-symphony-understanding</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Fri, 23 May 2025 06:53:27 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7b89bf6a-19a4-4608-8248-c187827f5243_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Imagine you're running a bustling restaurant. You have dozens of chefs, waiters, and kitchen staff who need to work together seamlessly to serve hundreds of customers every day. Without proper coordination, you'd have chaos&#8212;orders getting lost, ingredients running out, and customers waiting forever for their food.</p><p>This is exactly the problem that container orchestration solves in the software world. Let's dive into how modern DevOps tools turn application chaos into a well-orchestrated symphony.</p><h2>The Container Revolution: Why We Need Orchestration</h2><p>Before we talk about orchestration, let's understand what we're orchestrating. Containers are like those meal prep containers you use to portion out your weekly lunches. They package your application with everything it needs to run&#8212;code, dependencies, configuration&#8212;into a neat, portable box.</p><p>But here's the thing: running one container is easy. Running hundreds or thousands of containers across multiple servers? That's where things get complicated fast. You need something to:</p><ul><li><p>Decide where each container should run</p></li><li><p>Restart containers when they crash</p></li><li><p>Scale up when traffic increases</p></li><li><p>Handle networking between containers</p></li><li><p>Manage updates without downtime</p></li></ul><p>Enter Kubernetes&#8212;the conductor of our container orchestra.</p><h2>Kubernetes: The Master Orchestrator</h2><p>Think of Kubernetes (often called K8s) as the ultimate project manager for your containers. It's like having a super-intelligent assistant who never sleeps, constantly monitoring your applications and making sure everything runs smoothly.</p><h3>Pods: The Basic Building Blocks</h3><p>In Kubernetes, the smallest unit isn't a container&#8212;it's a <strong>pod</strong>. A pod is like a shared apartment where one or more containers live together. They share the same network and storage, kind of like roommates sharing Wi-Fi and a refrigerator.</p><p>Most of the time, you'll have one container per pod (like a studio apartment), but sometimes containers need to be so tightly coupled that they share a pod (like a married couple sharing a one-bedroom).</p><pre><code><code># A simple pod - think of it as a container's home address
apiVersion: v1
kind: Pod
metadata:
  name: my-app-pod
spec:
  containers:
  - name: my-app
    image: my-app:latest
</code></code></pre><h3>Deployments: The Scaling Solution</h3><p>Here's where Kubernetes gets really smart. A <strong>deployment</strong> is like having a factory manager who ensures you always have the right number of workers (pods) running. If one worker gets sick (pod crashes), the manager immediately hires a replacement.</p><p>Want to scale from 3 to 10 instances of your app? Just tell the deployment, and it handles the rest. Need to update your app? The deployment can do a rolling update, gradually replacing old versions with new ones so your users never notice downtime.</p><pre><code><code># A deployment - your app's production manager
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
spec:
  replicas: 3  # "I want 3 copies running at all times"
  selector:
    matchLabels:
      app: my-app
  template:
    # Pod template goes here
</code></code></pre><h3>Services: The Networking Layer</h3><p>Now, imagine your restaurant has multiple locations, but customers shouldn't need to know which specific location they're calling. They just dial the main number, and the system routes them to an available location.</p><p>That's exactly what a <strong>service</strong> does in Kubernetes. It provides a stable network endpoint for your pods, even as individual pods come and go. Your users always connect to the same address, but behind the scenes, traffic gets distributed to healthy pods.</p><pre><code><code># A service - your app's phone number
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer  # Makes it accessible from outside
</code></code></pre><h2>Helm: The Package Manager</h2><p>Managing all these Kubernetes configurations manually is like trying to assemble IKEA furniture without the instruction manual&#8212;technically possible, but unnecessarily painful.</p><p><strong>Helm</strong> is Kubernetes' package manager, like npm for Node.js or pip for Python. It lets you package your entire application (all the pods, deployments, services, and configurations) into a single "chart" that can be easily installed, upgraded, or removed.</p><p>Think of Helm charts as recipe books. Instead of remembering every ingredient and step to make your grandmother's famous lasagna, you just follow the recipe. Similarly, instead of manually creating dozens of Kubernetes files, you can install a pre-packaged application with a single command:</p><pre><code><code>helm install my-blog wordpress-chart
</code></code></pre><p>Helm also handles upgrades gracefully. Changed your mind about a configuration? Helm can roll back to the previous version faster than you can say "undo."</p><h2>Service Mesh: The Nervous System</h2><p>As your application grows, you start facing new challenges. How do different services communicate securely? How do you monitor traffic between them? How do you gradually roll out new features to only 10% of users?</p><p>This is where <strong>service mesh</strong> comes in. If Kubernetes is the skeleton of your application infrastructure, then service mesh is the nervous system&#8212;it handles all the communication between different parts.</p><h2>Istio: The Communication Expert</h2><p><strong>Istio</strong> is the most popular service mesh solution, and it works like having a personal assistant for every service in your application. Each service gets its own "sidecar proxy" (called Envoy) that handles all incoming and outgoing communication.</p><p>Imagine if every employee in a company had a personal secretary who:</p><ul><li><p>Screened all their calls and emails</p></li><li><p>Kept detailed logs of every interaction</p></li><li><p>Applied company security policies</p></li><li><p>Could reroute communications when needed</p></li></ul><p>That's essentially what Istio does for your services.</p><h3>Traffic Management: The Smart Router</h3><p>One of Istio's superpowers is <strong>traffic management</strong>. It's like having a GPS system that not only finds the best route but can also:</p><ul><li><p><strong>Split traffic</strong>: Send 90% of users to the stable version and 10% to the new beta version</p></li><li><p><strong>Retry requests</strong>: Automatically retry failed requests before giving up</p></li><li><p><strong>Circuit breaking</strong>: Stop sending requests to a failing service to prevent cascade failures</p></li><li><p><strong>Timeouts</strong>: Prevent requests from hanging forever</p></li></ul><pre><code><code># Traffic splitting - like A/B testing on steroids
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-app-vs
spec:
  http:
  - match:
    - headers:
        user-type:
          exact: beta
    route:
    - destination:
        host: my-app-v2
  - route:
    - destination:
        host: my-app-v1
      weight: 80
    - destination:
        host: my-app-v2
      weight: 20
</code></code></pre><h3>Networking and Security: The Bouncer</h3><p>Istio also acts like a sophisticated security system. It can:</p><ul><li><p><strong>Encrypt all traffic</strong> between services automatically (mTLS)</p></li><li><p><strong>Implement fine-grained access control</strong> (only the user service can talk to the database service)</p></li><li><p><strong>Monitor and log all communications</strong> for debugging and compliance</p></li></ul><p>Think of it as having a bouncer at every door who checks IDs, maintains a guest list, and keeps detailed logs of who went where and when.</p><h2>Putting It All Together: The Big Picture</h2><p>Let's trace through a real-world scenario to see how all these pieces work together:</p><ol><li><p><strong>Developer pushes code</strong> to the repository</p></li><li><p><strong>CI/CD pipeline builds</strong> a container image</p></li><li><p><strong>Helm chart deploys</strong> the application to Kubernetes</p></li><li><p><strong>Kubernetes deployment</strong> ensures the right number of pods are running</p></li><li><p><strong>Kubernetes service</strong> provides a stable endpoint for the application</p></li><li><p><strong>Istio manages traffic</strong> between different services</p></li><li><p><strong>Istio enforces security policies</strong> and monitors performance</p></li></ol><p>It's like a well-choreographed dance where each component knows its role and performs it flawlessly.</p><h2>Why Should You Care?</h2><p>You might be thinking, "This sounds incredibly complex. Why not just run everything on a single server like the old days?"</p><p>Here's the thing: modern applications need to handle millions of users, process massive amounts of data, and never go down. The old approach is like trying to feed a wedding reception with your home kitchen&#8212;it might work for a small dinner party, but it won't scale.</p><p>These tools give you:</p><ul><li><p><strong>Reliability</strong>: Your app keeps running even when individual components fail</p></li><li><p><strong>Scalability</strong>: Handle traffic spikes without breaking a sweat</p></li><li><p><strong>Security</strong>: Built-in encryption and access controls</p></li><li><p><strong>Observability</strong>: Deep insights into how your application behaves</p></li><li><p><strong>Flexibility</strong>: Easy to update, rollback, and experiment with new features</p></li></ul><h2>The Learning Journey</h2><p>If you're just starting with DevOps, don't try to learn everything at once. Here's a suggested path:</p><ol><li><p><strong>Start with containers</strong> (Docker)</p></li><li><p><strong>Learn basic Kubernetes</strong> (pods, deployments, services)</p></li><li><p><strong>Get comfortable with Helm</strong> for package management</p></li><li><p><strong>Explore service mesh concepts</strong> with Istio</p></li></ol><p>Each step builds on the previous one, like learning to walk before you run.</p><h2>Conclusion</h2><p>Container orchestration and service mesh might seem like buzzwords, but they're solving real problems that every growing application faces. They're the difference between a chaotic kitchen and a Michelin-starred restaurant&#8212;both might serve food, but only one can do it consistently, at scale, with excellence.</p><p>The beauty of these tools is that they handle the complex stuff so you can focus on what matters: building great applications that users love. And in today's world, that's not just a nice-to-have&#8212;it's essential for survival.</p><p><em>Ready to dive deeper? Start by spinning up a local Kubernetes cluster with minikube, deploy a simple app, and watch the magic happen. The future of software deployment is here, and it's more approachable than you might think.</em></p><div><hr></div><p><em>What's your experience with container orchestration? Have you tried Kubernetes or Istio in your projects? Share your thoughts in the comments below&#8212;I'd love to hear about your DevOps journey!</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/from-chaos-to-symphony-understanding?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/from-chaos-to-symphony-understanding?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/from-chaos-to-symphony-understanding?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/@codeclouddevops/note/p-164218545&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/@codeclouddevops/note/p-164218545"><span>Leave a comment</span></a></p><p></p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!4g1j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F407e9005-a2b4-4790-8bca-3a1c5b9ce57c_1280x1280.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Nakul Shivakumar in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=codeclouddevops" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Human Side of DevOps]]></title><description><![CDATA[How Git, Docker, and Cloud Platforms Make Our Lives Easier]]></description><link>https://codeclouddevops.substack.com/p/the-human-side-of-devops</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/the-human-side-of-devops</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Tue, 13 May 2025 14:15:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Human Factor of DevOps: How Git, Docker, and Cloud Platforms Simplify Our Lives</h2><p>In the constantly changing landscape of software development, three technologies are the pillars of contemporary DevOps: Git, Docker, and cloud platforms. But beneath the technical terminology and intricate architectures is a basic fact - these technologies were developed to address extremely human issues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4UpN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4UpN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 424w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 848w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 1272w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4UpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png" width="1456" height="1568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1568,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105842,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/163473604?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4UpN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 424w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 848w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 1272w, https://substackcdn.com/image/fetch/$s_!4UpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6449d433-09bf-448b-8984-6c6b8de0856d_1759x1894.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Git: The Time Machine for Code</h3><p>Do you recall the times you would email files of code forwards and backwards? Or, to be a good sport, the file name being "final_version_ACTUALLY_FINAL_v2.php"? Git addresses one of the most human problems in software development: our fallible memory and desire for collaboration.</p><h3>The Human Problem</h3><p>Humans do forget things. We are capable of errors. We have to collaborate with one another without over each other's toes. Pre-Git, developers encountered:</p><p>- The "who broke this?" enigma when code spontaneously failed</p><p>- The "which version is current?" mess when several users collaborated</p><p>- The awful realization that you've lost something important and can't recover it</p><h4>The Git Solution</h4><p>The collective memory and collaboration tool is Git. It's as if you have:</p><p>- A time machine that enables you to travel back to any point in the history of your project</p><p>- An elaborate journal that keeps track of who did what, when, and why</p><p>- Parallel universes (branches) where you can safely experiment without breaking what works</p><p>- A system that helps merge these universes together when the experiments succeed</p><h3>What Git Brings to DevOps</h3><p>For DevOps, Git represents the foundation of everything: version control that enables continuous integration and deployment, collaborative workflows, and code accountability.</p><div><hr></div><h2>Docker: Solving the "Works on My Machine" Syndrome</h2><p>Docker solves a profoundly human issue: the problem of expressing advanced environmental requirements and the frustration with inconsistency.</p><h3>The Human Problem</h3><p>One of the most biting words in development have ever been: "Well, it works on my machine!" Prior to Docker, groups wrestled with:</p><p>- Mysterious bugs arising from differences in environments</p><p>- Bringing new members on board to take days or weeks to prepare their environment</p><p>- Production environments that never precisely replicated development environments</p><p>- The "dependency hell" of conflicting library versions</p><h3>The Docker Solution</h3><p>Docker provides us with containers - light, reproducible environments that execute the same everywhere. It's similar to:</p><p>- A portable workspace that acts absolutely the same whether on your laptop or a server</p><p>- A recipe book which exactly describes each ingredient your application requires</p><p>- A shield that keeps your application from being disturbed by other applications</p><h3>What Docker Offers DevOps</h3><p>Docker revolutionizes DevOps by establishing parity between development and production environments, ensuring deployment is predictable and repeatable, and compartmentalizing services for enhanced security and scalability.</p><div><hr></div><h3>Cloud Platforms: Converting Physical Limitations to Virtual Opportunities</h3><p>AWS, Azure, and Google Cloud Platform (GCP) address arguably the most restrictive human problem: our physical limitations and desire for flexibility.</p><h4>The Human Problem</h4><p>Physical infrastructure has ever posed human constraints:</p><p>- The initial cost of buying hardware</p><p>- The delay between requiring resources and having them</p><p>- The excess of over-provisioning "just in case"</p><p>- The 3AM server room crises when something breaks</p><p>- The constraint of geographic reach</p><h4>The Cloud Solution</h4><p>Cloud platforms convert these physical constraints into software issues with virtual solutions:</p><p>- Resources at the touch of a button (or API call)</p><p>- The capacity to scale up or down as required</p><p>- Global availability without having global data centers</p><p>- Managed services to minimize operational complexity</p><p>- Pay-for-what-you-consume economic models</p><h4>What Cloud Platforms Deliver to DevOps?</h4><p>Cloud platforms bring to DevOps the adaptable infrastructure that makes possible automation, scalability, disaster recovery, and global deployment with minimal operational complexity of dealing with physical hardware.</p><div><hr></div><h4>The DevOps Symphony: How These Tools Cooperate</h4><p>These three technologies don't operate in isolation &#8211; they create a harmonious workflow that's more than the sum of its parts.</p><h4>The Development Lifecycle</h4><h5>1. Code Creation &amp; Collaboration (Git)</h5><p>   - Code is written by developers and committed</p><p>   - Features are built concurrently on distinct branches</p><p>   - Code is reviewed and merged into the main branch</p><h5>2. Constant Building &amp; Testing (Docker)</h5><p>- Docker containers provide consistent environments for testing</p><p>   - The same container runs on developer machines and CI/CD pipelines</p><p>   - Tests verify behavior in production-like environments</p><h5>3. Deployment &amp; Scaling (Cloud Platforms)</h5><p>   - Containers are deployed to cloud infrastructure</p><p>   - Auto-scaling adjusts resources based on demand</p><p>   - Managed services handle underlying infrastructure</p><h4>The Human Benefits</h4><p>This unified strategy addresses deep human problems:</p><p>- Less Cognitive Load: Coders concentrate on code, not environment configuration</p><p>- Conflict-Free Collaboration: Teams collaborate without disrupting one another's work</p><p>- Trust in Changes : Small, traceable changes minimize risk and fear</p><p>- Work-Life Balance : Automation eliminates late-night emergency calls</p><p>- Value Focus : Less time struggling with infrastructure leaves more time for value creation</p><h4>The Future: Empowering Humans, Not Replacing Them</h4><p>Despite fears about automation replacing jobs, the true purpose of Git, Docker, and cloud platforms is to enhance human capabilities &#8211; making us more effective, reducing tedious work, and letting us focus on creative problem-solving.</p><p>These tools don't eliminate the need for human insight and creativity; they amplify it by removing friction and repetitive tasks. They free us to do what humans do best: innovate, create, and solve complex problems.</p><div><hr></div><h4><strong>Getting Started: Your DevOps Journey</strong></h4><p>If you're new to your DevOps journey:</p><p>1. <strong>Begin with Git:</strong> Learn the basics of version control and collaborative practices</p><p>2. <strong>Introduce Docker:</strong> Learn to containerize software and see the value of isolation</p><p>3. <strong>Investigate Cloud Platforms:</strong> Begin with one platform (AWS, Azure, or GCP) and familiarize yourself with its key services</p><p>4. <strong>Integrate :</strong> Create CI/CD pipelines that integrate these technologies.</p><p>5 <strong>Nextwork-</strong>  <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;NextWork&quot;,&quot;id&quot;:281575054,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65f000df-c61b-47cd-8e1e-cf6b3292a865_500x500.png&quot;,&quot;uuid&quot;:&quot;290bcace-6249-468c-b3ae-cada5b871427&quot;}" data-component-name="MentionToDOM"></span> showcases step-by step amazing AWS Projects, they also have the <a href="https://learn.nextwork.org/projects/aws-devops-cicd">Nextwork - 7 Days DevOps Challenge Series</a> for applied learning and documentation of the learning, Check out Now!!</p><p>Remember also that DevOps is just as much about collaboration and culture as it is technology. These tools facilitate DevOps practices, but in the end, it is the human behavior in utilizing them that develops into real DevOps culture.</p><h1>Conclusion</h1><p>Git, Docker, and cloud platforms have changed how we develop and deploy software &#8211; not by adding bewildering complexity, but by addressing rather human challenges in beautiful ways. They empower us to develop and deploy more effectively, with certainty, and at any size.</p><p>Ultimately, the greatest technology is the one that recedes into the background, allowing us to concentrate on adding value and not wrestling with tools. That's precisely what these three pillars of contemporary DevOps accomplish for us.</p><p>The next time you commit, build a container, or deploy to the cloud, take a moment to realize how these tools have made what were once painful, error-prone operations into silky-smooth, reliable processes &#8211; making our lives as developers that much better in the process.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/the-human-side-of-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/the-human-side-of-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/the-human-side-of-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[Infrastructure as Code: A Key Part of Modern DevOps]]></title><description><![CDATA[Automating Infrastructure Management for Faster, More Reliable Software Delivery.]]></description><link>https://codeclouddevops.substack.com/p/infrastructure-as-code-a-key-part</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/infrastructure-as-code-a-key-part</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Wed, 30 Apr 2025 20:25:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pSTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Infrastructure as Code (IaC) is transforming how companies handle their IT systems and is a crucial part of DevOps today. Instead of setting up servers, networks, and databases by hand, IaC uses code to automate these tasks.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pSTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pSTw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pSTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1668025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/162545806?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pSTw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!pSTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F555ed7ef-4ccb-4b49-9679-7a473fd14a02_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Understanding Infrastructure as Code</h2><p>IaC approaches infrastructure management like software programming. Rather than manually configuring settings or entering commands, everything is defined in code files. These files can be saved, tested, and automatically set up without repeating manual processes each time.</p><p>Think of it like this: instead of setting up each server individually, you write code that specifies "I need 5 web servers configured like this, 3 database servers set up like that, and load balancing handled in this way." With a single command, the environment is created consistently every time.</p><h2>Why IaC is Important in DevOps</h2><h4> Automation Saves Time</h4><p>Automation is a major advantage. Tasks that once required days or weeks now take only minutes or hours. By encoding the infrastructure setup, teams eliminate repetitive manual work, freeing up time for innovation.</p><h4>Consistency Across Systems</h4><p>IaC resolves the "it works on my machine" problem by ensuring that development, testing, and production environments are identical. When staging matches production, many deployment issues are avoided.</p><h4>Control and Track Infrastructure Changes</h4><p>By saving infrastructure code in systems like Git, teams gain several benefits:</p><p>- A complete history of infrastructure changes</p><p>- The ability to revert to previous versions</p><p>- Team collaboration tools like code reviews for changes</p><p>- Accountability through change logs</p><h4>Accelerated Development and Deployment</h4><p>IaC speeds up the setup process, allowing for more frequent updates and greater confidence in deployments. This aligns with DevOps goals of quick delivery of new features.</p><h4>Fewer Mistakes and Increased Reliability</h4><p>Manual configuration errors are a leading cause of outages and security breaches. By automating setup, IaC reduces these risks and enhances system stability.</p><h4> Cost Savings and Efficient Scaling</h4><p>IaC helps companies optimize resource usage, scale systems easily, and avoid unnecessary setup. This results in cost savings, especially in cloud environments where charges are based on usage.</p><h4>Enhanced Security Measures</h4><p>Security policies can be embedded in infrastructure code, ensuring consistent enforcement. This approach, known as "security as code," helps maintain compliance and reduces vulnerabilities.</p><h4>Two Main Approaches to IaC</h4><p>There are two primary methods for implementing IaC:</p><p>Declarative Approach: Define the desired end state of infrastructure, and the IaC tool determines how to achieve it. Tools like Terraform, AWS CloudFormation, and Azure Resource Manager use this approach.</p><p>Imperative Approach: Provide specific steps to set up the infrastructure. Ansible and Chef are examples of tools that follow this method.</p><h4>Popular IaC Tools</h4><p>- Terraform: Offers multi-cloud support with HCL language.</p><p>- Ansible: Operates without agents, using YAML playbooks.</p><p>- AWS CloudFormation: An AWS service using JSON/YAML templates.</p><p>- Azure Resource Manager: Azure's tool working with JSON templates.</p><p>- Pulumi: Utilizes familiar programming languages for IaC.</p><p>- Chef and Puppet: Use Ruby-based descriptions for configuration management.</p><h4>Best Practices for IaC</h4><p>1. Adopt a Modular Design: Break down infrastructure into reusable parts.</p><p>2. Ensure Idempotency: Running the same code repeatedly should deliver the same outcome.</p><p>3. Use Version Control: Track all changes to infrastructure code.</p><p>4. Implement CI/CD for Infrastructure: Automate testing and deployment of changes.</p><p>5. Monitor for Configuration Drift: Identify when the actual setup diverges from the code.</p><h3>Closing Thoughts</h3><p>For companies aiming for efficient and secure software delivery, Infrastructure as Code is essential. By applying software development principles to infrastructure management, IaC addresses significant DevOps challenges, enabling faster, more consistent, and confident application deployments.</p><p>As digital transformation continues, IaC will remain a critical component of DevOps, integrating with new technologies to enhance automation.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/infrastructure-as-code-a-key-part?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/infrastructure-as-code-a-key-part?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/infrastructure-as-code-a-key-part?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Basic Foundations for DevOps]]></title><description><![CDATA[Lets get the basics right!]]></description><link>https://codeclouddevops.substack.com/p/basic-foundations-for-devops</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/basic-foundations-for-devops</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Tue, 29 Apr 2025 06:40:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xad7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xad7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xad7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xad7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png" width="728" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:600,&quot;width&quot;:800,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:44796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/162278572?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xad7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Xad7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68c2596-fcb3-46bf-b390-1e5889da8189_800x600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The DevOps Trinity: Linux, Networking, and Programming </h1><p>Beginning a DevOps journey? Then take this as your final briefing. While the DevOps world is vast and ever-evolving, there are three essential pillars at its center: Linux, Networking, and Programming. Mastering these is not only useful; it is the very foundation upon which your DevOps success will be built. Let's see why each one is so important.</p><h2>Linux: The Unsung DevOps Infrastructure Hero</h2><p>Visualize Linux as the behind-the-scenes quiet majority that makes the digital world go round. It's not the flashy desktop you'd see on a home computer, but the firm, sturdy engine that powers the vast majority of what you see today. Consider the following persuasive facts: over 96% of the world's top web servers are powered by Linux. Each and every one of the major cloud providers &#8211; AWS, Google Cloud, and even Microsoft Azure &#8211; rely on its numerous distributions extensively. And the very technologies that power modern DevOps, like Docker and Kubernetes, were developed with Linux as their basis.</p><p>This near-universal availability is not an accident. Linux's open-source model leads to a vast base of developers working on its stability and security all the time. Its modularity allows it to utilize minimal resources, making it ideal for optimized server environments. Linux is utilized in nearly all aspects of your work as a DevOps engineer, from cloud instances to container environments.</p><h2>The Power of the Command Line: Where Automation Takes Flight</h2><p>While graphical user interfaces (GUIs) have their place, the real magic of DevOps automation is revealed in the command line interface (CLI). Mastery here is not a specialized skill; it's a basic requirement for a number of reasons:</p><p>Scale: One cannot envision operating hundreds, and potentially thousands, of servers. Navigating through individual interfaces is just not feasible. Automation via the CLI is the sole method of being able to effectively manage such scale.</p><p>Access: The majority of cloud servers are headless (i.e., they have no graphical user interface). Secure shell (SSH), a command-line tool, will be your initial point of contact.</p><p>Efficiency: CLI experts can accomplish complex tasks with extremely concise commands in less time than would be required using a GUI. Efficiency and speed are essential in high-speed DevOps environments.</p><p>Integration: The large DevOps toolset is designed to work in harmony seamlessly through command-line interfaces and allow you to chain together complex automated workflows.</p><h2>How Linux Knowledge Supports Fundamental DevOps Principles</h2><p>Knowing how Linux functions is a basic mindset to aid you in understanding basic DevOps concepts:</p><p>Process Management: Linux process management is as similar as it gets to the way container technologies such as Docker manage standalone applications. Process signal and state know-how directly translates to managing containers.</p><p>Namespaces and Cgroups: These two basic Linux kernel features are the very building blocks that enable containerization, enabling isolation and resource control for containers.</p><p>File System Understanding: Understanding how to work with the Linux file system, permissions, and management of file operations is essential for working with container images and debugging application deployments.</p><p>Bash Scripting: Bash scripting plays a critical part in automating the routine administrative tasks, from the application deployment to managing the system settings. This is what makes the basic DevOps automation possible.</p><p>Troubleshooting Utilities: Linux provides a great variety of command-line utilities to scan logs, monitor processes, and troubleshoot network issues. These are critical skills to identify and correct issues in your infrastructure in a timely fashion.</p><h2>Your Beginning: A Guided Path for Learning Linux</h2><p>To start your journey into the world of must-know Linux for DevOps, use this focused learning path:</p><p>Set up your environment: Get some hands-on experience by installing a popular Linux distribution like Ubuntu or CentOS on a virtual machine on your local machine. Or, use the free tier offered by cloud providers like AWS to host a Linux instance.</p><p>Master the fundamentals: Don't ignore the basics. Learn to deal with the file system, learn to deal with users and permissions, and how package management systems (such as apt on Ubuntu or yum/dnf on CentOS) operate.</p><p>Get DevOps-specific training: Once you are comfortable with the fundamentals, learn skills that are directly applicable to DevOps, such as use of SSH to remotely connect to servers, how web servers such as Nginx or Apache are installed on Linux, system and app log management, and use of basic monitoring tools.</p><p>Practice with projects: The best way to ground your learning is by putting it into practice. Try to set up a basic LAMP (Linux, Apache, MySQL, PHP) installation, write Bash scripts to automate mundane tasks, or set up basic system monitoring.</p><h1>Networking: The Invisible Backbone of DevOps</h1><p>Beyond the operating system, the ability of systems to communicate is the most critical facet of distributed DevOps environments. Networking is the typically behind-the-scenes infrastructure that makes it possible, and having a solid understanding of its basics is necessary.</p><p>Just like a physical structure is based on an infrastructure of utility and roads, your infrastructure and applications are based on a properly configured digital network. In the cloud, it appears as Virtual Private Clouds (VPCs), your own network domains. Within those VPCs, you will be working with subnets to structure resources, route tables to control traffic flow, and security groups and network ACLs to function as virtual firewalls, regulating incoming and outgoing access.</p><p>Even in on-prem or hybrid environments, the layer that facilitates networking is the one that gets servers, databases, and other pieces of equipment to communicate. When you are automating the deployment of such infrastructure with Infrastructure as Code tools such as Terraform or CloudFormation, one part of your automated deployment is defining and managing network configuration.</p><p>Applications now consist of many microservices, each executing in its own container or instance. Networking gives the "language" through which these different pieces of the puzzle can speak to one another seamlessly. This includes knowledge of IP addresses and ports, the unique keys and entry points of applications. You'll also have to know protocols such as TCP and UDP, the rules for reliable and quick data transfer. Service discovery mechanisms enable applications to find and dynamically connect with one another, similar to an up-to-date electronic phone book. Load balancers are traffic controllers, directing user requests across several instances to ensure application availability and performance. Security in DevOps is not an afterthought; it's woven into the fabric of the entire lifecycle. Networking is top priority here. Firewalls and security groups are your initial defense, controlling network access based on predefined rules. VPNs (Virtual Private Networks) offer secure distant access, and network segmentation separates sensitive parts of your infrastructure in order to limit the blast radius of possible security incidents. Regular network monitoring is also necessary to detect malicious activity and performance bottlenecks.</p><p>Automation, one of the main tenets of DevOps, also reaches network automation. Network configuration and network device and service management are automated through network automation scripts and software. Making network configuration part of your CI/CD pipelines allows you to have automated and uniform deployments on all environments.</p><p>Finally, when something inevitably goes wrong, a good understanding of networking is your diagnostic superpower. Tools like ping (to check reachability), traceroute (to trace network routes), netstat (to inspect network connections), and nslookup/dig (for DNS queries) become critical to diagnose and correct network-related issues.</p><h1>Programming &amp; Scripting: The Language of Automation</h1><p>While Linux builds the environment and networking enables communication, programming and scripting are the tools that you will use to bring automation to life in DevOps. They are not nice-to-haves, but a requirement for living the core principles of automation, consistency, and scalability.</p><p>DevOps, at its core, is about automating tedious, bug-ridden workflows. The solution to making that a reality is to write code:</p><p>Infrastructure as Code (IaC): You can define your whole infrastructure &#8211; networks, servers, databases &#8211; in code through the use of tools like Terraform, CloudFormation, and Pulumi. This enables you to have version control, repeatability, and manageability.</p><p>Configuration Management: Tools like Ansible, Puppet, and Chef use code to enable your servers to always be in the same, desired state and eliminate configuration drift and provide you with consistency in your environment.</p><p>CI/CD Pipelines: Scripting and coding within Jenkins, GitHub Actions, and GitLab CI is critical to the automation of software delivery from code commit through production deployment.</p><p>Without knowing how to program, you'll be stuck with the pre-existing features of these systems, unable to customize them to fit your particular needs as an organization or create bespoke solutions to handle special problems.</p><p>Each organization also has certain custom requirements that cannot be fulfilled by off-the-shelf software. That is where your programming expertise is worth its weight in gold:</p><p>Custom Integrations: You'll typically need to integrate disparate systems and tools that were not built to coexist with one another. Programming allows you to build those bridges.</p><p>Workflow Automation: Businesses have custom processes that require custom-made automation scripts to optimize operations.</p><p>Glue Code: You might need to write small pieces of code to glue different tools together into one unified pipeline.</p><p>Here are some actual DevOps examples where coding comes into play: coding Python scripts to back up databases and check their integrity, coding Bash scripts to scan application logs for issues and alert, coding custom webhook receivers in Go to process deployment events, or coding Node.js applications to collect metrics from different sources.</p><p>Even though you do not need to be a master in every programming language, expertise in a few major ones will be very useful in enhancing your DevOps capabilities:</p><h1>Bash/Shell Scripting: This is the native scripting language for talking to the Linux command line and automating basic system administration.</h1><p>Python: Due to its high flexibility and large library base, it is commonly utilized for building DevOps tools, automation scripts, and APIs interactions.</p><p>YAML/JSON: These are file configuration languages used to serialize data in most DevOps tools, e.g., Infrastructure as Code.</p><p>Go: Being increasingly used to write scalable and efficient DevOps tools due to its concurrency and performance.</p><p>Today's monitoring and observability also rely heavily on coding. You may have to code to gather custom application metrics, automate incident response (self-healing systems), or process large volumes of log data to gain useful insights.</p><p>Your DevOps Learning Roadmap for Programming</p><p>Basic Shell Scripting: Start with learning the fundamentals of Bash scripting for Linux daily tasks automation.</p><p>Version Control: Learn Git in its entirety because it is the foundation of today's development pipelines and essential to code and config management.</p><p>Python Basics: Concentrate on scripting, file operations, and API interactions because Python is utilized most widely in the DevOps environment.</p><p>Infrastructure as Code: Begin experimenting with declarative IaC tools like Terraform or CloudFormation to learn how infrastructures are declared via code.</p><p>CI/CD Configurations: Discover how to write pipeline definitions in scripting or domain-specific languages in tools like Jenkins' scripting or GitHub Actions. The Interconnected Trinity: It should be remembered that Linux, Networking, and Programming are not independent disciplines in DevOps, but are extremely interdependent. Your success in maintaining Linux systems typically relies on networking knowledge. Your capacity to automate infrastructure and deployment processes relies on your programming and scripting skills, typically exercised in a Linux environment and over networks. Learning these three pillars will not only give you the skills that you need to become a successful DevOps professional but also arm you with a solid knowledge of the technology that powers contemporary software delivery and infrastructure management. This solid knowledge will enable you to build, deploy, manage, and debug sophisticated systems without fear and with ease.</p><h2>Conclusion</h2><p>In summary, understanding Linux, networking, and programming is the backbone of a career in DevOps. Linux provides the underlying operating system for modern infrastructure, whereas networking provides uninterrupted and secure intercommunication between decentralized systems. Programming gives the muscle to the automation that powers the efficiency and scalability of DevOps practices. The three areas are not separate but highly interconnected and interdependent upon each other's expertise. Mastery of them allows DevOps experts to develop, deploy, and maintain advanced systems efficiently, fueling innovation and agility in the rapidly evolving technologic</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/basic-foundations-for-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/basic-foundations-for-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/basic-foundations-for-devops?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!4g1j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F407e9005-a2b4-4790-8bca-3a1c5b9ce57c_1280x1280.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Nakul Shivakumar in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=codeclouddevops" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><p></p>]]></content:encoded></item><item><title><![CDATA[Absolute Beginner DevOps Learning Roadmap]]></title><description><![CDATA[This roadmap breaks down each phase into specific learning objectives and suggests potential tools and areas of focus. Remember to prioritize hands-on practice alongside theoretical learning.]]></description><link>https://codeclouddevops.substack.com/p/absolute-beginner-devops-learning</link><guid isPermaLink="false">https://codeclouddevops.substack.com/p/absolute-beginner-devops-learning</guid><dc:creator><![CDATA[Nakul Shivakumar]]></dc:creator><pubDate>Fri, 25 Apr 2025 14:16:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1x2s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>DevOps has transformed how organisations build, deploy, and maintain software. By breaking down silos between development and operations teams, DevOps practices enable faster, more reliable software delivery. If you're looking to start a career in this high-demand field but don't know where to begin, this roadmap will guide you through your learning journey.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1x2s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1x2s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 424w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 848w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 1272w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1x2s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png" width="1000" height="1100" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1100,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:211843,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://codeclouddevops.substack.com/i/162126124?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1x2s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 424w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 848w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 1272w, https://substackcdn.com/image/fetch/$s_!1x2s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F943a6555-cf13-4790-a9ca-faaf439eb041_1000x1100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What is DevOps?</h2><p>DevOps is both a cultural philosophy and a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the development lifecycle while delivering features, fixes, and updates frequently and reliably.</p><p>The core principles include:</p><ul><li><p><strong>Collaboration</strong>: Breaking down barriers between development and operations</p></li><li><p><strong>Automation</strong>: Reducing manual effort and human error</p></li><li><p><strong>Continuous Integration/Continuous Delivery (CI/CD)</strong>: Enabling frequent, reliable releases</p></li><li><p><strong>Monitoring and Feedback</strong>: Constantly improving based on real performance data</p></li></ul><h2>The DevOps Learning Journey</h2><h3>Phase 1: Build a Strong Foundation (1-2 months)</h3><p><strong>Learn Linux Fundamentals</strong></p><ul><li><p>Basic command line operations</p></li><li><p>File system navigation</p></li><li><p>User management</p></li><li><p>Process management</p></li><li><p>Package installation</p></li><li><p>Shell scripting basics</p></li></ul><p><strong>Understand Networking Concepts</strong></p><ul><li><p>IP addressing and subnets</p></li><li><p>DNS, HTTP/HTTPS protocols</p></li><li><p>Firewalls and security basics</p></li><li><p>Load balancing concepts</p></li></ul><p><strong>Learn Version Control with Git</strong></p><ul><li><p>Basic commands (clone, add, commit, push, pull)</p></li><li><p>Branching and merging strategies</p></li><li><p>Pull requests and code reviews</p></li><li><p>Using GitHub/GitLab/Bitbucket</p></li></ul><p><strong>Programming/Scripting Language</strong></p><ul><li><p>Learn at least one scripting language:</p><ul><li><p>Python (recommended for beginners)</p></li><li><p>Bash scripting</p></li><li><p>PowerShell (for Windows environments)</p></li></ul></li></ul><h3>Phase 2: Infrastructure as Code &amp; Containerization (2-3 months)</h3><p><strong>Learn Containerization with Docker</strong></p><ul><li><p>Container concepts</p></li><li><p>Creating Dockerfiles</p></li><li><p>Managing images and containers</p></li><li><p>Docker networking and volumes</p></li><li><p>Docker Compose for multi-container applications</p></li></ul><p><strong>Container Orchestration with Kubernetes</strong></p><ul><li><p>Kubernetes architecture</p></li><li><p>Pods, deployments, services</p></li><li><p>ConfigMaps and secrets</p></li><li><p>Basic cluster management</p></li><li><p>Helm charts for package management</p></li></ul><p><strong>Infrastructure as Code</strong></p><ul><li><p>Terraform basics</p></li><li><p>Writing infrastructure as code</p></li><li><p>Managing state</p></li><li><p>Creating reusable modules</p></li><li><p>Provisioning cloud resources</p></li></ul><p><strong>Configuration Management</strong></p><ul><li><p>Ansible basics</p></li><li><p>Writing playbooks</p></li><li><p>Inventory management</p></li><li><p>Roles and variables</p></li><li><p>Idempotent configuration</p></li></ul><h3>Phase 3: CI/CD &amp; Cloud Platforms (2-3 months)</h3><p><strong>Continuous Integration/Continuous Delivery</strong></p><ul><li><p>CI/CD concepts and practices</p></li><li><p>Setting up CI/CD pipelines with:</p><ul><li><p>Jenkins</p></li><li><p>GitHub Actions</p></li><li><p>GitLab CI</p></li><li><p>CircleCI</p></li></ul></li></ul><p><strong>Cloud Platforms (focus on one initially)</strong></p><ul><li><p>AWS:</p><ul><li><p>EC2, S3, RDS</p></li><li><p>VPC, IAM, Security Groups</p></li><li><p>Lambda, ECS, EKS</p></li></ul></li><li><p>Azure:</p><ul><li><p>Virtual Machines, Storage Accounts</p></li><li><p>Virtual Networks, Active Directory</p></li><li><p>AKS, Azure Functions</p></li></ul></li><li><p>Google Cloud:</p><ul><li><p>Compute Engine, Cloud Storage</p></li><li><p>VPC, IAM</p></li><li><p>GKE, Cloud Functions</p></li></ul></li></ul><p><strong>Serverless Architecture</strong></p><ul><li><p>Serverless concepts</p></li><li><p>Function as a Service (FaaS)</p></li><li><p>Event-driven architecture</p></li><li><p>AWS Lambda/Azure Functions/Google Cloud Functions</p></li></ul><h3>Phase 4: Monitoring, Logging &amp; Security (1-2 months)</h3><p><strong>Monitoring and Observability</strong></p><ul><li><p>Prometheus for metrics</p></li><li><p>Grafana for visualization</p></li><li><p>Setting up dashboards</p></li><li><p>Alerting rules</p></li></ul><p><strong>Logging</strong></p><ul><li><p>Centralized logging concepts</p></li><li><p>ELK Stack (Elasticsearch, Logstash, Kibana)</p></li><li><p>Fluentd/Fluent Bit</p></li><li><p>Log analysis techniques</p></li></ul><p><strong>Security</strong></p><ul><li><p>Security best practices</p></li><li><p>Vulnerability scanning</p></li><li><p>Secret management</p></li><li><p>Compliance as code</p></li><li><p>DevSecOps principles</p></li></ul><h3>Phase 5: Advanced Topics &amp; Specialization (Ongoing)</h3><p><strong>Advanced Kubernetes</strong></p><ul><li><p>Custom controllers</p></li><li><p>Operators</p></li><li><p>Service mesh (Istio/Linkerd)</p></li><li><p>Advanced scheduling</p></li></ul><p><strong>GitOps</strong></p><ul><li><p>GitOps principles</p></li><li><p>ArgoCD/Flux</p></li><li><p>Automated deployment</p></li></ul><p><strong>Site Reliability Engineering (SRE)</strong></p><ul><li><p>SRE principles</p></li><li><p>Service Level Objectives (SLOs)</p></li><li><p>Error budgets</p></li><li><p>Chaos engineering</p></li></ul><h2>Practical Project Ideas</h2><p>As you learn, apply your knowledge through practical projects:</p><ol><li><p><strong>Simple Web Application Deployment</strong></p><ul><li><p>Deploy a simple web app with Docker</p></li><li><p>Set up CI/CD for automated builds and deployments</p></li><li><p>Implement infrastructure as code</p></li></ul></li><li><p><strong>Microservices Architecture</strong></p><ul><li><p>Build and deploy multiple microservices</p></li><li><p>Set up Kubernetes for orchestration</p></li><li><p>Implement service discovery and load balancing</p></li></ul></li><li><p><strong>Monitoring Dashboard</strong></p><ul><li><p>Set up monitoring for your applications</p></li><li><p>Create dashboards for key metrics</p></li><li><p>Implement alerting for critical issues</p></li></ul></li><li><p><strong>Disaster Recovery Solution</strong></p><ul><li><p>Design backup strategies</p></li><li><p>Implement automated recovery procedures</p></li><li><p>Test failure scenarios</p></li></ul></li><li><p>Checkout Nextwork for their incredible beginner friendly project on AWS and Multicloud projects completely performed on free-tier of AWS.</p><p>https://learn.nextwork.org/</p></li></ol><h2>Learning Resources</h2><h3>Free Resources</h3><ul><li><p><strong>Documentation</strong>: Official docs for tools like Docker, Kubernetes, Terraform</p></li><li><p><strong>YouTube Channels</strong>: TechWorld with Nana, DevOps Directive, KodeKloud</p></li><li><p><strong>Blogs</strong>: The DevOps Guy, DevOps.com</p></li><li><p><strong>GitHub</strong>: Hands-on practice repositories</p></li><li><p><strong>Nextwork:</strong> Hands-on Project learning on AWS and Multi-cloud Projects. </p></li></ul><h3>Paid Courses and Platforms</h3><ul><li><p><strong>Udemy</strong>: Docker, Kubernetes, and Jenkins courses by Mumshad Mannambeth</p></li><li><p><strong>A Cloud Guru / Linux Academy</strong>: Comprehensive DevOps learning paths</p></li><li><p><strong>Pluralsight</strong>: Deep dives into specific technologies</p></li><li><p><strong>KodeKloud</strong>: Hands-on labs and challenges</p></li></ul><h3>Certifications to Consider</h3><ul><li><p><strong>AWS Certified DevOps Engineer</strong></p></li><li><p><strong>Microsoft Certified: DevOps Engineer Expert</strong></p></li><li><p><strong>Certified Kubernetes Administrator (CKA)</strong></p></li><li><p><strong>Docker Certified Associate</strong></p></li><li><p><strong>HashiCorp Certified: Terraform Associate</strong></p></li></ul><h2>Mindset for Success</h2><p>DevOps is not just about tools; it's about adopting a mindset:</p><ul><li><p><strong>Continuous Learning</strong>: The field evolves rapidly; stay curious and keep learning</p></li><li><p><strong>Automation First</strong>: Always look for opportunities to automate repetitive tasks</p></li><li><p><strong>Systems Thinking</strong>: Understand how components work together in a larger system</p></li><li><p><strong>Problem-Solving</strong>: Develop strong troubleshooting and analytical skills</p></li><li><p><strong>Collaboration</strong>: Work effectively across teams and specializations</p></li></ul><h2>Final Thoughts</h2><p>Remember that DevOps is a journey, not a destination. No one masters all aspects overnight. Start with the fundamentals, build practical projects, and gradually expand your knowledge.</p><p>The most successful DevOps professionals are those who combine technical skills with a genuine passion for improving software delivery processes. Focus on understanding the "why" behind practices rather than just memorising commands.</p><p>As you progress through this roadmap, you'll not only develop valuable technical skills but also a mindset that will serve you well throughout your career in technology.</p><p>Happy learning!</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/@codeclouddevops/note/p-162126124&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/@codeclouddevops/note/p-162126124"><span>Leave a comment</span></a></p><div class="install-substack-app-embed install-substack-app-embed-web" data-component-name="InstallSubstackAppToDOM"><img class="install-substack-app-embed-img" src="https://substackcdn.com/image/fetch/$s_!jKCZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ec48beb-dbc6-486e-9461-54320de6ca39_144x144.png"><div class="install-substack-app-embed-text"><div class="install-substack-app-header">Get more from Nakul Shivakumar in the Substack app</div><div class="install-substack-app-text">Available for iOS and Android</div></div><a href="https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&amp;utm_content=author-post-insert&amp;utm_source=codeclouddevops" target="_blank" class="install-substack-app-embed-link"><button class="install-substack-app-embed-btn button primary">Get the app</button></a></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/absolute-beginner-devops-learning?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://codeclouddevops.substack.com/p/absolute-beginner-devops-learning?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://codeclouddevops.substack.com/p/absolute-beginner-devops-learning?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item></channel></rss>