<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[OpsInsights]]></title><description><![CDATA[OpsInsights was regular another blogspot to share ideas and technological glitches we face in our daily.]]></description><link>https://opsinsights.dev</link><generator>RSS for Node</generator><lastBuildDate>Mon, 20 Apr 2026 18:59:26 GMT</lastBuildDate><atom:link href="https://opsinsights.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[AWS EKS Capabilities: Fully Managed Argo CD — Setup with Terraform]]></title><description><![CDATA[Hello All,
If you've been running Argo CD on EKS, you know the drill — install the helm chart, manage the controllers, handle upgrades, configure SSO, worry about HA, and repeat for every cluster. It']]></description><link>https://opsinsights.dev/aws-eks-capabilities-fully-managed-argo-cd-setup-with-terraform</link><guid isPermaLink="true">https://opsinsights.dev/aws-eks-capabilities-fully-managed-argo-cd-setup-with-terraform</guid><category><![CDATA[aws, kubernetes, argocd, eks, terraform, gitops]]></category><category><![CDATA[AWS]]></category><category><![CDATA[argocs]]></category><category><![CDATA[gitops]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sat, 04 Apr 2026 17:59:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/613ee4f75cf6c3048a1e368b/1bef7ff4-4979-4b97-82eb-27f713ec3568.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello All,</p>
<p>If you've been running Argo CD on EKS, you know the drill — install the helm chart, manage the controllers, handle upgrades, configure SSO, worry about HA, and repeat for every cluster. It's a lot of operational overhead for a tool that's supposed to simplify your life.</p>
<p>AWS recently introduced <strong>EKS Capabilities</strong>, and one of the first capabilities available is a fully managed Argo CD. This means AWS takes care of running, scaling, and upgrading Argo CD controllers in their own service accounts — outside of your cluster. You don't install anything on your worker nodes.</p>
<blockquote>
<p>The Argo CD software runs in the AWS control plane, not on your worker nodes.</p>
</blockquote>
<p>Sounds interesting? Let's walk through the setup using Terraform.</p>
<hr />
<h2>What Changes with EKS Capability for Argo CD?</h2>
<p>Before this, self-managing Argo CD meant:</p>
<ul>
<li><p>Installing and maintaining Argo CD controllers on your cluster</p>
</li>
<li><p>Configuring SSO separately (Dex, OIDC, etc.)</p>
</li>
<li><p>Managing HA, scaling, and upgrades yourself</p>
</li>
<li><p>Setting up IRSA or cross-account IAM for multi-cluster deployments</p>
</li>
</ul>
<p>With EKS Capabilities, all of that is handled by AWS. The key highlights:</p>
<ul>
<li><p>Argo CD runs in AWS-managed service accounts, not your worker nodes</p>
</li>
<li><p>Native integration with <strong>AWS Identity Center</strong> (formerly SSO) for user management — no more Dex configs</p>
</li>
<li><p>Simplified multi-cluster access using EKS Access Entries</p>
</li>
<li><p>Native integrations with ECR, Secrets Manager, and CodeConnections</p>
</li>
</ul>
<blockquote>
<p>User management is completely taken out of your hands and integrated with AWS Identity Center. If you've been struggling with Argo CD RBAC + SSO, this is a relief.</p>
</blockquote>
<hr />
<h2>The Catch with Terraform</h2>
<p>Though we have the <a href="https://github.com/terraform-aws-modules/terraform-aws-eks">terraform-aws-modules/eks</a> module supporting this capability, it does not work completely out of the box for a fresh EKS setup. There are a few manual steps and dependencies you need to wire together.</p>
<p>In this blog, I'll walk through the setup and how to get it running quickly.</p>
<hr />
<h2>Step 1: Create the Argo CD Capability</h2>
<p>Using the EKS capability sub-module from <code>terraform-aws-modules/eks</code>:</p>
<pre><code class="language-hcl">module "argocd" {
  source  = "terraform-aws-modules/eks/aws//modules/capability"
  version = "~&gt; 21.16"

  name         = "${local.cluster_name}-argocd"
  cluster_name = module.eks.cluster_name
  type         = "ARGOCD"

  configuration = {
    argo_cd = {
      aws_idc = {
        idc_instance_arn = "arn:aws:sso:::instance/&lt;YOUR_IDC_INSTANCE_ID&gt;"
      }
      namespace = "argocd"
      rbac_role_mapping = [{
        role = "ADMIN"
        identity = [{
          id   = data.aws_identitystore_group.argocd_admin.group_id
          type = "SSO_GROUP"
        }]
      }]
    }
  }

  iam_policy_statements = {
    ECRRead = {
      actions = [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
      ]
      resources = ["*"]
    }
  }

  tags = local.additional_tags
}
</code></pre>
<p>A few things to note here:</p>
<ul>
<li><p><code>idc_instance_arn</code> — this is your AWS Identity Center instance ARN. Make sure Identity Center is configured in your account before proceeding.</p>
</li>
<li><p><code>rbac_role_mapping</code> — maps your Identity Center groups to Argo CD RBAC roles (ADMIN, VIEWER). This replaces the traditional Argo CD RBAC ConfigMap approach.</p>
</li>
<li><p>The <code>iam_policy_statements</code> block grants ECR read access so Argo CD can pull images and Helm charts from your private registries.</p>
</li>
</ul>
<hr />
<h2>Step 2: Configure EKS Access Entry for Argo CD</h2>
<p>This is a critical step. The capability needs cluster-level access to manage deployments. We wire this up using EKS Access Entries:</p>
<pre><code class="language-hcl">resource "aws_eks_access_entry" "argocd" {
  cluster_name  = module.eks.cluster_name
  principal_arn = module.argocd.iam_role_arn
  type          = "STANDARD"
}

resource "aws_eks_access_policy_association" "argocd" {
  cluster_name  = module.eks.cluster_name
  principal_arn = module.argocd.iam_role_arn
  policy_arn    = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"

  access_scope {
    type = "cluster"
  }

  depends_on = [aws_eks_access_entry.argocd]
}
</code></pre>
<blockquote>
<p>The <code>AmazonEKSClusterAdminPolicy</code> provides full cluster-admin access. This is fine for getting started, but for production consider scoping down with custom Kubernetes RBAC bindings.</p>
</blockquote>
<p>If you've read my earlier blog on <a href="https://opsinsights.dev/simplifying-access-entries-in-eks-a-guide">EKS Access Entries</a>, this pattern should feel familiar.</p>
<hr />
<h2>Step 3: Register Your Cluster (Mandatory!)</h2>
<p>Here's where most people get stuck. The Argo CD capability <strong>does not automatically register the local cluster</strong>. You must explicitly register it as a deployment target.</p>
<p>This is done by creating a Kubernetes Secret in the <code>argocd</code> namespace:</p>
<pre><code class="language-yaml">apiVersion: v1
kind: Secret
metadata:
  name: local-cluster
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: cluster
stringData:
  name: in-cluster
  server: arn:aws:eks:&lt;REGION&gt;:&lt;ACCOUNT_ID&gt;:cluster/&lt;CLUSTER_NAME&gt;
  project: default
</code></pre>
<p>Points to note:</p>
<ul>
<li><p>Use the <strong>EKS cluster ARN</strong> in the <code>server</code> field, not the Kubernetes API server URL. The managed capability requires ARNs to identify clusters.</p>
</li>
<li><p><code>kubernetes.default.svc</code> is <strong>not supported</strong> here.</p>
</li>
<li><p>This step depends on the access policy association from Step 2 being completed first.</p>
</li>
</ul>
<p>Apply it:</p>
<pre><code class="language-bash">kubectl apply -f local-cluster.yaml
</code></pre>
<p>Once registered, the cluster will show in an <code>Unknown</code> connection state until you create your first application — that's expected behavior.</p>
<hr />
<h2>Step 4: Deploy Your Applications</h2>
<p>Now you're ready to create Argo CD Applications and Projects. Here's a quick example:</p>
<pre><code class="language-yaml">apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/&lt;YOUR_ORG&gt;/&lt;YOUR_REPO&gt;.git
    targetRevision: HEAD
    path: k8s/
  destination:
    name: in-cluster
    namespace: my-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
</code></pre>
<p>Use <code>destination.name</code> with the cluster name you registered (like <code>in-cluster</code>). The <code>destination.server</code> field also works with EKS cluster ARNs, but using names is cleaner.</p>
<hr />
<h2>Things to Keep in Mind</h2>
<ul>
<li><p><strong>AWS Identity Center is mandatory</strong> — local users are not supported. Make sure IDC is configured with the right groups before setting up the capability.</p>
</li>
<li><p><strong>Cluster registration is not automatic</strong> — don't skip Step 3, or your deployments won't have a target.</p>
</li>
<li><p><strong>Access Entry dependency</strong> — the cluster secret registration depends on the access policy being in place. Terraform <code>depends_on</code> is your friend here.</p>
</li>
<li><p><strong>Production RBAC</strong> — scope down from <code>AmazonEKSClusterAdminPolicy</code> to least-privilege custom roles for production workloads.</p>
</li>
</ul>
<hr />
<p>This is a solid step forward from AWS in reducing the operational burden of running GitOps tooling. No more managing Argo CD upgrades, HA configs, or SSO integrations manually. It just works as part of your EKS cluster lifecycle.</p>
<p>Happy GitOps-ing!</p>
<p>References:</p>
<p><a href="https://docs.aws.amazon.com/eks/latest/userguide/argocd.html">https://docs.aws.amazon.com/eks/latest/userguide/argocd.html</a></p>
<p><a href="https://docs.aws.amazon.com/eks/latest/userguide/argocd-register-clusters.html">https://docs.aws.amazon.com/eks/latest/userguide/argocd-register-clusters.html</a></p>
<p><a href="https://aws.amazon.com/blogs/containers/deep-dive-streamlining-gitops-with-amazon-eks-capability-for-argo-cd/">https://aws.amazon.com/blogs/containers/deep-dive-streamlining-gitops-with-amazon-eks-capability-for-argo-cd/</a></p>
<p><a href="https://github.com/terraform-aws-modules/terraform-aws-eks">https://github.com/terraform-aws-modules/terraform-aws-eks</a></p>
]]></content:encoded></item><item><title><![CDATA[CI/CD for Databricks Pipelines with Databricks Asset Bundles (DAB) and Azure DevOps]]></title><description><![CDATA[Introduction
If you've spent any time managing data pipelines in the real world, you know the pain of deploying changes manually — copy-pasting notebooks, praying nothing breaks in prod, and maintaini]]></description><link>https://opsinsights.dev/ci-cd-for-databricks-pipelines-with-databricks-asset-bundles-dab-and-azure-devops</link><guid isPermaLink="true">https://opsinsights.dev/ci-cd-for-databricks-pipelines-with-databricks-asset-bundles-dab-and-azure-devops</guid><category><![CDATA[Databricks]]></category><category><![CDATA[Databricks asset bundles]]></category><category><![CDATA[cicd]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Fri, 20 Feb 2026 10:25:34 GMT</pubDate><enclosure url="https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/covers/613ee4f75cf6c3048a1e368b/c8b1a827-9946-4b08-bd76-742559954b7e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>Introduction</strong></h2>
<p>If you've spent any time managing data pipelines in the real world, you know the pain of deploying changes manually — copy-pasting notebooks, praying nothing breaks in prod, and maintaining that one giant script nobody dares to touch. I've been there, and I'll be honest: it's not fun.</p>
<p>At some point, we decided enough was enough. We adopted <strong>Databricks Asset Bundles (DAB)</strong> to bring the same software engineering rigour we apply to application code — version control, code review, automated testing, and environment promotion — to our Databricks workloads.</p>
<p>This post walks through exactly how we built a fully automated CI/CD pipeline using DAB and Azure DevOps, and some of the hard-won lessons we picked up along the way.</p>
<hr />
<h2><strong>What Are Databricks Asset Bundles?</strong></h2>
<p>Databricks Asset Bundles are a YAML-based configuration framework for defining and deploying Databricks resources — <strong>Delta Live Tables (DLT) pipelines</strong>, <strong>jobs/workflows</strong>, <strong>SQL warehouses</strong>, and <strong>permissions</strong> — as code. Think of it as Terraform, but purpose-built for Databricks.</p>
<p>Resources are declared in <code>.yml</code> files and deployed via the <code>databricks</code> CLI. This makes them version-controllable, reviewable, and CI/CD-friendly right out of the box. Once you start using it, going back to manual deployments feels unthinkable.</p>
<hr />
<h2><strong>Our Bundle Structure</strong></h2>
<p>Here's how our project is laid out. The folder structure itself tells a story — environments, jobs, pipelines, and warehouses each have their own space, all orchestrated from a single root bundle file:</p>
<pre><code class="language-plaintext">&lt;your-project&gt;/
├── databricks.yml              # Root bundle definition &amp; targets (dev/test/prod)
├── azure-pipelines.yml         # Azure DevOps CI/CD pipeline
└── resources/
    ├── environments/
    │   ├── dev.yml             # Dev-specific variables (catalog, schema, storage)
    │   └── prod.yml            # Prod-specific variables
    ├── jobs/
    │   ├── wf_&lt;domain_a&gt;.yml   # Workflow: triggers a single DLT pipeline
    │   ├── wf_&lt;domain_b&gt;.yml   # Workflow: orchestrates multiple DLT pipelines
    │   └── wf_&lt;domain_c&gt;.yml   # Workflow: SQL task-based job
    ├── pipelines/
    │   ├── pl_&lt;entity_a&gt;.yml   # DLT: Bronze → Silver → Gold
    │   ├── pl_&lt;entity_b&gt;.yml   # DLT: another domain pipeline
    │   └── ...                 # (one YAML per domain/entity)
    └── sql_warehouses/
        └── cl_sql_xsm_stage.yml  # Serverless SQL warehouse (X-Small)
</code></pre>
<p>One pipeline, one YAML. Clean, predictable, and easy to find what you're looking for.</p>
<hr />
<h2><strong>Multi-Target Configuration</strong></h2>
<p>This is where DAB really earns its keep. Instead of maintaining separate repos or complex branching strategies per environment, we use <strong>targets</strong> — environment-specific configurations all sitting inside a single <code>databricks.yml</code>:</p>
<table style="min-width:100px"><colgroup><col style="min-width:25px"></col><col style="min-width:25px"></col><col style="min-width:25px"></col><col style="min-width:25px"></col></colgroup><tbody><tr><th><p>Target</p></th><th><p>Mode</p></th><th><p>Workspace</p></th><th><p>Use Case</p></th></tr><tr><td><p><code>local-user</code></p></td><td><p>development</p></td><td><p>Dev workspace</p></td><td><p>Individual developer sandboxes</p></td></tr><tr><td><p><code>dev</code></p></td><td><p>development</p></td><td><p>Dev workspace</p></td><td><p>Shared dev environment (CI on feature/dev branch)</p></td></tr><tr><td><p><code>test</code></p></td><td><p>development</p></td><td><p>Test workspace</p></td><td><p>QA / pre-production</p></td></tr><tr><td><p><code>prod</code></p></td><td><p>production</p></td><td><p>Prod workspace</p></td><td><p>Live production</p></td></tr></tbody></table>

<p>Each target overrides variables like the Unity Catalog name, data lake schema, and environment label. The same pipeline YAML works across all environments — zero code changes required when promoting between them.</p>
<pre><code class="language-yaml"># databricks.yml (simplified)
targets:
  dev:
    mode: development
    workspace:
      host: https://&lt;your-dev-workspace&gt;.azuredatabricks.net/
    variables:
      pipeline_uc_catalog:
        default: dev_&lt;your_catalog&gt;

  prod:
    mode: production
    workspace:
      host: https://&lt;your-prod-workspace&gt;.azuredatabricks.net/
    variables:
      pipeline_uc_catalog:
        default: prod_&lt;your_catalog&gt;
</code></pre>
<hr />
<h2><strong>Pipeline Resource Design</strong></h2>
<h3><strong>DLT Pipelines</strong></h3>
<p>Each DLT pipeline follows the <strong>Bronze → Silver → Gold</strong> medallion architecture. Common utilities are shared across all pipelines (think a shared loader module and init file). Pipeline-level configs — catalog names, schema names, storage paths — are injected via variables at deploy time, so there's nothing hardcoded:</p>
<pre><code class="language-yaml"># resources/pipelines/pl_&lt;entity&gt;.yml (simplified)
resources:
  pipelines:
    pipeline_pl_&lt;entity&gt;:
      name: pl_&lt;entity&gt;
      configuration:
        pipeline.uc_catalog: ${variables.pipeline_uc_catalog.default}
        pipeline.uc_schema_brz: ${variables.pipeline_uc_schema_brz.default}
      libraries:
        - file: { path: ../../src/code/bronze/&lt;entity&gt;_bronze_dlt.py }
        - file: { path: ../../src/code/silver/&lt;entity&gt;_silver_dlt.py }
        - file: { path: ../../src/code/gold/fct_&lt;entity&gt;_dlt.sql }
      photon: true
      serverless: true
</code></pre>
<h3><strong>Jobs / Workflows</strong></h3>
<p>Jobs wire DLT pipelines together. We use two patterns depending on the use case:</p>
<ul>
<li><p><strong>Pipeline task jobs</strong> — trigger DLT pipelines by resource reference. Good for orchestrating multiple related pipelines in sequence.</p>
</li>
<li><p><strong>SQL task jobs</strong> — run SQL scripts against a SQL warehouse, with conditional full-refresh logic built in.</p>
</li>
</ul>
<hr />
<h2><strong>The Azure DevOps CI/CD Pipeline</strong></h2>
<p>Our <code>azure-pipelines.yml</code> defines three stages that execute on every push to any of our active branches:</p>
<pre><code class="language-plaintext">Push to branch
      │
      ▼
┌─────────────┐     ┌───────────────────┐     ┌─────────────────────────┐
│  Stage 1    │────▶│     Stage 2       │────▶│       Stage 3           │
│  Test       │     │  Code Quality     │     │  Deploy to Environment  │
│  (pytest)   │     │  (SonarCloud)     │     │  (DAB deploy)           │
└─────────────┘     └───────────────────┘     └─────────────────────────┘
</code></pre>
<h3><strong>Branch → Environment Mapping</strong></h3>
<table style="min-width:75px"><colgroup><col style="min-width:25px"></col><col style="min-width:25px"></col><col style="min-width:25px"></col></colgroup><tbody><tr><th><p>Branch</p></th><th><p>Environment</p></th><th><p>Approval Required</p></th></tr><tr><td><p><code>&lt;dev-branch&gt;</code></p></td><td><p><code>dev</code></p></td><td><p>None</p></td></tr><tr><td><p><code>&lt;test-branch&gt;</code></p></td><td><p><code>test</code></p></td><td><p>Optional</p></td></tr><tr><td><p><code>&lt;main-branch&gt;</code></p></td><td><p><code>prod</code></p></td><td><p><strong>Manual approval</strong></p></td></tr></tbody></table>

<p>Simple and deliberate. Dev merges are instant; production deployments require a human to say yes.</p>
<h3><strong>Authentication</strong></h3>
<p>We use an <strong>Azure Service Principal</strong> with the <code>AzureCLI@2</code> task to obtain a short-lived Databricks access token at deploy time. No secrets baked into YAML, no long-lived credentials lying around:</p>
<pre><code class="language-bash">DATABRICKS_TOKEN=$(az account get-access-token \
  --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \
  --query "accessToken" -o tsv)
</code></pre>
<p>Short-lived tokens via service principal is the way to go. It's one less thing to rotate manually.</p>
<h3><strong>Deploy Step</strong></h3>
<pre><code class="language-bash">databricks bundle validate -t $(environment)
databricks bundle deploy -t $(environment) --auto-approve
</code></pre>
<blockquote>
<p><code>--auto-approve</code> is required in non-interactive CI/CD environments to bypass prompts for destructive actions like renaming or deleting resources. Skip it and your pipeline will hang indefinitely — ask me how I know.</p>
</blockquote>
<hr />
<h2><strong>Key Lessons Learned</strong></h2>
<p>These are the things I wish someone had told me when we started. Saving them here so you don't have to learn them the hard way:</p>
<ol>
<li><p><strong>Name resources without environment suffixes.</strong> Let DAB targets handle environment context — use <code>pl_orders</code>, not <code>pl_dev_orders</code>. The moment you bake the environment into the name, renames will break your CI/CD and your day.</p>
</li>
<li><p><strong>Use variable references consistently.</strong> Never hardcode catalog names or storage accounts inside pipeline YAMLs. Always use <code>${variables.*}</code> so the environment config files are the single source of truth. It pays off every time you add a new target.</p>
</li>
<li><p><code>--auto-approve</code> <strong>is mandatory in CI/CD.</strong> The Databricks CLI will block on any destructive action when there's no interactive terminal. Always include this flag in your deploy command — you'll forget it at least once, and you'll remember it forever after.</p>
</li>
<li><p><strong>Serverless + Photon = simplicity.</strong> Serverless pipelines remove the need to manage cluster sizing in each pipeline config. Less boilerplate, fewer misconfigurations, faster pipelines. It's an easy win.</p>
</li>
<li><p><strong>Event log per pipeline.</strong> Store pipeline event logs in a dedicated schema. It enables centralised monitoring across all DLT pipelines via a single Unity Catalog query — much better than hunting through individual pipeline UIs.</p>
</li>
</ol>
<hr />
<h2><strong>In the next blog</strong></h2>
<p>I can give a technical deep dive of the pipeline code and how we are using the Unity Catalog and Delta Live Tables to build the data platform.</p>
<ul>
<li><p>Azure Devops pipeline</p>
</li>
<li><p>Deploying to data bricks using azure Devops</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[K8s 1.35 Sneak Peek: What's Coming Next?]]></title><description><![CDATA[Kubernetes never sleeps! Just when we get comfortable, a new release brings features that make our lives easier (and our clusters safer). Here is a quick sneak peek at what is landing in Kubernetes 1.35.
Deprecation of ipvs mode
KEP-5495: Deprecate i...]]></description><link>https://opsinsights.dev/k8s-135-sneak-peek-whats-coming-next</link><guid isPermaLink="true">https://opsinsights.dev/k8s-135-sneak-peek-whats-coming-next</guid><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sat, 06 Dec 2025 18:47:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765046808192/36532900-b586-4f23-a7d5-72336be5347b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Kubernetes never sleeps! Just when we get comfortable, a new release brings features that make our lives easier (and our clusters safer). Here is a quick sneak peek at what is landing in <strong>Kubernetes 1.35</strong>.</p>
<h3 id="heading-deprecation-of-ipvs-mode"><strong>Deprecation of ipvs mode</strong></h3>
<p><em>KEP-5495: Deprecate ipvs mode in kube-proxy</em></p>
<p>Since the <code>iptables</code> scalability issues are being solved using <code>nftables</code>, <code>ipvs</code> mode is essentially redundant. It's time to say goodbye to the complexity of IPVS. If you want the gritty details on why <code>iptables</code> needed a replacement, <a target="_blank" href="https://isovalent.com/blog/post/why-replace-iptables-with-ebpf/">check out this deep dive</a>.</p>
<h3 id="heading-node-declared-features"><strong>Node Declared Features</strong></h3>
<p><em>KEP-4033: Discover cgroup driver from CRI</em></p>
<p>This solves the "does this node actually support what I need?" problem without manual labeling. Nodes can now self-report specific capabilities (like hardware or plugins) directly to the control plane via the API. This allows the scheduler to automatically place Pods on compatible nodes without you having to manually manage feature labels!</p>
<pre><code class="lang-yaml"><span class="hljs-attr">status:</span>
  <span class="hljs-attr">declaredFeatures:</span>
    <span class="hljs-attr">example.com/gpu:</span> <span class="hljs-string">"true"</span> <span class="hljs-comment"># Node says: "I explicitly support this!"</span>
</code></pre>
<h3 id="heading-in-place-update-of-pod-resources"><strong>In-Place Update of Pod Resources</strong></h3>
<p><em>KEP-1287: In-place Update of Pod Resources</em></p>
<p>This has been in the works for a while, but it's a game changer! Vertical Pod Autoscaling (VPA) can finally resize CPU and Memory limits <em>without restarting the Pod</em>. No more disruptions just to give a hungry container a bit more RAM.</p>
<h3 id="heading-numeric-values-for-taints"><strong>Numeric Values for Taints</strong></h3>
<p><em>KEP-5471: Enable SLA-based Scheduling</em></p>
<p>Taints used to be boolean (it exists or it doesn't). Now, we get math!</p>
<p><strong>Before:</strong> <code>taint key=HighPriority:NoSchedule</code> (You either match "HighPriority" exactly, or you don't).</p>
<p><strong>After:</strong> <code>taint reliability=999:NoSchedule</code> Pod Toleration: <code>operator: Gt, value: 950</code> <em>(The Pod says: "I can only tolerate nodes with reliability &gt; 950" — much more expressive!)</em></p>
<h3 id="heading-user-namespaces"><strong>User Namespaces</strong></h3>
<p><em>KEP-127: User Namespaces</em></p>
<p>This is a massive security upgrade for running containers safely.</p>
<p><strong>Before:</strong> Container <code>root</code> (UID 0) == Host <code>root</code> (UID 0). <em>(If they break out of the container, they are root on your server. Scary!)</em></p>
<p><strong>After:</strong> Container <code>root</code> (UID 0) == Host <code>nobody</code> (UID 65534). <em>(They are root inside, but powerless outside. Much better.)</em></p>
<h3 id="heading-oci-images-as-volumes"><strong>OCI Images as Volumes</strong></h3>
<p><em>KEP-4639: OCI Volume Source</em></p>
<p>Stop building massive images just to include data!</p>
<p><strong>Before:</strong> InitContainer runs <code>wget https://...</code> -&gt; saves to <code>emptyDir</code> -&gt; MainContainer reads it. <em>(Slow, complex startup scripts, wasted bandwidth.)</em></p>
<p><strong>After:</strong></p>
<pre><code class="lang-yaml"><span class="hljs-attr">volumes:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">data</span>
    <span class="hljs-attr">image:</span>
      <span class="hljs-attr">reference:</span> <span class="hljs-string">"my-data-layer:latest"</span> <span class="hljs-comment"># Mounts directly!</span>
</code></pre>
<hr />
<p>That's a wrap for the 1.35 highlights!</p>
<p>If you found this useful, feel free to reach out to me on LinkedIn. Let's discuss what feature you are most excited about!</p>
<p><strong>For more detailed Reading:</strong></p>
<p>1.35 Release notes: <a target="_blank" href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md">https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md</a></p>
<p><a target="_blank" href="https://isovalent.com/blog/post/why-replace-iptables-with-ebpf/">https://isovalent.com/blog/post/why-replace-iptables-with-ebpf/</a></p>
]]></content:encoded></item><item><title><![CDATA[Tired of Your LLM Using Outdated Terraform Docs? There's a Fix for That! 😉]]></title><description><![CDATA[Hello !
You ask your favorite AI assistant to whip up some Terraform code for a new module, and it confidently spits out syntax that was deprecated six months ago. 🤦‍♂️ While LLMs are fantastic for boosting productivity, they often rely on their tra...]]></description><link>https://opsinsights.dev/tired-of-your-llm-using-outdated-terraform-docs-theres-a-fix-for-that</link><guid isPermaLink="true">https://opsinsights.dev/tired-of-your-llm-using-outdated-terraform-docs-theres-a-fix-for-that</guid><category><![CDATA[Terraform]]></category><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Wed, 20 Aug 2025 00:48:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1755650721522/6bfca64a-295a-4639-ba7e-20b0eed35fe0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-hello">Hello !</h2>
<p>You ask your favorite AI assistant to whip up some Terraform code for a new module, and it confidently spits out syntax that was deprecated six months ago. 🤦‍♂️ While LLMs are fantastic for boosting productivity, they often rely on their training data, which can be a frozen snapshot in time. This means they miss out on the latest features and best practices.</p>
<p>I've been testing a new tool from HashiCorp that directly tackles this problem: the <code>terraform-mcp-server</code>. It essentially gives your LLM a direct, live line to the official Terraform documentation, ensuring the code it generates is accurate and up-to-date.</p>
<p>For my setup, I use <strong>Podman</strong> on my local machine, but you can use Docker or any container tool you prefer. And my go-to co-programmer for everything these days is <a target="_blank" href="https://cline.bot/">Cline</a>, which integrates with tools like this seamlessly.</p>
<hr />
<h3 id="heading-why-is-the-terraform-mcp-server-a-game-changer">Why is the Terraform MCP Server a Game-Changer?</h3>
<p>The core issue is that LLMs, by default, refer to the data they were trained on. To make them more effective, we can give them "tools" they can use to fetch live information. The <code>terraform-mcp-server</code> acts as this tool, an independent agent with a specific set of capabilities.</p>
<p>Think of it as giving your AI assistant a toolkit specifically for Terraform. Instead of guessing, it can now <em>ask</em> questions and get real-time answers. The tools it gets access to include:</p>
<ul>
<li><p><code>get_latest_module_version</code></p>
</li>
<li><p><code>get_latest_provider_version</code></p>
</li>
<li><p><code>get_module_details</code></p>
</li>
<li><p><code>get_policy_details</code></p>
</li>
<li><p><code>get_provider_details</code></p>
</li>
<li><p><code>search_modules</code></p>
</li>
<li><p><code>search_policies</code></p>
</li>
</ul>
<p>With these capabilities, the LLM can perform a thorough analysis before generating a single line of code. When I tested it on a GCP VPC module, it correctly used all the newest features without any hallucinated or outdated arguments. It was a beautiful thing to see!</p>
<hr />
<h3 id="heading-the-setup-is-super-simple">The Setup is Super Simple</h3>
<p>Getting this up and running is incredibly straightforward.</p>
<ol>
<li><p>First, clone the repository from GitHub:</p>
<p> Bash</p>
<pre><code class="lang-plaintext"> git clone https://github.com/hashicorp/terraform-mcp-server.git
</code></pre>
</li>
<li><p>Next, build the container image. I'm using <code>podman</code>, but <code>docker build</code> works just the same.</p>
<p> Bash</p>
<pre><code class="lang-plaintext"> cd terraform-mcp-server
 podman build -t terraform-mcp-server:dev .
</code></pre>
</li>
<li><p>Finally, run the server!</p>
<p> Bash</p>
<pre><code class="lang-plaintext"> podman run -p 8080:8080 --rm terraform-mcp-server:dev
</code></pre>
<p> <em>Note: The user's original notes included</em> <code>-e TRANSPORT_MODE=streamable-http -e TRANSPORT_HOST=0.0.0.0</code>, which are the defaults and can be useful for specific network setups, but the basic command above works for most local use cases.</p>
</li>
</ol>
<p>That's it! The server is now running and ready to accept requests from your AI tool.</p>
<hr />
<h3 id="heading-integrating-with-an-ai-assistant-like-cline">Integrating with an AI Assistant like Cline</h3>
<p>To get my assistant, Cline, to use this new tool, I just needed to add a simple configuration. This tells Cline how to run the <code>terraform-mcp-server</code> whenever it needs to access Terraform information.</p>
<p>Here is the JSON configuration I used:</p>
<p>JSON</p>
<pre><code class="lang-plaintext">{
  "mcpServers": {
    "terraform": {
      "command": "podman",
      "args": [
        "run",
        "-i",
        "--rm",
        "terraform-mcp-server:dev"
      ],
      "autoApprove": [
        "list_tools"
      ]
    }
  }
}
</code></pre>
<p>Here is the code generated and reference: <a target="_blank" href="https://github.com/jothimanikrish/terraform-ai-gcp-vpc/tree/main/gcp-vpc-module">https://github.com/jothimanikrish/terraform-ai-gcp-vpc/tree/main/gcp-vpc-module</a></p>
<p>This configuration defines a tool named <code>terraform</code>, specifies the <code>podman</code> command to execute it, and automatically approves the initial tool listing.</p>
<p>This <code>terraform-mcp-server</code> is a fantastic solution for anyone using LLMs in their Infrastructure as Code workflow. It bridges the gap between the static knowledge of the model and the dynamic, ever-evolving world of Terraform. No more copy-pasting docs or second-guessing AI-generated code.</p>
<p>Happy Terraforming!</p>
]]></content:encoded></item><item><title><![CDATA[Goodbye to iptables: A Quick Dive into GKE's Dataplane V2]]></title><description><![CDATA[Have you heard of Cilium and its eBPF-based networking magic for Kubernetes? If you're already a fan, then you're going to like what Google has been up to with GKE Dataplane V2.
GKE Dataplane V2 is built directly on top of Cilium, bringing the power ...]]></description><link>https://opsinsights.dev/goodbye-to-iptables-a-quick-dive-into-gkes-dataplane-v2</link><guid isPermaLink="true">https://opsinsights.dev/goodbye-to-iptables-a-quick-dive-into-gkes-dataplane-v2</guid><category><![CDATA[gke]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[GCP]]></category><category><![CDATA[GCP DevOps]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Thu, 17 Jul 2025 07:18:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752736621462/f3a02a42-1e0c-4baa-be15-6f80ee5dd2d9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Have you heard of Cilium and its eBPF-based networking magic for Kubernetes? If you're already a fan, then you're going to like what Google has been up to with GKE Dataplane V2.</p>
<p>GKE Dataplane V2 is built directly on top of Cilium, bringing the power of eBPF to Google's managed Kubernetes service.</p>
<p>Sounds cool, right? Let's dive into a few exciting things that Dataplane V2 brings to the table.</p>
<p>What's the Big Deal? If you've been using GKE for a while, you're probably familiar with the standard networking stack that used Calico for Network Policy and kube-proxy (with iptables) for service routing. Dataplane V2 changes the game completely.</p>
<h2 id="heading-here-are-the-highlights">Here are the highlights:</h2>
<ul>
<li><p><strong>Default on Autopilot:</strong> To make things easy, Dataplane V2 is enabled by default for all new GKE Autopilot clusters. You might already be using it without knowing!</p>
</li>
<li><p><strong>No More kube-proxy! That's right.</strong> GKE Dataplane V2 completely removes kube-proxy and its complex web of iptables rules. This means service routing is handled far more efficiently by eBPF, leading to better performance and scalability.</p>
</li>
<li><p><strong>Built-in Security:</strong> Security is now a first-class citizen. You don't need a third-party tool like Calico just to enforce Network Policies. You can enable policy enforcement with a single click in the GKE console or a simple flag in your cluster config.</p>
</li>
</ul>
<ul>
<li><p><strong>Network Policy Logging:</strong> Ever wondered if a connection was allowed or denied by a Network Policy? Dataplane V2 has built-in logging for this. You can configure a simple CRD on your cluster to get detailed logs, which is a massive help for debugging and security audits.</p>
</li>
<li><p><strong>Real-time Network Visibility:</strong> Thanks to its eBPF foundation, you get much deeper, real-time visibility into the network traffic flowing between your pods.</p>
</li>
</ul>
<h2 id="heading-the-specs-by-the-numbers-dataplane-v2-isnt-just-about-features-its-built-for-scale"><strong>The Specs:</strong> By the Numbers Dataplane V2 isn't just about features; it's built for scale.</h2>
<ul>
<li><p>Specification Limit on Dataplane V2 Number of nodes per cluster 7,500 Number of Pods per cluster 200,000 Number of Pods behind one Service 10,000 Number of Cluster IP Services 10,000 Number of LoadBalancer Services per cluster 750.</p>
<h2 id="heading-things-to-keep-in-mind-the-limitations"><strong>Things to Keep in Mind (The Limitations)</strong></h2>
<h3 id="heading-as-with-any-powerful-technology-there-are-a-few-things-you-should-be-aware-of-before-jumping-in"><code>As with any powerful technology, there are a few things you should be aware of before jumping in:</code></h3>
</li>
<li><p><strong>Creation Time Only:</strong> Dataplane V2 can only be enabled when you create a new GKE cluster. You can't migrate an existing cluster to it on the fly, so plan accordingly.</p>
</li>
<li><p>eBPF Map Limits: GKE Dataplane V2 relies on eBPF maps, which are limited to 260,000 endpoints across all services. An "endpoint" here is a single Pod backing a Service.</p>
</li>
<li><p>Missing kube-proxy Features: Since kube-proxy is gone, you might miss some specific metrics or behaviors you were used to. It's a new paradigm, so some old debugging habits might need to change.</p>
</li>
<li><p>Update Your GKE Version: To get the most out of Dataplane V2 and all its features without limitations, you'll want to be on a recent GKE version (the docs often recommend 1.31+ for the latest enhancements).</p>
</li>
</ul>
<p>Key Takeaways GKE Dataplane V2 is more than just an update; it's a fundamental shift in how Kubernetes networking is handled in GKE. By leveraging Cilium and eBPF, it offers:</p>
<ul>
<li><p>Increased performance and scalability by removing kube-proxy.</p>
</li>
<li><p>Simplified and integrated security with built-in Network Policies.</p>
</li>
<li><p>Better observability with features like policy logging.</p>
</li>
</ul>
<p>It's a powerful and efficient foundation for your modern applications running on GKE.</p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes Vertical Pod Autoscaler: A Deep Dive into Right-Sizing Your Applications]]></title><description><![CDATA[Hello All,
Like many of you, I was incredibly excited about the VPA Beta release in Kubernetes 1.33 (which you can read about here). I've since taken the time to understand the current VPA release as of July 2025 in more detail.
Let's dive into VPA i...]]></description><link>https://opsinsights.dev/kubernetes-vertical-pod-autoscaler-a-deep-dive-into-right-sizing-your-applications</link><guid isPermaLink="true">https://opsinsights.dev/kubernetes-vertical-pod-autoscaler-a-deep-dive-into-right-sizing-your-applications</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[VPA]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sun, 29 Jun 2025 13:40:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751204304697/3a82a6ab-de2c-4422-a259-ee3b9b1f76d9.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>Hello All,</p>
<p>Like many of you, I was incredibly excited about the <strong>VPA Beta release in Kubernetes 1.33</strong> (which you can read about <a target="_blank" href="https://kubernetes.io/blog/2025/04/23/kubernetes-v1-33-release/#beta-in-place-resource-resize-for-vertical-scaling-of-pods">here</a>). I've since taken the time to understand the current VPA release as of July 2025 in more detail.</p>
<p>Let's dive into VPA in this blog post.</p>
<hr />
<h2 id="heading-the-critical-question-every-devops-engineer-asks">The Critical Question Every DevOps Engineer Asks</h2>
<p>I have critical workloads in production that cannot be scaled horizontally, and the replica count is always set to 1. <strong>Shall I directly use VPA now to scale vertically?</strong></p>
<p>The answer is <strong>NO.</strong></p>
<p>Why? Let's first understand VPA in detail, and then I'll share my experience on why you should be cautious.</p>
<hr />
<h2 id="heading-what-is-vertical-pod-autoscaler">What is Vertical Pod Autoscaler?</h2>
<p><strong>Vertical Pod Autoscaler (VPA)</strong> automatically adjusts the CPU and memory reservations for your pods to help "right-size" your applications. Unlike <strong>Horizontal Pod Autoscaler (HPA)</strong>, which scales the number of replicas, VPA scales the resources allocated to existing pods.</p>
<p>Think of it this way: HPA says, "I need more workers," while VPA says, "I need stronger workers."</p>
<hr />
<h2 id="heading-installation">Installation</h2>
<p>Installation is straightforward. Refer to the official guide <a target="_blank" href="https://github.com/kubernetes/autoscaler/tree/9f87b78df0f1d6e142234bb32e8acbd71295585a/vertical-pod-autoscaler">here</a>.</p>
<p>Bash</p>
<pre><code class="lang-plaintext">git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh
</code></pre>
<hr />
<h2 id="heading-vpa-operating-modes">VPA Operating Modes</h2>
<p>VPA can operate in four different modes:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Mode</td><td>Description</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Auto</strong></td><td>Currently, this means <strong>Recreate</strong>. This might change to in-place updates in the future.</td></tr>
<tr>
<td><strong>Recreate</strong></td><td>The VPA assigns resource requests on pod creation and also updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation.</td></tr>
<tr>
<td><strong>Initial</strong></td><td>The VPA only assigns resource requests on pod creation and never changes them later.</td></tr>
<tr>
<td><strong>Off</strong></td><td>The VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-container-resize-policies">Container Resize Policies</h2>
<p>With Kubernetes 1.33, you can now define how containers should handle resource changes:</p>
<p>YAML</p>
<pre><code class="lang-plaintext">spec:
  containers:
    - name: my-app
      image: my-app:latest
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired # apply directly to running container
        - resourceName: memory
          restartPolicy: RestartContainer # apply and restart to take effect
</code></pre>
<p>This is a game-changer! Finally, we can control whether a container needs to restart when resources are updated.</p>
<hr />
<h2 id="heading-real-world-example">Real-World Example</h2>
<p>Let me share a practical example. Here's how I set up VPA for a monitoring application:</p>
<p>YAML</p>
<pre><code class="lang-plaintext">apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: monitoring-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: monitoring-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: monitoring-app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 1
          memory: 1Gi
        controlledResources: ["cpu", "memory"]
</code></pre>
<hr />
<h2 id="heading-where-can-vpa-be-used">Where Can VPA Be Used?</h2>
<h3 id="heading-1-analyzing-current-workload-patterns-off-mode">1. Analyzing Current Workload Patterns (Off Mode)</h3>
<p>Start with <strong>Off</strong> mode to understand your application's resource consumption patterns:</p>
<p>YAML</p>
<pre><code class="lang-plaintext">spec:
  updatePolicy:
    updateMode: "Off"
</code></pre>
<p>This mode is perfect for:</p>
<ul>
<li><p>Understanding resource usage patterns</p>
</li>
<li><p>Getting recommendations without any changes</p>
</li>
<li><p>Planning capacity for new applications</p>
</li>
<li><p>use prometheus compatible metrics</p>
</li>
</ul>
<hr />
<h2 id="heading-limitations-and-gotchas">Limitations and Gotchas</h2>
<p><strong>Important:</strong> As of Kubernetes 1.33, VPA still has some limitations you need to be aware of:</p>
<h3 id="heading-1-pod-disruption">1. Pod Disruption</h3>
<p>In <strong>Recreate</strong> and <strong>Auto</strong> modes, VPA will terminate and recreate pods. This means downtime for single-replica applications. This is why I said <strong>NO</strong> to using it directly on critical production workloads!</p>
<h3 id="heading-2-compatibility-issues">2. Compatibility Issues</h3>
<p>VPA and HPA cannot target the same metrics (CPU/Memory). You can use them together, but HPA should target custom metrics.</p>
<h3 id="heading-3-resource-limits">3. Resource Limits</h3>
<p>VPA doesn't set resource limits, only requests. You need to set limits manually or use LimitRanges.</p>
<hr />
<h2 id="heading-my-production-strategy">My Production Strategy</h2>
<p>Here's how I approach VPA in production environments:</p>
<h3 id="heading-observation-off-mode">Observation (Off Mode)</h3>
<p>YAML</p>
<pre><code class="lang-plaintext">updateMode: "Off"
</code></pre>
<p>Run for 2-4 weeks to gather data and understand patterns. And Action!</p>
<hr />
<h2 id="heading-best-practices">Best Practices</h2>
<ol>
<li><p><strong>Always set resource bounds:</strong></p>
<p> YAML</p>
<pre><code class="lang-plaintext"> minAllowed:
   cpu: 100m
   memory: 128Mi
 maxAllowed:
   cpu: 2
   memory: 4Gi
</code></pre>
</li>
<li><p><strong>Use PodDisruptionBudgets:</strong></p>
<p> YAML</p>
<pre><code class="lang-plaintext"> apiVersion: policy/v1
 kind: PodDisruptionBudget
 metadata:
   name: my-app-pdb
 spec:
   minAvailable: 1
   selector:
     matchLabels:
       app: my-app
</code></pre>
</li>
<li><p><strong>Monitor VPA recommendations:</strong></p>
<p> Bash</p>
<pre><code class="lang-plaintext"> kubectl describe vpa my-app-vpa
</code></pre>
</li>
</ol>
<hr />
<h2 id="heading-the-future-is-bright">The Future is Bright</h2>
<p>With in-place resource updates coming to Kubernetes, VPA will become much more production-ready. The ability to update resources without pod restarts will be a game-changer for critical workloads.</p>
<p>Until then, use VPA wisely:</p>
<ul>
<li><p>Start with <strong>Off</strong> mode for analysis.</p>
</li>
<li><p>Use <strong>Initial</strong> mode for new deployments.</p>
</li>
<li><p>Be cautious with <strong>Auto</strong> mode on critical applications.</p>
</li>
</ul>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>VPA is a powerful tool, but like any powerful tool, it needs to be used with care and understanding. Don't rush into production with <strong>Auto</strong> mode on critical workloads. Take time to understand your application's behavior, set proper bounds, and gradually roll it out.</p>
<p>Remember: <strong>With great power comes great responsibility!</strong></p>
<p>Have you tried VPA in your environment? Share your experiences in the comments below.</p>
<p>Happy scaling! 🚀</p>
]]></content:encoded></item><item><title><![CDATA[Building a ChatOps AI Bot with LangChain and LLMs in Slack]]></title><description><![CDATA[Intro
When it comes to managing operations, wouldn't it be great to have a more intuitive, convenient way to execute tasks, delegate work, and empower teams for self-service? That's what ChatOps brings to the table.
It's not just a trend—it's a parad...]]></description><link>https://opsinsights.dev/building-a-chatops-ai-bot-with-langchain-and-llms-in-slack</link><guid isPermaLink="true">https://opsinsights.dev/building-a-chatops-ai-bot-with-langchain-and-llms-in-slack</guid><category><![CDATA[generative ai]]></category><category><![CDATA[slack-bot]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sat, 18 Jan 2025 14:27:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1737209744170/5cb138c6-b1f0-4295-adfe-b3e6df285136.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h3 id="heading-intro">Intro</h3>
<p>When it comes to managing operations, wouldn't it be great to have a more intuitive, convenient way to execute tasks, delegate work, and empower teams for self-service? That's what ChatOps brings to the table.</p>
<p>It's not just a trend—it's a paradigm shift in how we interact with our systems. While bots themselves aren't new, we're about to revolutionize how we interact with them, making the experience smarter, more intuitive, and deeply conversational.</p>
<p>Today, I'm going to walk you through how I built a ChatOps bot using <strong>Slack</strong>, <strong>LangChain</strong>, and an <strong>LLM</strong> of my choice, all wrapped in Python.</p>
<h3 id="heading-why-chatops">Why ChatOps?</h3>
<p>ChatOps is all about making operations more accessible and collaborative. By integrating operations directly into Slack (or any chat platform), it becomes:</p>
<ol>
<li><p><strong>Intuitive</strong>: Users interact in natural language rather than remembering complex commands.</p>
</li>
<li><p><strong>Convenient</strong>: Operations happen where your team is already communicating.</p>
</li>
<li><p><strong>Empowering</strong>: Enables <strong>delegation</strong> and <strong>self-service</strong>, minimizing reliance on admins for every minor operation.</p>
</li>
<li><p><strong>Automated</strong>: Offloads repetitive tasks, leaving your team free to focus on higher-level decisions.</p>
</li>
</ol>
<h3 id="heading-chatops-with-llm">ChatOps with LLM</h3>
<p>Traditionally, ChatOps bots like <strong>ErrBot</strong> work by triggering commands—users invoke specific commands directly. For example, typing <code>!status</code> would return the status of a service.</p>
<p>But what if we could make it smarter and more conversational? Instead of explicitly typing exact commands, we can leverage <strong>LLM</strong> to make our bot understand and execute intent, even if the phrasing isn't an exact match. This is where <strong>LangChain</strong> comes in.</p>
<h3 id="heading-langchain">LangChain</h3>
<p>LangChain is a powerful framework that simplifies the integration of <strong>LLMs (large language models)</strong> into real-world applications. It manages components like querying the LLM, parsing responses, and chaining together operations to make it seamless for developers like us. LangChain enables us to easily adapt traditional bots like ErrBot, injecting them with the intelligence of an LLM—while still keeping operations under tight control.</p>
<h3 id="heading-defining-my-chatops-framework">Defining My ChatOps Framework</h3>
<p>For any good operational framework, a few essentials are non-negotiable:</p>
<ol>
<li><p><strong>RBAC (Role-Based Access Control)</strong>: Define <strong>who</strong> can trigger which commands and limit the actions they can perform.</p>
</li>
<li><p><strong>Channel Control</strong>: Specify <strong>where the bot can operate</strong>—not all commands should work in every channel for safety reasons.</p>
</li>
<li><p><strong>Command Rules</strong>: Decide <strong>what tasks</strong> the bot can execute.</p>
</li>
</ol>
<p>In my case, I’m using <strong>ErrBot</strong> as the underlying framework for simplicity and layering <strong>LangChain</strong> on top of it to add conversational intelligence.</p>
<h3 id="heading-features-my-bot-supports">Features My Bot Supports</h3>
<p>Before introducing LangChain to the mix, my bot already supported the following commands:</p>
<ul>
<li><p><strong>help</strong>: Lists all the available commands.</p>
</li>
<li><p><strong>status</strong>: Returns the current status of a service or environment.</p>
</li>
<li><p><strong>approve_release</strong>: fetches the pending releases and prompts for approval</p>
</li>
</ul>
<p>These commands work perfectly, but they lack the flexibility of natural language interactions. For example:</p>
<ul>
<li>Instead of typing <code>!status</code>, wouldn’t it be nice to type “What’s the system status?” and have the bot figure out the intent?</li>
</ul>
<h3 id="heading-architecture-the-big-idea">Architecture: The Big Idea</h3>
<p>Here’s how I designed the bot after integrating LangChain:</p>
<ol>
<li><p><strong>Intent Classifier</strong><br /> The heart of the system! It takes the user’s prompt, runs it through an LLM (via LangChain), and extracts the intent. For example:</p>
<ul>
<li><p>Input: “Can you check on the system?”</p>
</li>
<li><p>Result: Intent is classified as <code>status</code>.</p>
</li>
</ul>
</li>
</ol>
<p>    The intent classifier strips down the natural language input into actionable commands for ErrBot. This ensures that only valid intents trigger actions, reducing the risk of LLM hallucinations or misfires. It's contextual and scoped, making it both powerful and safe.</p>
<ol start="2">
<li><p><strong>LLM Integration with JSON Responses</strong><br /> The LangChain model responds with structured JSON, making it easy to decode:</p>
<pre><code class="lang-json"> {
     <span class="hljs-attr">"intent"</span>: <span class="hljs-string">"status"</span>,
     <span class="hljs-attr">"parameters"</span>: {}
 }
</code></pre>
<p> By relying on this structured response, we can map intents to specific bot commands confidently.</p>
</li>
<li><p><strong>Mapping Intent to Commands</strong><br /> Once the intent is identified, the bot triggers the appropriate command handler. It’s a simple and modular pipeline that feels scalable and robust: - User types natural language. - LLM determines intent. - The bot executes the mapped command.</p>
</li>
</ol>
<h3 id="heading-why-this-approach-works">Why This Approach Works</h3>
<p>This architecture minimizes <strong>AI hallucination</strong> by restricting the context in which the LLM operates. The intent classifier ensures the bot doesn’t go rogue—it only triggers predefined, purposeful commands. This guards your ChatOps setup and makes AI-powered automation more predictable.</p>
<h3 id="heading-code-example">Code Example</h3>
<p>Here’s a simplified Python example showing how this would work using LangChain and Slack as the frontend:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Optional, Dict, Any
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> OpenAI
<span class="hljs-keyword">from</span> langchain.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain.schema.runnable <span class="hljs-keyword">import</span> RunnablePassthrough
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> logging

logger = logging.getLogger(__name__)

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">IntentClassifier</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, api_key: str, api_base: str, config: dict</span>):</span>
        <span class="hljs-string">"""Initialize the intent classifier with LangChain"""</span>
        self.llm = OpenAI(
            model=<span class="hljs-string">"claude-3-5-sonnet-latest"</span>,
            temperature=<span class="hljs-number">0.3</span>,  <span class="hljs-comment"># Lower temperature for more consistent results</span>
            openai_api_key=api_key,
            openai_api_base=api_base
        )

        <span class="hljs-comment"># Get commands from config</span>
        self.commands = config.get(<span class="hljs-string">"bot"</span>, {}).get(<span class="hljs-string">"commands"</span>, {})

        <span class="hljs-comment"># Build intent descriptions from commands</span>
        intent_descriptions = []
        <span class="hljs-keyword">for</span> cmd_name, cmd_config <span class="hljs-keyword">in</span> self.commands.items():
            <span class="hljs-keyword">if</span> cmd_config.get(<span class="hljs-string">"enabled"</span>, <span class="hljs-literal">True</span>):
                description = cmd_config.get(<span class="hljs-string">"description"</span>, <span class="hljs-string">""</span>)
                aliases = cmd_config.get(<span class="hljs-string">"aliases"</span>, [])
                alias_text = <span class="hljs-string">f" (aliases: <span class="hljs-subst">{<span class="hljs-string">', '</span>.join(aliases)}</span>)"</span> <span class="hljs-keyword">if</span> aliases <span class="hljs-keyword">else</span> <span class="hljs-string">""</span>
                intent_descriptions.append(<span class="hljs-string">f"- <span class="hljs-subst">{cmd_name}</span>: <span class="hljs-subst">{description}</span><span class="hljs-subst">{alias_text}</span>"</span>)

        logger.debug(<span class="hljs-string">f"Loaded <span class="hljs-subst">{len(intent_descriptions)}</span> commands for intent matching"</span>)

        <span class="hljs-comment"># Define the classification prompt template</span>
        template = <span class="hljs-string">"""
        You are a Slack bot designed to classify user messages into predefined intents. 
        Your task is to analyze the user message and match it to the closest predefined intent or respond if no match is found.

        Predefined intents include:
        {commands}

        User message: {message}

        Return a JSON object with:
        - intent: The matched intent name or null if no match
        - confidence: Confidence score between 0 and 1
        - extracted_params: Any parameters extracted from the message (optional)        

        Important rules:
        1. Only return intents from the predefined list above
        2. Consider command aliases when matching intents
        3. Extract any relevant parameters mentioned in the message
        4. Return null if no intent matches with high confidence (below 0.6)
        5. Ensure the response is valid JSON with no additional text
        6. Be generous with confidence scores when the intent is clear
        """</span>

        self.prompt = PromptTemplate(
            input_variables=[<span class="hljs-string">"message"</span>, <span class="hljs-string">"commands"</span>],
            template=template
        )

        self.commands_str = <span class="hljs-string">"\n"</span>.join(intent_descriptions)

        <span class="hljs-comment"># Create a runnable chain</span>
        self.chain = (
            {<span class="hljs-string">"message"</span>: RunnablePassthrough(), <span class="hljs-string">"commands"</span>: <span class="hljs-keyword">lambda</span> _: self.commands_str}
            | self.prompt
            | self.llm
        )

        logger.info(<span class="hljs-string">"Intent classifier initialized successfully"</span>)

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">classify_message</span>(<span class="hljs-params">self, message: str</span>) -&gt; Optional[Dict[str, Any]]:</span>
        <span class="hljs-string">"""Classify a message and return the intent details"""</span>
        <span class="hljs-keyword">try</span>:
            logger.debug(<span class="hljs-string">f"Classifying message: <span class="hljs-subst">{message}</span>"</span>)

            <span class="hljs-comment"># Run the classification chain</span>
            result = <span class="hljs-keyword">await</span> self.chain.ainvoke(message)
            logger.debug(<span class="hljs-string">f"Raw classification result: <span class="hljs-subst">{result}</span>"</span>)

            <span class="hljs-comment"># Parse the JSON response</span>
            intent_data = json.loads(result)

            <span class="hljs-comment"># Validate the response format</span>
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> isinstance(intent_data, dict):
                logger.error(<span class="hljs-string">f"Invalid classification response format: <span class="hljs-subst">{result}</span>"</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

            required_fields = [<span class="hljs-string">"intent"</span>, <span class="hljs-string">"confidence"</span>]
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> all(field <span class="hljs-keyword">in</span> intent_data <span class="hljs-keyword">for</span> field <span class="hljs-keyword">in</span> required_fields):
                logger.error(<span class="hljs-string">f"Missing required fields in classification: <span class="hljs-subst">{result}</span>"</span>)
                <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

            <span class="hljs-comment"># Log the classification result</span>
            intent = intent_data.get(<span class="hljs-string">"intent"</span>)
            confidence = intent_data.get(<span class="hljs-string">"confidence"</span>, <span class="hljs-number">0</span>)
            params = intent_data.get(<span class="hljs-string">"extracted_params"</span>, {})
            logger.info(
                <span class="hljs-string">f"Classification result - Intent: <span class="hljs-subst">{intent}</span>, "</span>
                <span class="hljs-string">f"Confidence: <span class="hljs-subst">{confidence}</span>, Params: <span class="hljs-subst">{params}</span>"</span>
            )

            <span class="hljs-keyword">return</span> intent_data

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            logger.error(<span class="hljs-string">f"Error classifying message: <span class="hljs-subst">{str(e)}</span>"</span>)
            <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_validate_confidence</span>(<span class="hljs-params">self, confidence: float</span>) -&gt; bool:</span>
        <span class="hljs-string">"""Validate confidence score is between 0 and 1"""</span>
        <span class="hljs-keyword">return</span> isinstance(confidence, (int, float)) <span class="hljs-keyword">and</span> <span class="hljs-number">0</span> &lt;= confidence &lt;= <span class="hljs-number">1</span>
</code></pre>
<p>In this example:</p>
<ol>
<li><p>The LLM determines intent based on my input.</p>
</li>
<li><p>Based on the intent (<code>status</code>, <code>echo</code>), the bot routes the input to the relevant handler.</p>
</li>
<li><p>Only predefined commands are triggered, keeping the system secure and deterministic.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737095279497/747cef76-ee05-4157-8614-c52886245c59.png" alt class="image--center mx-auto" /></p>
<p>Thank you !!!</p>
]]></content:encoded></item><item><title><![CDATA[Simplifying Access Entries in EKS: A Guide]]></title><description><![CDATA[EKS has introduced a new set of controls (Apologies I am late to explain this :p) for authentication and authorization, effectively integrating IAM principles with Kubernetes RBAC—a seamless and robust integration.
Accessing the cluster involves thre...]]></description><link>https://opsinsights.dev/simplifying-access-entries-in-eks-a-guide</link><guid isPermaLink="true">https://opsinsights.dev/simplifying-access-entries-in-eks-a-guide</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[AWS]]></category><category><![CDATA[EKS]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Thu, 02 May 2024 15:23:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714663324689/db3e371a-acb5-4d35-a8b8-b55916edf081.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>EKS has introduced a new set of controls (Apologies I am late to explain this :p) for authentication and authorization, effectively integrating IAM principles with Kubernetes RBAC—a seamless and robust integration.</p>
<p>Accessing the cluster involves three types:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><code>ConfigMap</code> only (<code>CONFIG_MAP</code>)</td><td><code>aws-auth</code> <code>ConfigMap (legacy )</code></td></tr>
</thead>
<tbody>
<tr>
<td>EKS API and <code>ConfigMap</code>(<code>API_AND_CONFIG_MAP</code>)</td><td>access entries in the EKS API, AWS Command Line Interface, AWS SDKs, AWS CloudFormation, and AWS Management Console and <code>aws-auth</code> <code>ConfigMap</code></td></tr>
<tr>
<td>EKS API only (<code>API</code>)</td><td>access entries in the EKS API, AWS Command Line Interface, AWS SDKs, AWS CloudFormation, and AWS Management Console</td></tr>
</tbody>
</table>
</div><p>Lets us understand how it was working before using aws-auth and CONFIGMAP.</p>
<h3 id="heading-the-aws-auth-configmap-deprecated">The <code>aws-auth</code> ConfigMap <em>(deprecated)</em></h3>
<p>This process involves mapping AWS IAM identities, including users, groups, and roles, to Kubernetes role-based access control (RBAC) for authorization.</p>
<p>Challenges and Pain Points</p>
<p>Ideally, this configuration should be managed internally within the cluster. After provisioning the cluster, it is necessary to establish a configuration that facilitates the relationship between IAM and the Kubernetes system.</p>
<p>Although eksctl can be used for this setup, it's important to note that not all clusters are created with eksctl. Thus, alternative methods need to be available for those who do not use this tool.</p>
<h3 id="heading-why-i-love-eks-access-entries-simplicity">Why I love EKS Access Entries? Simplicity!</h3>
<p>You do not need to learn anything new; simply integrate existing IAM principles with Kubernetes permissions.</p>
<p>IAM principles can be mapped to any of the four EKS permissions. Below is the mapping between EKS access policies and Kubernetes' default RBAC:</p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html#access-policy-permissions-AmazonEKSClusterAdminPolicy">AmazonEKSClusterAdminPolicy</a>: <strong>cluster-admin in kubernetes</strong></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html#access-policy-permissions-AmazonEKSAdminPolicy">AmazonEKSAdminPolicy</a>: <strong>admin</strong></p>
</li>
<li><p>AmazonEKSAdminViewPolicy: (<code>get</code>, <code>list</code>, <code>watch</code> across all API and resources)</p>
</li>
<li><p>AmazonEKSEditPolicy: <strong>edit</strong></p>
</li>
<li><p>AmazonEKSViewPolicy: <strong>view</strong></p>
</li>
</ul>
<p>Excited?</p>
<h3 id="heading-how-to-enable-the-access-entries-api">How to enable the access entries API?</h3>
<p>Eksctl</p>
<pre><code class="lang-yaml"><span class="hljs-attr">accessConfig:</span> 
    <span class="hljs-attr">authenticationMode:</span> <span class="hljs-string">&lt;&gt;</span>
</code></pre>
<p>Terraform:</p>
<p><strong>authentication_mode = "API_AND_CONFIG_MAP"</strong></p>
<p>Cluster administrators now have the capability to grant AWS IAM principals access to Amazon EKS clusters and Kubernetes objects across all supported versions (version 1.23 and later).</p>
<p>configmap is to be disabled soon?</p>
<p>Access Entries:</p>
<p><img src="https://d2908q01vomqb2.cloudfront.net/fe2ef495a1152561572949784c16bf23abb28057/2023/11/16/Workflow.png" alt /></p>
<p><em>Image ref:</em> <a target="_blank" href="https://aws.amazon.com/blogs/containers/a-deep-dive-into-simplified-amazon-eks-access-management-controls/"><em>https://aws.amazon.com/blogs/containers/a-deep-dive-into-simplified-amazon-eks-access-management-controls/</em></a></p>
<p>If you already using aws auth configmap, <a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/migrating-access-entries.html">here</a> is migration guide for the easy access management life</p>
<p>Here is an example of Terraform configuration for associating your IAM users and roles with the respective EKS access:</p>
<p>Step 1 - create access entry</p>
<pre><code class="lang-yaml"><span class="hljs-string">resource</span> <span class="hljs-string">"aws_eks_access_entry"</span> <span class="hljs-string">"readonly"</span> {
  <span class="hljs-string">cluster_name</span>      <span class="hljs-string">=</span> <span class="hljs-string">"eks-demo-cluster"</span>
  <span class="hljs-string">principal_arn</span>     <span class="hljs-string">=</span> <span class="hljs-string">aws_iam_role.example.arn</span> <span class="hljs-comment">#user-iam-arn</span>
  <span class="hljs-string">kubernetes_groups</span> <span class="hljs-string">=</span> []
  <span class="hljs-string">type</span>              <span class="hljs-string">=</span> <span class="hljs-string">"STANDARD"</span>
}
</code></pre>
<p>Step 2 - associate policy to it.</p>
<pre><code class="lang-yaml"><span class="hljs-string">resource</span> <span class="hljs-string">"aws_eks_access_policy_association"</span> <span class="hljs-string">"readonly"</span> {
  <span class="hljs-string">cluster_name</span>  <span class="hljs-string">=</span> <span class="hljs-string">"eks-demo-cluster"</span>
  <span class="hljs-string">policy_arn</span>    <span class="hljs-string">=</span> <span class="hljs-string">"arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"</span>
  <span class="hljs-string">principal_arn</span> <span class="hljs-string">=</span> <span class="hljs-string">aws_iam_user.example.arn</span> <span class="hljs-comment">#user-iam-arn</span>

  <span class="hljs-string">access_scope</span> {
    <span class="hljs-string">type</span>       <span class="hljs-string">=</span> <span class="hljs-string">"namespace"</span>
    <span class="hljs-string">namespaces</span> <span class="hljs-string">=</span> [<span class="hljs-string">"example-namespace"</span>]
  }
}
</code></pre>
<hr />
<p>Thank you!</p>
<p>If you have any questions, feel free to reach out to me on LinkedIn, and let's discuss further.</p>
<p>Detailed References:</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html">https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html</a></p>
<p><a target="_blank" href="https://aws.github.io/aws-eks-best-practices/security/docs/iam/#the-aws-auth-configmap-deprecated">https://aws.github.io/aws-eks-best-practices/security/docs/iam/#the-aws-auth-configmap-deprecated</a></p>
]]></content:encoded></item><item><title><![CDATA[Exploring the Newest S3 Bucket Innovations from AWS re:Invent 2023]]></title><description><![CDATA[Amazon S3 is often the starting point for anyone embarking on their AWS cloud journey, and it was my initial experience with AWS as well.
Since its inception in 2006, S3 has undergone significant evolution. From offering a range of bucket classificat...]]></description><link>https://opsinsights.dev/exploring-the-newest-s3-bucket</link><guid isPermaLink="true">https://opsinsights.dev/exploring-the-newest-s3-bucket</guid><category><![CDATA[s3-directory]]></category><category><![CDATA[AWS]]></category><category><![CDATA[S3]]></category><category><![CDATA[reInvent]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sat, 27 Jan 2024 17:34:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1706376502082/cf86013c-a1f8-4a65-b0d7-09ca49e2ef22.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>A</strong>mazon S3 is often the starting point for anyone embarking on their AWS cloud journey, and it was my initial experience with AWS as well.</p>
<p>Since its inception in 2006, S3 has undergone significant evolution. From offering a range of bucket classifications like Standard and Glacier to implementing zonal replications and versatile access policies, the advancements have been noteworthy.  </p>
<p>One of the more recent additions is the concept of mount points for S3, which further enhances its functionality. </p>
<blockquote>
<p>You can learn more about this development here: <a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/">[Mountpoint for Amazon S3]</a></p>
</blockquote>
<p>Another exciting feature recently introduced is <strong><em>"Directory Buckets."</em></strong> This addition to the S3 lineup offers even more flexibility and options for managing cloud storage efficiently.</p>
<h2 id="heading-what-is-a-directory-bucket">What is a Directory Bucket?</h2>
<p>S3 has now been categorized into two main types:</p>
<p>1. General Purpose Buckets</p>
<p>2. Directory Buckets</p>
<p>Let's explore what Directory Buckets are all about.</p>
<h3 id="heading-key-advantages">Key advantages:</h3>
<p>This advanced storage class stands out with three key features:</p>
<p>- Single-digit millisecond first byte latency for compute-intensive and latency-sensitive applications</p>
<p>- Consistent performance eliminates tail latencies, driving down query times</p>
<p>- Data access speeds up to 10x faster, and requested costs up to 50% lower than S3 standard.</p>
<p><strong>Points to consider:</strong></p>
<ul>
<li><p>These buckets has naming standards</p>
<ul>
<li><strong><mark>Base-name</mark></strong>--azid--x-s3</li>
</ul>
</li>
</ul>
<p>More details here: <a target="_blank" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-bucket-naming-rules.html">https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-bucket-naming-rules.html</a></p>
<p><img src="https://lh7-eu.googleusercontent.com/JrJFEkmpDiJ3ET22-S87Qi4_Pd_aHCgSNCDOHGc_avZ8OsJWWAIfAEiljVpO7boGkoKQ7dkpzZUekxQEROCzMPuZbH-fD407jeB66AEwj5oJRigz-rAiVwyBpDvd6NqgVP_JZ-upysXC4Gqu5xHAjxE" alt /></p>
<p><strong>Points to Consider during design:</strong></p>
<ul>
<li><p>This is one zone s3 storage so, in times of outage there will be dataloss or unavailability</p>
</li>
<li><p>Directory buckets store data across multiple devices within a single Availability Zone</p>
</li>
</ul>
<p>Thank you!!!</p>
]]></content:encoded></item><item><title><![CDATA[Elevate Your Kubernetes Skills: Key Insights for Acing CKA and CKS]]></title><description><![CDATA[Achieving certification in the Certified Kubernetes Administrator (CKA) and Certified Kubernetes Security Specialist (CKS).

We're all familiar with the syllabus and structure available on the official pages. In this post, I'll be sharing a fast-trac...]]></description><link>https://opsinsights.dev/key-insights-for-acing-cka-and-cks</link><guid isPermaLink="true">https://opsinsights.dev/key-insights-for-acing-cka-and-cks</guid><category><![CDATA[cka]]></category><category><![CDATA[CKS]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Mon, 22 Jan 2024 19:40:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705952362024/9e47f937-98f8-4328-b7cd-b5136a765e55.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Achieving certification in the Certified Kubernetes Administrator (CKA) and Certified Kubernetes Security Specialist (CKS).</p>
<ul>
<li>We're all familiar with the syllabus and structure available on the official pages. In this post, I'll be sharing a fast-track method to attain the certification, assuming you have around two years of experience in Kubernetes.</li>
</ul>
<h3 id="heading-maximizing-efficiency-in-cks-exam-preparation-emphasis-on-practice-and-speed"><strong>Maximizing Efficiency in CKS Exam Preparation: Emphasis on Practice and Speed</strong></h3>
<p>When preparing for the Certified Kubernetes Security (CKS) or Certified Kubernetes Administrator (CKA) exam, two key elements play a crucial role in your success: practice and speed.</p>
<p>It's essential to familiarize yourself with imperative commands, which can significantly streamline your workflow. Keeping a handy list of these commands can be a game-changer. Here are a few critical commands, among others, to remember:</p>
<ol>
<li><p><strong>Creating Secrets Quickly</strong>: Instead of dealing with the complexities of YAML definitions and base64 encoding, use the <code>create secret</code> command for swift creation. For example:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">create</span> <span class="hljs-string">secret</span> <span class="hljs-string">generic</span> <span class="hljs-string">--from-literal=username=admin</span> <span class="hljs-string">--from-literal=password=admin</span>
</code></pre>
<p> This command allows you to swiftly set up secrets, a fundamental aspect of Kubernetes security.</p>
</li>
<li><p><strong>Creating a Service Account</strong>: Simplify the creation of service accounts with this command:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">create</span> <span class="hljs-string">serviceaccount</span> <span class="hljs-string">readonly-sa</span> <span class="hljs-string">--namespace=dev</span>
</code></pre>
<p> This is an efficient way to manage access within different namespaces.</p>
</li>
<li><p><strong>Defining Roles</strong>: Establish roles easily with:</p>
<pre><code class="lang-yaml"> <span class="hljs-string">kubectl</span> <span class="hljs-string">create</span> <span class="hljs-string">role</span> <span class="hljs-string">dev-role</span> <span class="hljs-string">--namespace=dev</span> <span class="hljs-string">--verb=get</span> <span class="hljs-string">--resource=pods</span>
</code></pre>
<p> This command helps in setting up roles that define what actions are permitted on specific resources.</p>
</li>
<li><p><strong>Setting Up Role Bindings</strong>: To link roles to service accounts, use:</p>
<pre><code class="lang-plaintext"> kubectl create rolebinding dev-rolebinding --namespace=dev --role=dev-role --serviceaccount=dev:readonly-sa
</code></pre>
</li>
</ol>
<p>Other keys to keep handy,</p>
<ul>
<li><p>Creation of Persistent Volumes (PV)</p>
</li>
<li><p>Creation of Persistent Volume Claims (PVC)</p>
</li>
<li><p>Creation and management of Pods</p>
</li>
</ul>
<h3 id="heading-effective-documentation-utilization">Effective Documentation Utilization:</h3>
<p>For instance, consider using the official Kubernetes documentation at <a target="_blank" href="https://kubernetes.io/docs/home/">https://kubernetes.io/docs/home/</a>.</p>
<p>It's crucial to develop proficiency in locating specific information, such as steps for upgrading a cluster.</p>
<p>When searching for "upgrade," ensure that the sources you refer to are from the official Kubernetes documentation (URLs containing '<a target="_blank" href="http://kubernetes.io">kubernetes.io</a>'), rather than relying on discussions or forum pages. This approach guarantees accurate and authoritative information.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705951543160/82128447-4a9b-452d-91c1-a19faa09fd58.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-study-materials-and-resources-utilized-for-exam-preparation">Study Materials and Resources Utilized for Exam Preparation</h3>
<p><strong><em>CKA:</em></strong></p>
<ul>
<li><p>Kodekloud</p>
</li>
<li><p>practices test from <a target="_blank" href="http://killer.sh/">http://killer.sh/</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/kodekloudhub/certified-kubernetes-administrator-course/tree/master/docs">https://github.com/kodekloudhub/certified-kubernetes-administrator-course/tree/master/docs</a></p>
</li>
</ul>
<p><strong><em>CKS:</em></strong></p>
<ul>
<li><p>practices test from <a target="_blank" href="http://killer.sh/">http://killer.sh/</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/kodekloudhub/certified-kubernetes-security-specialist-cks-course/tree/main/docs">https://github.com/kodekloudhub/certified-kubernetes-security-specialist-cks-course/tree/main/docs</a></p>
</li>
<li><p><a target="_blank" href="https://kodekloud.com/courses/certified-kubernetes-security-specialist-cks/">https://kodekloud.com/courses/certified-kubernetes-security-specialist-cks/</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/ramanagali/Interview_Guide/blob/main/CKS_Preparation_Guide.md">https://github.com/ramanagali/Interview_Guide/blob/main/CKS_Preparation_Guide.md</a></p>
</li>
</ul>
<p>Good Luck!</p>
]]></content:encoded></item><item><title><![CDATA[My Journey to the HashiCorp Certified Terraform Associate Exam]]></title><description><![CDATA[Hashicorp Certified Terraform Associate - HCTA
This isn't just another typical blog post rehashing the HCTA exam details. We won't be covering the syllabus or exam structure that you can easily find on Hashicorp's official website.
In this post, I'll...]]></description><link>https://opsinsights.dev/my-journey-to-the-hashicorp-certified-terraform-associate-exam</link><guid isPermaLink="true">https://opsinsights.dev/my-journey-to-the-hashicorp-certified-terraform-associate-exam</guid><category><![CDATA[terraform certification]]></category><category><![CDATA[Certification]]></category><category><![CDATA[Terraform]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sun, 12 Nov 2023 13:30:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1699795527951/50f210e2-8c66-4656-8536-6a507a37a46d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-hashicorp-certified-terraform-associate-hcta">Hashicorp Certified Terraform Associate - HCTA</h3>
<p>This isn't just another typical blog post rehashing the HCTA exam details. We won't be covering the syllabus or exam structure that you can easily find on Hashicorp's official website.</p>
<p><code>In this post, I'll be detailing my journey to successfully passing the HCTA, focusing on the strategies and preparations I undertook.</code></p>
<p><strong>Prerequisites:</strong></p>
<blockquote>
<p>The key is to gain as much practical experience as possible.</p>
</blockquote>
<p>The more challenges you encounter and overcome in your hands-on practice, the smoother your exam experience will be.</p>
<p>Although the exam consists of multiple-choice questions, some of them are designed to test your practical knowledge to such an extent that you're likely to answer correctly only if you have real-world experience with the scenarios presented.</p>
<h3 id="heading-key-preparation-strategies-and-challenges"><strong>Key Preparation Strategies and Challenges</strong></h3>
<blockquote>
<p>Understanding the nuances of Terraform commands and their differences is crucial.</p>
<ul>
<li><p>terraform state (all sub commands)</p>
</li>
<li><p>terraform show</p>
</li>
<li><p>terraform output</p>
</li>
</ul>
</blockquote>
<ul>
<li><p>Some of the tricky questions include - <code>terraform push or terraform state push</code></p>
</li>
<li><p>After the moving the state file to another backend, what should you do?</p>
</li>
</ul>
<p><code>terraform init or terraform state push</code></p>
<ul>
<li>These questions cannot be answered unless you have worked; going on with the theory and white papers will not help.</li>
</ul>
<p><code>Terraform count() vs foreach?</code></p>
<ul>
<li>Remote backends and available remote backends</li>
</ul>
<h3 id="heading-how-to-prepare-effectively"><strong>How to Prepare Effectively?</strong></h3>
<ol>
<li><p><strong>Enroll in a Comprehensive HCTA Exam Preparation Course:</strong> This will provide a structured learning path.</p>
</li>
<li><p><strong>Undertake a Small Project Covering All Terraform Use Cases:</strong> It's vital not just to know but to understand the use of all Terraform commands thoroughly.</p>
</li>
<li><p><strong>Mock Exams:</strong> When you feel ready, test your knowledge with mock exams. There are numerous resources available for this.</p>
</li>
</ol>
<blockquote>
<p>Pls remember Theory + Hands-on + Mock exams = HCTA Certified</p>
</blockquote>
<p><em>Good Luck!</em></p>
<p><code>References:</code></p>
<p><em>Course I used for Preparation:</em> <a target="_blank" href="https://www.udemy.com/course/terraform-beginner-to-advanced/">https://www.udemy.com/course/terraform-beginner-to-advanced/</a></p>
<p><em>Exam practice:</em> <a target="_blank" href="https://www.udemy.com/course/terraform-associate-practice-exam">https://www.udemy.com/course/terraform-associate-practice-exam</a></p>
]]></content:encoded></item><item><title><![CDATA[Scaling Stateful Applications in Kubernetes: EKS + EFS.]]></title><description><![CDATA[Motivation:
Kubernetes is widely recognized for its ability to manage containerized applications at scale. However, when it comes to managing stateful applications, certain considerations must be addressed. This article explores the challenges faced ...]]></description><link>https://opsinsights.dev/scaling-stateful-applications-in-kubernetes-eks-efs</link><guid isPermaLink="true">https://opsinsights.dev/scaling-stateful-applications-in-kubernetes-eks-efs</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[EKS]]></category><category><![CDATA[Amazon EFS]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Tue, 30 May 2023 11:31:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1685446113454/78b233d2-0aac-404c-8bac-4b5c0acabb52.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-motivation">Motivation:</h3>
<p><strong>Kubernetes</strong> is widely recognized for its ability to manage containerized applications at scale. However, when it comes to managing stateful applications, certain considerations must be addressed. This article explores the challenges faced in scaling stateful applications and presents solutions for seamless scalability.</p>
<p><em>Let's</em> consider a scenario where we have an application consisting of two microservices that communicate and require a shared volume to support specific processing operations.</p>
<p>Additionally, the data generated by these services need to be retained for further analysis.</p>
<h3 id="heading-requirement">Requirement:</h3>
<p>To ensure scalability among worker nodes, it is necessary to implement a solution that meets the following criteria:</p>
<ul>
<li>The pods must have access to a shared persistent volume.</li>
</ul>
<h2 id="heading-create-efs">Create EFS</h2>
<p>We are not going into the details of creating EFS. Assuming we have an existing EFS already.</p>
<p><code>my efs mount id: fs-582a0sdat</code></p>
<p>Make sure to create an access point. Access points are mount paths in EFS where the data should be accessed.</p>
<p>my efs access point <code>/tmp/poc-efs</code></p>
<h2 id="heading-deploy-efs-driver-in-the-cluster">Deploy EFS driver in the cluster</h2>
<p>Checkout the official document from AWS for EFS driver:</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html">https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html</a></p>
<h3 id="heading-create-storageclass">Create Storageclass</h3>
<p>This is to define a storage class that tells the k8s to this resource to provision volumes. There are a lot of provisioners like EBS, GCE PD, EFS etc.</p>
<p><a target="_blank" href="https://kubernetes.io/docs/concepts/storage/storage-classes/#provisioner">https://kubernetes.io/docs/concepts/storage/storage-classes/#provisioner</a></p>
<p>We are using EFS provisioner for our usecase.</p>
<p><code>storageClass.yaml</code></p>
<pre><code class="lang-yaml"><span class="hljs-attr">kind:</span> <span class="hljs-string">StorageClass</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">storage.k8s.io/v1</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">efs-sc</span>
<span class="hljs-attr">provisioner:</span> <span class="hljs-string">efs.csi.aws.com</span>
<span class="hljs-attr">parameters:</span>
  <span class="hljs-attr">provisioningMode:</span> <span class="hljs-string">efs-ap</span>
  <span class="hljs-attr">fileSystemId:</span> <span class="hljs-string">fs-582a0sdat</span>
  <span class="hljs-attr">directoryPerms:</span> <span class="hljs-string">"700"</span>
  <span class="hljs-attr">gidRangeStart:</span> <span class="hljs-string">"1000"</span> <span class="hljs-comment"># optional</span>
  <span class="hljs-attr">gidRangeEnd:</span> <span class="hljs-string">"2000"</span> <span class="hljs-comment"># optional</span>
  <span class="hljs-attr">basePath:</span> <span class="hljs-string">"/dynamic_provisioning"</span> <span class="hljs-comment"># optional</span>
</code></pre>
<p><code>kubectl apply -f storageclass.yaml</code></p>
<p>Verify the installation.</p>
<p><code>kubectl get pods -n kube-system | grep efs-csi-controller</code></p>
<h3 id="heading-create-a-persistent-volume-and-persistent-volume-claim">Create a persistent volume and persistent volume claim.</h3>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PersistentVolume</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">efs-pv1</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">capacity:</span>
    <span class="hljs-attr">storage:</span> <span class="hljs-string">5Gi</span>
  <span class="hljs-attr">volumeMode:</span> <span class="hljs-string">Filesystem</span>
  <span class="hljs-attr">accessModes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">ReadWriteMany</span>
  <span class="hljs-attr">persistentVolumeReclaimPolicy:</span> <span class="hljs-string">Retain</span>
  <span class="hljs-attr">storageClassName:</span> <span class="hljs-string">efs-sc</span>
  <span class="hljs-attr">mountOptions:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">tls</span>
  <span class="hljs-attr">csi:</span>
    <span class="hljs-attr">driver:</span> <span class="hljs-string">efs.csi.aws.com</span>
    <span class="hljs-attr">volumeHandle:</span> <span class="hljs-string">fs-582a0sdat:/tmp/poc-efs</span>

<span class="hljs-string">-----------</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">PersistentVolumeClaim</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">efs-claim</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">accessModes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-string">ReadWriteMany</span>
  <span class="hljs-attr">storageClassName:</span> <span class="hljs-string">efs-sc</span>
  <span class="hljs-attr">resources:</span>
    <span class="hljs-attr">requests:</span>
      <span class="hljs-attr">storage:</span> <span class="hljs-string">5Gi</span>
</code></pre>
<p>Verify the status</p>
<p><code>kubectl get pv,pvc</code></p>
<h3 id="heading-deploy-application">Deploy Application</h3>
<p>Let's use it in our deployment spec</p>
<p><code>deployment.yaml</code></p>
<pre><code class="lang-yaml">
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
<span class="hljs-attr">name:</span> <span class="hljs-string">efs-app</span>
  <span class="hljs-attr">labels:</span>
    <span class="hljs-attr">app:</span> <span class="hljs-string">efs-app</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>
  <span class="hljs-attr">selector:</span>
    <span class="hljs-attr">matchLabels:</span>
      <span class="hljs-attr">app:</span> <span class="hljs-string">efs-app</span>
  <span class="hljs-attr">containers:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">myapp</span>
      <span class="hljs-attr">image:</span> <span class="hljs-string">centos</span>
      <span class="hljs-attr">command:</span> [<span class="hljs-string">"/bin/sh"</span>]
      <span class="hljs-attr">args:</span> [<span class="hljs-string">"-c"</span>, <span class="hljs-string">"while true; do echo $(date -u) &gt;&gt; /data/out; sleep 5; done"</span>]
      <span class="hljs-attr">volumeMounts:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">persistent-storage</span>
          <span class="hljs-attr">mountPath:</span> <span class="hljs-string">/data</span>
  <span class="hljs-attr">volumes:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">persistent-storage</span>
      <span class="hljs-attr">persistentVolumeClaim:</span>
        <span class="hljs-attr">claimName:</span> <span class="hljs-string">efs-claim</span>
</code></pre>
<p><code>kubectl apply -f deployment.yaml</code></p>
<p>Now bash into the pods and the mount path can be accessed across the pods.</p>
<p>Thank you, Happy Provisioning!</p>
<p>References:</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html">https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html</a></p>
]]></content:encoded></item><item><title><![CDATA[Mastering Fault Tolerance: Ensuring High Availability for Kubernetes Pods]]></title><description><![CDATA[Are Kubernetes pods highly available by default? - Partially true! :p
Certainly! Let's consider a scenario where you have deployed a cluster (let's stick with EKS for simplicity and my preference) and have deployed node groups across 3 availability z...]]></description><link>https://opsinsights.dev/mastering-fault-tolerance-ensuring-high-availability-for-kubernetes-pods</link><guid isPermaLink="true">https://opsinsights.dev/mastering-fault-tolerance-ensuring-high-availability-for-kubernetes-pods</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[scheduler]]></category><category><![CDATA[EKS]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Mon, 22 May 2023 05:20:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1684732306735/3a0df48b-e8ff-4198-a2b1-fcd3f5ac0a97.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-are-kubernetes-pods-highly-available-by-default-partially-true-p">Are Kubernetes pods highly available by default? - Partially true! :p</h3>
<p>Certainly! Let's consider a scenario where you have deployed a cluster (let's stick with EKS for simplicity and my preference) and have deployed node groups across 3 availability zones, consisting of 6 worker nodes. Now, when scheduling a deployment with 3 replicas, can you guarantee that they will be evenly spread across the nodes?</p>
<blockquote>
<p>The answer is no!</p>
</blockquote>
<p>It's challenging to expect such even distribution based on my exploration. However, Kubernetes provides an out-of-the-box solution to address this problem. This solution requires some preparation and planning during the setup stage.</p>
<p>We can delve into this detail in our blog post.</p>
<h3 id="heading-strategies-for-making-pods-highly-available-in-kubernetes">Strategies for Making Pods Highly Available in Kubernetes</h3>
<p>Node affinity: Node affinity is similar to node selector with additional flexibility. There are two types of node affinity.</p>
<p><strong>requiredDuringSchedulingIgnoredDuringExecution</strong>: The scheduler can't schedule the Pod unless the rule is met. This functions like nodeSelector, but with a more expressive/customizable/flexible syntax.**</p>
<p><strong>preferredDuringSchedulingIgnoredDuringExecution:</strong> The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.**</p>
<p>The above will help us to attract pods to respective nodes based on the labels and constraints.</p>
<p><mark>To deploy the pods across the worker nodes we use</mark></p>
<h2 id="heading-pod-topology-spread-constraints">Pod Topology Spread Constraints</h2>
<p>TopologySpreadConstraints describes how a group of pods ought to spread across topology domains. The scheduler will schedule pods in a way that abides by the constraints. All topologySpreadConstraints are ANDed.**</p>
<p>Example definition file for the same</p>
<hr />
<p>apiVersion: v1 kind: Pod metadata: name: example-pod spec:</p>
<h1 id="heading-configure-a-topology-spread-constraint">Configure a topology spread constraint</h1>
<pre><code class="lang-yaml"><span class="hljs-meta">---</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Pod</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">example-pod</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-comment"># Configure a topology spread constraint</span>
  <span class="hljs-attr">topologySpreadConstraints:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">maxSkew:</span> <span class="hljs-number">1</span>
      <span class="hljs-attr">topologyKey:</span> <span class="hljs-string">kubernetes.io/hostname</span>
      <span class="hljs-attr">whenUnsatisfiable:</span> <span class="hljs-string">ScheduleAnyway</span>
      <span class="hljs-attr">labelSelector:</span> 
      <span class="hljs-attr">matchLabelKeys:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">app-name</span>

  <span class="hljs-comment">### other Pod fields go here</span>
</code></pre>
<p><strong>maxSkew</strong> - The extent of uneven distribution in Kubernetes is defined by a parameter, which is set to a default value of 1. This means that at least one pod should be scheduled on each node.</p>
<p>For example, in a cluster with three zones, the initial deployment may have 2 pods in zone 1, 2 pods in zone 2, and 1 pod in zone 3 (2/2/1 distribution). If the deployment is scaled up to 6 pods, the distribution across zones would be adjusted to 2 pods in each zone (2/2/2 distribution).</p>
<p><strong>topologyKey</strong> - in simplified terms, it defines a group of nodes using its labels <a target="_blank" href="http://kubernetes.io/hostname">kubernetes.io/hostname</a> - is the node label and it recognizes all nodes with this label as topology and each server inside it is a domain.</p>
<p><strong>labelSelector</strong> - used to select matching pods</p>
<p>The above definition will help us to deploy and spread the pods across the nodes making it highly available.</p>
<p>Other Useful Tools:</p>
<p><a target="_blank" href="https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller">NodeLabeller</a> - Auto labels node if it is GPU processor.</p>
<p><a target="_blank" href="https://github.com/kubernetes-sigs/descheduler">Descheduler</a> - To effectively utilize resources in the nodes.</p>
<p>Thank you!</p>
<p>**definition from the k8s documentation</p>
<p>References:</p>
<p><a target="_blank" href="https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/">https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/</a></p>
<p><a target="_blank" href="https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector">https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector</a></p>
<p><a target="_blank" href="https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller">https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller</a></p>
]]></content:encoded></item><item><title><![CDATA[Golang Cheat-sheet: Quick guide to start Golang for python developers.]]></title><description><![CDATA[As a cloud-native python developer in day-to-day infrastructure and Site reliability operations, I always wanted to learn Golang due to its great performance comparably and for its out-of-the-box simplicity in multiprocessing and multithreading.
Chec...]]></description><link>https://opsinsights.dev/golang-cheatsheet</link><guid isPermaLink="true">https://opsinsights.dev/golang-cheatsheet</guid><category><![CDATA[Go Language]]></category><category><![CDATA[cheatsheet]]></category><category><![CDATA[cloud native]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Tue, 21 Feb 2023 10:28:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1676982718882/6713da48-ef9a-4389-a80b-68ca0e37430a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As a cloud-native python developer in day-to-day infrastructure and Site reliability operations, I always wanted to learn <code>Golang</code> due to its great performance comparably and for its out-of-the-box simplicity in multiprocessing and multithreading.</p>
<p><em>Check out this blog post Go concurrency:</em> <a target="_blank" href="https://opsinsights.dev/my-first-go-program">https://opsinsights.dev/my-first-go-program</a></p>
<p>So, while getting started with <code>Golang</code>, I noticed similarities;</p>
<p>Otherwise, I tried to relate, and compare with python which helped me to grab the concepts, and syntax quicker. This is the motivation for this blog post, as understanding the similarities of what we already know helps our mind to easily adopt new learning.</p>
<p><code>Sharing my findings and understanding while learning GOlang. Part-1</code></p>
<h2 id="heading-a-comprehensive-guide-to-getting-started-with-go-for-python-developers">A comprehensive guide to getting started with GO for python developers.</h2>
<h3 id="heading-highlights-of-golang">Highlights of Golang.</h3>
<blockquote>
<p><em>Go is expressive, concise, clean, and efficient\</em>**</p>
</blockquote>
<p>**as per documentation</p>
<ul>
<li><p>Go is a compiled language. Statically typed.</p>
</li>
<li><p>Compiled executables are OS-specific, like a runtime. If compiled in windows will work only for windows.</p>
</li>
<li><p>Case sensitive</p>
</li>
<li><p>Go is designed for the next generation of C.</p>
</li>
</ul>
<p><em>Sometimes it feels like python, but It's a fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.</em></p>
<p>Advanced concepts in python like multiprocessing, and multithreading is day to day in Golang which is a core functionality.</p>
<h3 id="heading-golang-does-not-support">Golang does not support.</h3>
<ul>
<li><p>Type inheritance</p>
</li>
<li><p>Method or Operator overloading</p>
</li>
<li><p>Does not have structured exception handling. (try catch as in python)</p>
</li>
<li><p>NO implicit numeric conversions</p>
</li>
</ul>
<h1 id="heading-importing-packages">Importing packages.</h1>
<p><strong><mark>.py</mark></strong></p>
<p>to import a package, below are some of the methods of importing modules, packages</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> Game.Level.start <span class="hljs-comment">#to import any specific module </span>
<span class="hljs-keyword">import</span> Game.* <span class="hljs-comment">#full import from Game.Level </span>
<span class="hljs-keyword">import</span> start <span class="hljs-comment">#without package prefix</span>
<span class="hljs-keyword">import</span> Game.Level <span class="hljs-keyword">as</span> GL <span class="hljs-comment">#alias imports</span>
</code></pre>
<p><strong><mark>.go</mark></strong></p>
<pre><code class="lang-go"><span class="hljs-keyword">import</span> <span class="hljs-string">"fmt"</span> <span class="hljs-keyword">import</span> <span class="hljs-string">"os"</span>

<span class="hljs-comment">//to import multiple packages we can also enclose in simple brackets as shown below for.</span>

<span class="hljs-keyword">import</span> ( 
    <span class="hljs-string">"fmt"</span> 
    <span class="hljs-string">"time"</span> 
    <span class="hljs-string">"math"</span> 
)

<span class="hljs-keyword">import</span> mt <span class="hljs-string">"math"</span> 
<span class="hljs-comment">// syntax for alias import </span>

<span class="hljs-keyword">import</span> . math 
<span class="hljs-comment">// Dot import  allows to use modules without referencing package names.</span>
</code></pre>
<h1 id="heading-using-variables">Using variables</h1>
<p><strong><mark>.py</mark></strong></p>
<p>variable declaration is a simple assignment using this operator. "="</p>
<pre><code class="lang-python">
variable_1 = <span class="hljs-number">10</span> 
variable_2 = <span class="hljs-string">"hello, team!"</span>
</code></pre>
<p>*variables are case-sensitive.</p>
<p><strong><mark>.go</mark></strong></p>
<pre><code class="lang-go"><span class="hljs-comment">//var variableName dataType = initialValue</span>
<span class="hljs-keyword">var</span> x <span class="hljs-keyword">int</span> = <span class="hljs-number">10</span>
</code></pre>
<p>there will be default initial values for variables if not assigned.</p>
<ul>
<li><p>int and float types: 0</p>
</li>
<li><p>bool type: false</p>
</li>
<li><p>string type: ""</p>
</li>
<li><p>array and struct types: all of their elements are set to their respective zero values pointer, slice, map, channel, and function types: nil</p>
</li>
</ul>
<p><em>For example, if you declare an integer variable x without an initial value, Go will assign it the zero value of int, which is 0:</em></p>
<h1 id="heading-error-handling">Error handling</h1>
<p><strong><mark>.py</mark></strong></p>
<p>try-catch block syntax in python.</p>
<pre><code class="lang-python"><span class="hljs-keyword">try</span>: 
    print(x) 
<span class="hljs-keyword">except</span>: 
    print(<span class="hljs-string">"An exception occurred"</span>)
</code></pre>
<p><strong><mark>.go</mark></strong></p>
<p>In Go, errors are represented as values of the built-in error, which can defined as.</p>
<pre><code class="lang-go"><span class="hljs-keyword">type</span> error <span class="hljs-keyword">interface</span> { 
    Error() <span class="hljs-keyword">string</span> 
}
</code></pre>
<p>If a function completes successfully the error value is null, else it will return the error.</p>
<hr />
<p><em>The above example is with the interface, we will learn more in the next blog post.</em></p>
<p>All the above sections deserve an individual post, however*,* this will help you to kindle and kick-start Golang learning.</p>
<hr />
<p><strong>End of Part 1.</strong></p>
<p>Happy learning. Thank you.</p>
<p><strong>References:</strong></p>
<p><a target="_blank" href="https://go.dev/doc/code">https://go.dev/doc/code</a></p>
<p><a target="_blank" href="https://blog.xojo.com/2017/12/06/compilers-101-overview-and-lexer/">https://blog.xojo.com/2017/12/06/compilers-101-overview-and-lexer/</a></p>
]]></content:encoded></item><item><title><![CDATA[Terraform Workspaces with a simple example use case.]]></title><description><![CDATA[What is a terraform workspace?
A TF workspace isolates state information within a workspace. At the time of workspace creation, an isolated TF backend is created. This ensures existing configurations are not disturbed.
Sounds interesting? Let us find...]]></description><link>https://opsinsights.dev/terraform-workspaces-with-a-simple-example-use-case</link><guid isPermaLink="true">https://opsinsights.dev/terraform-workspaces-with-a-simple-example-use-case</guid><category><![CDATA[Terraform]]></category><category><![CDATA[workspace]]></category><category><![CDATA[WeMakeDevs]]></category><category><![CDATA[Terraform workspace]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Tue, 24 Jan 2023 07:26:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1674545003464/3da1d64a-9e29-4da1-8e2e-d4e27bbd6c21.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-what-is-a-terraform-workspace">What is a terraform workspace?</h1>
<p>A TF workspace isolates state information within a workspace. At the time of workspace creation, an isolated TF backend is created. This ensures existing configurations are not disturbed.</p>
<h3 id="heading-sounds-interesting-let-us-find-some-interesting-use-in-this-blog-post">Sounds interesting? Let us find some interesting use in this blog post.</h3>
<blockquote>
<p><code>Using terraform workspaces is equivalent to using a container with different volumes.</code></p>
</blockquote>
<p><mark>NVM :p that's a metaphor.</mark></p>
<p>By default, every time you do terraform init a workspace is created which is a default workspace. So by design or by choice, we are using terraform workspace every day.</p>
<p><em>you can quickly switch between workspaces using the terraform workspace cli. (Refer to TF workspace</em> <em>cheat</em> <em>sheet below)</em></p>
<h3 id="heading-definition-as-in-documentation"><em>Definition as in documentation:</em></h3>
<p><code>You can create multiple working directories to maintain multiple instances of a configuration with completely separate state data.</code></p>
<p>As defined, state data if isolated between workspaces which helps to use the same code multiple provisioning.</p>
<h3 id="heading-to-understand-better-let-us-try-an-example-use-case">To understand better let us try an example use case:</h3>
<p><em>An identical infrastructure should be provisioned in two different regions, us-east-1(N.Virginia), and eu-west-1(Ireland)</em></p>
<p>Provided the use case, there are several ways to achieve this, however, let us see an example with terraform workspaces.</p>
<p>I have used a local provisioner for this demo to keep it simple. I did not create any workspace yet, let us check which workspace we are in.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543302163/b611fe5a-5ff7-4855-baef-47e7635dbc67.png" alt class="image--center mx-auto" /></p>
<p>Below terraform creates a new txt file with text inside "Workspace Blog post". Also if you notice I have used terraform workspace variable to dynamically name the txt file.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543479263/f5455465-902f-463e-a77d-e1f8c3ce89fc.png" alt class="image--center mx-auto" /></p>
<p>This will help us to identify from which workspace execution this file was created. Let us plan and apply our script. And a file will be created as shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543599853/04adbfd6-0dc1-47c8-b20b-e3a6e9ccf072.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543645477/98edef2e-e976-4d8c-a206-9dc68b5f9c69.png" alt class="image--center mx-auto" /></p>
<p>let us create a new workspace with the same blog-demo, and make sure we are in the correct workspace.</p>
<p><code>terraform workspace new blog-demo</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543690703/01ef5ae7-b239-449d-878b-863e9169a0e2.png" alt class="image--center mx-auto" /></p>
<p>Successfully created and terraform workspace list will display all the workspaces available. Trying to terraform plan and apply will results as shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1674543743769/b4b7fae0-48f8-4adc-9cc8-8c123cbf7375.png" alt class="image--center mx-auto" /></p>
<p>With the same terraform code without any modifications we had two different executions.</p>
<p>Now co-relate this with provisioning multi-region infrastructure with the common code base.</p>
<h3 id="heading-key-takeaways-of-using-terraform-workspace">Key takeaways of using terraform workspace.</h3>
<ul>
<li><p>in a similar approach, we can differentiate between regions based on the workspace names.</p>
</li>
<li><p>terraform execution is faster since it will reuse the modules and packages downloaded stead of downloading again.</p>
</li>
<li><p>increases code reusability and efficiency.</p>
</li>
</ul>
<p>And any more use cases, it grows depending on our design pattern.</p>
<h2 id="heading-workspace-commands-cheat-sheet">Workspace commands cheat sheet.</h2>
<p><em>To list all the existing workspaces.</em></p>
<pre><code class="lang-bash"> terraform workspace list
</code></pre>
<p><em>To select/switch to a new workspace:</em></p>
<pre><code class="lang-bash">terraform workspace select &lt;workspace name&gt;
</code></pre>
<p><em>To create a new workspace:</em></p>
<pre><code class="lang-bash">terraform workspace new &lt;workspace name&gt;
</code></pre>
<p><em>To delete a workspace</em></p>
<pre><code class="lang-bash">terraform workspace delete  &lt;workspace name&gt;
</code></pre>
<p>ignore dependencies and track</p>
<pre><code class="lang-bash">terraform workspace delete -force &lt;workspace name&gt;
</code></pre>
<p><em>To display the current active workspace:</em></p>
<pre><code class="lang-bash">terraform workspace show
</code></pre>
<p>Thank you, Peace!</p>
<p><strong>References:</strong></p>
<p>Feel free to go through these articles if you prefer a more detailed understanding.</p>
<p><a target="_blank" href="https://spacelift.io/blog/terraform-workspaces">https://spacelift.io/blog/terraform-workspaces</a></p>
<p><a target="_blank" href="https://developer.hashicorp.com/terraform/language/state/workspaces">https://developer.hashicorp.com/terraform/language/state/workspaces</a></p>
<p><a target="_blank" href="https://registry.terraform.io/providers/hashicorp/local/latest/docs/resources/file">https://registry.terraform.io/providers/hashicorp/local/latest/docs/resources/file</a></p>
]]></content:encoded></item><item><title><![CDATA[ARGO workflows Vs AWS Step functions.]]></title><description><![CDATA[This blog quickly shows the comparison between AWS Step functions and ARGO workflow.

AWS STEP Functions
AWS Step Functions is a serverless orchestration service that lets you integrate with AWS Lambda functions and other AWS services for any specifi...]]></description><link>https://opsinsights.dev/argo-workflows-vs-aws-step-functions</link><guid isPermaLink="true">https://opsinsights.dev/argo-workflows-vs-aws-step-functions</guid><category><![CDATA[ArgoCD]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[AWS]]></category><category><![CDATA[stepfunction]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Tue, 13 Dec 2022 15:12:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1670944095889/f_m80XiUu.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>This blog quickly shows the comparison between AWS Step functions and ARGO workflow.</p>
</blockquote>
<h2 id="heading-aws-step-functions">AWS STEP Functions</h2>
<p>AWS Step Functions is a serverless orchestration service that lets you integrate with AWS Lambda functions and other AWS services for any specific use cases.</p>
<p>Step functions help to execute stateful and stateless jobs in a sequential way to achieve a solution.</p>
<p>Assume 10-STEP operational workflow; in each step, it does a specific operation that is interrelated to all other jobs, like the success of STEP A triggers STEP B.</p>
<p>Each step can be in the different executor,</p>
<p>STEP-1: python, STEP-2: bash etc..,</p>
<p>And above is a single example and there can be n different use cases for such, like sequential Jobs, parallel jobs and at scale etc.</p>
<p>AWS Step functions are AWS agonistic. Yes, we can communicate with any of the other AWS services via API/Cloud SDK.</p>
<p><em>One of the practical AWS Step function use cases.</em></p>
<ol>
<li><p>Restore a database from the backup.</p>
</li>
<li><p>Sanitize the sensitive data.</p>
</li>
<li><p>update the connection strings in the secrets manager.</p>
</li>
</ol>
<h2 id="heading-argo-workflows">ARGO Workflows.</h2>
<p>ARGO workflow is an open-source container-native workflow engine for orchestrating jobs on Kubernetes.</p>
<p>container native - Yes, bring out your imagination of use cases here to leverage the power of ARGO-Workflow at scale.</p>
<p>Some of the most common use cases are,</p>
<ul>
<li><p>CICD pipelines</p>
</li>
<li><p>analytics operation</p>
</li>
<li><p>high volume data processing</p>
</li>
<li><p>sequential, parallel and more</p>
</li>
<li><p>directed-acyclic graph (DAG)</p>
</li>
</ul>
<p><mark>How it is different from AWS Step functions?</mark></p>
<p>AWS Step functions are AWS agonistic whereas ARGO Workflows are cloud-agnostic.</p>
<ul>
<li><p>Can be provisioned using cloud formation templates and AWS CLI. (submitting step functions via CLI)</p>
</li>
<li><p>job submission can be in JSON/YAML.</p>
</li>
<li><p>support GUI, API, CLI and through supported AWS services. (Pls refer to the aws doc linked at the bottom.)</p>
</li>
<li><p>support to create lambda functions at each stage.</p>
</li>
</ul>
<p>ARGO Workflows can be used across cloud providers.</p>
<ul>
<li><p>Workflows can be templated/bootstrapped.</p>
</li>
<li><p>ARGO jobs can be submitted using GUI and CLI.</p>
</li>
<li><p>Workflows can be templated and YAML syntax</p>
</li>
</ul>
<h3 id="heading-pricing">Pricing</h3>
<p><strong>AWS STEP Functions.</strong></p>
<p>It is easy to pay as you go, However, if we decode below are the price points to be considered while using step fn.</p>
<ul>
<li><p>State transition charges + based on resource usage(CPU RAM utilized for each step)</p>
</li>
<li><p>Lambda backend charges, EC2, ECS, ECR(if required)</p>
</li>
</ul>
<p><strong>ARGO</strong></p>
<p>Cloud hosting charges in the Kubernetes environment. (can be a part of existing cluster) or Independent hosting in any vm.</p>
<blockquote>
<p>Who knows maybe ARGO can come up with their native cloud-hosted environment in the future, and I am counting on it.</p>
</blockquote>
<p>We will discuss ARGO workflow with an example in detail in our upcoming blog post</p>
<p>Thank you!</p>
<p>References:</p>
<p><a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/development-options.html">https://docs.aws.amazon.com/step-functions/latest/dg/development-options.html</a></p>
<p><a target="_blank" href="https://argoproj.github.io/workflows/#:~:text=What%20is%20Argo%20Workflows%3F,the%20workflow%20is%20a%20container">https://argoproj.github.io/workflows/#:~:text=What%20is%20Argo%20Workflows%3F,the%20workflow%20is%20a%20container</a>.</p>
<p><a target="_blank" href="https://aws.amazon.com/step-functions/pricing/">https://aws.amazon.com/step-functions/pricing/</a></p>
]]></content:encoded></item><item><title><![CDATA[My first GO! program. Letsss G0!!!]]></title><description><![CDATA[Before getting started with GO, WHY GO? Do I hate python, it's a BIG NO.

 There are some perks, ease of adoption, and responsibilities as we consider microservice architecture mainly in INFRA design and automation.


Merits of GO alongside Python.
G...]]></description><link>https://opsinsights.dev/my-first-go-program</link><guid isPermaLink="true">https://opsinsights.dev/my-first-go-program</guid><category><![CDATA[Go Language]]></category><category><![CDATA[concurrency]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Mon, 04 Jul 2022 12:32:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1656937721046/vC-xylt7C.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before getting started with <code>GO</code>, WHY <code>GO</code>? Do I hate python, it's a BIG NO.

 There are some perks, ease of adoption, and responsibilities as we consider microservice architecture mainly in INFRA design and automation.
</p>

<h2 id="heading-merits-of-go-alongside-python">Merits of GO alongside Python.</h2>
<p>GO solved two major problems,</p>
<ul>
<li>Ease of programming as an interpreted language like Python.</li>
<li>With efficiency and safety as a static language like C++.</li>
<li>Competitive multi-core computing as built-in support</li>
</ul>
<h3 id="heading-concurrency-this-blog-will-focus-only-concurrency-speciality-of-go">Concurrency. This blog will focus only concurrency speciality of GO.</h3>
<ul>
<li><p>Concurrency in other languages comes as extended functionality unlike in GO we have built-in support also known as <code>goroutines</code>. Goroutines are deeply integrated with Go's runtime, this runtime engine takes care of managing the threads (blocking &amp; unblocking)</p>
</li>
<li><p>The runtime and the logic of a goroutine work together. Goroutines can communicate with one another and synchronize their execution. </p>
</li>
<li>In Go, one of the synchronization elements is called a channel. Channel helps to share data between goroutines</li>
</ul>
<p>To learn more about concurrency in python: https://dev.to/kcdchennai/optimising-python-workloads-for-kubernetes-1d6c</p>
<p>To run a function as a goroutine, call that function prefixed with the go statement. </p>
<pre><code><span class="hljs-selector-tag">sum</span>()     <span class="hljs-comment">// A normal function call that executes sum synchronously and waits for completing it</span>
<span class="hljs-selector-tag">go</span> <span class="hljs-selector-tag">sum</span>()  <span class="hljs-comment">// A goroutine that executes sum asynchronously and doesn't wait for completing it</span>
</code></pre><h3 id="heading-lets-the-learn-this-concurrency-in-go-using-an-example">Lets the learn this concurrency in GO using an example.</h3>
<p>Using Opensource API endpoint here http://worldtimeapi.org/pages/examples, this API helps us with timezone using various params, we are using this to find timezone based on IP.</p>
<pre><code><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"io"</span>
    <span class="hljs-string">"log"</span>
    <span class="hljs-string">"net/http"</span>
    <span class="hljs-string">"os"</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">request</span><span class="hljs-params">(url <span class="hljs-keyword">string</span>)</span></span> {
    res, err := http.Get(url)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-built_in">panic</span>(err)
    }

    <span class="hljs-keyword">defer</span> res.Body.Close()
    b, err := io.ReadAll(res.Body)
    fmt.Println(<span class="hljs-keyword">string</span>(b))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(os.Args) &lt; <span class="hljs-number">2</span> {
        log.Fatalln(<span class="hljs-string">"Usage: go run main.go &lt;url1&gt; &lt;url2&gt; ... &lt;urln&gt;"</span>)
    }
    <span class="hljs-keyword">for</span> _, url := <span class="hljs-keyword">range</span> os.Args[<span class="hljs-number">1</span>:] {
        request(<span class="hljs-string">"http://"</span> + url)
    }
}
</code></pre><p>Below is the response to my above excerpt. I have passed 4 args to the above function. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656928799890/EYbLIS3EX.png" alt="Screenshot 2022-07-04 at 3.28.22 PM.png" /></p>
<p>What is happening in the above program is <em>Sequential execution</em></p>
<p>We sent a request to the first argument and when it completes, a response comes in. Then the program returns to the for loop and sends another request to the next argument, and the process continues.</p>
<p>Let's do concurrency using <code>goroutines</code>  now</p>
<h2 id="heading-goroutines">GOroutines</h2>
<p>GOroutines are managed by GO runtime scheduler which takes care managing threads accessing CPU.</p>
<blockquote>
<p> asynchronous, powerful</p>
</blockquote>
<p>As we already know to make fn into goroutine, add a prefix <strong>go</strong> behind the fn invocation. Updating the code below</p>
<pre><code><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"io"</span>
    <span class="hljs-string">"log"</span>
    <span class="hljs-string">"net/http"</span>
    <span class="hljs-string">"os"</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">request</span><span class="hljs-params">(url <span class="hljs-keyword">string</span>)</span></span> {
    res, err := http.Get(url)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-built_in">panic</span>(err)
    }

    <span class="hljs-keyword">defer</span> res.Body.Close()
    b, err := io.ReadAll(res.Body)
    fmt.Println(<span class="hljs-keyword">string</span>(b))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(os.Args) &lt; <span class="hljs-number">2</span> {
        log.Fatalln(<span class="hljs-string">"Usage: go run main.go &lt;url1&gt; &lt;url2&gt; ... &lt;urln&gt;"</span>)
    }
    <span class="hljs-keyword">for</span> _, url := <span class="hljs-keyword">range</span> os.Args[<span class="hljs-number">1</span>:] {
        <span class="hljs-keyword">go</span> request(<span class="hljs-string">"http://"</span> + url)
    }
}
</code></pre><p>The above program will result in below, which in 0.843 seconds. </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656935655029/AWsYRmE2x.png" alt="Screenshot 2022-07-04 at 5.18.52 PM.png" /></p>
<p>:| why?
Yes, we have started the concurrency and it triggered the fn 4 times but did not wait for the output. In order to visualize this let us add a <strong>waitgroup</strong></p>
<h3 id="heading-waitgroup">WaitGroup</h3>
<p>WaitGroup is included in the Golang sync package. It includes features that allow it to block and wait for any number of goroutines to complete their execution.</p>
<p>Code with added wait:</p>
<pre><code><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"io"</span>
    <span class="hljs-string">"log"</span>
    <span class="hljs-string">"net/http"</span>
    <span class="hljs-string">"os"</span>
    <span class="hljs-string">"sync"</span>
)

<span class="hljs-keyword">var</span> wg sync.WaitGroup

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">request</span><span class="hljs-params">(url <span class="hljs-keyword">string</span>)</span></span> {
    <span class="hljs-keyword">defer</span> wg.Done()
    res, err := http.Get(url)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-built_in">panic</span>(err)
    }

    <span class="hljs-keyword">defer</span> res.Body.Close()
    b, err := io.ReadAll(res.Body)
    fmt.Println(<span class="hljs-keyword">string</span>(b))
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(os.Args) &lt; <span class="hljs-number">2</span> {
        log.Fatalln(<span class="hljs-string">"Usage: go run main.go &lt;url1&gt; &lt;url2&gt; ... &lt;urln&gt;"</span>)
    }
    <span class="hljs-keyword">for</span> _, url := <span class="hljs-keyword">range</span> os.Args[<span class="hljs-number">1</span>:] {
        <span class="hljs-keyword">go</span> request(<span class="hljs-string">"http://"</span> + url)
        wg.Add(<span class="hljs-number">1</span>)
    }
    wg.Wait()
}
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1656937219518/ybf2JUfzG.png" alt="Screenshot 2022-07-04 at 5.47.11 PM.png" /></p>
<p>While executing the same fn with 8 args, the response was within   0.853s which is 4x faster than the normal way.</p>
<p>Hope you enjoyed this new exploration. We will discuss more about Go in my upcoming blog posts.</p>
<p>Peace!</p>
]]></content:encoded></item><item><title><![CDATA[Container Runtime Interface (CRI), Docker deprecation & Dockershim]]></title><description><![CDATA[#kcdchennai #kubernetes #docker #devops
Author: Jothimani Radhakrishnan (Lifion by ADP). A Software Product Engineer, Cloud enthusiast | Blogger | DevOps | SRE | Python Developer. I usually automate my day-to-day stuff and Blog my experience on chall...]]></description><link>https://opsinsights.dev/container-runtime-interface-cri-docker-deprecation-and-dockershim</link><guid isPermaLink="true">https://opsinsights.dev/container-runtime-interface-cri-docker-deprecation-and-dockershim</guid><category><![CDATA[Devops]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Thu, 27 Jan 2022 15:12:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1643295484218/if-u1ndx6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>#kcdchennai #kubernetes #docker #devops</p>
<p>Author: Jothimani Radhakrishnan (Lifion by ADP). A Software Product Engineer, Cloud enthusiast | Blogger | DevOps | SRE | Python Developer. I usually automate my day-to-day stuff and Blog my experience on challenging items.</p>
<h1 id="heading-intro">Intro</h1>
<p>Hey Docker lovers &lt;3, this is not going to be a happy story for you guys. Yes, for me as well. Like everyone, I loved using Docker very much. When I get to know about this project (Dockershim), it was a heartbreaking moment for me.</p>
<p>Before knowing about Dockershim let us discuss the following.</p>
<p>Kubelet takes care of managing worker nodes in relation to the master node. It ensures that the specified containers for the pod are up and running.</p>
<p>To know more about kubelet: <a target="_blank" href="Link">https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/</a></p>
<h2 id="heading-container-runtime">Container runtime:</h2>
<p>Container runtime is a software that is responsible for running containers. To create a pod, kubelet needs a container runtime environment. For a long time, Kubernetes used Docker as its default container runtime.</p>
<p>This creates a problem/dependency that whenever docker release updates/upgrades it breaks the Kubernetes.</p>
<p>There are also several container runtime tools.</p>
<ul>
<li>containerd</li>
<li>CRI-O</li>
<li>Docker</li>
<li>Rocket</li>
<li>LXD</li>
<li>OpenVZ</li>
<li>Windows Server Containers</li>
</ul>
<p>Okay! Coming back to the context of this blog. Docker is going to be deprecated from Kubernetes as default and containerd is going to replace the place.</p>
<h2 id="heading-what-is-dockershim">What is Dockershim?</h2>
<p>Docker existed as a default engine in k8s, after introducing additional CRI access (Container runtime interface) in k8s, Kubernetes created an adaptor component called dockershim.</p>
<p>The dockershim adapter allows the kubelet to interact with Docker as if Docker were a CRI compatible runtime.</p>
<p>Img src: Kubernetes documentation</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1643296206947/kzu7JzJ2c.png" alt="cri-containerd.png" /></p>
<pre><code>Switching <span class="hljs-keyword">to</span> Containerd <span class="hljs-keyword">as</span> a container runtime eliminates the middleman
</code></pre><p>Points to Check to make sure your environment is not affected by this change.
You can continue to use Docker to build images, using Docker is not considered a dependency.</p>
<p>All these below pointers should be considered and updated as per your native CRI.</p>
<p>Make sure to update your docker commands operations that are running inside the pod. For example, listing running containers in the worker node using docker ps which might not work after the deprecation.</p>
<p>Check for any private registries or image mirror settings in the Docker configuration file (like /etc/docker/daemon.json)</p>
<ol>
<li>Any scripts that ssh in worker nodes and does any docker CRUD operations.</li>
<li>Any third-party tools using docker needs to be updated. Migrating Telemetry</li>
<li>Any alerts that are configured based on Docker specific errors should be updated.</li>
<li>Any automation or bootstrap scripts based on Docker commands should be updated.</li>
<li>The list is not limited as mentioned above and might vary based on your adoption of usage.</li>
</ol>
<p>To know more about the deprecation FAQ: https://kubernetes.io/blog/2020/12/02/dockershim-faq/</p>
<p>Thank you,</p>
<p><em>Happy containerd! :p
</em></p>
<p>Reference:
https://developer.ibm.com/blogs/kube-cri-overview/
https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/
https://kubernetes.io/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/</p>
]]></content:encoded></item><item><title><![CDATA[AWS - Karpenter - Kubernetes cluster auto-scaler]]></title><description><![CDATA[AWS recently announced  Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler - ReInvent-2021 
Before getting into the discussion of Karpenter, let's discuss k8s native cluster auto-scaler.
According to the documentation below is ...]]></description><link>https://opsinsights.dev/aws-karpenter-kubernetes-cluster-auto-scaler</link><guid isPermaLink="true">https://opsinsights.dev/aws-karpenter-kubernetes-cluster-auto-scaler</guid><category><![CDATA[AWS]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Cloud Computing]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Wed, 08 Dec 2021 05:09:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1638940317525/8BOgMz6ms.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS recently announced  <a target="_blank" href="https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler/">Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler - ReInvent-2021</a> </p>
<p>Before getting into the discussion of Karpenter, let's discuss k8s native cluster auto-scaler.</p>
<p>According to the documentation below is the definition:</p>
<blockquote>
<p>Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:</p>
</blockquote>
<ul>
<li>there are pods that failed to run in the cluster due to insufficient resources.</li>
<li>there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.</li>
</ul>
<p>And if we prefer to adjust the CA for various use cases we have to create each group for them, as explained below.</p>
<h2 id="heading-traditional-way">Traditional way:</h2>
<p>Creating autoscaling groups different types of node groups based on our needs.</p>
<p>Example:</p>
<p><em>ASG node group for GPU use
</em>ASG node group for general use and based on instance family etc..</p>
<p>This creates more overhead in maintenance and operation costs. :( </p>
<p>Why do we need a cloud-native cluster auto-scaler?</p>
<p>Cloud-native CA (cluster-autoscaler) explores the full capability of the native tools and technologies which helps to effectively and efficiently use the resources in all aspects.</p>
<p>This creates a debate about Kubernetes native vs AWS native. </p>
<h2 id="heading-karpenter">Karpenter:</h2>
<p>Karpenter solves the following problem making effective decisions and scheduling.</p>
<blockquote>
<p>What does the pod need? Where is the pod best fit-in? What can I do to best fit that pod in a node?</p>
</blockquote>
<p>it provides complete control over ec2 instances</p>
<pre><code><span class="hljs-string">cat</span> <span class="hljs-string">&lt;&lt;EOF</span> <span class="hljs-string">|kubectl</span> <span class="hljs-attr">apply -f -apiVersion:</span> <span class="hljs-string">karpenter.sh/v1alpha5</span>
 <span class="hljs-attr">kind:</span> <span class="hljs-string">Provisioner</span>
 <span class="hljs-attr">metadata:</span>
 <span class="hljs-attr">name:</span> <span class="hljs-string">default</span>
 <span class="hljs-attr">spec:</span>
 <span class="hljs-comment">#Requirements that constrain the parameters of provisioned nodes. </span>
 <span class="hljs-comment">#Operators { In, NotIn } are supported to enable including or excluding values</span>
   <span class="hljs-attr">requirements:</span>
     <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">node.k8s.aws/instance-type</span> <span class="hljs-comment">#If not included, all instance types are considered</span>
       <span class="hljs-attr">operator:</span> <span class="hljs-string">In</span>
       <span class="hljs-attr">values:</span> [<span class="hljs-string">"m5.large"</span>, <span class="hljs-string">"m5.2xlarge"</span>]
     <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">"topology.kubernetes.io/zone"</span> <span class="hljs-comment">#If not included, all zones are considered</span>
       <span class="hljs-attr">operator:</span> <span class="hljs-string">In</span>
       <span class="hljs-attr">values:</span> [<span class="hljs-string">"us-east-1a"</span>, <span class="hljs-string">"us-east-1b"</span>]
     <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">"kubernetes.io/arch"</span> <span class="hljs-comment">#If not included, all architectures are considered</span>
       <span class="hljs-attr">values:</span> [<span class="hljs-string">"arm64"</span>, <span class="hljs-string">"amd64"</span>]
     <span class="hljs-bullet">-</span> <span class="hljs-attr">key:</span> <span class="hljs-string">" karpenter.sh/capacity-type"</span> <span class="hljs-comment">#If not included, the webhook for the AWS cloud provider will default to on-demand</span>
       <span class="hljs-attr">operator:</span> <span class="hljs-string">In</span>
       <span class="hljs-attr">values:</span> [<span class="hljs-string">"spot"</span>, <span class="hljs-string">"on-demand"</span>]
   <span class="hljs-attr">provider:</span>
     <span class="hljs-attr">instanceProfile:</span> <span class="hljs-string">KarpenterNodeInstanceProfile-eks-karpenter-demo</span>
   <span class="hljs-attr">ttlSecondsAfterEmpty:</span> <span class="hljs-number">30</span>  
 <span class="hljs-string">EOF</span>
</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1638940018225/_jLyiJGRJ.png" alt="karpenter-overview.png" /></p>
<p>From the above provisioner config file, we can see we can control instance type, Availability zone, architecture and capacity type (on-demand vs Spot)</p>
<h2 id="heading-highlights">Highlights:</h2>
<ul>
<li><p>Faster Scheduling - Karpenter directly manages the nodes.</p>
</li>
<li><p>No Node group provisioning - Karpenter takes care of instance provisioning with node groups, it can schedule pods in instances based on the configuration.</p>
</li>
<li><p>Cost-effective than k8s native cluster auto-scaler.</p>
</li>
</ul>
<blockquote>
<p>Karpenter: Right now it only supports AWS native, anyone can contribute to other cloud tools.</p>
</blockquote>
<p><em>Wonderfull self-explained video from JustinGarrison  about Karpenter
</em>
 <a target="_blank" href="https://www.youtube.com/watch?v=3QsVRHVdOnM&amp;ab_channel=JustinGarrison">https://www.youtube.com/watch?v=3QsVRHVdOnM&amp;ab_channel=JustinGarrison
</a>  </p>
<p>--- End-of-Blog ---</p>
<p>Reference:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md</p>
]]></content:encoded></item><item><title><![CDATA[My take on Elasticsearch as Primary Database?]]></title><description><![CDATA[ElasticSearch:
Elasticsearch(ES) is a distributed, full-text document store, search engine which is based on Apache Lucene as a core library. ES was originally designed for rich text-based search with advanced features to support complex queries, fil...]]></description><link>https://opsinsights.dev/elasticsearch-as-primary-database</link><guid isPermaLink="true">https://opsinsights.dev/elasticsearch-as-primary-database</guid><category><![CDATA[elasticsearch]]></category><category><![CDATA[database]]></category><dc:creator><![CDATA[Jothimani Radhakrishnan]]></dc:creator><pubDate>Sun, 21 Nov 2021 11:08:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1637479887631/XWFMDZrkL.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-elasticsearch">ElasticSearch:</h1>
<p>Elasticsearch(ES) is a distributed, full-text document store, search engine which is based on Apache Lucene as a core library. ES was originally designed for rich text-based search with advanced features to support complex queries, filters, analyzers with multiple languages. ES indexes and stores data so that it can be retrieved/searched in near real-time. </p>
<p><em>Distributed: ES saves the data in multiple nodes, data can be retrieved from any node at any time
</em></p>
<p><strong>Highlights of Lucene core:
</strong></p>
<ul>
<li>It can return search responses quickly.</li>
<li>Based on the inverted index data structure. (stores mapping from content such as words or numbers to its location in a document(s))  </li>
</ul>
<h2 id="heading-possibilities-on-using-es-as-primary-database">Possibilities on using ES as primary database:</h2>
<p>It depends on a lot of factors, such as, </p>
<blockquote>
<p>(One ongoing WIP from the elastic team is that they are constantly working towards improving the resiliency)</p>
</blockquote>
<pre><code>    ○ Size <span class="hljs-keyword">of</span> <span class="hljs-built_in">document</span>
    ○ <span class="hljs-built_in">Number</span> <span class="hljs-keyword">of</span> concurrent requests <span class="hljs-keyword">of</span> read per second
    ○ Writes per second
</code></pre><p><code>ES always does best in read requests, as the name suggests it is indexed data store.</code></p>
<p>If we prefer to increase the write (aka indexing rate) it can be achieved in several ways. refresh_interval, flush_threshold_size are some of the key parameters to be considered while improving writes.</p>
<h2 id="heading-not-impossible-however-is-it-advisable-to-go-my-take">Not impossible, however, is it advisable to go? My take;</h2>
<p>Below are the pointers out of my experience in using ES as a database in production.</p>
<ul>
<li><p>It is not preferred to use ES as the primary database when you have (WPS) writes per second dominant reads (RPS).</p>
</li>
<li><p>ES is very helpful when one of your primary use cases is to read, visualize the data using the complex combination.</p>
</li>
<li><p>Best works for structured data set with less number of nested items.</p>
</li>
</ul>
<p>To mitigate the concerns of write in elastic search, add another layer over ES, that can be REDIS or KAFKA buffer to queue the incoming data and to avoid data loss.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1637481708640/k7CBlb6j3.png" alt="blog-image.png" /></p>
<p>The main question is it worth adding more complexity to architecture?  Which adds maintenance and other ad-hoc items related to the new services. </p>
<p>The use-cases which use this architecture are; </p>
<ul>
<li>Realtime logs storing systems</li>
<li>Data capturing systems, where continuous ingestion of data from frontend applications.</li>
</ul>
<p>And if your primary goal is to save JSON data and with the best performance in writes and reads, (considering all the points above in this section )please go ahead with mainstream NoSQL databases like MongoDB.</p>
<p><em>Invention and Discovery always being the part of human nature,</em></p>
<blockquote>
<p>I always love the idea of extending the capabilities of any such machine with a pinch of salt and pepper to adapt to our complex use-cases in our daily. However, this should happen without killing the nature, purpose for which the tool is built since we might have straight forward alternative for any such use case.</p>
</blockquote>
<p><em>EOB (End-of-Blog)
</em></p>
<p>Reference:</p>
<p>https://medium.com/@merrinkurian/elasticsearch-as-the-primary-database-5e41b2a0189d
https://cloud.netapp.com/blog/cvo-blg-elasticsearch-vs-mongodb-6-key-differences
https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-indexing-performance/</p>
]]></content:encoded></item></channel></rss>