hckrnws
> If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.
It depends on where the source of dynamism is coming from, but yes you can do this in Terraform. You get the instances with data.aws_instances, feed it into aws_route53_record with a for_each, and you're done. Maybe you need to play around with putting them into different modules because of issues with dynamic state identifiers, but it's not remotely the most complicated Terraform I've come across.
That's a separate question from whether or not it's a good idea. Terraform is a one-shot CLI tool, not a daemon, and it doesn't provide auto-reconciliation on its own (albeit there are daemons like Terraform Enterprise / TerraKube that will run Terraform on a schedule for you and thus provide auto-reconciliation). Stuff like DNS records for Kubernetes ingress is much better handled by external-dns, which itself is statically present in a Kubernetes cluster and therefore might be more properly installed with Terraform.
K8S is at a point now where I'd probably try to configure whatever I can inside the cluster as an operator or controller.
There are going to be situations where that isn't practical, but the ability to describe all the pieces of your infra as a CRD is quite nice and it takes some pain out of having things split between terraform/pulumi/cdk and yaml.
At that point, you're just running your own little cloud instead of piggybacking on someone else's. Just need a dry-run pipeline so you can review changes before applying them to the cluster.
Sure, but the Kubernetes cluster itself, plus its foundational extra controllers (e.g. FluxCD) are basically static and therefore should be configured in Terraform.
That’s only true if you go with an architecture that involves doing so in terraform. A common pattern I implement is an initial management cluster bootstrap that runs Argo then after that it’s possible to manage everything, including cluster components of “child” clusters, using Argo. Can use either cluster api provider or cross plane for that, or one of the cloud specific ones like ack.
One single imperative helm install command to start the whole train rolling then after that it’s all IaC
This is similar to what I do. Terraform for anything that can't be in K8s. Create EKS cluster and bootstrap it with Argo. Then everything else is blissfully not in Terraform.
https://registry.terraform.io/providers/hashicorp/random/lat... is also very useful for this sort of thing, in case you want a persistent random value per resource- shuffle, id, pet, and password are all super handy.
I think a majority of the rants about Terraform I read are written from the perspective of someone managing inherently ephemeral infrastructure - things that are easily disposed of and reprovisioned quickly. The author of such a critique is likely managing an application stack on top of an account that someone else has provided them, a platform team maybe. CDK probably works for you in this case.
Now, if you belong to that platform team and have to manage the state of tens of thousands of "pet" resources that you can't just nuke and recreate using the CDK (because some other team depends on their avaiability) then Terraform is the best thing since sliced bread; it manages state, drift, and the declarative nature of the DSL is desirable.
Horses for courses.
> Horses for courses.
I think with YMMV, these are the two most important things we need to keep in our mind. With plethora of technologies and similar tools, we generally read the tin superficially but not the manual, and we declare "This is bollocks!".
Every tool is targeted towards a specific use and thrive in specific scenarios. Calling a tool bad for something not designed for is akin to getting angry to your mug because it doesn't work as well when upside down [0].
For me Terraform's biggest strength is also its biggest source of pain: it can integrate all sorts of technologies under one relatively vendor-agnostic umbrella and enforce a standard workflow across a huge amount of change. However, that means any bug in any provider is sort of Terraform's fault, if only in the developer's mind.
I think I've commented this elsewhere, but using Cue [1] is also great for this purpose, with no extra infrastructure. E.g. you define a Cue Template [2], which seems analogous to Yoke/ATC's CRDs, and then your definitions just include the data.
Here's an example of Vaultwarden running on my K8s cluster:
deployment: bitwarden: {
spec: {
template: {
spec: {
containers: [{
image: "vaultwarden/server:1.32.7"
env: [{
name: "ROCKET_PORT"
value: "8080"
}, {
name: "ADMIN_TOKEN"
valueFrom: secretKeyRef: {
name: "bitwarden-secrets"
key: "ADMIN_TOKEN"
}
}]
volumeMounts: [{
name: "data"
mountPath: "/data"
subPath: "bitwarden"
}]
ports: [{
containerPort: 8080
name: "web"
}]
}]
volumes: [{
name: "data"
persistentVolumeClaim: claimName: "local-pvc"
}]
}
}
}
}
And simpler services are, well, even simpler: deployment: myapp: spec: template: spec: containers: [{
ports: [{
containerPort: 8080
name: "web"
}]
}]
And with Cue, you get strongly typed values for everything, and can add tighter constraints as well. This expands to the relevant YAML resources (Services, Deployments, etc), which then get applied to the cluster. The nice thing of this approach is that the cluster doesn't need to know anything about how you manage your resources.Hill I will die on: Terraform being less expressive than a real language is a feature, not a drawback.
CDK/Pulumi/Yoke is optimised for being easy to write, but code should be optimised to be easy to READ.
Sure, cdk/pulumi/yoke lets you write the most clever and succinct construction you can compose in your favourite language.. however, whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.
If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.
On the flipside Helm templating is an absolute abomination and i would probably take anything over needing to immerse myself in that filth, maybe Yoke is worth a look after all. But the REAL answer is a real config language, still.
> code should be optimised to be easy to READ
You say that as if it’s impossible to write clear code. As soon as you have any form of multiple resources (e.g. create x of y) I’ll take the real programming language over terraform.
> As soon as you have any form of multiple resources
terraform handles this with for_each. need 10 EBS volumes on 10 EC2 instances? for_each and link instance id of the each value. done. theres a bunch of stuff i now don’t have to worry about (does the instance actually exist yet? other validation edge cases?)
https://developer.hashicorp.com/terraform/language/meta-argu...
> You say that as if it’s impossible to write clear code.
not the parent, but i feel their usage of the word “code” was in error. i don’t care about how, i care about what.
the HCL is purely a defintion/description of what the infrastructure looks like. what resources will be created? that is all it is. i want that. to define the infra and move on. i don’t want low level control of every minutia to do with infrastructure. i want to read a config file and just know what resources will exist in the account. wanna know every resource that exists? `terraform state list` … job done. no reading code required.
HCL/terraform is to define my cloud resources, not to control them or their creation. if i want control, then i need to whip out some go/python.
that’s my vibe on CDK libraries/platform APIs versus terraform.
You can understand every single terraform codebase using nothing other than the terraform documentation itself. All abstractions are provided by the language itself.
Clear isn't really the word I would call it, more that the real work being done is exposed and always visible.
As the Go proverb goes: "clear is better than clever". https://go-proverbs.github.io/
>whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.
If you want to maximize the number of people who have a chance of understanding what is happening, python is your huckleberry. They are going to want to hit the guy who wrote everything in a bizarre language called HCL that nobody outside of infra has ever seen or heard of.
> If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.
"bog standard" is doing a lot of heavy lifting here. You can write simple python or esoteric python and you can write simple terraform or esoteric terraform.
> Yoke is a project that takes this basic idea to the next level. With Yoke, you write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster.
This just puts me in mind of https://howfuckedismydatabase.com/nosql/
I ditched Terraform years ago and just interact with the raw cloud provider SDKs now. It's much easier to long-term evolve actual code and deal with weird edgecases that come up when you're not in beholden to the straight jacket that is configuration masquerading as code.
Oh yea, and we can write tests for all that provisioning logic too.
I went through the same evolution, even built a PaaS for AWS, but I kept going and now just deploy my own stuff to VMs with Swarm via one command in Rove. It's great. And yes I know kubernetes I use it at work. It's an unnecessary waste of time.
> Swarm
docker swarm is so simple and easy compared to the utter behemoth that is k8s, and basically is all you need for CRUD webapps 80-90% of the time. add an RDS instance and you’re set.
i will always pick swarm in a small company* whenever possible until k8s or ECS makes sense because something has changed and it’s needed.
dont start with complexity.
* - bigger companies have different needs.
People have really been sleeping on Swarm. I sometimes even see people trying to recreate Swarm features with Compose. Wish more devs knew about it.
How are you handling creating multiple resources in parallel? or rolling back changes after an unsuccessful run?
Not OP, but for rolling back we just… revert the change to the setup_k8s_stuff.py script !
In practice it’s a module that integrates with quite a large number of things in the monolith because that’s one of the advantages of Infrastructure as Actual Code: symbols and enums and functions that have meaningful semantics in your business logic are frequently useful in your infrastructure logic too. The Apples API runs on the Apples tier, the Oranges API runs on the Oranges tier, etc. etc.
People call me old fashioned (“it’s not the 1990s any more”) but when I deploy something it’s a brand new set of instances to which traffic gets migrated. We don’t modify in place with anything clever and I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)
> I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)
I guess you're not managing any databases then? Because you can't just treat those immutably, you have to manage the database in-place.
One thing that annoys me is the inconsistency between mutable "data" resources and everything else.
Something that would be nice would be the rough equivalent of the deployment slots used in Azure App Service, but for everything else too. So you could provision a "whole new resource" and then atomically switch traffic over to it.
You can express this in Terraform, it's just a little more contrived. You release your changes as Terraform modules (a module in and of itself doesn't do anything, it's like a library/package), then your Terraform workspace instantiates both a "blue" module and a "green" module, at different versions, with DNS / load balancing resources depending on both modules and switching between either blue or green.
> revert the change to the setup_k8s_stuff.py script
What about resources that were created by the code you reverted?
I’ve been thinking about this for a long time. But doesn’t it brings a host of other issues? For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created? I need to make a small change, how do I know what parts of my scripts to run?
> For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created?
let front_id = if instance_exists("front_balancer") {
return fetch_instance("front_balancer").id
} else {
return create_new_instance("front_balancer", front_balancer_opts).id
}
Or however else you would manage that sort of thing in your favorite programming language.> I need to make a small change, how do I know what parts of my scripts to run?
Either just re-run the parts you know you've changed (manually or based on git diffs), or even better, make the entire thing idempotent and you won't have to care, re-run the entire program after each change and it'll automagically work.
> Either just re-run the parts you know you've changed (manually or based on git diffs)
This is exactly the sort of thing Terraform is designed to avoid because it can obviously get quite messy. Agreed that making things idempotent solves that problem, but it's not always obvious/easy how to do so.
You write code to do these things? If there's a requirement for you to be able to do such a thing make it a feature, implement it with tests and voila, no different than any other feature or bug you work on is it?
Here are the things that TF does that you are probably not going to get around to in a comprehensive way-
- State tracking, especially all of the tedious per cloud resource details
- Parallelism- TF defaults to 10 threads at a time. You won't notice this when you write a demo to deploy one thing, but it really matters as you accrete more things.
- Dependency tracking- hand in hand with the parallelism, but this is what makes it possible. It is tedious, resource by resource blood sweat and tears stuff, and enabled by the inexpressive nature of HCL
Plus, you know, all of the work that has already done by other people to wrap a million quirky APIs in a uniform way.
Terraform added tests somewhat recently: https://developer.hashicorp.com/terraform/language/tests
I agree that the SDK is better for many use cases. I do like terraform for static resources like aws vpc, networking, s3 buckets, etc.
And eventually, you end up with your own in-house Terraform.
I'm quite happy with CDK[0].
My experience is only with the main AWS cloudformation based version of CDK, although there is also CDK for terraform, which supports any resource that terraform supports, although some of what I'm about to say is not applicable to that version.
What I like about CDK, is that you can write real code, and it supports a wide range of languages, although typescript is the best experience.
Provided that you don't use any of the `fromLookup` type functions, you can run and test the code without needing any actual credentials to your cloud provider.
CDK essentially complies your code into a cloudformation template, you can run the build without credentials, then deploy the built cloudformation template separately.
You don't need to worry about your terraform server crashing half way though a deployment, because cloudformation runs the actual deployment.
My main problem with CDK is that it only outputs a CloudFormation stack. I can sign up for a new cloud account, spin up a k8s cluster, deploy everything to it, and restore the database snapshot faster than CF will finish a job that's stuck on UPDATE_CLEANUP_IN_PROGRESS.
Of course there's also cdk8s, but I'll probably go with Pulumi instead if I need that. Right now I'm happy with helmfile, though not so much with helm itself. So I'll definitely be giving Yoke a look.
In your experience how often have you had template builds succeed but then fail at apply time? This kind of issue is what I find most frustrating about IaC today, your 'code' 'compiling' means nothing because all of the validations are serverside, and sometimes you won't find out something's wrong until Terraform is already half done applying. I want to be able to declare my infrastructure, be able to fully validate it offline, and have it work first try when I apply it.
I find Pulumi very nice here because it persists state after every successful resource creation. If it breaks somewhere in the middle, the next run will just pick up where it left off last time.
CDK… well, CDK doesn’t get in an invalid state often either, but that’s because it spends 30m rolling back every time something goes wrong.
I've had less such issues with CDK, versus raw cloudformation, or terraform, but it can still happen.
Slightly related, in everything I see that allows adding secrets or env vars as code, they seem to prefer a list of objects instead of a key value pair for these. Does anyone know why this is? I know in some cases you can add additional values, but this seems easily solved by dynamically determining what the value is.
I'd much rather write:
env:
key: value
instead of env:
- name: key
value: value
Comment was deleted :(
Who are these ops people that want to write golang and rust? It seems like a tiny niche. If you're that comfortable writing golang or rust then why not just become a developer?
I'm a lifelong ops person, since 2000, and I use Ansible or Terraform daily.
I often wanted to learn golang better but I just never had the motivation. I'm a mean Python scripter, I can write a system integration in hours, but there's something about compiled languages I just never could get into.
I'm saying this only because the whole point of yoke is to define your infrastructure in Golang so that you can add in the missing pieces with Golang. So that you're free to use Golang for anything other than the pre-defined infrastructure providers in Yoke, so you're now a Golang developer. Congrats.
A big benefit is the compiler catches bugs so you don’t have to wait around for your Python program to crash at runtime. Also, if the type system is more “legit” then you can skip a ton of defensive parsing of inputs.
Could be more about developers who know Golang and Rust wanting to deploy their apps (no need to pigeonhole anyone into just dev or just ops)
I don't really see a distinction between developer and ops person in this context. The whole point of all of these tools is to make infrastructure into code. Go isn't the choice I would have made but it's fine.
Comment was deleted :(
A lot of important devops tools like Kubernetes and Grafana are written in golang, and it’s often handy to be able to import their code to use in your own code to automate those things.
But again, you're now a developer.
And I'm asking who are these developers using IaC tooling? It seems to me like it was made for ops.
All power to you if you take on both roles, but that's a good way to get burned out. I'm a devops person so the devs can focus on just code, and I can focus on making the best and safest infrastructure for them to run their code in.
I feel like the distinction between the two is fairly contrived these days. I'm an SRE, and we're constantly building tooling to help us better manage infrastructure, improve reliability, improve DX, etc. On the flip side, we also push a lot of the responsibility for infrastructure management to our devs: we maintain the modules and IaC pipelines, and the developers hook up the building blocks they need. It can actually help avoid burnout because our team doesn't become a bottleneck for infrastructure provisioning.
Say what you want about IaC in Go or other programming languages, but it can definitely help strengthen the whole "developers own their infrastructure" since they don't have to learn an additional language syntax.
Those developers are working on “Internal Development Platforms” and building their own abstractions on top of tools like Kubernetes and Grafana to simplify things for developers. This page explains it pretty well: https://internaldeveloperplatform.org/what-is-an-internal-de...
I've written a bunch of k8s operators in go (and rust more recently). That's how basically everyone working with k8s does once you reach a certain level of complexity.
I don't really understand, in fact, why you'd use yoke instead of just writing an operator with kubebuilder or kube-rs.
Supposedly it's a package manager as well, so if there's a package, you'd be able to use it without writing any code.
I'm very much onboard with the 'hey, code is useful' idea. Code is most useful when you want to build abstractions around infrastructure.
You can do declarative code most of the time but bust out function calls and control flow and even more heavy weight abstractions when needed. And you get all that nice typechecking that speeds things up.
This is basically why I joined Pulumi and why I joined Earthly before that. ( CI needs to move beyond YAML as well).
Where I disagree is that Pulumi ( or CDK or CDKTF) running the language runtime of the language you've decided to use is a problem.
Looks promising but it starts with a (justified) rant about terraform and then goes into how to replace Helm.
I am confused. Can yoke be used to create and manage infrastructure or just k8s resources?
Indeed. This isn't really a replacement for terraform, unless you are only using terraform to manage k8s resources. Which probably isn't most people who are currently using Terraform.
Author here. It's mainly for k8s resources; but if you install operators like external-dns or something like crossplane into your cluster, you can manage infra too.
I've considered dropping terraform (openTofu) for our k8s resources since k8s is stateful anyway.
But that would complicate synchronization with resources outside of k8s, like tailscale, DNS, managed databases, cloud storage (S3 compatible) - and even mapping k8s ingress to load_balancer and external DNS.
So far I feel that everything in terraform is the most simple and reasonable solution - mostly because everything can be handled by a single tool and language.
> into your cluster
I guess the point is: what if you don't have a cluster.
And also: what manages the Kubernetes cluster lifecycle in the cloud provider, or on bare metal?
There is life before (and beyond) Kubernetes.
What alternative to terraform would one use to set up the whole cluster before provisioning any resources?
I currently have a custom script that is a mix between terraform and ansible that sets up a proxmox cluster, then a k3s cluster and a few haproxys with keepalived on top. Granted, maybe not the most standard setup.
Do you have a complex Ansible setup? For the few bespoke VMs I need, I've been able to get away with cloud init so far - but they're explicitly set up to be reasonable to nuke and recreate - if they had more personality and needed to be more managed as pets - I would probably need to reach for something like Ansible - or see if I could build images (vm or Docker).
But then with images I'm on the hook for patching... Not simply auto-patching via apt...
ok, that makes sense. A better Helm would be nice. timoni.sh is getting better and better, but Cue is a big hurdle.
Unfortunately, I'm not a big fan of the yaml-hell that crossplane is either.
But as a Terraform replacement systeminit.com is still the strongest looking contender.
> A better Helm would be nice.
Consider CDK8s (Typescript or Go) or Jsonnet. We evaluated Cue and the two aforementioned options and ended up with CDK8s using Typescript and it's incredibly powerful.
Hm... CDK8s just helps herding k8s yaml, nothing else?
There's nothing like terraform plan/apply?
I mean - some help wrangling yaml is welcome - but I already get (some) help from terraform with the k8s provider there...
Do you check in the generated yaml in git, or just the typescript code?
It’s just a dunk on terraform to promote yet another K8s provisioning thing.
> If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.
It's possible a few ways. I prefer modules, and this LLM answer describes an older way with count and for_each.
It's always possible that incantation of the problem space has a gotcha that needs a work around, but I doubt it would be a blocker.
https://www.perplexity.ai/search/if-you-really-do-think-that...
>> Wait, there's something here that I'm not getting. Why are you compiling the code to WebAssembly instead of just running it directly on the server?
> Well, everything's a tradeoff. Let's imagine a world where you run the code on the server directly.
> If you're using a language like Python, you need to have the Python runtime and any dependencies installed. This means you have to incur the famous wrath of pip (pip hell is a real place and you will go there without notice). If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.
> One of the main advantages of using WebAssembly here is that you can compile your code once and then run it anywhere that has a WebAssembly runtime, such as with the yoke CLI or with Air Traffic Controller.
At this point, why not use a proper runtime like JVM or .Net?
Then one can also easily use reasonable languages like C#, Java or Kotlin as well.
> At this point, why not use a proper runtime like JVM or .Net?
Because then you are forced to only use managed languages?
Ahh, good point.
I guess Rust (and maybe other unmanaged languages) can be compiled to WebAssembly?
https://logandark.net/calc is C++ compiled to WebAssembly using Emscripten. Back from I think 2018.
These days Rust is practically the poster child of compiling to WebAssembly because it's so easy. Most WASM content I see is actually about Rust.
> This is not code. This is configuration.
I don't think those two things are mutually exclusive.
IMO hcl is absolutely code. As is html, and css, json, and yaml.
It isn't a full programming language, and I often wish it was, but I wouldn't say it isn't code.
JSON YAML are file formats for data. Is XML code? Is SVG code? Is a GIF code? Is a BMP code?
> If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.
This is exactly the approach that Terraform takes. Both Terraform and its providers are written in Go, which is a great language for this purpose because of GoReleaser and the ease of compiling to different architectures and OSes. It scales just fine.
Did the author talk to any senior Terraform practicioners before building this?
Hi. I think the article was just showing the example that IaC tools use configuration languages instead of code. Yoke is not a terraform replacement, and does not mention terraform anywhere its documentation.
It does sit at the same level as helm & timoni. It just takes a code-based approach to managing your cluster (which in turn can manage your infra but that wasn't the larger point).
Speaking of IAC- I have an existing GCP project with some basic infra (service accounts, cloud run jobs, cloud build scripts, and databases) what is the best tool to _import_ all of this into IAC. The only real tool I’ve found is terraformer. I have no dog in the race regarding tooling e.g if my output is Pulumi, terraform, or just straight YAML. I’m just looking to “codify” it.
Any suggestions from experience?
Just go with plain Terraform.
You can check the docs for the GCP provider to see if the resources you want to manage are "importable" into the Terraform state file; they usually are and you'll see a section at the bottom of each resources documentation page showing you how to do this. e.g. https://registry.terraform.io/providers/hashicorp/google/lat...
Your process will be -
1. Write TF configuration approximating what you think is deployed
2. Import all your resources into the state file
3. Run a `terraform plan ...` to show what Terraform wants to change about your resources (including creating any you missed or changing/recreating any your config doesn't match)
4. Correct your TF configuration to reflect the differences from 3.
5. Goto 3, repeat until you get a "No changes" plan or the you actually want TF to correct some things (add tags, for example)
6. run `terraform apply`
and optionally...
7. set up your CI/automation to run `terraform plan` regularly and report the "drift" via some means - stuff that has been changed about your resources outside of Terraform management.
I put a lot of stock in this last step, because small, incremental change is the cornerstone of platform management. If you want to make a change and come to find there's a huge amount of other stuff you have to correct as well, your change isn't small any more.
You don't need to write all the tf upfront for existing resources.
Use `import` resources in a .tf file (I like to just call it imports.tf) and run `terraform plan -generate-config-out=imported.tf`
That will dump the tf resources - often requires a little adjustment to the generated script, but it's a huge time saver
I used this instead of terraformer. Can agree that it’s a huge timesaver.
This feels like a Helm replacement and not a Terraform replacement in any real way. Which is fine but confusing when the post begins with Terraform.
Terraform can create an K8S cluster. This requires an K8S cluster to work. Etc.
This seems like a great approach that sits between using the sdk directly and a dsl/yaml. My experience has been that most of the people configuring these systems don’t know how to code, and configuration languages is their gateway. Most never venture past configuration which is why yaml is so used and difficult to get any traction outside of it. I think terraform adopted some of the patterns which have been around since a long time ( remember the chef va puppet discussion from a decade ago) and it massively helped with adoption. Cue seems a step up from terraform ( you can use cue vet for type checking, even if CRDs are not yet supported all the way) but tracking seems to be low as it’s hard for non-programmers to grasp. Maybe Claude will help to move all people that don’t want to manage these systems with code to something even more simpler than yaml and open the door for real infra as code for the rest.
> My experience has been that most of the people configuring these systems don’t know how to code, and configuration languages is their gateway
I don't really disagree but this is such a pessimistic, NIH-syndrome viewpoint. Feel free to look at the code for any of the major Terraform providers. There's a lot of production-hardened, battle-tested Go code that's dealing with the idiosyncrasies of the different cloud APIs. They are an incredibly deep abstraction. Terraform also implicitly builds a DAG to run operations in the right order. Comparing writing HCL to writing straight Go code with the AWS SDK, the HCL code has something like an order of magnitude fewer lines of code. It absolutely makes sense to use Terraform / HCL instead of writing straight Go code.
Yeah, don’t really understand the sentiment here. I’ve been programming for 20 years and actively use Terraform and CUE at work. I actually write a lot of Go code for our platform, but I’ve never once thought it’d be a good idea to just start calling APIs directly.
But doesn't the codeless "infrastructure as code" kind of smell like cargo cult practices, i mean there might be places where having your infrastructure defined as data is a really good thing, but at least in my work i keep hitting roadblocks where i really wish i was writing actual logic in a modern scripting language rather then trying to make data look like code and code look like data, which is what a lot of devops tutorials seem to be teaching.
> traction seems to be low when referring to cue. Autocorect issue
From the website:
> New tools like CUE, jsonnette, PKL, and others have emerged to address some of the short comings of raw YAML configuration and templating. Inspiring new K8s package managers such as timoni. However it is yoke’s stance that these tools will always fall short of the safety, flexibility and power of building your packages from code.
The never-ending debate continues between configuration languages and traditional languages. I don't know if the industry will ever standardize in this area.
It will once traditional languages are good enough
I disagree - I think there are fundamental tradeoffs between declarative and imperative ways of configuration so they'll never "converge" completely.
I am a huge fan of each camp stealing ideas from the other and thus "converging" closer.
I also like the two-step "write imperative code to generate declarative config" approach some systems take - Terraform/Pilumi do this. At the cost of having two steps, you get to write your for loops AND get a declarative "state" you can diff easily with the previous state.
Configuration languages are just structured data, and programming languages also need to be able to express complex literal data.
JSON is already explicitly designed as a ~subset of JavaScript. An equivalent of a JSON written as a JavaScript literal is easier to read than JSON. The only problem is that we often don't want to use an interpreter to parse data.
I was 100% for infra as code as it gives devs more freedom to get what they need. Then the startup went from 50 to 100 to 1000 and people just needed to get stuff done and usually the exact same thing over and over. So we migrated to a custom DSL which is much easier to standardize, lint, review and read. I think when you don't know what you need code is better for flexibility, when the domain is sorted, DSL.
I feel that writing out infrastructure templates through a "proper programming language" (for the lack of a better term) comes with some sharp tradeoffs that many don't recognize.
A big feature of most IaC tools is that they are relatively logic-less and therefore can be easily understood at a glance, allowing for easier reasoning about what resources can be created, and this ability is diminished by introducing logic, and debugging issues in them becomes a nightmare. A large company I used to work for had a system just like that, and while I thankfully never had to work with said system, hearing statements like you can "debug your templates with pry[1]" being touted as a feature is something I hope to never hear again.
> If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.
You’re wrong. You can do that with Terraform.
You can also provision stuff that isn’t just k8s.
People are on here arguing that terraform is code, actually. Sure it is, but it's not a good general purpose programming language. It is a config language that grew some features from general purpose languages. It's clearly more pleasant to write that config in a real language
If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.
Great take.
Except it's not, because their example is trivially easy and common in Terraform.
The challenge isn't just defining multiple DNS records—it’s doing so dynamically based on an unknown number of instances at plan time. Terraform struggles with truly dynamic resource creation because it relies on a static graph. You can use count or for_each, but those require knowing the instances in advance within the Terraform configuration. If your instances are created dynamically outside Terraform (e.g., via auto-scaling groups), you hit limitations.
You can work around this by using external data sources or separate workflows (e.g., running Terraform after instances are created), but that just proves the point: Terraform isn’t fully "code" in the sense of having true loops and dynamic logic like a real programming language.
If you think this is trivially easy, show me how you'd do it without resorting to hacks like running terraform apply twice.
You can't do it if you have the instances created with auto-scaling groups of course. But nobody would think you could, that's runtime not infra.
With Terraform you are supposed to have multiple coupled state files, with a tree structure of references, so that eg the state file containing the DNS can reference the previously applied state file that created the instances
You are supposed to run terraform apply in a sequence that respects the dependency graph. Terragrunt makes this trivial.
At that point, you’re conceding the exact limitation I was pointing out. Terraform can't handle truly dynamic infrastructure changes within a single plan because its execution model is declarative, not imperative. Saying "nobody would think you could" just acknowledges that Terraform lacks the flexibility of real code—because if it were actual code, you'd be able to handle this inline rather than orchestrating multiple runs with external tools.
Yes, you can manage this with separate state files and a structured apply sequence (e.g., using Terragrunt), but that’s just adding more scaffolding to work around Terraform’s inability to express dynamic logic. That’s infrastructure orchestration, not infrastructure as code.
The fact that we have to rely on external tools or multi-step workflows to accomplish something that would be trivial in a real programming language just reinforces the point: Terraform isn’t really "code" in the traditional sense.
Comment was deleted :(
The point is terraform can do it, and it does it well. Just because you don't want to use Terraform properly doesn't mean it's bad at what it does.
Using "a real programming language" to do the infra still has to solve the same issues faced by terraform. Using a programming language to define infra doesn't solve the auto scaling DNS issue, for example, you'll be using lambdas to create the dns either way. It also doesn't inherently solve coordination of the resource deployment, you still need to organise your code into modules and ensure the order of execution.
If you think Terraform is the problem here you're blaming the tool for a failure of process and understanding
The core issue isn't whether Terraform can do it—it's how it does it. You're describing a workflow that requires multiple state files, external tools like Terragrunt, and sequential terraform apply runs to work around the fact that Terraform itself lacks imperative, runtime-driven logic. That’s not "using Terraform properly"—that’s compensating for its limitations.
And sure, using a general-purpose language doesn’t magically eliminate coordination problems, but it does give you far more control. If I were using something like Pulumi or CDK, I wouldn’t need to hack around Terraform’s static graph by splitting state files and manually sequencing deployments. I could express logic directly in code—dynamically querying instance IDs, handling autoscaling changes, and updating DNS records inline, without requiring an entirely separate execution step.
So no, this isn't a "failure of process and understanding"—it's just recognizing that Terraform’s declarative model is great for static infrastructure but falls short when dealing with truly dynamic scenarios. If you think its workflow is fine, that’s cool—but don’t pretend it doesn’t have real limitations just because you've built processes to work around them.
Yes, if you use your tools poorly, it will turn out badly.
I run infra that has dynamic scaling provisioned via Terraform. Guess what, we auto-scale from hundreds to thousands of boxes a day dynamically managed through TF and we have had no issues, since we bothered to figure out how to do it properly.
I don’t doubt that you’ve made Terraform work for your needs. The point isn't that Terraform can't be used for dynamic infrastructure—it's that doing so requires workarounds like pre-split state files, sequential applies, or additional tooling like Terragrunt. That’s not the same as having a truly dynamic system where resources can be created and modified in response to real-time conditions without external orchestration.
Terraform works well for many cases, but the fact that you had to “figure out how to do it properly” kind of proves the point—it’s not inherently designed for dynamically changing infrastructure within a single apply cycle. If it were, you wouldn’t need external coordination to handle something as simple as "create DNS records for all instances, even if the number changes at runtime."
At the end of the day, Terraform is great for declaring infrastructure, but it lacks the flexibility of programming infrastructure. If you’re happy with the trade-offs, great—but let’s not pretend those trade-offs don’t exist.
> This is not code. This is configuration.
It is all configuration (data) my friend. At the end of the day the internet and the services which connect it are all based off hard codes somewhere.
How do you think Google enumerates all their datacenters? You bet ya there is a magic config file you have to update to add new datacenters and from there the automation kicks off the infrastructure based upon the contents of the config file, the configuration is still there.
In a previous project, I decided to write a Go program to template k8s YAML from "higher level" definitions of resources. It worked, but the mapping code ended up being more complex than I would have liked, and I think it was ultimately difficult to maintain.
Lesson for me was to think about infrastructure in terms of infrastructure, i.e. treat it as its own domain.
Why is someone talking about infrastructure as code one moment, and then kubernetes manifests the next. The two are not the same. You can’t replace what Terraform or Pulumi does with kubernetes manifests.
IaC doesn't mean your infrastructure config has to be code, it means you manage your infrastructure config like code. As in version control, PRs, tests, that sort of thing. How much of it is Turing complete is up to you.
You absolutely can.
I'm managing a few dozen cloud accounts on azure and aws with nothing but crossplane and a very small terraform script that bootstraps the control plane account.
No, Nix is "infrastructure as code, but actually".
The downside is that now you have to code in Nix.
Except that this is not really infrastructure code; it's Kubernetes code. Infrastructure is what runs your Kubernetes clusters!
Comment was deleted :(
or you could just compile pulumi js to webassembly?
Is this a yoke?
_sorry_
Lol! There are some parts that read like one:
"... write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster."
This is not code. This is configuration.
FWIW we've been working on letting you declare data in YSH, a new Unix shell.
So you can arbitrarily interleave code and data, with the same syntax. The config dialect is called "Hay" - Hay Ain't YAML.
Here's a demo based on this example: https://github.com/oils-for-unix/blog-code/blob/main/hay/iac...
It looks almost the same as HCL (although I think this was convergent evolution, since I've actually never used Terraform):
# this is YSH code!
echo 'hello world'
Data aws_route53_zone cetacean_club {
name = 'cetacean.club.'
}
Resource aws_route53_record A {
zone_id = data.aws_route53_zone.cetacean_club.zone_id
name = "ingressd.$[data.aws_route53_zone.cetacean_club.name]"
type = 'A'
ttl = "300"
}
And then the stdout of this config "program" is here - https://github.com/oils-for-unix/blog-code/blob/main/hay/out...It can be serialized to JSON, or post-processed and then serialized
---
Then I show you can wrap Resource in a for loop, as well as parameterize it with a "proc" (procedure).
make-resource (12)
make-resource (34)
if (true) {
make-resource (500)
}
This is all still in progress, and can use feedback, e.g. on Github. (This demo runs, but it relies on a recent bug fix.)The idea is not really to make something like Terraform, but rather to make a language with metaprogramming powerful enough to make your own "dialects", like Terraform.
---
I wrote a doc about Hay almost 3 years ago - Hay - Custom Languages for Unix Systems - https://oils.pub/release/0.27.0/doc/hay.html
Comments - https://lobste.rs/s/phqsxk/hay_ain_t_yaml_custom_languages_f...
At that time, Oils was a slow Python prototype, but now it's fast C++! So it's getting there
The idea of Oils is shell+Python+JSON+YAML, squished together in the same language. So this works by reflection and function calls, not generating text ("Unix sludge"). No Go templates generating YAML, etc.
Where is YSH? It is built into the oil shell? The YSH link in https://www.oilshell.org/cross-ref.html#YSH is broken.
Crafted by Rajat
Source Code