Yoke: Infrastructure as code, but actually

Yoke: Infrastructure as code, but actually

171

by xena

solatic

> If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.

It depends on where the source of dynamism is coming from, but yes you can do this in Terraform. You get the instances with data.aws_instances, feed it into aws_route53_record with a for_each, and you're done. Maybe you need to play around with putting them into different modules because of issues with dynamic state identifiers, but it's not remotely the most complicated Terraform I've come across.

That's a separate question from whether or not it's a good idea. Terraform is a one-shot CLI tool, not a daemon, and it doesn't provide auto-reconciliation on its own (albeit there are daemons like Terraform Enterprise / TerraKube that will run Terraform on a schedule for you and thus provide auto-reconciliation). Stuff like DNS records for Kubernetes ingress is much better handled by external-dns, which itself is statically present in a Kubernetes cluster and therefore might be more properly installed with Terraform.

ljm

K8S is at a point now where I'd probably try to configure whatever I can inside the cluster as an operator or controller.

There are going to be situations where that isn't practical, but the ability to describe all the pieces of your infra as a CRD is quite nice and it takes some pain out of having things split between terraform/pulumi/cdk and yaml.

At that point, you're just running your own little cloud instead of piggybacking on someone else's. Just need a dry-run pipeline so you can review changes before applying them to the cluster.

solatic

Sure, but the Kubernetes cluster itself, plus its foundational extra controllers (e.g. FluxCD) are basically static and therefore should be configured in Terraform.

SOLAR_FIELDS

That’s only true if you go with an architecture that involves doing so in terraform. A common pattern I implement is an initial management cluster bootstrap that runs Argo then after that it’s possible to manage everything, including cluster components of “child” clusters, using Argo. Can use either cluster api provider or cross plane for that, or one of the cloud specific ones like ack.

One single imperative helm install command to start the whole train rolling then after that it’s all IaC

dionian

This is similar to what I do. Terraform for anything that can't be in K8s. Create EKS cluster and bootstrap it with Argo. Then everything else is blissfully not in Terraform.

url00

Can you expand a bit on the kinds of things you are doing in operators and controllers? I've been wary to put to much in the cluster... but maybe I should be doing more.

klooney

https://registry.terraform.io/providers/hashicorp/random/lat... is also very useful for this sort of thing, in case you want a persistent random value per resource- shuffle, id, pet, and password are all super handy.

akdor1154

Hill I will die on: Terraform being less expressive than a real language is a feature, not a drawback.

CDK/Pulumi/Yoke is optimised for being easy to write, but code should be optimised to be easy to READ.

Sure, cdk/pulumi/yoke lets you write the most clever and succinct construction you can compose in your favourite language.. however, whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.

If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.

On the flipside Helm templating is an absolute abomination and i would probably take anything over needing to immerse myself in that filth, maybe Yoke is worth a look after all. But the REAL answer is a real config language, still.

Aeolun

> code should be optimised to be easy to READ

You say that as if it’s impossible to write clear code. As soon as you have any form of multiple resources (e.g. create x of y) I’ll take the real programming language over terraform.

dijksterhuis

> As soon as you have any form of multiple resources

terraform handles this with for_each. need 10 EBS volumes on 10 EC2 instances? for_each and link instance id of the each value. done. theres a bunch of stuff i now don’t have to worry about (does the instance actually exist yet? other validation edge cases?)

https://developer.hashicorp.com/terraform/language/meta-argu...

> You say that as if it’s impossible to write clear code.

not the parent, but i feel their usage of the word “code” was in error. i don’t care about how, i care about what.

the HCL is purely a defintion/description of what the infrastructure looks like. what resources will be created? that is all it is. i want that. to define the infra and move on. i don’t want low level control of every minutia to do with infrastructure. i want to read a config file and just know what resources will exist in the account. wanna know every resource that exists? `terraform state list` … job done. no reading code required.

HCL/terraform is to define my cloud resources, not to control them or their creation. if i want control, then i need to whip out some go/python.

that’s my vibe on CDK libraries/platform APIs versus terraform.

Aeolun

I don’t understand how these things follow.

I’ll be the first to agree that CDK sucks, but Pulumi is essentially Terraform as a programming language, with all the niceties of terraform and more besides.

The format that defines what infra looks like is… very hard to parse with terraform (for me). It seems optimized for defining a list of static resources, and everything else is bolted on. The for_each or count constructs do not make intuitive sense to me anyway.

That said, by all means, use Terraform, as long as it’s not CDK I’ll be happy.

Spivak

You can understand every single terraform codebase using nothing other than the terraform documentation itself. All abstractions are provided by the language itself.

Clear isn't really the word I would call it, more that the real work being done is exposed and always visible.

Aeolun

Modules aren’t provided by the language though?

paulddraper

Fortunately, Terraform has CDKTF [1] which allows you to use common languages such as Python, Java, and TypeScript to author Terraform infra.

I used it daily and find it greatly liberating.

[1] https://developer.hashicorp.com/terraform/cdktf

patrick451

>whoever comes across your clever code next will probably want to hit you, especially if it's not a dev from your immediate team, and especially if you have succumbed to blurring the lines between your idk code and your app code.

If you want to maximize the number of people who have a chance of understanding what is happening, python is your huckleberry. They are going to want to hit the guy who wrote everything in a bizarre language called HCL that nobody outside of infra has ever seen or heard of.

> If they instead come across some bog-standard terraform that maybe has a bunch of copy-paste and is a bit more verbose... Who cares? Its function will be obvious, there is no mental overhead needed.

"bog standard" is doing a lot of heavy lifting here. You can write simple python or esoteric python and you can write simple terraform or esoteric terraform.

liampulles

As the Go proverb goes: "clear is better than clever". https://go-proverbs.github.io/

danw1979

I think a majority of the rants about Terraform I read are written from the perspective of someone managing inherently ephemeral infrastructure - things that are easily disposed of and reprovisioned quickly. The author of such a critique is likely managing an application stack on top of an account that someone else has provided them, a platform team maybe. CDK probably works for you in this case.

Now, if you belong to that platform team and have to manage the state of tens of thousands of "pet" resources that you can't just nuke and recreate using the CDK (because some other team depends on their avaiability) then Terraform is the best thing since sliced bread; it manages state, drift, and the declarative nature of the DSL is desirable.

Horses for courses.

bayindirh

> Horses for courses.

I think with YMMV, these are the two most important things we need to keep in our mind. With plethora of technologies and similar tools, we generally read the tin superficially but not the manual, and we declare "This is bollocks!".

Every tool is targeted towards a specific use and thrive in specific scenarios. Calling a tool bad for something not designed for is akin to getting angry to your mug because it doesn't work as well when upside down [0].

[0]: https://i.redd.it/mcfym6oqx5p11.jpg

robertlagrant

For me Terraform's biggest strength is also its biggest source of pain: it can integrate all sorts of technologies under one relatively vendor-agnostic umbrella and enforce a standard workflow across a huge amount of change. However, that means any bug in any provider is sort of Terraform's fault, if only in the developer's mind.

gregmac

Having debugged this sort of thing before, it's actually really hard to figure that out.

The entire stack is kind of bad at both logging and having understandable error messages.

You get things like this:

    ╷
    │ Error: googleapi: Error 400: The request has errors, badRequest
    │ 
    │   with google_cloudfunctions_function.function,
    │   on main.tf line 46, in resource "google_cloudfunctions_function"         "function":
    │   46: resource "google_cloudfunctions_function" "function" ¨

Is this a problem with the actual terraform or passing a variable in or something? Is it a problem with the googleapi provider? Is it a problem with the API? Or did I, as the writer of this, simply forget a field?

In complex setups, this will be deep inside a module inside a module, and as the developer who did not use any google_cloudfunctions_function directly, you're left wondering what the heck is going on.

JohnMakin

The nice thing though, if you are a developer, is that most of these providers code is open source. We've had cases where we've forked providers and fixed bugs on our own before we could get something merged in. I've personally fixed several provider bugs on my own out of annoyance - terraform's just a wrapper around cloud API's, usually, and you can be in control of how that works.

gregmac

Yep, agreed, and IMHO in a lot of cases, as-is, terraform wouldn't be viable as a closed-source product. At least, I would have got frustrated and ditched it.

I haven't personally found a real bug in terraform or a provider yet, but I've had to refer to the source many times to figure out what is actually happening. It's always been either misuse on my part, or drift that the provider couldn't resolve.

I still consider it a failure though if it takes looking at source code to figure out what's actually going on -- whether it's vendor or in-house. The ironic and annoying part is it usually takes a deeper level of knowledge to write better error messages, but the people with that knowledge don't have the perception of it being a problem. I fight this battle internally with my own teams all the time. The problem is not getting people to make a change, but recognizing that the message is misleading/confusing/unclear to their users (eg: developers who are not domain experts like them) in the first place.

stego-tech

These sorts of posts are fascinating "nerd snipes" to cryptids like me. On the surface, they look incredibly interesting and I want to learn more! Terraform isn't code? Please explain to me why not, you have my attention.

Then I get to the real meat of the issue, which is often along the lines of, "I'm a software developer who has to handle my own infrastructure and I hate it, because infrastructure doesn't behave like software." Which, fair! That is a fair critique! Infrastructure does not behave like software, and that's intentional!

It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different, and weird, and...crufty compared to their nifty and efficient CI/CD pipeline, just like I cannot fully appreciate the point of such a pipeline when I'm basically deploying bespoke machines for internal teams on the regular because politics dictates customer service over enterprise efficiency. It's the difference between building an assembly line for Corollas and Camrys (DevOps), and building a Rolls-Royce Phantom to spec for a VIP client (BizTech). That's not to say there hasn't been immense pressure to transform the latter into more like the former, and I've been part of some of those buildouts and transitions in my career (with some admittedly excellent benefits - Showback! Tenancy! Lifecycles!), but these gripes about Terraform are admittedly lost on me, because I'll never really encounter them.

And if I did, I don't need to pickup programming to fix it necessarily. I just need to improve my existing system integrations so Ansible runbooks can handle the necessary automation for me.

JohnMakin

Thanks for posting this, I favorited it - having carved out a weird niche in my career as an "infra" guy, inevitably I deal with a lot of IAC. I run into this attitude a lot by devs - they are indeed annoyed by managing infrastructure, because it innately is not like software! I know I'm reiterating what you said but it is so important to understand this.

Here is a thing I run into a lot:

"Our infra is brittle and becoming a chore to manage, and is becoming a huge risk. We need IAC!" (At this point, I don't think it's a bad idea to reach for this)

But then -

"We need to manage all our IAC practices like dev ones, because this is code, so we will use software engineering practices!"

Now I don't entirely disagree with the above statement, but I have caveats. I try to treat my IAC like "software" as much as I can, but as you pointed out, this can break down. Example: managing large terraform repositories that touch tons of things across an organization can become a real pain with managing state + automation + normal CI/CD practices. I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live. As opposed to software, where you can be reasonably confident that the code is going to mostly work how you intend before you deploy it. Often in infra, the only way to know is to try/apply it. Rollback procedures are entirely different, etc.

It also breaks down as others have noted trying to use terraform to manage dynamic resources that aren't supposed to be immutable (like Kubernetes). I still do it, but it's loaded with foot guns I wouldn't recommend to someone that hasn't spent years doing this kind of thing.

mdaniel

> I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live

Our concession to this risk was that once a merge request was approved, the automation was free to to run the apply pipeline step, leaving open the very likely possibility that TF shit itself. However, since it wasn't actually merged yet, push fixes until TF stopped shitting itself

I'm cognizant that solution doesn't "scale," in that if you have a high throughput repo those merge requests will almost certainly clash, but it worked for us because it meant less merge request overhead (context switching). It also, obviously, leveraged the "new pushes revoke merge request approval" which I feel is good hygiene but some places are "once approved, always approved"

anonfordays

>It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different

This is generally true, but the interesting thing about Terraform is it was created specifically to work in the world of "immutable by default." This is why Terraform automatically creates and destroys instead of mutating in many (most?) cases, shys away from using provisioners to mutate resources after creation, etc.

stego-tech

Yep, and that's why I only very recently picked it up in Enterprise world, where the AWS team used it to deploy resources. What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time Terraform (or whatever fork they're now using post-Hashicorp) updates/changes, because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.

anonfordays

>What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time

This is just operational immaturity. No one should be building anything "by hand," everything should be automated. Deploying instances from prebuilt AMIs takes a dozen or so lines of Terraform code. Terraform can spin up dozens of instances in less than 5 minutes with a dozen lines of code: https://dev.to/bennyfmo_237/deploying-basic-infrastructure-o...

If you're not operationally mature enough, the problem isn't the tool, it's you. This is basic Terraform usage.

>because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.

This is no longer correct/true. Maybe for laggards it's true, but modern enterprises with modern ops teams using modern tooling are deploying most of everything with immutability in mind. Enterprise architecture is immutable by default now, and destroying and replacing is the norm.

tryauuum

> Enterprise architecture is immutable by default now, and destroying and replacing is the norm.

real life is harder. If I have a cluster of 8 H200 machines running training I can't really destroy it and redeploy. Technically I can but I need to spend time with the data scientists to make sure they configured everything to continue training from checkpoints. And if this cluster is idle for a day the amount of money wasted is around my monthly salary..

hm, maybe more enterprisey clusters are used in a such a way that any node can be replaced at any time.

stego-tech

And this gets into another complication of ET that doesn't happen with PT: with Product Tech, the onus is on the customers to modernize around a new update, whereas with ET, it's our responsibility to work around the customers, on their schedule, and their timeline, unless we want to be fired for "bad customer service".

We cannot simply rip and tear like Product can, placing trust in your orchestrators to rebuild from configs with brand new instances. We can't spool up Chaos Monkey and test-tank the ERP system, because the ERP team has no interest (or political benefit) in modernizing their infrastructure to support Configuration Management tools or pipelines.

stego-tech

> Deploying instances from prebuilt AMIs takes a dozen or so lines of Terraform code. Terraform can spin up dozens of instances in less than 5 minutes with a dozen lines of code

That's ignoring everything that goes into even deciding the "nitty-gritty" around the deployment, which is where the bulk of the code comes from. What security keys does the customer use? Do we use ASGs or one-offs? Is the underlying application fault tolerant or not? Does the customer require backups? What subnet does it go in? What security groups need to be added? What are the tags? Is it region-specific? Does it belong in a higher security zone? Does it need specific failover criteria?

500 lines later, you can deploy one VM with everything needed to meet the customer and organizational demands. That's not efficient, but that's how enterprise technology ultimately works.

> Maybe for laggards it's true, but modern enterprises with modern ops teams using modern tooling are deploying most of everything with immutability in mind. Enterprise architecture is immutable by default now, and destroying and replacing is the norm.

So throwing insults isn't exactly helping here, because I'm literally coming from said modern ops teams, using said modern tooling, from a large enterprise. You can apply a universal standard to "all enterprise" all you want, but the cruel reality is that most Enterprise technology does not work in the way you are describing. ERP servers remain mutable, database clusters are mutable, Physical Security appliances are mutable, hypervisor ops appliances are mutable, VPN concentrators are - you guessed it - mutable. We have built the tooling to support immutable architecture, we have demonstrated its capabilities to the Enterprise, we are ready for Kubernetes and Containers both on-prem and in the cloud, but our customers and applications flatly do not use or support it.

This is something I have had to explain time and again to the Powers that Be (TM), that Enterprise Technology and Product Technology needs/pipelines/customers are vastly different, with different paces, needs, and operational goals. No amount of Terraform, Ansible, GitHub Actions, Argo Workflows, Puppet, or other pipeline add-ons are going to speed up Enterprise Technology, because the software providers do not care to do so. If your Enterprise application selection enables immutable architecture across the board, you are exceedingly lucky to have leaders who allow that to be the case, because in my experience - from small MSPs, to major publishers, to giant tech conglomerates, and everywhere in between - Enterprise Technology is mostly mutable infrastructure with old-but-custom software that will never, ever be modernized, and often with SLAs far superior than anything public customers are allowed to have.

ctrlp

what sort of cryptid are you?

stego-tech

As the username implies, the "dinosaur on the internet" kind. The classic trope of the IT person who live(d) in their windowless cave, surrounded by a cacophony of whirling fans and grinding hard drives, retired kit repurposed into a lab since the budget never allowed for a proper one. Graphic tees and blue jeans, an enigmatic mystery to the masses who complain stuff is broken but also that they don't know why I'm here since everything always works.

So just your average IT person, really. What we lack in social graces, we make up for with good humor, excellent media recommendations, and a loved passion for what we create because we like seeing our users smile at their own lives being made easier. I guess the "cryptid" part comes in because I'm actively trying to improve said sociability and round out my flaws, unlike the stereotypical portrayals of the BOFH or IT Crowd.

ctrlp

the stegotech, a mythical beast that, unlike the BOFH, will not bite the hand that submits a support ticket

throwanem

I miss that kind of work.

voidfunc

I ditched Terraform years ago and just interact with the raw cloud provider SDKs now. It's much easier to long-term evolve actual code and deal with weird edgecases that come up when you're not in beholden to the straight jacket that is configuration masquerading as code.

Oh yea, and we can write tests for all that provisioning logic too.

plmpsu

How are you handling creating multiple resources in parallel? or rolling back changes after an unsuccessful run?

gorgoiler

Not OP, but for rolling back we just… revert the change to the setup_k8s_stuff.py script !

In practice it’s a module that integrates with quite a large number of things in the monolith because that’s one of the advantages of Infrastructure as Actual Code: symbols and enums and functions that have meaningful semantics in your business logic are frequently useful in your infrastructure logic too. The Apples API runs on the Apples tier, the Oranges API runs on the Oranges tier, etc. etc.

People call me old fashioned (“it’s not the 1990s any more”) but when I deploy something it’s a brand new set of instances to which traffic gets migrated. We don’t modify in place with anything clever and I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)

solatic

> I imagine reverting changes in a mutable environment is indeed quite hard to get right (and what you are hinting at?)

I guess you're not managing any databases then? Because you can't just treat those immutably, you have to manage the database in-place.

jiggawatts

One thing that annoys me is the inconsistency between mutable "data" resources and everything else.

Something that would be nice would be the rough equivalent of the deployment slots used in Azure App Service, but for everything else too. So you could provision a "whole new resource" and then atomically switch traffic over to it.

solatic

You can express this in Terraform, it's just a little more contrived. You release your changes as Terraform modules (a module in and of itself doesn't do anything, it's like a library/package), then your Terraform workspace instantiates both a "blue" module and a "green" module, at different versions, with DNS / load balancing resources depending on both modules and switching between either blue or green.

michaelmior

> revert the change to the setup_k8s_stuff.py script

What about resources that were created by the code you reverted?

CoolCold

it's not fair to ask such questions

inopinatus

A very small shell script.

kikimora

I’ve been thinking about this for a long time. But doesn’t it brings a host of other issues? For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created? I need to make a small change, how do I know what parts of my scripts to run?

klooney

Here are the things that TF does that you are probably not going to get around to in a comprehensive way-

- State tracking, especially all of the tedious per cloud resource details

- Parallelism- TF defaults to 10 threads at a time. You won't notice this when you write a demo to deploy one thing, but it really matters as you accrete more things.

- Dependency tracking- hand in hand with the parallelism, but this is what makes it possible. It is tedious, resource by resource blood sweat and tears stuff, and enabled by the inexpressive nature of HCL

Plus, you know, all of the work that has already done by other people to wrap a million quirky APIs in a uniform way.

voidfunc

You write code to do these things? If there's a requirement for you to be able to do such a thing make it a feature, implement it with tests and voila, no different than any other feature or bug you work on is it?

diggan

> For example, I need to update instance RAM from 4 to 8 Gb but how do I know if the instance exists or should be created?

    let front_id = if instance_exists("front_balancer") {
      return fetch_instance("front_balancer").id
    } else {
      return create_new_instance("front_balancer", front_balancer_opts).id
    }

Or however else you would manage that sort of thing in your favorite programming language.

> I need to make a small change, how do I know what parts of my scripts to run?

Either just re-run the parts you know you've changed (manually or based on git diffs), or even better, make the entire thing idempotent and you won't have to care, re-run the entire program after each change and it'll automagically work.

michaelmior

> Either just re-run the parts you know you've changed (manually or based on git diffs)

This is exactly the sort of thing Terraform is designed to avoid because it can obviously get quite messy. Agreed that making things idempotent solves that problem, but it's not always obvious/easy how to do so.

kikimora

I get the idea but I don't think it addresses the issue. There has to be a function that a) checks if instance exists b) checks if instance state is what I want (e.g. it has 8 GB ram) c) if not it updates the instance. Ideally it also locks environment while doing this to prevent race conditions. It can be written but seems to be quite cumbersome. Complexity of this code and also time it takes to run it what concerns me most.

I guess this is why terraform state is there. IMHO state is IaaC biggest weakness because you have to keep it consistent with actual cloud state. If we can just query state from the cloud and make it performant + be able to automatically (or just fast enough) select resources to be update it would be ideal.

evantbyrne

I went through the same evolution, even built a PaaS for AWS, but I kept going and now just deploy my own stuff to VMs with Swarm via one command in Rove. It's great. And yes I know kubernetes I use it at work. It's an unnecessary waste of time.

dijksterhuis

> Swarm

docker swarm is so simple and easy compared to the utter behemoth that is k8s, and basically is all you need for CRUD webapps 80-90% of the time. add an RDS instance and you’re set.

i will always pick swarm in a small company* whenever possible until k8s or ECS makes sense because something has changed and it’s needed.

dont start with complexity.

* - bigger companies have different needs.

evantbyrne

People have really been sleeping on Swarm. I sometimes even see people trying to recreate Swarm features with Compose. Wish more devs knew about it.

solatic

Terraform added tests somewhat recently: https://developer.hashicorp.com/terraform/language/tests

imp0cat

And eventually, you end up with your own in-house Terraform.

beacon294

I agree that the SDK is better for many use cases. I do like terraform for static resources like aws vpc, networking, s3 buckets, etc.

abound

I think I've commented this elsewhere, but using Cue [1] is also great for this purpose, with no extra infrastructure. E.g. you define a Cue Template [2], which seems analogous to Yoke/ATC's CRDs, and then your definitions just include the data.

Here's an example of Vaultwarden running on my K8s cluster:

    deployment: bitwarden: {
      spec: {
       template: {
        spec: {
         containers: [{
          image: "vaultwarden/server:1.32.7"
          env: [{
           name:  "ROCKET_PORT"
           value: "8080"
          }, {
           name: "ADMIN_TOKEN"
           valueFrom: secretKeyRef: {
            name: "bitwarden-secrets"
            key:  "ADMIN_TOKEN"
           }
          }]
          volumeMounts: [{
           name:      "data"
           mountPath: "/data"
           subPath:   "bitwarden"
          }]
          ports: [{
           containerPort: 8080
           name:          "web"
          }]
         }]
         volumes: [{
          name: "data"
          persistentVolumeClaim: claimName: "local-pvc"
         }]
        }
       }
      }
     }

And simpler services are, well, even simpler:

    deployment: myapp: spec: template: spec: containers: [{
      ports: [{
       containerPort: 8080
       name:          "web"
      }]
     }]

And with Cue, you get strongly typed values for everything, and can add tighter constraints as well. This expands to the relevant YAML resources (Services, Deployments, etc), which then get applied to the cluster. The nice thing of this approach is that the cluster doesn't need to know anything about how you manage your resources.

[1] https://cuelang.org/

[2] https://cuelang.org/docs/tour/types/templates/

Cyphus

I really want to dive in with Cue, but one thing that I got burned on when using jsonnet to generate CloudFormation templates years ago was lack of discoverability for newcomers to the repo.

Taking your sample code as an example, someone might look at the myapp deployment definition and ask: “does this deployment get created in the default namespace or does it automatically create a myapp namespace? What’s the default number of replicas? Are there any labels or annotations that get automatically added?” Etc.

On the flip side, there’s potential lack of “greppability.” The user may have found a problem with a deployed resource in, say, the development cluster, and go to grep for some resource-specific string in the repo, only to come up empty because that string is not in the source but rather generated at by the templating system.

To be clear, both of these problems can affect any method of generating config, be it yoke, helm, ksonnet, kustomize, or cue. It’s like a curse of abstraction. The more you make things into nice reusable components, the easier it is for you to build upon, and the harder it is for others to others to jump in and modify.

At least with Cue you get properly typed values and parameter validation built in, which puts it miles ahead of “everything is a string” templating systems like the helm templates the article complains about.

strangelove026

I was kind of interested in cue earlier last year as IIRC it can be served by helm and is much much better than templating yaml. Never really got started with it. Wish they had an LSP too.

https://github.com/cue-lang/cue/issues/142

mdaniel

What the hell is going on with their bot copy-pasting every comment on that issue? What a mess

Anyway, I wanted to ask what you meant by "served by helm?" I knew about https://github.com/stefanprodan/timoni and https://github.com/holos-run/holos but I believe they are merely "inspired by helm" and not "cue for helm"

nosefrog

Reminds me of gcl (yikes).

bbu

Looks promising but it starts with a (justified) rant about terraform and then goes into how to replace Helm.

I am confused. Can yoke be used to create and manage infrastructure or just k8s resources?

thayne

Indeed. This isn't really a replacement for terraform, unless you are only using terraform to manage k8s resources. Which probably isn't most people who are currently using Terraform.

xena

Author here. It's mainly for k8s resources; but if you install operators like external-dns or something like crossplane into your cluster, you can manage infra too.

groestl

> into your cluster

I guess the point is: what if you don't have a cluster.

darkwater

And also: what manages the Kubernetes cluster lifecycle in the cloud provider, or on bare metal?

There is life before (and beyond) Kubernetes.

sureglymop

What alternative to terraform would one use to set up the whole cluster before provisioning any resources?

I currently have a custom script that is a mix between terraform and ansible that sets up a proxmox cluster, then a k3s cluster and a few haproxys with keepalived on top. Granted, maybe not the most standard setup.

e12e

Do you have a complex Ansible setup? For the few bespoke VMs I need, I've been able to get away with cloud init so far - but they're explicitly set up to be reasonable to nuke and recreate - if they had more personality and needed to be more managed as pets - I would probably need to reach for something like Ansible - or see if I could build images (vm or Docker).

But then with images I'm on the hook for patching... Not simply auto-patching via apt...

glitchcrab

I use the cluster-api provider for Proxmox running in an ephemeral cluster (usually kind) to bootstrap the Kubernetes cluster to a point where Flux gets installed and it takes over managing itself. I then throw the kind cluster away as I no longer need it.

e12e

I've considered dropping terraform (openTofu) for our k8s resources since k8s is stateful anyway.

But that would complicate synchronization with resources outside of k8s, like tailscale, DNS, managed databases, cloud storage (S3 compatible) - and even mapping k8s ingress to load_balancer and external DNS.

So far I feel that everything in terraform is the most simple and reasonable solution - mostly because everything can be handled by a single tool and language.

bbu

ok, that makes sense. A better Helm would be nice. timoni.sh is getting better and better, but Cue is a big hurdle.

Unfortunately, I'm not a big fan of the yaml-hell that crossplane is either.

But as a Terraform replacement systeminit.com is still the strongest looking contender.

jonasdegendt

> A better Helm would be nice.

Consider CDK8s (Typescript or Go) or Jsonnet. We evaluated Cue and the two aforementioned options and ended up with CDK8s using Typescript and it's incredibly powerful.

e12e

Hm... CDK8s just helps herding k8s yaml, nothing else?

There's nothing like terraform plan/apply?

I mean - some help wrangling yaml is welcome - but I already get (some) help from terraform with the k8s provider there...

Do you check in the generated yaml in git, or just the typescript code?

https://cdk8s.io/docs/latest/get-started/typescript/

danw1979

It’s just a dunk on terraform to promote yet another K8s provisioning thing.

WatchDog

I'm quite happy with CDK[0].

My experience is only with the main AWS cloudformation based version of CDK, although there is also CDK for terraform, which supports any resource that terraform supports, although some of what I'm about to say is not applicable to that version.

What I like about CDK, is that you can write real code, and it supports a wide range of languages, although typescript is the best experience.

Provided that you don't use any of the `fromLookup` type functions, you can run and test the code without needing any actual credentials to your cloud provider.

CDK essentially complies your code into a cloudformation template, you can run the build without credentials, then deploy the built cloudformation template separately.

You don't need to worry about your terraform server crashing half way though a deployment, because cloudformation runs the actual deployment.

[0]: https://github.com/aws/aws-cdk

chuckadams

My main problem with CDK is that it only outputs a CloudFormation stack. I can sign up for a new cloud account, spin up a k8s cluster, deploy everything to it, and restore the database snapshot faster than CF will finish a job that's stuck on UPDATE_CLEANUP_IN_PROGRESS.

Of course there's also cdk8s, but I'll probably go with Pulumi instead if I need that. Right now I'm happy with helmfile, though not so much with helm itself. So I'll definitely be giving Yoke a look.

cedws

In your experience how often have you had template builds succeed but then fail at apply time? This kind of issue is what I find most frustrating about IaC today, your 'code' 'compiling' means nothing because all of the validations are serverside, and sometimes you won't find out something's wrong until Terraform is already half done applying. I want to be able to declare my infrastructure, be able to fully validate it offline, and have it work first try when I apply it.

Aeolun

I find Pulumi very nice here because it persists state after every successful resource creation. If it breaks somewhere in the middle, the next run will just pick up where it left off last time.

CDK… well, CDK doesn’t get in an invalid state often either, but that’s because it spends 30m rolling back every time something goes wrong.

WatchDog

I've had less such issues with CDK, versus raw cloudformation, or terraform, but it can still happen.

jon-wood

> Yoke is a project that takes this basic idea to the next level. With Yoke, you write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster.

This just puts me in mind of https://howfuckedismydatabase.com/nosql/

skinkestek

>> Wait, there's something here that I'm not getting. Why are you compiling the code to WebAssembly instead of just running it directly on the server?

> Well, everything's a tradeoff. Let's imagine a world where you run the code on the server directly.

> If you're using a language like Python, you need to have the Python runtime and any dependencies installed. This means you have to incur the famous wrath of pip (pip hell is a real place and you will go there without notice). If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.

> One of the main advantages of using WebAssembly here is that you can compile your code once and then run it anywhere that has a WebAssembly runtime, such as with the yoke CLI or with Air Traffic Controller.

At this point, why not use a proper runtime like JVM or .Net?

Then one can also easily use reasonable languages like C#, Java or Kotlin as well.

LoganDark

> At this point, why not use a proper runtime like JVM or .Net?

Because then you are forced to only use managed languages?

skinkestek

Ahh, good point.

I guess Rust (and maybe other unmanaged languages) can be compiled to WebAssembly?

xena

Yes:

Go: https://go.dev/blog/wasi

Rust: https://github.com/bytecodealliance/wasmtime/blob/main/docs/...

politelemon

.net: https://devblogs.microsoft.com/dotnet/extending-web-assembly...

LoganDark

https://logandark.net/calc is C++ compiled to WebAssembly using Emscripten. Back from I think 2018.

These days Rust is practically the poster child of compiling to WebAssembly because it's so easy. Most WASM content I see is actually about Rust.

sunrunner

> a proper runtime like JVM or .Net?

Or (god forbid) a proper runtime like just the OS and architecture of the platform you're running the tool on.

This article calling out Go because you have to prebuild binaries for every OS and architecture combination beforehand (even though you probably already know this combination and it's likely small) and saying that doesn't scale but then requiring every user to not just get the WASM binary they want to run but _also_ just get the (OS and architecture-specific) runtime for the environment seems unfair.

Anything where you distribute your application in a way that isn't immediately usable on a target platform without having to set up an environment of some kind feels like a 'Shift right deployment' kind of thinking, where it's fine to do the bare minimum to make your tool available and it's now an end user's problem to get everything set up to run it.

In some cases this is fine and the cost is low (same-language library usage for example) but when it comes to tool distribution every time I see 'cargo install' for someone's cute Rust project or an instruction to 'just do the following ten steps and you can run the executable' I usually just back away from GitHub and don't bother.

beacon294

It's possible a few ways. I prefer modules, and this LLM answer describes an older way with count and for_each.

It's always possible that incantation of the problem space has a gotcha that needs a work around, but I doubt it would be a blocker.

https://www.perplexity.ai/search/if-you-really-do-think-that...

INTPenis

Who are these ops people that want to write golang and rust? It seems like a tiny niche. If you're that comfortable writing golang or rust then why not just become a developer?

I'm a lifelong ops person, since 2000, and I use Ansible or Terraform daily.

I often wanted to learn golang better but I just never had the motivation. I'm a mean Python scripter, I can write a system integration in hours, but there's something about compiled languages I just never could get into.

I'm saying this only because the whole point of yoke is to define your infrastructure in Golang so that you can add in the missing pieces with Golang. So that you're free to use Golang for anything other than the pre-defined infrastructure providers in Yoke, so you're now a Golang developer. Congrats.

bionhoward

A big benefit is the compiler catches bugs so you don’t have to wait around for your Python program to crash at runtime. Also, if the type system is more “legit” then you can skip a ton of defensive parsing of inputs.

Could be more about developers who know Golang and Rust wanting to deploy their apps (no need to pigeonhole anyone into just dev or just ops)

ForTheKidz

I don't really see a distinction between developer and ops person in this context. The whole point of all of these tools is to make infrastructure into code. Go isn't the choice I would have made but it's fine.

Comment was deleted :(

abofh

Being a developer is not necessarily a life-long goal of ops people. I like playing with all the toys in the toybox - sometimes I need to write code to make things play the way I want. I went to school to become a programmer, and discovered that while I enjoyed programming for myself, I hated doing it for others -- otoh, I had been working as an ops guy to pay beer money, and found that it was a lot more aligned with my interests.

Most of these tools though, are written by engineers who don't want to understand the tools they were given, and want to write their own. Even the vaunted pulumi's 'aws-native' package is just built on top of cloud control which is built on top of cloudformation which is often cited as the reason terraform (which pulumi is based on top of) was created: "eww, I don't like that tool".

Which is all to say - people write code because they have a problem - engineering, operational, it doesn't matter. Assuming an ops person wants to become a developer is akin to assuming all developers want to become managers, and that all managers want to become TV stars. The logic presumes a viewpoint that simply isn't true.

empath75

I've written a bunch of k8s operators in go (and rust more recently). That's how basically everyone working with k8s does once you reach a certain level of complexity.

I don't really understand, in fact, why you'd use yoke instead of just writing an operator with kubebuilder or kube-rs.

davidmdm91

Writing your own operator is a lot of complexity when what you want to do is deploy a package.

Via Yoke's AirTrafficController, it's easy to define a package as a CRD and its implementation (a program that takes the CR and outputs the desired underlying resources) and the AirTrafficController takes care of the rest.

You no longer need to think about reconciler loops, figuring out any orphaned state, etc.

Kinrany

Supposedly it's a package manager as well, so if there's a package, you'd be able to use it without writing any code.

postpawl

A lot of important devops tools like Kubernetes and Grafana are written in golang, and it’s often handy to be able to import their code to use in your own code to automate those things.

INTPenis

But again, you're now a developer.

And I'm asking who are these developers using IaC tooling? It seems to me like it was made for ops.

All power to you if you take on both roles, but that's a good way to get burned out. I'm a devops person so the devs can focus on just code, and I can focus on making the best and safest infrastructure for them to run their code in.

terrabitz

I feel like the distinction between the two is fairly contrived these days. I'm an SRE, and we're constantly building tooling to help us better manage infrastructure, improve reliability, improve DX, etc. On the flip side, we also push a lot of the responsibility for infrastructure management to our devs: we maintain the modules and IaC pipelines, and the developers hook up the building blocks they need. It can actually help avoid burnout because our team doesn't become a bottleneck for infrastructure provisioning.

Say what you want about IaC in Go or other programming languages, but it can definitely help strengthen the whole "developers own their infrastructure" since they don't have to learn an additional language syntax.

postpawl

Those developers are working on “Internal Development Platforms” and building their own abstractions on top of tools like Kubernetes and Grafana to simplify things for developers. This page explains it pretty well: https://internaldeveloperplatform.org/what-is-an-internal-de...

solatic

> If you're using a language like Go, you need to have either the Go compiler toolchain installed or prebuild binaries for every permutation of CPU architecture and OS that you want to run your infrastructure on. This doesn't scale well.

This is exactly the approach that Terraform takes. Both Terraform and its providers are written in Go, which is a great language for this purpose because of GoReleaser and the ease of compiling to different architectures and OSes. It scales just fine.

Did the author talk to any senior Terraform practicioners before building this?

davidmdm91

Hi. I think the article was just showing the example that IaC tools use configuration languages instead of code. Yoke is not a terraform replacement, and does not mention terraform anywhere its documentation.

It does sit at the same level as helm & timoni. It just takes a code-based approach to managing your cluster (which in turn can manage your infra but that wasn't the larger point).

anonfordays

>Did the author talk to any senior Terraform practicioners before building this?

They clearly did not. Their example of something that's supposedly not possible in Terraform:

"If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform."

Is something a junior infrastructure engineer can do in about a dozen lines of Terraform code: https://www.perplexity.ai/search/if-you-really-do-think-that... (Taken from another comment here)

If they would have asked anyone that uses Terraform, they would not have written that.

thayne

> This is not code. This is configuration.

I don't think those two things are mutually exclusive.

IMO hcl is absolutely code. As is html, and css, json, and yaml.

It isn't a full programming language, and I often wish it was, but I wouldn't say it isn't code.

voidnap

JSON YAML are file formats for data. Is XML code? Is SVG code? Is a GIF code? Is a BMP code?

grahar64

I was 100% for infra as code as it gives devs more freedom to get what they need. Then the startup went from 50 to 100 to 1000 and people just needed to get stuff done and usually the exact same thing over and over. So we migrated to a custom DSL which is much easier to standardize, lint, review and read. I think when you don't know what you need code is better for flexibility, when the domain is sorted, DSL.

smithcoin

Speaking of IAC- I have an existing GCP project with some basic infra (service accounts, cloud run jobs, cloud build scripts, and databases) what is the best tool to _import_ all of this into IAC. The only real tool I’ve found is terraformer. I have no dog in the race regarding tooling e.g if my output is Pulumi, terraform, or just straight YAML. I’m just looking to “codify” it.

Any suggestions from experience?

danw1979

Just go with plain Terraform.

You can check the docs for the GCP provider to see if the resources you want to manage are "importable" into the Terraform state file; they usually are and you'll see a section at the bottom of each resources documentation page showing you how to do this. e.g. https://registry.terraform.io/providers/hashicorp/google/lat...

Your process will be -

1. Write TF configuration approximating what you think is deployed

2. Import all your resources into the state file

3. Run a `terraform plan ...` to show what Terraform wants to change about your resources (including creating any you missed or changing/recreating any your config doesn't match)

4. Correct your TF configuration to reflect the differences from 3.

5. Goto 3, repeat until you get a "No changes" plan or the you actually want TF to correct some things (add tags, for example)

6. run `terraform apply`

and optionally...

7. set up your CI/automation to run `terraform plan` regularly and report the "drift" via some means - stuff that has been changed about your resources outside of Terraform management.

I put a lot of stock in this last step, because small, incremental change is the cornerstone of platform management. If you want to make a change and come to find there's a huge amount of other stuff you have to correct as well, your change isn't small any more.

taberiand

You don't need to write all the tf upfront for existing resources.

Use `import` resources in a .tf file (I like to just call it imports.tf) and run `terraform plan -generate-config-out=imported.tf`

That will dump the tf resources - often requires a little adjustment to the generated script, but it's a huge time saver

Aeolun

I used this instead of terraformer. Can agree that it’s a huge timesaver.

iliec

This seems like a great approach that sits between using the sdk directly and a dsl/yaml. My experience has been that most of the people configuring these systems don’t know how to code, and configuration languages is their gateway. Most never venture past configuration which is why yaml is so used and difficult to get any traction outside of it. I think terraform adopted some of the patterns which have been around since a long time ( remember the chef va puppet discussion from a decade ago) and it massively helped with adoption. Cue seems a step up from terraform ( you can use cue vet for type checking, even if CRDs are not yet supported all the way) but tracking seems to be low as it’s hard for non-programmers to grasp. Maybe Claude will help to move all people that don’t want to manage these systems with code to something even more simpler than yaml and open the door for real infra as code for the rest.

solatic

> My experience has been that most of the people configuring these systems don’t know how to code, and configuration languages is their gateway

I don't really disagree but this is such a pessimistic, NIH-syndrome viewpoint. Feel free to look at the code for any of the major Terraform providers. There's a lot of production-hardened, battle-tested Go code that's dealing with the idiosyncrasies of the different cloud APIs. They are an incredibly deep abstraction. Terraform also implicitly builds a DAG to run operations in the right order. Comparing writing HCL to writing straight Go code with the AWS SDK, the HCL code has something like an order of magnitude fewer lines of code. It absolutely makes sense to use Terraform / HCL instead of writing straight Go code.

sepositus

Yeah, don’t really understand the sentiment here. I’ve been programming for 20 years and actively use Terraform and CUE at work. I actually write a lot of Go code for our platform, but I’ve never once thought it’d be a good idea to just start calling APIs directly.

Stranger43

But doesn't the codeless "infrastructure as code" kind of smell like cargo cult practices, i mean there might be places where having your infrastructure defined as data is a really good thing, but at least in my work i keep hitting roadblocks where i really wish i was writing actual logic in a modern scripting language rather then trying to make data look like code and code look like data, which is what a lot of devops tutorials seem to be teaching.

iliec

> traction seems to be low when referring to cue. Autocorect issue

supriyo-biswas

I feel that writing out infrastructure templates through a "proper programming language" (for the lack of a better term) comes with some sharp tradeoffs that many don't recognize.

A big feature of most IaC tools is that they are relatively logic-less and therefore can be easily understood at a glance, allowing for easier reasoning about what resources can be created, and this ability is diminished by introducing logic, and debugging issues in them becomes a nightmare. A large company I used to work for had a system just like that, and while I thankfully never had to work with said system, hearing statements like you can "debug your templates with pry[1]" being touted as a feature is something I hope to never hear again.

[1] https://github.com/pry/pry

nickmonad

Yeah, I've always felt like defining infrastructure in a full-on language fell nicely into that category of "just because you can doesn't mean you should."

I've only recently started to see this play out with a sufficiently large infrastructure setup and nothing is more infuriating than having to keep multiple layers of logic in your head when you're trying to figure out why some value got set on a task definition.

danw1979

You’re wrong. You can do that with Terraform.

You can also provision stuff that isn’t just k8s.

marcinzm

This feels like a Helm replacement and not a Terraform replacement in any real way. Which is fine but confusing when the post begins with Terraform.

Terraform can create an K8S cluster. This requires an K8S cluster to work. Etc.

sepositus

From the website:

> New tools like CUE, jsonnette, PKL, and others have emerged to address some of the short comings of raw YAML configuration and templating. Inspiring new K8s package managers such as timoni. However it is yoke’s stance that these tools will always fall short of the safety, flexibility and power of building your packages from code.

The never-ending debate continues between configuration languages and traditional languages. I don't know if the industry will ever standardize in this area.

Kinrany

It will once traditional languages are good enough

bbkane

I disagree - I think there are fundamental tradeoffs between declarative and imperative ways of configuration so they'll never "converge" completely.

I am a huge fan of each camp stealing ideas from the other and thus "converging" closer.

I also like the two-step "write imperative code to generate declarative config" approach some systems take - Terraform/Pilumi do this. At the cost of having two steps, you get to write your for loops AND get a declarative "state" you can diff easily with the previous state.

Kinrany

Configuration languages are just structured data, and programming languages also need to be able to express complex literal data.

JSON is already explicitly designed as a ~subset of JavaScript. An equivalent of a JSON written as a JavaScript literal is easier to read than JSON. The only problem is that we often don't want to use an interpreter to parse data.

fire_lake

Traditional languages will not remove the features that make them poor for configuration.

I can’t imagine Python dropping the ability to query the current date, for example.

otabdeveloper4

No, Nix is "infrastructure as code, but actually".

The downside is that now you have to code in Nix.

__turbobrew__

> This is not code. This is configuration.

It is all configuration (data) my friend. At the end of the day the internet and the services which connect it are all based off hard codes somewhere.

How do you think Google enumerates all their datacenters? You bet ya there is a magic config file you have to update to add new datacenters and from there the automation kicks off the infrastructure based upon the contents of the config file, the configuration is still there.

liampulles

In a previous project, I decided to write a Go program to template k8s YAML from "higher level" definitions of resources. It worked, but the mapping code ended up being more complex than I would have liked, and I think it was ultimately difficult to maintain.

Lesson for me was to think about infrastructure in terms of infrastructure, i.e. treat it as its own domain.

tryauuum

what problem does it solve?

    - However, Pulumi has a few downsides:

    - You have to install the language runtimes and dependencies for the language you're using
    - The code has to run on the server that's managing the infrastructure
 
    This sounds reasonable at first, but then you come to the shocking realization that code that runs on the host machine can do literally anything it wants. This means that if a dependency gets popped, your infrastructure is now compromised and likely has cryptocurrency miners running on it.

can't you just put whatever you use to manage infrastructure in a docker container / appimage / whatever and be free of "what if dependencies contain cryptominers" problem?

habitue

People are on here arguing that terraform is code, actually. Sure it is, but it's not a good general purpose programming language. It is a config language that grew some features from general purpose languages. It's clearly more pleasant to write that config in a real language

kacesensitive

If you really do think that Terraform is code, then go try and make multiple DNS records for each random instance ID based on a dynamic number of instances. Correct me if I'm wrong, but I don't think you can do that in Terraform.

Great take.

taberiand

Except it's not, because their example is trivially easy and common in Terraform.

kacesensitive

The challenge isn't just defining multiple DNS records—it’s doing so dynamically based on an unknown number of instances at plan time. Terraform struggles with truly dynamic resource creation because it relies on a static graph. You can use count or for_each, but those require knowing the instances in advance within the Terraform configuration. If your instances are created dynamically outside Terraform (e.g., via auto-scaling groups), you hit limitations.

You can work around this by using external data sources or separate workflows (e.g., running Terraform after instances are created), but that just proves the point: Terraform isn’t fully "code" in the sense of having true loops and dynamic logic like a real programming language.

If you think this is trivially easy, show me how you'd do it without resorting to hacks like running terraform apply twice.

taberiand

You can't do it if you have the instances created with auto-scaling groups of course. But nobody would think you could, that's runtime not infra.

With Terraform you are supposed to have multiple coupled state files, with a tree structure of references, so that eg the state file containing the DNS can reference the previously applied state file that created the instances

You are supposed to run terraform apply in a sequence that respects the dependency graph. Terragrunt makes this trivial.

kacesensitive

At that point, you’re conceding the exact limitation I was pointing out. Terraform can't handle truly dynamic infrastructure changes within a single plan because its execution model is declarative, not imperative. Saying "nobody would think you could" just acknowledges that Terraform lacks the flexibility of real code—because if it were actual code, you'd be able to handle this inline rather than orchestrating multiple runs with external tools.

Yes, you can manage this with separate state files and a structured apply sequence (e.g., using Terragrunt), but that’s just adding more scaffolding to work around Terraform’s inability to express dynamic logic. That’s infrastructure orchestration, not infrastructure as code.

The fact that we have to rely on external tools or multi-step workflows to accomplish something that would be trivial in a real programming language just reinforces the point: Terraform isn’t really "code" in the traditional sense.

taberiand

The point is terraform can do it, and it does it well. Just because you don't want to use Terraform properly doesn't mean it's bad at what it does.

Using "a real programming language" to do the infra still has to solve the same issues faced by terraform. Using a programming language to define infra doesn't solve the auto scaling DNS issue, for example, you'll be using lambdas to create the dns either way. It also doesn't inherently solve coordination of the resource deployment, you still need to organise your code into modules and ensure the order of execution.

If you think Terraform is the problem here you're blaming the tool for a failure of process and understanding

kacesensitive

The core issue isn't whether Terraform can do it—it's how it does it. You're describing a workflow that requires multiple state files, external tools like Terragrunt, and sequential terraform apply runs to work around the fact that Terraform itself lacks imperative, runtime-driven logic. That’s not "using Terraform properly"—that’s compensating for its limitations.

And sure, using a general-purpose language doesn’t magically eliminate coordination problems, but it does give you far more control. If I were using something like Pulumi or CDK, I wouldn’t need to hack around Terraform’s static graph by splitting state files and manually sequencing deployments. I could express logic directly in code—dynamically querying instance IDs, handling autoscaling changes, and updating DNS records inline, without requiring an entirely separate execution step.

So no, this isn't a "failure of process and understanding"—it's just recognizing that Terraform’s declarative model is great for static infrastructure but falls short when dealing with truly dynamic scenarios. If you think its workflow is fine, that’s cool—but don’t pretend it doesn’t have real limitations just because you've built processes to work around them.

tekla

Yes, if you use your tools poorly, it will turn out badly.

I run infra that has dynamic scaling provisioned via Terraform. Guess what, we auto-scale from hundreds to thousands of boxes a day dynamically managed through TF and we have had no issues, since we bothered to figure out how to do it properly.

kacesensitive

I don’t doubt that you’ve made Terraform work for your needs. The point isn't that Terraform can't be used for dynamic infrastructure—it's that doing so requires workarounds like pre-split state files, sequential applies, or additional tooling like Terragrunt. That’s not the same as having a truly dynamic system where resources can be created and modified in response to real-time conditions without external orchestration.

Terraform works well for many cases, but the fact that you had to “figure out how to do it properly” kind of proves the point—it’s not inherently designed for dynamically changing infrastructure within a single apply cycle. If it were, you wouldn’t need external coordination to handle something as simple as "create DNS records for all instances, even if the number changes at runtime."

At the end of the day, Terraform is great for declaring infrastructure, but it lacks the flexibility of programming infrastructure. If you’re happy with the trade-offs, great—but let’s not pretend those trade-offs don’t exist.

TonyCoffman

You don’t need any of that. Define a data source to query the instances and then a for_each DNS resource using the data source instances.

kacesensitive

If your instances are created by Terraform itself, sure, you can use for_each with a data source to define DNS records dynamically. But if the instances are created dynamically outside of Terraform—such as through an auto-scaling group—then Terraform's static plan model becomes a problem.

Terraform data sources can read existing infrastructure, but they don't automatically trigger new resource creation based on real-time changes. That means your DNS records won't update unless you manually run terraform apply again, and they won’t be part of a single apply cycle. In contrast, a real programming language could handle this as a continuous process, responding to infrastructure changes in real-time.

So yes, you can query instances with a data source and use for_each—but unless you’re running Terraform repeatedly to catch changes, your DNS records won’t reflect real-time scaling events. That’s the exact limitation I’m talking about: Terraform isn’t imperative, it’s declarative, and it doesn’t react dynamically at runtime without external orchestration.

taberiand

You keep saying using multiple state files is a hack and a workaround. It's not, it's how you're supposed to use the tool, and it's a good way to think about and manage infrastructure.

Applying changes in a structured, staged approach via that separation is a great way to avoid issues that will be obfuscated when managing infrastructure in other ways.

Comment was deleted :(

adamgordonbell

I'm very much onboard with the 'hey, code is useful' idea. Code is most useful when you want to build abstractions around infrastructure.

You can do declarative code most of the time but bust out function calls and control flow and even more heavy weight abstractions when needed. And you get all that nice typechecking that speeds things up.

This is basically why I joined Pulumi and why I joined Earthly before that. ( CI needs to move beyond YAML as well).

Where I disagree is that Pulumi ( or CDK or CDKTF) running the language runtime of the language you've decided to use is a problem.

Aeolun

Why is someone talking about infrastructure as code one moment, and then kubernetes manifests the next. The two are not the same. You can’t replace what Terraform or Pulumi does with kubernetes manifests.

empath75

You absolutely can.

https://www.crossplane.io/

I'm managing a few dozen cloud accounts on azure and aws with nothing but crossplane and a very small terraform script that bootstraps the control plane account.

Aeolun

Ok, maybe you can, but I don’t think you should xD

It feels a bit like that MS Paint IDE that was posted a day ago.

chuckadams

IaC doesn't mean your infrastructure config has to be code, it means you manage your infrastructure config like code. As in version control, PRs, tests, that sort of thing. How much of it is Turing complete is up to you.

figmert

Slightly related, in everything I see that allows adding secrets or env vars as code, they seem to prefer a list of objects instead of a key value pair for these. Does anyone know why this is? I know in some cases you can add additional values, but this seems easily solved by dynamically determining what the value is.

I'd much rather write:

    env:
      key: value

instead of

    env:
      - name: key
        value: value

Comment was deleted :(

xena

The reason they do that is because then you can do things like extract envvar values from secrets.

mdaniel

Pedantically, there's nothing stopping them from doing it the way CFN does it: dispatching on the value's type:

  env:
    AWS_REGION: us-east-1
    AWS_SECRET_ACCESS_KEY:
      valueFrom: ...

CFN even goes as far as having two ways to pull off that stunt, magic prefixes and objects

  Properties:
    ImageId: !Ref MyAwesomeAmi
    # or
    ImageId:
      Ref: MyAwesomeAmi
    # or cursed "{{resolve" syntax
    ImageId: !Join
      - ''
      - - '{{resolve:ssm:/aws/service/bottlerocket/aws-k8s-'
        - Ref: EksVersion
        - '/x86_64/latest/image_id}}'
    # which ends up being "ImageId: '{{resolve:ssm:/aws/service/bottlerocket/aws-k8s-1.31/x86_64/latest/image_id}}'" and then *that* gets replaced by the CFN service at provisioning time

nikolay

Except that this is not really infrastructure code; it's Kubernetes code. Infrastructure is what runs your Kubernetes clusters!

Comment was deleted :(

weakfish

We use Pulumi at my org, with TypeScript, and it’s actually genuinely a great experience to be able to have typed outputs of infrastructure.

Example, poorly explained as I’m on mobile:

- create an S3 bucket object - assign bucket.name to a variable - use variable as input to the creation of the managed airflow object, which asks for an S3 bucket as part of it’s config

dvektor

Little did my guy know that he would be starting one of the more hackernews-ish arguments out there.

"is terraform code?", and despite everyone knowing exactly what he means, everyone must try to prove him wrong because he didn't choose the best example to prove his point.

randomcatuser

Wait this sounds really cool, I want to start exploring Kubernetes! Can someone with more experience tell me why I should? (currently I just make all my apps on NextJS/Supabase...)

Is there a cost advantage vs using a more managed experience, say, Fly.io?

Rickasaurus

I feel like the whole distributing webasm binaries thing is tough to swallow. I get the argument, but surely there's a better way, like maybe you could make a tiny interpreter in webasm and distribute that with a human readable script.

thebeardisred

I do not want the definitions of my infrastructure to be Turing complete.

Kinrany

The definitions can be declarative but generated from code

dijit

I think we need something like samurai/ninja for terraform, that would be sweeeeet

mdaniel

I believe you just discovered cdktf https://github.com/hashicorp/terraform-cdk#cdk-for-terraform

Also, I couldn't search for samurai in order to know if it's "jinja2 for $other_language" or what, but I just inferred that you wanted a templating language to generate terraform files

yevpats

complementary hitler uses docker https://www.youtube.com/watch?v=PivpCKEiQOQ . but yeah I think infrastructure is hard but at the end it's just a tool for a job, you can use terraform, cdk, pulumi, yoke - the cycle of X for Infrastructure is bad continues :)

daralthus

or you could just compile pulumi js to webassembly?

lorenzotenti

Is this a yoke?

_sorry_

imp0cat

Lol! There are some parts that read like one:

    "... write your infrastructure definitions in Go or Rust, compile it to WebAssembly, and then you take input and output Kubernetes manifests that get applied to the cluster."

chubot

This is not code. This is configuration.

FWIW we've been working on letting you declare data in YSH, a new Unix shell.

So you can arbitrarily interleave code and data, with the same syntax. The config dialect is called "Hay" - Hay Ain't YAML.

Here's a demo based on this example: https://github.com/oils-for-unix/blog-code/blob/main/hay/iac...

It looks almost the same as HCL (although I think this was convergent evolution, since I've actually never used Terraform):

    # this is YSH code!
    echo 'hello world'

    Data aws_route53_zone cetacean_club { 
      name = 'cetacean.club.'
    }

    Resource aws_route53_record A {
      zone_id = data.aws_route53_zone.cetacean_club.zone_id
      name    = "ingressd.$[data.aws_route53_zone.cetacean_club.name]"
      type    = 'A'
      ttl     = "300"
    }

And then the stdout of this config "program" is here - https://github.com/oils-for-unix/blog-code/blob/main/hay/out...

It can be serialized to JSON, or post-processed and then serialized

---

Then I show you can wrap Resource in a for loop, as well as parameterize it with a "proc" (procedure).

    make-resource (12)
    make-resource (34)
    if (true) {
      make-resource (500)
    }

This is all still in progress, and can use feedback, e.g. on Github. (This demo runs, but it relies on a recent bug fix.)

The idea is not really to make something like Terraform, but rather to make a language with metaprogramming powerful enough to make your own "dialects", like Terraform.

---

I wrote a doc about Hay almost 3 years ago - Hay - Custom Languages for Unix Systems - https://oils.pub/release/0.27.0/doc/hay.html

Comments - https://lobste.rs/s/phqsxk/hay_ain_t_yaml_custom_languages_f...

At that time, Oils was a slow Python prototype, but now it's fast C++! So it's getting there

The idea of Oils is shell+Python+JSON+YAML, squished together in the same language. So this works by reflection and function calls, not generating text ("Unix sludge"). No Go templates generating YAML, etc.

esafak

Where is YSH? Is it built into the oil shell? The YSH link in https://www.oilshell.org/cross-ref.html#YSH is broken.

chubot

Yes sorry, if you install the latest oils-for-unix binary - https://oils.pub/release/latest/ - Then you will get a symlink called "ysh".

It's also available in a few distros, HomeBrew, etc.

As mentioned, the demo depends on a bug fix that hasn't been released yet. Right now there are a bunch of people on Zulip giving feedback, so you are welcome to join and ask questions / give feedback, or wait until it's more fully baked

YSH itself is pretty "baked", but the Hay part is in flux

---

I fixed the link, thank you for mentioning that!

https://www.oilshell.org/cross-ref.html?tag=YSH#YSH

Crafted by Rajat

Source Code

hckrnws

Yoke: Infrastructure as code, but actually