For me, I would not consider buying physical servers for quite a long time. That's coming from someone who has quite a bit of experience working with hardware. If you shipped me a box of parts that would be just as well as a physical server. It's just the time / maintenance / cost that goes with that.
On the other hand, there is also that growth tipping point where cloud is not cost effective anymore and you need to switch to physical servers, or at least it would be a cost savings doing so.
That's quite a while later though save for some niche cases.
One interesting lane might be to combine these two and offer a one stop service for getting cloud servers and hardware. In particular you could offer guidence (automated or otherwise) about when it makes the most sense to use cloud or onprem at what scale.
Scope could also include things like colo vs self hosting, What datacenters to use, where to buy land to build new datacenters, region and cdn consideration, etc.
My wife used to be involved with data center provisioning for what the NRO called "agency unique equipment." This was a facility and a contract that did centralized infrastructure procurement, installation, and tier 1-2 support for any NRO workloads that needed to use hardware not available in Amazon's classified GovCloud enclaves.
Full workflow was customer generated and provided the initial build spec, her program (called "Palinode") validate this spec, made the purchase, received equipment from the vendor, assembled it in test racks at a contractor facility, performed a physical inspection and validation of the build, then shipped it to the production data center, installed it, and performed final validation on-site, usually with a customer witness. For that, they used a suite of custom tools that matched the functionality your linked toolkit seems to be providing, and also had management and alerting tools all developed in-house.
I'd say the biggest pain points were not really something any third-party or software provider could alleviate, and that was the locked down firmware and NDA agreements with the network appliance vendors. They experienced so many bizarre appliance failure modes that could not even be debugged except with chip-level log access that required vendor reps to come on-site just to be able to read them. And there was a major fire-drill level issue with defective memory they couldn't even disclose to their customers because of the vendor NDA, and it took a really long time to address it. I was honestly pretty surprised by that, as the vendor is a publicly traded enormous company that you think would need to legally disclose product defects like this in components that could potentially power safety-critical devices, but for whatever reason, that wasn't the case, and remediation was seriously slowed down by all the need for secrecy.
The only thing I can see changing this is competition, but it largely doesn't exist at the chip level or for specialized hardware. Many components and appliances only have two or three vendors, and the levels of compliance you have to go through to host classified data combined with patent law make it nearly undisruptable as an industry.
We've been looking at the logistics of co-location. Cloud Prices, even at Digital Ocean and Linode, are pricey!
The problem is finding anyone with experience doing this. It's a skillset that's sorta been lost.
I was just looking at your repo... Please correct me if I'm wrong but can't you do the same thing with Ansible or equivalent?
Automation built on top of ipmitool? I could see some people liking this.
Good customer development! Steve and Eric would be proud.
two times already. a lot of infrastructure is hidden for developers by default. My motivation is in situations I have been confused sometimes when I speaking to dev ops.
Dunno if this is what you are looking for.... but I've specified hundreds of high end servers (all HP) in very large banks onprem and in colo in 4 geos.
- often you are limited to "off the shelf" approved subsets of hardware+rhel configurations that have been tested by a centralised team. These teams rarely have a clue about what they are doing, are highly inflexible and their specs very out of date.
- with colo there has been more flexibility as these are bought for the bank by a 3rd party
- very large number of NICs needed for bonding + seperate physical network for client connectivity/internal connectivity/exchange connectivity.
- driven by business requirements from whatever trading area
- their own technology team
- then massive multilayer procurement sign off process that can take 6+ months
- the available hardware might change or increment during this process requiring a reset if the appetite exists
- usually not bought direct from manufacturer but a "pre approved" vendor who passes bank procurement onboarding procedures
- servers all have iLo for mgmt/failure etc - usually requires interaction with an official systems team via internal bank ticketing/helpdesk system
- colo you open tickets with e.g. Eqinix to receive the hardware and have it put in your cage or whatever pending weekend install by bank or 3rd party team who hook them up to switches as per agreed design with network+systems and business tech team
- often when live there are 2/3 different teams monitoring the hosts 24/7 using various tools
prob left out a load of stuff.