F

Principal Operations Engineer Hardware - Data Center Operations

Fluidstack

Fully Remote
πŸ“ Remote - Worldwide
πŸ‡ΏπŸ‡¦ SA Friendly: 0.7/1.0

πŸ‡ΏπŸ‡¦ Hirezar Summary for South African Applicants

This fully remote full time position at Fluidstack is open to applicants from South Africa. The estimated monthly salary is R231,250 – R385,417 ZAR. As a remote position, you can work from anywhere in South Africa β€” whether you're based in Johannesburg, Cape Town, Durban, or a smaller town.

Job Description

About Fluidstack

We exist to make humanity more free. For most of human history, you farmed or you starved. Technology gave people more time for the things they wanted to do, instead of things they had to do. Powerful AI will be the biggest lever for human choice we've ever built - but only if models are aligned with what humanity actually wants. There are groups building AI who don't share these goals. Whoever deploys frontier compute infrastructure fastest will decide whether AI expands human freedom or shrinks it.

We're singularly focused on delivering 10 to 100s of GWs of compute faster than anyone else, rethinking every layer of the stack. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale are our key differentiators. Come be a part of building civilization-scale infrastructure for AI.

We hire people who care deeply about this problem space. If that is you, please apply!

About the Role

We are seeking a Principal Operations Engineer, Hardware to serve as the most senior technical authority for the operational hardware fleet across our hyperscale AI data center portfolio. AI infrastructure lives and dies on the reliability of the compute itself - this role exists to ensure that the GPU systems, servers, and supporting hardware we deploy at scale are operated, maintained, and continuously improved at the standard the workload demands.

You will operate as the technical arm of senior operations leadership in the field - leading site assessments and operational audits, driving the technical readiness of teams ahead of site activation, reviewing hardware platforms and integration designs from an operational lens, and feeding operational learnings back into the hardware engineering, deployment, and supply chain organizations as we shift toward a productized, repeatable build model. You will be a force multiplier across our site hardware leads, deployment teams, and reliability engineers, and the connective tissue between hardware operations, hardware engineering, network, facilities, and customer-facing teams.

The ideal candidate has spent a career operating hardware at scale - in hyperscale data centers, large HPC environments, or comparable 24/7 infrastructure - and is equally comfortable diagnosing a stubborn boot failure on the floor, leading a fleet-wide root cause investigation, and pushing back on a vendor on a flawed RMA process. Formal engineering credentials are valued but not required - practical depth, judgment under pressure, the ability to teach, and the discipline to keep critical infrastructure running through change are what define this role.

Responsibilities
* 10+ years of hands-on experience operating mission-critical hardware infrastructure, with at least 5 years as the senior technical voice on a site, campus, or fleet.
10+ years of hands-on experience operating mission-critical hardware infrastructure, with at least 5 years as the senior technical voice on a site, campus, or fleet.
* Data center operations experience strongly preferred; hyperscale, large HPC, cloud, or other mission-critical compute infrastructure experience considered.
Data center operations experience strongly preferred; hyperscale, large HPC, cloud, or other mission-critical compute infrastructure experience considered.
* Deep working command of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics - earned in the field, not from a textbook.
Deep working command of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics - earned in the field, not from a textbook.
* Demonstrated ability to author, approve, and execute high-risk MOPs and change records in live production environments.
Demonstrated ability to author, approve, and execute high-risk MOPs and change records in live production environments.
* A track record of leading root cause analysis on significant hardware events and driving corrective actions to closure.
A track record of leading root cause analysis on significant hardware events and driving corrective actions to closure.
* A track record of holding OEMs, ODMs, service vendors, and deployment partners accountable - you know how to enforce a standard without burning the relationship.
A track record of holding OEMs, ODMs, service vendors, and deployment partners accountable - you know how to enforce a standard without burning the relationship.
* Strong written communication: operational health assessments, RCAs, procedure reviews, and design review feedback are second nature.
Strong written communication: operational health assessments, RCAs, procedure reviews, and design review feedback are second nature.
* Comfort operating as the senior technical voice across operations, hardware engineering, network, facilities, supply chain, and customer-facing teams.
Comfort operating as the senior technical voice across

Tips for South African Applicants

⏰

Timezone Advantage

South Africa (SAST, UTC+2) overlaps well with European business hours and has a few hours of overlap with US East Coast. Mention your timezone flexibility in your application.

πŸ’°

Salary in Context

At R231,250/month, this role pays well above the average South African remote salary. The USD equivalent ($12,500/mo) benefits from the favourable exchange rate.

πŸ“‹

Application Tips

Tailor your CV to international standards β€” use a clean format, highlight remote work experience, and include your English proficiency. Many SA applicants succeed by emphasising their strong work ethic and cultural adaptability.

πŸ”Œ

Load Shedding Preparedness

If you're applying for a remote role, having a backup power solution (UPS, inverter, or generator) and mobile data as a backup internet connection shows employers you're prepared for South Africa's infrastructure challenges.

About Fluidstack

Fluidstack is a company in the design industry that hires remote workers from South Africa.