Understanding the Big Data DevOps Shortage

Big data skills shortages continue to grab headlines, as they have for years. A look beneath those headlines reveals important context. Things have changed significantly in the age of the cloud. Commercial platforms like Cloudera and Hortonworks have evolved and there are a variety of -aaSes in the cloud: PaaS (platform as a service), HaaS (Spark/Hadoop as a Service) or BDaaS (Big Data as a Service). But the move to the cloud has not simplified the skills issue. Now, a new set of skills is required for big data success in the cloud – and some of them are harder to find than others.

Deploying a big data platform in the cloud today is much more complicated than some Hadoop training and an AWS account. Platforms must meet enterprise production requirements, pass security and compliance certifications and be configured for different types of workloads. What’s not always apparent is the ongoing administration after the platform is launched. Things change a lot in the Hadoop ecosystem and the cloud – so teams will need specific skill sets and coordination across activities. You can’t just have a great big data team – you need a killer Data DevOps team.

At a minimum, here are the roles required for big data and cloud. Note that this doesn’t include data scientists or analysts who will use the platform, just the team to build and run it.

  1. Cloud DevOps to administer cloud accounts and resources, and manage the cloud infrastructure.
  2. Hadoop Platform Administrator to provision and tune Hadoop/Spark nodes, with attached data stores and centralized object store required to deliver workload performance.
  3. Cloud Security Architect to administer security controls such as encryption, key-management, identities and role-based access control, as well as establish and ensure compliance controls.
  4. Data Management Lead to manage and administer data ingestion, data governance and logging as well as manage user access from a variety of data engineering, machine learning and SQL tools.
  5. Data Production Ops to cover first and second-line alerting, support, root-cause analysis and upgrade/patching/validation issues. This is also a catch-all capability required for technical issues like sprint tracking, billing, SLA monitoring, and management.

Depending on the size of your company, these roles will need be filled by full-time employees or possibly a mix of full-time and part-time employees and/or contractors. If that sounds expensive, well, that’s because it can be. I recently researched these roles across different job sites and found a huge delta in the salary ranges and availability. While your mileage may vary, this team will run you between $629,000 to $1.2M annually. Let’s hope your ROI can support that.

Summarized results from research in May, you’ll find links to run your own current search on or localise for your region and industry.

That’s why, for every big data success story, there are sad tales of projects that can’t get off the ground because the company just can’t find the right team. Or they find a team, but they can’t get a platform working to enterprise production standards. Or they find a great team, launch a great platform – and then lose their key team members, who are now worth much more with a successful platform launch on their resume. This is a problem for results and for the bottom line, an issue recently explored in this blog on the DevOps Drag, which finds 70-80% of big data project costs are often related to operations costs.

Here’s the good news. Fully-managed cloud services for big data and analytics can offload a lot of the challenges associated with getting big data platforms into production. There are a growing variety of services labelled “fully-managed” and “managed services,” so it’s important to understand exactly what’s included in the service, and to what extent big data DevOps are built in, along with security and other enterprise features.

But don’t let today’s big data skills shortage headlines scare you. At least part of the problem has been solved with new options that help you start fast, and run lean – without the six-figure annual price tag.

Article by Hannah Smalltree. Hannah is on the leadership team at Cazena, which offers enterprise Big Data as a Service. She’s worked for several data software companies and spent over a decade as technology journalist, interviewing companies about their data and analytics programs.