You're Not Hiring a Unicorn Data Professional: You're Writing Comedy Gold
A Reality Check on Data Engineering Job Specs.
If you’ve ever browsed through LinkedIn job listing for open data engineering positions and felt like you were reading list of responsibilities encompassing an entire data team. Let me tell you, you’re not alone.
“Somewhere between “Junior” and “Machine Learning Expert with 5 years of cloud experience”
The industry seems to have gone wild and its time we call it out.
This post is for:
Aspiring and current data engineers confused about what they should be doing at different stages of their career.
Hiring managers and recruiters who need to understand what a good data engineer looks like and what an unreasonable ask sounds like.
Teams trying to plug 5 headcount gaps with 1 person and how that is nothing more than a pipe dream.
The Role of a Data Engineer: the REAL definition
A data engineer builds and maintains the systems that move, transform and store data. The goal? Ensuring that data is reliable, accessible, and usable for analysis, business operations and enabling data science to operate machine learning functions.
Key responsibilities
→ Designing and maintaining data pipelines.
→ Building data models for reporting and analytics.
→ Managing data storage systems (databases, data lake, warehouses, lakehouses).
→ Ensuring data quality and governance.
→ Collaborating with data consumers (analysts and scientists)
That’s already a lot to deal with. But the kicker is, these responsibilities scale with seniority. A junior shouldn’t be expected to architect a distributed real-time data platform solo. But you look at the influx of LinkedIn postings and see they think that its the status quo.
Here’s a Sensible Career Ladder for Data Engineers.
This should, in theory,, help everyone understand exactly what they are looking for when it comes to new hires and when to cull a ridiculous job advert:
Junior Data Engineer (0-2 years)
Focused on: Learning, supporting, basic development skills
Core Skills:
Writing simple SQL queries
Building/debugging existing pipelines
Understanding data structures and basic modelling
Learning version control, CI/CD basic
Should NOT be:
Owning Infrastructure
Running stakeholder meetings or explaining business context
Designing systems from the ground up
Mid-Level Data Engineer (2-4 years)
Focused on: Building, collaborating, optimising.
Core Skills:
Data modelling
Designing moderate complexity pipelines
optimising queries and scripts
Working with cloud data tools like Azure Data Factory, AWS glue etc.
Can start looking at:
Mentoring juniors
Contributing to architectural discussions
Still should not be:
Owning ML pipelines
Gathering requirements from stakeholders or product teams
Senior Data Engineer (5-7 years)
Focused on: Leading, scaling, automating
Core Skills:
Owning major systems end-to-end
Leading infrastructure design
Enforcing best practices
Enabling the data platform for analysts, ML Engineers, etc.
Principal Data Engineers / Data Architects (7+ years)
Focused on: Strategy, governance, architecture
Core Skills:
Technical leadership across the data team
Driving long-term data platform strategy
Advocating for best practices org-wide
Partnering with management on data vision and platform capabilities.
The Problem with Job Specs Today
Here are real bullet points from a job advert hiring for a “junior to mid-level” data engineering position:
“3+ years in distrobtued systems”
“3+ years in data modelling”
“3+ years in CI/CD and version control”
“Nice to have: ML, BI and stakeholder management”
NICE TO HAVE? You’ve got to be kidding right. You’re actively seeking a unicorn data professional across 3 different subteams.
Time to face facts, a junior is not an ML Engineer, a mid-level is not a DevOps Expert, and a Senior is not your part-time BI team.
I don’t care what these maniacs are saying, no single engineer, no matter the seniority should do the work of more than ONE member of a data team.
Unless you want to pay them the salaries of five people.
My Advice for Data Engineers
Don’t get roped into being an organisations one-man data army because they don’t want to pay out for a full fledged data-driven initiative.
If you’re early in your career and reading crazy job specs just remember that’s not what you should be doing, and more than likely its not even a tech person that’s written the spec its either GPT or some HR operative that doesn’t have a clue on what is actually required.
Three things you should definitely do:
Push back in interviews - don’t let these hiring managers push crazy responsibilities on you. The wage will never equal the load, but the stress will outweigh your package.
Ask them about team size, division of responsibilities and expectations. That way you’re able to get a real understanding about how they operate and if its a red-flag listing.
Don’t let the “we’re a startup” guilt you into doing the work of an entire team for 20% of the cost.
My Advice to the Hiring Managers and Recruiters
Two words: PUSH BACK
If you see a job spec that’s an obvious “net throw into the sea and hoping you’ve caught a big fish in with the junk from the sea floor”:
→ Ask who wrote it. Did the tech lead write it? Did anyone from the data team even have a say on whats written down for this position? Most of the time its GPT or someone slapping it together from what they “know” or have seen on other postings.
→ Clarify the actual need. Are you trying to hire a data engineer or an entire data team?
→ Educate the stakeholders and explain how unrealistic expectations lead to poor hires, hefty burnout and major turnover. I’ve seen it myself.
Additionally, educate yourselves. Yes, recruitment is a nice £££ business but face facts, you need to know what you’re hiring for and what responsibilities a person should have. You shouldn’t be pushing people into a stressful role because you’ll get a nice payday from it.
Wrapping this up I’d like to point out that data engineering is a mega discipline. Treat it like one. Build cross-functional data teams, set some realistic expectations and pay for talent based on role and scope. Most of all:
Respect the ladder.
Otherwise, the only pipelines that will get built are the ones leading your best engineers out of the door.
I’ve left all SEO business out of this because I want people to understand this isn’t to drive traffic to my publication its to make a point that we are going down the wrong path and it needs to be corrected.
If you’re unsure whether your job spec makes sense, or you want a second pair of eyes to scope your data estate and recommend what kind of data professional you actually need, I’m happy to help. Free of charge. No sales pitches just honest guidance from someone who’s seen it go wrong but also seen exactly how having a robust team in place leads to data-driven success.