What skills matter most to data engineers in 2025?
The old-guard favored technical prowess, the new-school need to know the business, so what skills should you focus on? I share some success stories that accelerated my journey to Senior Data Engineer.
Read time: 7 minutes
If you look at job postings for data engineering roles, you’ll see a plethora of different tools, technologies and frameworks listed. No-one can know all of these technologies and instead I think its worth focusing on three foundational sets of skills; databases (both SQL and NoSQL), cloud systems and distributed compute, and communication skills (soft skills). The first two will get you established as a junior data engineer and the third will set you up to ascend the career ladder.
In this article we will discuss the evolution of the data engineer role and I will share some experiences which have helped accelerate my career progression.
The evolution of data engineering
There’s been significant change in the technologies used by data engineers over the last two decades. Similarly, the profile and expectations of data engineers have drifted over the last five or so years. Let’s discuss what’s changed and how it affects us now!
The old-guard
When I think of the old-guard I think of three things; technical mastery, back-end focus, and working in a silo.
The old-guard had technical mastery, they understood technologies like Hadoop and Spark very deeply and were proficient in multiple programming languages like Java or Scala. They would have a deep understanding of data modelling and data architecture but implemented this to build an engineer’s platform which was used exclusively by engineers to serve data to downstream consumers.
These old-school data engineers worked almost exclusively in the back-end where they maintained and optimized data pipelines, storage solutions and distributed systems. They were often very siloed, working independently and had more interactions with other IT members than they did analysts and business stakeholders.
I experienced this when I started working in data almost a decade ago. My responsibilities were entirely technical, I wasn’t required to have much domain knowledge and so I didn’t. My focus was on architecture, maintaining Linux servers, managing software dependencies and writing code to ingest and transform data.
The new-school
Modern-day data engineers are expected to still have technical work as their core contribution but with the rise of low-code (and no-code) solutions the level seems to be lower than it was a decade ago. To progress in your career as a data engineer, you need to complement your technical skills with business acumen, focus more on end-to-end/full-stack development, and have strong communication skills to enable collaboration with analysts and business stakeholders.
As a data engineer you need to understand enough business context of data that you work with so that you can deliver data products which directly drive decisions and outcomes. This also lets us an engineers be more involved with the end-to-end process, everything ranging from data quality and data governance, through to data strategy and even creating data products that can be easily consumed by business teams.
I have worked with a lot of established engineers and I often seem them struggle with communication when dealing with less technical colleagues. Translating technical jargon into business terms, helping align technical teams and business units, and being involved with requirement capture all require effective communication. This isn’t a skill that comes easily to most technically minded people… we have to be intentional about developing these!
Fortunately during my time working in academia (where I built simulations of planetary cores and ran machine learning on the data) I built a passion for science communication. I put a lot of effort into developing my technical communication through presenting at conferences and running outreach events. These soft skills are completely transferable and are what helped me get promoted twice within a year.
What skills should you focus on in 2025?
The technical fundamentals are important as ever - you need to be proficient in both SQL and NoSQL databases, as well as Python so that you can write efficient ETL (extract-transform-load) pipelines and handle job orchestration. Becoming proficient with SQL and Python is where I advise anyone in data to start.
With so many companies now in the cloud, it’s vital to be familiar with the core concepts to get hands-on experience with one of the big three (AWS, Azure, GCP). The one that makes most sense will depend on your geographical location and industry; AWS is still the most popular but I really enjoy working with Microsoft Azure. The cloud gives us easy access to scalable compute and storage and so distributed processing (Spark, typically PySpark) and platforms likes Databricks, Snowflake and Bigquery are prominent. I encourage you to learn the fundamentals and focus on building experience with one tool of each type - if you do this well, the skills are transferable and you can learn another tool fairly quickly!
Soft skills round out what I consider to be the fundamental pillars of data engineering. Data engineers need to integrate technical work with business processes and to do so requires working with cross-functional teams, including software engineers, data scientists, business analysts, etc. Being able to communicate with both equally technical and less-technical colleagues is key to having impact. To go a step above, engineers need to be able to lead and take ownership of projects and of their delivery.
With these foundational skills in hand, you are able to build a data pipeline and curate datasets. Now you can focus on developing all of the enabling skills to productionize your pipelines; learn how to write code tests, how to model data, how to perform data quality checks, how to use APIs, etc. This in many instances will help you progress from a junior role. Now is where the whacky stuff begins and you can either specialize in a specific tech stack or enrich your profile by developing a secondary specialty; this could be data visualization by learning a dashboarding tool (e.g. PowerBI or Superset), machine learning by applying supervised learning, etc.
Some success stories
I want to end by sharing a few experiences that helped me increase my impact by building relationships with business users and integrating with business processes. I hope these can help you identify opportunities to raise your profile and get recognition for your work.
To have your work be impactful and add value, you need to be able to build meaningful relationships with business users; you should take the initiative and reach out offering support for the things that cause them the most trouble!
For me, what started as impromptu calls with our stakeholders organically grew into spending ~40% of my time embedded with their team. I got to work with business users on real problems, understand their evolving needs and align the data engineering team’s efforts with these. I achieved this by asking what their pain points were and identify things they struggled with, by building capability and automation, I helped their team with their analysis.
As data engineers, the product we deliver is data. A trap that many fall into is thinking that their value increases if they produce more data. This is not true! Data is only valuable if it is being used, and in most cases is only used if aligned to a business problem or process.
There is more data than ever before and so it’s becoming increasingly important than ever to build data products (which can include curated datasets) that closely align with business KPIs and critical workflows. Instead of just delivering a dataset or table, deliver the data with the aggregated values or visualizations aligned to the KPI of interest - this builds confidence in the newly produced data.
Remember you’re job is to deliver value, not to be wed to specific tools or ideologies. While some engineers gasp at the thought of notebooks, these can be useful. When creating a data validation toolkit, Jupyter notebooks were a good format as they have a low barrier to entry and this reduced an analyst’s time to evaluate a new dataset from days to hours.
As a data team grows in size there comes a tipping point where a lack of process hinders the team’s ability to deliver value proportionate to its growing size. As a data engineer and working alongside your management team, you can (and probably should) meaningfully contribute to creating and updating processes.
I have taken ownership of various processes which helped to optimize our data team’s delivery:
Data Acceptance Criteria: Before establishing proper expectations for data that we commissioned, our team spent more time fixing data than we did ingesting and exploiting data. Using my knowledge of data quality, data formats and data governance, we created a set of criteria which outlined the expected file formats, data tables and data documentation. This reduced the effort in evaluating new datasets from days to hours.
Coding Standards: When our data team was starting out, everyone wrote code in their own way; collaboration was difficult and code reviews were almost impossible. I put together the initial versions of a Standard Operating Procedures document which outlined a code standard. This greatly improved code quality, collaboration and speed of delivery. Even after leaving this team, it has been kept as a living document and helped them continue to deliver efficiently.
Software Approval Processes: Once we established our own data platform, we were manually assessing software to approve it for use. I recognized that software approval is really an ETL pipeline in disguise and built an automated pipeline using open-source software to reduce the approval time from a day down to minutes. I will write up a technical deep dive of this in a future article.
Any processes which were nice to have but not on the critical path were deprioritized from the task board.
This article was really just me sharing my thoughts on where I would focus my efforts if I was a junior or mid-level data engineer. Do leave a comment below if you found this useful or if you disagree with me! Would love to hear about how you see the skills profile of the modern data engineer.
To receive new posts and support our work, become a free or paid subscriber today.
If you enjoyed this newsletter, comment below and share this post with your thoughts.