By Nicola Wright
Is your business in the process of building a big data team?
Here’s a rundown of the people you need in your big data team, and the responsibilities involved in these common big data roles.
It’s no secret that companies are currently obsessing about how to derive value from the data they collect from customers and their operations.
Data is the most lucrative resource an organization has at its fingertips, and it’s a resource that multiplies substantially every second you’re doing business. Between the dawn of civilization and 2003, we created around five exabytes of information. Today, we produce that every 48 hours.
In fact, today it’s estimated that we produce 2.5 quintillion bytes of data per day – that’s a lot of data.
Collating and analyzing this ocean of information is a crucial part of making informed, proactive decisions and delivering a sustainable performance.
But, despite the growing stable of big data and business intelligence products hitting the market to help companies get to grips with their data, getting the most out of it still requires a human touch.
The first step in assembling your own big data team is understanding the roles and responsibilities associated with each of today’s big data positions, and determining the skills you need in your business.
Depending on the size of your business, and what you want to achieve with your big data strategy, you may not need a full fleet of data mavens in-house.
Big data talent is a serious concern. As more businesses wise up to the value data professionals can bring, competition for talent with these skills is hotter than ever. Demand for fast-growing roles, such as data scientist, data developer, and data engineer is growing at as fast a rate as any. In fact, according to the World Economic Forum, data analysts, data scientists, and big data specialists all rank within the top 5 jobs to be most in-demand by 2025.
Let’s take a look at each role and find out how they could fit into your organization.
Businesses that want to start utilizing big data techniques will find a wealth of options available on the Amazon Web Services platform. In fact, more organizations host their data lakes and analytics on AWS than with any other cloud service.
AWS offers a heap of cloud products and services to help its customers develop, secure, and run big data apps.
With no infrastructure to maintain, users can get right to work analyzing their data, scaling their resources up and down easily as data sets wax and wane.
New features are added to the vendor’s stable of data management and analytical tools all the time, giving users access to the latest big data and machine learning techniques on a secure and stable platform.
There’s Amazon S3 for secure, scalable object storage, Amazon Glacier for long-term backup and archiving, and AWS Glue for data cataloging, to name but a few.
When it comes to analyzing data stored on the AWS Cloud, users have a huge array of choice depending on their needs.
Amazon EMR is designed for big data processing, while for warehousing and querying all types of data, there’s Amazon Redshift and Redshift Spectrum.
Then there’s Amazon Athena and Amazon Elasticsearch Service; analytical tools that give users the power to monitor and scrutinize data in real-time, among many other things.
A list of the AWS’ data architecture can be found below:
Take a look at our database of pre-screened AWS professionals and take the first step toward landing the best administrators, developers, and consultants in the market.
Big data isn’t called big data for nothing.
The amount of data amassed by businesses, their partners, and their customers simply by existing is enormous. Not all of the data you collect will be useful; in a lot of cases, it won’t even be complete, accurate, or relevant.
Nothing poisons the well like shabby data. Work with bad data and you’ll get poor results, so you’ll need a data hygienist to sort, sift, and scrub up your data so you’re only spending analytical resource on data that might yield useful insights.
Even the data that is relevant could throw a spanner in the works, especially if you’re rolling together data from different sources. You might have different data sets that record dates in different formats, for example.
In the world of data, there are many different “languages”. Not every source will record and store data in the same way, so it’s vital to get all of your data ducks in a row and make sure all data is comparable before you start looking for trends.
The process of maintaining high data hygiene standards starts at the capture stage, and involves all team members who touch the data at any point during its lifecycle, but a dedicated data hygienist may be brought in on a contract base or during a data migration to get things up to scratch.
In organizations that don’t have a full-time or permanent data hygienist, it’ll often fall to the likes of Data Administrators, Data Managers, and Database Officers to maintain a healthy data lake.
To handle data efficiently, you need to house and organize it in a way that makes it accessible.
Without a well-architected data management framework, your data will be unusable; think of it like giving your data scientists, engineers, and analysts access to a tidy, sensibly arranged library instead of them having to rake through a mountainous pile of books.
Data Architects use data-orientated programming languages to create relational databases and other data storage repositories.
They’ll visualize and design the best management model for a company’s data, ensuring that data is organized in a rational way so it can be queried logically and quickly.
Data Architects will have years of experience in areas like data modeling, data warehousing, database management, and ETL processes.
Desirable skills for the role often include MySQL, Microsoft SQL Server, and No SQL databases, as well as Excel, SPSS, and programming languages such as Python, Java, C/C++ and Perl. Knowledge of data mining and modeling tools like ERWin, Enterprise Architect, and Visio is also a plus.
Once the Data Architect has presented their vision for the cloud palace in which your data will be stored, the Data Engineer steps in to build it.
These specialist professionals use programming languages to construct and maintain the proposed framework and enable data to be searched and retrieved efficiently.
It’s super technical work that involves not only building the data warehouse, but constantly revisiting and improving it to ensure maximum efficiency. A Data Engineer will also create and document processes, outlining how other data professionals in the team will harvest, authenticate, and model the information.
Before big data truly took off, Data Architect and Data Engineer was often a single role, with data pros both designing and constructing the systems.
In the past few years, given the increasing popularity and complexity of analytical solutions—and the sheer quantity of data we’re amassing—Data Engineer has emerged as a standalone position.
Your Data Engineer should have a solid background in data warehousing, and have experience with big data technologies and languages like Python, R, SQL, and Scala, SQL and NoSQL databases, and the AWS Cloud.
A good understanding of big data platforms like Hadoop, Spark, Kafta, and visualization tools like Tableau will also come in handy.
Once your data is properly stored and organized, it’s ready to be analyzed. That’s where your (surprise!) Data Analyst comes in.
Sometimes called a Business Analyst or Business Intelligence (BI) Analyst, these data wizards will delve into your data lake to uncover unique patterns and relationships that’ll help you make more informed decisions.
The role involves a combination of technical skills, programming knowledge, and statistical experience that help analysts ensure their conclusions are valid.
They’ll be able to surface useful insights from massive quantities of data, identify practical actions relevant to operational needs, and present their conclusions to a wide range of people across their business in a way that’s easy to understand and digest.
Remember, data and knowledge are two different things. As Carly Fiorina, former CEO of Hewlett-Packard, said: “The goal is to turn data into information, and information into insight.”
To get value from your data, you need someone who can process it and present it in a way that makes sense.
That’s why visualization is such a big part of this position; being able to share results with others in a way that’s clear and tangible is a real skill. A successful Analyst knows how to bring data to life, and showcase and communicate it in an impactful way.
Having a little creative flair (for when a pie chart just isn’t going to cut it), and knowledge of tools like Microsoft Excel, PowerPoint, Tableau, and Amazon QuickSight, is a bonus for Analysts.
They should also have all the standard data professional skills, such as familiarity with SQL, R, and Python, as well as a real knack for reporting.
Much like the Data Analyst, a Data Scientist will decode, interpret, and present insight from complex data to deliver real business value. What makes the Data Scientist different, however, is that they’re able to use machine learning and advanced programming to automate this analysis.
Data Analysts spot trends and patterns in data, but a Data Scientist can build predictive models, and create machine learning algorithms that continuously learn from data to produce accurate forecasts.
For example, your Data Scientist will be able to create algorithms that can spot trends, and train these algorithms to predict customer behavior, helping a business get ahead of the curve.
A Data Scientist should have a great head for statistics and critical thinking, a strong grasp of languages like Python, R, SAS, SQL, and Scala, and be able to wrangle and visualize both structured and unstructured data.
Given the speed that big data is advancing, a sustainable data team needs to cultivate a culture of open-mindedness and continuous learning. To truly innovate, a data professional must be able to look beyond what was there before, and be prepared to acclimatize for the future.
Stereotypes would have us believe that it’s normal for scientists and tech guys to be recluses with a lack of communication skills, but that won’t fly for those who are part of a big data team.
Big data pros, especially analysts and scientists, need to communicate effectively with people who won’t always “speak the same language,” be excellent storytellers, and be able to use visual communications to maximize impact.
One company’s “data analyst” could be another’s “data scientist” or “data visualizer”. In the big data world, there is no standard definition for job titles, so when you’re looking for talent, don’t limit your search by job title.
Also, don’t rule out a candidate because they don’t have the right bits of paper. Just because a candidate doesn’t hold a relevant degree doesn’t necessarily make them less capable. It’s important to delve deeper into their experience, look at what projects they’ve worked on, and what kind of potential a candidate might offer given a little direction.
AWS insights now