In a technical field like data science, it is common for various terminologies and technical jargon to be thrown around. Based on my reading and understanding and my work as an intern in a Data Intelligence department during my first-year summer break, I have tried to make some sense of it. As always, your feedback and suggestions welcome.
The five main components of data science can be classified as under:
- Data Collection
- Data Cleaning
- Data Mining
- Data Analysis
- Data Visualisation
To further have an understanding of what the above components mean or do, I will give a brief overview of the above
DATA COLLECTION
This step pertains to the collection of raw data both structured or unstructured. This is also sometimes referred to as data extraction.
DATA CLEANING
This step involves taking the raw data and preparing it in usable formats. Looking for data duplication and any errors that may be embedded. Sometimes also called data warehousing.
DATA MINING
The data thus gathered in usable format, is further examined to see its utility in analyzing or answering a business question for example- does the data help in determining sales outcome based on number of customers visited by sales agent?
DATA ANALYSIS
As the name suggests, the data is used for various analysis including predictive analytics where regression models and machine learning techniques are used to determine trends or patterns in the data.
DATA VISUALIZATION
This step involves the communication to the management of the outcome of the analysis done previously in easily readable formats (reports, graphs, various charts etc.) It’s pretty much a part of Business Intelligence.
Leave a Reply