While the terms data science and data analytics are often used interchangeably, the two terms are quite different based on the difference in the scope of their performances. Data science is an umbrella term that comprises a large variety of fields compared to data analytics which is more focused and can be considered to be a subset of data science. Hence to understand data science thoroughly, let us first try to understand the various phases in the data analytics lifecycle.
Data analytics involves mainly six important phases that are carried out in a cycle - Data discovery, Data preparation, Planning of data models, the building of data models, communication of results, and operationalization. The six phases of the data analytics lifecycle that is followed one phase after another to complete one cycle. It is interesting to note that these six phases of data analytics can follow both forward and backward movement between each phase and are iterative.
The lifecycle of the data analytics provides a framework for the best performances of each phase from the creation of the project until its completion. This framework was built by a large team of data scientists with much care and experiments. The key stakeholders in data science projects are business analysts, data engineers, database administrators, project managers, executive project sponsors, and data scientists.
Let us now briefly discuss all the six phases of the data analytics lifecycle followed in any data science projects:
In this first phase of data analytics, the stakeholders regularly perform the following tasks — examine the business trends, make case studies of similar data analytics, and study the domain of the business industry. The entire team makes an assessment of the in-house resources, the in-house infrastructure, total time involved, and technology requirements. Once all these assessments and evaluations are completed, the stakeholders start formulating the initial hypothesis for resolving all business challenges in terms of the current market scenario.
In the second phase after the data discovery phase, data is prepared by transforming it from a legacy system into a data analytics form by using the sandbox platform. A sandbox is a scalable platform commonly used by data scientists for data preprocessing. It includes huge CPUs, high capacity storage, and high I/O capacity. The IBM Netezza 1000 is one such data sandbox platform used by the IBM Company for handling data marts. The stakeholders involved during this phase are mostly involved in the preprocessing of data for preliminary results by using a standard sandbox platform.
The third phase of the lifecycle is model planning, where the data analytics team makes proper planning of the methods to be adapted and the various workflow to be followed during the next phase of model building. At this stage, the various division of work among the team is decided to clearly define the workload among the team members. The data prepared in the previous phase is further explored to understand the various features and their relationships and also perform feature selection for applying it to the model.
The next phase of the lifecycle is model building in which the team works on developing datasets for training and testing as well as for production purposes. Also, the execution of the model, based on the planning made in the previous phase, is carried out. The kind of environment needed for the execution of the model is decided and prepared so that if a more robust environment is required, it is accordingly applied.
Phase five of the life cycle checks the results of the project to find whether it is a success or failure. The result is scrutinized by the entire team along with its stakeholders to draw inferences on the key findings and summarize the entire work done. Also, the business values are quantified and an elaborate narrative on the key findings is prepared that is discussed among the various stakeholders.
In phase six, a final report is prepared by the team along with the briefings, source codes, and related documents. The last phase also involves running the pilot project to implement the model and test it in a real-time environment. As data analytics help build models that lead to better decision-making, it, in turn, adds value to individuals, customers, business sectors, and other organizations.
While proceeding through these six phases, the various stakeholders that can be involved in the planning, implementation, and decision-making are data analysts, business intelligence analysts, database administrators, data engineers, executive project sponsors, project managers, and data scientists. All these stakeholders are rigorously involved in the proper planning and completion of the project, keeping in note the various crucial factors to be considered for the success of the project.
Hope this was helpful.