In the late 1980s, data warehouse technology, which is generally categorized as online analytical processing (OLAP) was introduced by the relational database management system (RDBMS) companies to support the business decision and business intelligence. It was originally designed to archive large amounts of data out of production databases and to keep them lean and mean for good performance. In data
warehousing, multiple copies of data are located on multiple database servers referred to as data mart. The data mart can be independent or an enterprise data mart. From there, data is then extracted and loaded into two analytical data marts. Here, the data analysts create their algorithms to run their jobs. One of the data marts links to a statistical analyst and the other to a business user. While data warehouse has not failed in creating business value through detailed reporting based on
complex statistical modeling [1,2], it is challenging to continuously move data over the network and takes a long time to yield results [3]. Furthermore, there are limitations in the data volume that can be stored on the system. In
addition, current data creation is continuously generated, thereby making it difficult to process big data. Big data has garnered lots of attention recently in government, industries, sciences, engineering, healthcare and medicine, finance and prominently in businesses [4]. Accordingly, data generated in these areas are characterized by high volume, inability to be categorized into the relational database
management system and the data are generated, captured and processed rapidly [1]. Therefore, the major challenges facing various organizations, industries, and other business sectors are how to design appropriate techniques to handle and process this large volume of data to ensure effective and efficient decision-making. Recently, big data and business analytics approaches have been developed and
implemented to analyze a large volume of data generated by different business organizations. Consequently, every business needs faster insight into growing volumes of transactional data. Analyzing data in real time helps organizations view the past and foresee the future. This is the beauty of streaming analytics and is endowed by knowing what occurred (descriptive), understanding why it happened (diagnostic), looking ahead to what might take place (predictive) and, ultimately, determining how
to influence future occurrences (prescriptive). These four analytics flavors which are explained in Section 3 of this article have huge business benefits but are progressively more difficult to implement and use. The big data opportunity is not only for achieving high efficiency in business operations. There are also important opportunities for economic growth and improving the standard of living to the
society. There are various ways in which big data analytics can improve business organizational outputs and industries. These include improved health care delivery, the standard of education, national security, and enable good governance [5,6]. In addition, it has potential to assist policy-makers to gain insight in
enabling policies that will grant safe playground for investors, help waste managers find the type of waste that is more generated from a particular locality and provide insight for sharing of waste collection material. Moreover, education monitoring agency can deploy big data and business analytics approaches to evaluate the performance of teachers and improve work attitude. Furthermore, mobile network location data can be used for traffic management to prevent traffic jams in big cities or
better plan the public transport system. The goal of this study is to implement a comprehensive investigation into big data and business analytics methods for improved business decision making, technological approaches, applications, and open research challenges. Furthermore, the study attempts to draw attention to the tremendous benefits big data has brought to companies in developed countries and how these can be replicated by indigenous business organizations. Moreover, the study
discusses various challenges facing big data analytics with a focus on data security, management, characteristics, regulation, and compliances. The phenomenon of big data analytics researches and implementation have been conducted by various researchers and industries for over a decade. This is due to the vital applications of big data in various areas such as the healthcare system, business decision-making, educational development, network optimization, travel estimation, and financial
services. Therefore, quite a number of studies and reviews have been published in big data analytics, implementations and related technologies in recent time. Sing et al. [7] reviewed hardware and software parameters for effective big data analytics developments. Additionally, Hashem et al. [4] presented taxonomy and
intersection of cloud computing and big data analytics. However, these studies focused on big data in cloud computing, software and hardware parameters such as data availability, scalability, and data size for implementation of big data analytics. The studies failed to discuss important big data analytics tools, their strengths, and weaknesses. Recently, reviews on big data analytics, open sources tools for big data implementation and iterative clustering algorithms for big data analysis were
presented by [8,9,10]. Tsai et al. [8] outlined big data analytics approaches in terms of data mining and knowledge discovery. The authors
primarily discussed data mining algorithm that can be extended for big data analytics. Nonetheless, challenges, applications, current tools and data sources for big data analytics were not comprehensively discussed. Lanset et al. [9] presented open sources tools for big data analytics, their advantages and drawbacks. However, the review is narrowed only to tools while other criteria for effective big
data implementation were not sufficiently covered. A closely related survey was presented recently by Mohammedi et al. [11] and discussed big data technologies, applications and opens source tools for big data analytics. Conversely, our study differs with their review in many ways. First, the present review provides a broader view by focusing on the recent trends in big data and business analytics
development. Second, we discussed platforms, opens source tools, their strengths and weaknesses. Third, this study presents big data success factors for analytic teams, their major functions, and challenges for the implementation of analytics in organizations. Fourth, the current study presents recent data sources and applications for big data and business analytics. Finally, the current review outlines and discusses open research directions in big data and analytics. The review is a timely
exploration of big data and business analytics. The major differences between recent reviews and the current study are presented in Table 1 below: The remainder of this paper is organized as follows: Section 2 discusses the recent developments in big data technologies.
Section 3 presents big data analytics platforms while Section 4 explores the success factors and challenges of big data implementation. Section 5 outlines the main applications and data sources for big data and business
analytics. Section 6 summarizes the study and explores open research directions. Figure 1 outlines the structure of the paper. Big data emerged for business with the development of social
media and weblogs. This has placed basic analytics and business intelligence (BI) activity on new data sources and offers deep, real-time analytics and business intelligence with operational integration. The volume of data generated in the digital world grows exponentially and has become difficult to manage using data warehouse technology. The massive amount of raw data generated using various data sources that require big data technology for analysis have been reported by a number of studies
recently [12,13]. For instance, Wal-Mart processes more than a million customer transactions hourly and stores 2.5 petabytes of customer data
[14,15]. Similarly, the Library of Congress collects 235 terabytes of new data per year and stores 60 petabytes of data. Over 5.5 billion mobile phones were used in 2014; each phone creates one terabyte of call record data yearly. In the mid-2000s, International Data Corporation (IDC), a premier global market
intelligent film report reveals that digital universe which was 4.4 ZB in 2003, will grow to 44 ZB by 2020 [16]. In addition, a recent study by McKinsey reveals that the pieces of content uploaded to Facebook are in the 30 billion while the value of big data for the healthcare industry is about 300 billion [12]. These
growths are necessitated by technological changes, and both internal and external activities in electronic commerce (e-commerce), business operations, manufacturing, and healthcare systems. Moreover, recent development in in-memory databases has provided an increase in database performance and makes data collection through the Internet of things (IoT) and cloud computing facilities that provide persistent large-scale data storage and transformation achievable. The surge in data volume is driven
by a number of technologies, which include: Distributed computing: Big data in large-scale distributed computing systems, which is based on open-source technology, are providing direct access and long-term storage for petabytes of data while powering extreme performance. Flash memory in solid-state drives allows computers to become universal. It delivers random-access speeds of less than 0.1 milliseconds unlike disk access of 3 to 12
milliseconds. There is a high possibility that future big data solutions will use a lot of flash memory to improve access time to data [17]. Mobile devices: Which represent computers everywhere, create much of the big data, and equally receives outputs from big data solutions. Cloud computing: This created an entirely new economy of computing by moving
storage, databases, services, into the cloud and offers great access for rapidly deploying big data solutions. Data analytics: This is a multistage approach that includes data collection, preparation, and processing, analyzing and visualizing large scale data to produce actionable insight for business intelligence. In-memoryapplications: These are significantly increasing database performance
[12]. A huge percentage of these data for big data analytics is unstructured data derived from various data sources and applications such as text files, weblogs, and social media posts, emails, photo images, audio, and movie. Big data are meant to handle and manage unstructured data using key-value pairs. The concept of big data is defined by Will Dailey and Gartner [17,18]. Dailey [17] defined big data as, “a supercomputing environment engineered to parallel process compute jobs across massive amounts of distributed data for the purpose of analysis.” He viewed big data as Global Data Fabric in action and the Centerpiece for the entire biosphere of modern computing. The Global Data Fabric idea shows how big data creates strong connections among institutions and enables them to work as a team. On the other hand, Gartner [18] defined big data as data with high-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making. There are various areas that big data analytics have been actively implemented for developing effective business decision making. For example, a solution can be developed to tie customer/merchants bank verification number (BVN) and subscriber identification module (SIM) registration details to a unique digital identity. The solution will utilize the unique digital identification number (id) and stream mobile payment transaction data through a mobile device into a big data repository. The collected data are continuously monitored and standard machine learning techniques can be applied to discover if there is an occurrence of fraudulent or false payment alert from a customer to a merchant. Such happening would trigger a warning alert that could be shared with their mobile operators, and the merchant’s bank, possibly even before the merchant releases his product. At the mobile operator end, the Sim registration record and Global Positioning System (GPS) technology can be used to create the customer’s crime chart and alert the police for the offender’s arrest. At the back end, the intelligent agent model running in the bank application would trigger a warning alert to the merchant to ignore such a transaction request. While at the big data repository, all of this data can then be mapped to other data, such as network failure log, failed payment transaction, technology awareness data and wrong debit record. These can undergo further analysis to understand users experience and ascertain the root cause of low acceptance of mobile money by merchant across the country. The information could then be used to develop an intelligent business model and enable policy that will build merchants and customers trust for mobile money payment. This, in general, will rapidly help actualize the government initiative of a cashless society [18]. Big data are characterized by various vectors as outlined by Gartner and shown in Figure 2 below. These vectors include volume, variety, velocity, veracity, and value. The big data volume focuses on the size of data set generated through various applications and sources and are growing at the rate of megabytes to petabytes. Variety aims at the heterogeneous nature of data that constitute big data. These include textual data, social media data, traffic information, health-related data, and other multimodal data. Velocity refers to the speed and dynamic nature of the data collection process and how to generate these data in real-time. Furthermore, veracity depicts the reliability of data sources and if the sources of data generation can be trusted. Finally, the value of big data shows the insight and hidden values that can be discovered from a large amount of dataset [4]. These vectors made it challenging for traditional data warehouse technology to handle huge data volumes of hundreds of terabytes [5,13]. Furthermore, big data is not quantifiable, not the same for all companies, and does not depict better data. There is no quantifiable amount of data that determines whether your data met some artificial thresholds. The size of big data varies from organization to organization. Bigger data is not necessarily better data, but data usually is always better than no data [19,20,21]. Accordingly, big data analytics provide hosts of great new tools including business analytics for visualizing and manipulating data insights. This makes it easy to visualize data into charts, graphs, models, and 3D. Therefore, big data analytics is a collection of tools and techniques aimed at handling a large volume of unstructured data that is beyond the capability of the traditional database system. Big data analytics solutions help the organization see changes in their business and innovate in real time [22]. Different companies have different use cases and obviously different data. A solution that works for one company may be ineffective or completely wrong for another. While it is valuable to benchmark others, it is necessary to understand the motivations that drive their technology choices and the analytics they use to capture the true sensitivity of their businesses. Replication of solution is, therefore, necessary where it makes sense, but most importantly understands your business drivers for the application of big data. Recent analyses show that big data giants like Google, Facebook and Twitter have used big data analytics effectively. Google indexes the entire internet for rapid Google searches and was said to process 24 petabytes of data per day in 2009. It offers cloud storage (Google Drive) and big data solution with Google Big Query. Moreover, Google performs machine learning and analytics on massive data sets (think reverse image search and voice recognition). With their rapid growth, they continue to be the world’s leading search engine. On the other hand, Facebook and Twitter each store information on over a billion users. There are hundreds of millions of shares, likes, tweets, image posts, etc., a day that must be tracked. They use machine learning tools and algorithms to recommend friends and display trending topics. Their estimated revenue for 2014 was $12.5 billion, for Facebook and Twitter made $1.4 billion respectively. Other businesses that have successfully implemented a big data analytics framework are Wal-Mart and American Express. Wal-Mart uses big data and machine learning to improve product searches and recommendations. The adoption saw its purchase completion rate increased by 10-15 percent. American Express analyzes its big data to predict customer churn and identify 24% of Australian accounts that will close within four months. Macy’s adjusts product pricing in real time for millions of items [23,24]. BancaCarige implemented IBM® DB2® Analytics Accelerator on a new IBM Enterprise® EC12 that enabled rapid query response times. This helps over 1000 business users to get fast access to vital insights. The positive results derived from big data analytics by various business organizations have seen the development of various tools to aid organizational big data analysis. In this paper, these tools are discussed in Section 4, with their strengths and weaknesses outlined to aid organizations’ choice of tools for their data analysis. Analytics involves the use of statistical techniques (measures of central tendency, graphs, and so on), information system software (data mining, sorting routines), and operations research methodologies (linear programming) to explore, visualize, discover and communicate patterns or trends in data [25]. For example, weather measurements collected from metrological agencies can be analyzed and use to predict weather pattern. Furthermore, analysis of business data held the key to the development of successful new products. Analytics process in a big data world reveals how to tap into the powerful tool of data analytics to create a strategic advantage and identify new business opportunities. It has wide applications which include credit risk assessment, marketing, and fraud detection. There are many types of analytics approaches, and these can be categorized as:
These analytic approaches can be used independently or in combination by an organization to provide information for decision making. For instance, Marist school based in the USA implemented an open source analytics platform from Pentaho to identify students who may be at risk of dropping a class and intervene in time to help them complete the course successfully. The process works by aggregating basic student data, such as GPAs, SAT scores, student’s addresses, and other demographic data, and then combine this information with course-specific data, such as how often students submit assignments and engage with instructors through online forums. The information is analyzed through predictive modeling and data mining, and the outcome presented an accurate picture of who is likely to drop a particular class. In order to assist the students at risk, prescriptive modeling is applied to give insight on how the instructors may prevent the foreseen occurrences and use an approach that will engage the class as a whole [27]. Analytics, business analytics (BA), and business intelligence (BI) are often used interchangeably in business literature and they convert data into useful information. However, they differ in purpose and methodologies used for each of the descriptive, predictive, diagnostic and prescriptive analytics. Analytics can involve any one of the four types of analytics processes. For clarity, definitions of these terms are presented below:
In the past, business analytics and business intelligence were used for structured DBMS-based content to report and understand what happened in the past [28]. With the growth of big data, they can be used alongside big data analytics techniques to provide opportunities for extracting actionable insight from data by using analytical processes and tools. Their implementation is seen in structured data analytics, text analytics, web analytics, network analytics, and mobile analytics [29,30]. Moreover, the volume and velocity of big data present an opportunity to use big data and analytical tools to predict the future and make new discoveries [31]. Business demand for business analytics and business intelligence has been demonstrated by a number of studies as shown in recent studies [32,33]. Moreover, successful business intelligence and analytics applications have also been reported in a broad range of industries, from health care and airlines to major IT and telecommunication firms [34]. Most successes recorded by organizations that deploy big data analytics are largely noticed in developed countries. This is perhaps why huge successes have not been seen for businesses in a developing country. International Data Corporation (IDC) in 2011 showed that business analytics was second Information Technology (IT) priorities for large enterprises that year [35]. An online survey conducted by [29] asserted that among 930 businesses across the globe in various industries, provides insight into the current state of business analytics in today’s organization. The research findings highlighted the fact that most organizations still rely on traditional technology and depend on spreadsheets for business analytics. There is moderate growth in the use of business analytics within companies. Nonetheless, it is narrowly used within departments or business units, and not integrated across the organization. For some organizations, analytics are used as part of the decision process at varying levels. In addition, organizations are in search of analytics that will primarily help in reducing costs, improving the bottom line, and managing risks. Meanwhile, fear of data accuracy, consistency, and even access is a challenge in the adoption or use of business analytics. Many organizations lack skills to implement analytics and some businesses that attempted it lack the knowledge to apply the results. Companies that have built an “analytics culture” are reaping the benefits of their analytics investments. Therefore, bridging the knowledge gap for the organization to apply big data and business analytics in their organization is vital for effective decision making and business success. To provide this knowledge gap, this paper also discusses the various teams for big data analytics framework in Section 5. These teams include business expert, big data analyst, big data architecture and Hadoop operators and engineers. 3. Big data Analytics PlatformsBig data technology consists of a large number of open source software components, majorly Apache project [36], available for use in constructing a big data platform. This software is designed to work in a distributed and cloud computing environment. However, common problems faced by computer scientists in designing efficient and effective big data computing platforms include, how to move large volumes of transactional data across the back pane; how to move large volumes of static data across the network; how to process large volumes of data very fast; how to ensure even job scheduling and fair usage of resources; how to handle errors interrupting other jobs; and how to coordinate and optimize resources. Consequently, earlier solutions were done at the hardware level, which significantly increased the cost. Recently, Hadoop was designed as an open source framework to handle big data analytics through the batch processing approach. It was designed on the principles which include less dependency on expensive high-end hardware platforms and infrastructure, parallel processing to reduce computing time, not moving the data from disk to the central application to be processed, embrace failure, build applications with less dependent on infrastructures and utilization of flexibility of Hadoop. These design principles helped in cost reduction, platform optimization, fast processing and achieving efficiency [17]. In this section of the paper, the Hadoop ecosystem that enables implementation of big data and business analytics is explained. We outlined the structure, components, and tools that provide effective and efficient processing of big data. 3.1. The Hadoop EcosystemHadoop is an open source project lead by Apache. It was originally designed to handle massive amounts of data, rapidly, efficiently and inexpensively. It handles both structured and unstructured data. Moreover, Hadoop stores bits and bytes of data on commodity hardware. In addition, Hadoop is user-friendly and scales massively. There can be up to 10, 20, 30, 40, or more racks filled with data nodes for a single supercomputing platform. It has the intelligence to run the distributed file system and the parallel processing work. Hadoop consists of three parts, Hadoop distributed file system, (HDFS), yet another resource negotiator (YARN) and common [17] as shown in Figure 4 below. Hadoop Distributed File System (HDFS) is the storage layer that is responsible for creating a distributed repository. While Yet Another Resource Negotiator (YARN) is the data refinery layer and is a processing level for scheduling parallel compute jobs. This structure abstracts away the complexities of distributed computing. YARN provides resource management and job scheduling in the Hadoop distributed processing platform. Moreover, Hadoop YARN extends the capability of Hadoop to support varieties of applications thereby reducing the limitation of Hadoop to only run MapReduce applications. Consequently, YARN enables Apache Hadoop to provide services such as interactive querying, data streaming, and real-time analytics applications [4]. With Hadoop, writing a MapReduce job by the programmer is easy as they do not have to determine data location or size and a number of parallels compute jobs. The primary components for the Hadoop cluster include the master server, the switches, the racks, and the data servers as shown in Figure 5. These are commonly called data workers, data nodes, or just nodes. The master server has responsibilities for managing and coordinating the entire Hadoop cluster (data nodes). Furthermore, it performs health checks and takes corrective action when required, mapping the location of all the data and directing all data movement, scheduling and rescheduling all compute jobs, and error handling (including loss of a data node and rescheduling of failed compute jobs). The data server is responsible for data storage and processing, and further, provides resources such as CPU and memory. In addition, data server reports health checks status and on-going job progress during data processing. Another important component of the Hadoop ecosystem is Common. Common is made up of utilities and tools to perform various operations such as codec compression, error detection, input/output utilities and authorization of proxy users. Furthermore, common is responsible for data and user authentication, services level authorization and configuration of rack awareness [9]. Generally, Hadoop is configured on rack-based servers. On top of each rack, the network switch is configured for intra-rack communication. Furthermore, another network switch is configured for handling communication between rack switches and the client that runs the Hadoop client-related software. Hadoop uses HDFS for holding files. It is responsible for breaking large files into smaller chunks (128 MB - configurable), placing them in different slave nodes and replicating them for providing high availability. 3.2. Common Big Data Analytics ToolsTraditionally, data and business analytics are performed using an integrated suite of machine learning and data mining algorithms [4]. These tools provide mechanisms to analyze small to large scale data for business decision-making process. The machine learning algorithms and tools for data analytics can be broadly categorized into:
However, the huge volume of big data has rendered the traditional data analysis approach ineffective for processing huge amounts of generated data in the current cyber-physical and mobile connected world. Therefore, various big data tools have been proposed and implemented recently for efficient data generation, transmission, processing, storage and analysis of big data [10]. Big data analytics tools and approaches are shown in Figure 6. These tools are continuously updated and many new tools are being introduced on a regular basis. There is always some meaning attached to the names given to the software projects, but there are no criteria defined for the namespace. Therefore, there is no connection between their names. For example, flume is named after the water race used in some sawmills to bring logs to the mill and pig was named on the fact that pigs eat anything. A few of the most common tools used in a Hadoop cluster are depicted in Figure 7. Other popular tools, can be found from distro companies such as Cloudera, Hortonworks, IBM, or MapR.
Apart from the above-listed tools, other tools for big data storage, processing, and management include Apache Casandra [49], NoSQL, and Zookeeper [50], Avro [51], Chukwa [52], Python [53], R [54] and Scala [55] programming languages. These tools key functions, features, strengths and weakness are summarized in Table 2 below. 3.3. Functional Layers of Big Data ArchitectureOpen source strategy creates a large number of computing software projects that provide users with various choice options for the big data analytics process. Often users are faced with the challenge of choice, and the wrong choice of any software solution may lead to business failure. Big data solution selection should be driven by use case, product reputation, team expertise, and knowledge of the current happening. Selection of any solution requires knowledge of functional areas (Big data stack) of the big data solution and maps the selected software into the correct functional layer [1]. A Hadoop cluster is constructed by combining any number of these software tools described in Section 4.2. Big data stack or architecture can be viewed as a layered model that consists of a number of different functional areas for big data analytics. They include infrastructure, data repository, data refinery, data factory, data fusion, and business value. Each functional area has tools that allow the organization to leverage Hadoop’s power of combining multiple data sources to gain new knowledge. These five layers are explained below.
4. Success Factors and ChallengesThis section discusses different factors that are vital for achieving success in big data and business analytics implementation. These factors include having an expert business team, analytics teams, Hadoop operators, etc. Furthermore, the section outlines various challenging hampering big data analytics implementation by business organizations. 4.1. Success FactorsGood implementation of big data solution to provide data analysis for marketing and business processes requires the following design considerations [23]:
Moreover, big data and business analytics teams play a vital role in the success of a big data analytics project. The size and complexity of big data technology require highly motivated team members, who are smart and determined. In addition, a successful implementation of the big data solution requires a good team member with the right mission statement. As a result of the complexity of the supercomputing platform, all levels of the team must have an in-depth mastery of the big data ecosystem [57]. Organizations must invest the time and money in developing their own expertise in the big data analytics team and computing infrastructure. The big data time members and their roles are briefly described below and include: (a) The business team: It comprises of
(b) The analytics team
(c) The Big Data architects team
(d) The Big Data Hadoop operators team The real frontline troops in managing and operating a Hadoop cluster are;
In addition, the big data Hadoop operator team must have in-depth knowledge and experience working with the supercomputing platform. In addition, the aforementioned considerations, the organization should also consider switching to global architecture where a supercomputing platform is best operated as a single entity and components are tightly coupled together rather than the usual enterprise architecture where each layer of the stack of component and clearly defines boundaries. Furthermore, the single team should have the responsibility for both the development and the operation of the supercomputing. The management, the platform engineers, the software developers, and the operators should work in a single team, frequently in a single location. The proximity of the team helps the team build working knowledge. This working attitude is the secret of the big data giants such as Yahoo, Google, and Facebook. Moreover, mastering the big data ecosystem by the team is another key criterion. Big data is built on the principles of supercomputing. The complexity of this platform mandates sophisticated knowledge of the big data ecosystem at all levels of the team. The operators, the engineers, the architects, and the business managers must be well versed in the big data ecosystem. Everyone involved in the project must be a generalist with a solid understanding of how a supercomputing platform works, and this includes the stakeholders and executive management. Organizations must invest the time and money in developing their own expertise in their supercomputing platform [1,12]. 4.2. Challenges of Big data and Business AnalyticsThough big data can be useful, like any resource, it has potential implementation challenges if it is not properly handled. It is needful to understand that big data does not equal good data. It may have come with some element of imperfection due to the consequence of an imperfect world. The impression of garbage in, garbage out with respect to data visualization, data analytics, still holds. It is needful to comprehend the level of imperfection in the collected data in order to calibrate and understand the meaning of the outputs with respect to the cleanliness or quality of the data. Some of these challenges are outlined below: (1) A false sense of securityBig data gives a false sense of security. This is because having a huge amount of data does not necessarily mean the result must be true. Big data may not capture the true information you need to solve a particular question on the business problem. Sometimes small data can be ok or even better. There are occasions you may not have the data to support your questions, which is going to spur on either data collection efforts or data acquisition efforts to answer your questions. However, collected data should not be used out of the sample to answer the wrong business problems [58]. (2) May waste resourcesUnnecessary use of big data ties up computer resources and so money and time should not be committed to big data if it is not needed. Big data should be deployed only when it is necessary, to avoid waste of computer resources. It is wasteful for an organization to spend time computing big data when small data can actually answer her questions. This is a case in the Google flu trend BD (engineering). In this project, Google attempted to predict flu outbreaks by measuring millions of Google search terms related to the flu – flu shots, flu symptoms, etc. However, the proposed big data analytics approach for flu prediction failed due to overestimation of results. The failure was due to the wrong choice of measurement as people who search for flu may not actually have the flu. Such error could be avoided when the source of your data, the degree of cleanliness or quality of your data, are understood from a modeling or analytics perspective. Understand the assumptions in your models and verify your data to identify/eliminate bad data, outliers, etc. If you have a big data system already in place, do not be afraid to capture more data that you think may be irrelevant and verify model results overtimes. (3) Physical challenges to Big DataBig data poses challenges beyond volumes, velocity, and variety. It equally questions the fundamental beliefs about the relationship between data and knowledge. Big data is challenging to current IT architecture, networks, servers, and software. Considering various reports and Cisco estimates by 2014, on the exponential growth of business data, the yearly doubling of Internet traffic will leave experts with the significant challenge of how these data will be collected and analyzed; will every collected data be analyzed and stored? How do we determine what should be stored and for what duration? Will there be enough physical space for storage? The volume of data on internal networks will hugely exceed most networks capacity for data transmission. The conversion of the moving data to higher bandwidth networks becomes a requirement. Infrastructure for datacenters that support big data and data storage challenges for online and archival data are all problems to deal with. Even if the cost of hardware and software are made affordable, the people and time to enable these changes while keeping the current enterprise fully operational is another challenging factor. (4) Management challenges.On management challenges, the issues on security, privacy and civil liberties, regulatory challenges, and compliance come into play. The path between the good and the bad in every technology is determined by the people and how they use it. Just as there are many amazing benefits of big data, so are many possible bad and criminal uses for it. Big data can be very destructive in the wrong hands. The original architecture of big data was not built with enough emphasis on security. Some users take advantage of design oversight and introduce management related challenges. These include security, privacy and civil liberties, regulatory challenges, and compliance.
5. Applications of Big Data and Business AnalyticsThere are various areas of business and industries that have benefited from big data analytics technologies. These areas generate a huge amount of data that require big data analytics process for effective and efficient decision making. These application areas include healthcare, telecommunication, network optimization, travel estimation, retails, financial industries, energy consumption [4,56] to mention but a few. The application areas are explained below while Table 3 outlines the key data sources in these areas and features.
These applications alongside key data sources and features and summarized in Table 3 below. 6. Summary and Open Research DirectionsBenefits of big data are potentially tremendous. To a business class, technology is merely a means of keeping the company close to its customers. Enterprises that embarked on a big data project have experienced massive growth in business as revealed in this work. It has successfully helped the organization to achieve cost reductions, faster and better decisions, and even provide new offerings for the customer as discussed in this paper. Hadoop and cloud-based analytics used in big data have contributed immensely to a substantial reduction in the cost of the technology when compared to the traditional architectures (data warehouses and marts in particular). However, big data is not a replacement for data warehouses but it augments it. Rather than processing and storing vast quantities of new data in a data warehouse, for example, companies are using Hadoop clusters for that purpose and moving data to enterprise warehouses as needed for production analytical applications. Analytics has always helped to improve decision making. But big data has leveraged the speed of Hadoop and in-memory analytics, to generate faster and better decisions. For example, health insurance giant United Healthcare is using “natural language processing” tools from SAS to better understand customer satisfaction and when to intervene to improve it. The most interesting use of big data analytics is to create new products and services for customers as seen in the case of BancaCarige and other organizations mentioned in the work. Big data is characterized by volume, variety, and velocity. Understanding customers’ demand requires an excellent grasp and analysis of business data. This is the key to the development of successful new products and services. Big data has some limitations. It is however not equal to a good result. Big data encourages collection and analysis of everything. There is some level of imperfection in the collected data which when not properly cleaned could yield a bad result that will lead to wrong business decisions. Data generated via big data is sometimes used by the wrong hands to perpetrate crime. Unnecessary use of technology can also waste computer resources. Therefore, the organization should be familiar with various analytics offerings by distro companies, to clearly understand their business requirements and apply a matching solution that will fit their business environment in order to avoid waste of resources. In addition, big data solution should be used when is needful. The technology is complex and runs on a supercomputing platform. This has resulted in new roles for supercomputing experts. Organizations should bear in mind that it takes time to build complex, sophisticated, and intense technology skills. They should, therefore, invest in their team to achieve a good result. Big data has a wide range of applications namely business, government, health, education, finance, and so on. Finally, every organization should stick to her memory that determining relevant data is key to delivering value from massive amounts of data. Even though various big data and business analytics approaches that can be deployed for enhanced business acceleration and development have been discussed in this paper. There are open research directions that have constantly appeared in literature recently. These open research directions cut across an analysis of heterogeneous data, data privacy, and security, providing a unified framework for data cleaning and deep learning techniques for big data processing.
I.A.A. and H.F.N. conducted the research, drafted and proof-read the article. All authors were involved in the review of the manuscript. The authors received no external funding for this research. We are grateful to the anonymous reviewers for their insightful comments and suggestions. The authors
declare no conflict of interest. Figure 1. Structure of the review paper. Figure 1. Structure of the review paper. Figure 2. The Gartner’s Vector model. Figure 2. The Gartner’s Vector model. Figure 3. Business analytics process. Figure 3. Business analytics process. Figure 4. Functional view of Hadoop. Figure 4. Functional view of Hadoop. Figure 5. The primary component of Hadoop cluster. Figure 5. The primary component of Hadoop cluster. Figure 6. Overview of big data and business analytics in Hadoop. Figure 6. Overview of big data and business analytics in Hadoop. Figure 7. Common tools used in a Hadoop cluster. Figure 7. Common tools used in a Hadoop cluster. Table 1. Recent review of big data analytics. Table 1. Recent review of big data analytics.
Table 2. Key features of big data analytics tools, strengths, and weaknesses. Table 2. Key features of big data analytics tools, strengths, and weaknesses.
Table 3. Applications and key data sources for big data and business analytics. Table 3. Applications and key data sources for big data and business analytics.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). What is the term for the time it takes for data to be stored or retrieved?Data latency is the time it takes for data packets to be stored or retrieved.
Which of the following is an approach to business governance that values decisions?Data-driven decision management (DDDM) is an approach to business governance that values decisions that can be backed up with verifiable data.
What is the term for the time it takes for data to be stored or retrieved quizlet?data latency. the time it takes for data to be stored or retrieved.
What is data mining quizlet?Data Mining. the use of techniques for the analysis of large collections of data and the extraction of useful and possibly unexpected patterns in data.
|