Big Data & Hadoop: Transforming
the Data Architecture of Peter Mayer
Data storage and management
technologies have recently begun to surge in popularity. Businesses want to
learn how to implement the best ways to store, maintain, capitalize on the copious
amounts of data that their products, consumers, services, etc. generate. With
that being said, organizing and measuring data has proven to be quite difficult
despite present-day technological innovations. The term “Big data” has emerged
and Apache Hadoop, or Hadoop, technology uses a set of algorithms to process
large clusters of data (Kelly, 2014).
This report serves as documentation
of research conducted on the benefits and barriers of Apache Hadoop as well as
a proposal to the management of Peter Mayer Advertising to implement the Apache
Hadoop platform to restructure the advertising agency’s big data.
Apache Hadoop is an open-source
computing platform that stores and processes data. It includes a storage system
known as Hadoop distributed file system (HDFS). HDFS is capable of storing large
amounts of data, growing incrementally, and surviving failure of major parts of
storage infrastructure (Dowling). Hadoop influences a cluster—a set of loosely
or tightly connected computers—of hubs to run MapReduce programs. The MapReduce
programming model is comprised of two steps: the “Map” procedure organizes
information and the “Reduce” procedure assembles the transitional results into
a final result, or summary operation. (Dowling) Each single cluster node—a
special type of computational cluster for storing data—node has a neighborhood
record framework and CPU to MapReduce programs on. Data is broken into pieces,
stored over the local records of hubs, and then copied. The local records form a
record framework called the Hadoop Distributed File System (HDFS) (Lay, 2010).
Hadoop has developed into a
decision support system for researchers, data scientists, information
architects, etc. that have to analyze information in fields that generate
copious amounts of data ( i.e. education, finance, public relations, and
advertising). The shifting nature of data itself has also spiked interest in
new approaches to data collection, especially concerning technologies that have
the capacity to parse the information from social media and mobile devices
Benefits & Barriers of Implementing Hadoop At
Below is an examination of the
benefits and barriers of implementing Hadoop to restructure the big data of
Peter Mayer Advertising to increase the agency’s network security, overall
profit, and consumer satisfaction.
Barriers of Implementing Hadoop at Peter Mayer
Leveraging the value of data can be
extremely complicated. If the data at Peter Mayer is misidentified, then
determining the best ways to go about using the agency’s big data could prove
to be very ambiguous and or difficult to articulate.
Qualified professionals that are
able work on new technologies and to interpret data are limited. “Big Data” is
a relatively new concept, consequently, there is a shortage of experts that can
interpret the information of a business like Peter Mayer’s to get understanding
of what the agency’s needs are.
Data Access and Connectivity
Many institutions, businesses,
organizations, companies, etc. lack the
correct technologies and software to manage and aggregate their data. While
there are organizations that are working to providing lasting solutions, this
is a problem that could hinder the implementation of Hadoop at Peter Mayer
Changing Technical Patterns
Technical patterns that exist in
data industry are constantly changing. Innovative employees, business partners,
and leaders are needed to develop the right information technology
infrastructure for a specific entity (business, institution, corporation, etc.)
that will cooperate with the industry’s changing landscape.
Leveraging big data requires
cross-functionality in fields such as engineering, IT, and finance and the
ability to determine where the owners of a business observe data fragmentation
in the organization. Collaborating all functions of the business could prove
difficult to implement.
Lack of Data Protection
Lastly, data protection is a roadblock that
constantly hinders organizations from taking total advantages of their data,
especially given the frequency in which data breaches occur.
Benefits of Implementing Hadoop at Peter Mayer
Hadoop’s data architecture helps
streamline massive amounts of big data. Three of the most popular types of big
data that are collected are clickstream data, sentiment data, and server log
data. While similar, each of these types of big data can provide very different
value to Peter Mayer. Below is a proposal to the management of Peter Mayer
about the aforementioned types of big data and how implementing Hadoop to make
its big more valuable (Kelly 2014).
Clickstream data is used to
understand how website visitors research and consider purchasing products. With
clickstream analysis, Peter Mayer can optimize its websites and promotional
content to improve the likelihood that visitors and customers will learn about
the performance of the advertising the agency produces. A record of these behavior
patterns would assist the digital marketing team at Peter Mayer with judging the
effectiveness of different types of advertisements—with the confidence that
their results are statistically meaningful and reproducible (“ClickStream
Hadoop Makes Clickstream Data More Valuable
Tools like Omniture and Google
Analytics already help digital teams at Advertising businesses like Peter Mayer
analyze clickstreams. However, Hadoop adds three key benefits to the
clickstream data of Peter Mayer. First, Hadoop can join clickstream data with
other data sources like CRM (customer relationship management data) customer
demographics data, and information its advertising campaigns. This additional
data can provide Peter Mayer with a more comprehensive evaluation of how the
information can be used as opposed to an isolated analysis of clickstream alone
(“Analyzing Click Stream Data Using Hadoop”).
Secondly, Hadoop scales easily that
years of data can be stored without incremental costs, allowing institutions to
perform temporal or year to year analyses of their clickstream data. Peter
Mayer Advertising could save years of data and to discover deeper patterns in
the clickstream that its competitors have missed.
Finally, Hadoop makes website
analysis easier. Without Hadoop, clickstream data is very difficult to process
and structure. With Hadoop, inexperienced business analysts and data scientists
can use Apache Hive or Apache Pig scripts to organize clickstream data. Hadoop
makes storing and refining the data easy, so that analysts of varying
experience can focus on the discovery of data patterns (“ClickStream Data
Sentiment data is unstructured data
on opinions, emotions, and attitudes that is contained in sources like social
media posts, blogs, online product reviews and customer support interactions.
Businesses use sentiment analysis to understand how the public feels about a specific
topic and tracks how those opinions change over time. Peter Mayer can analyze
sentiment about their advertisements to gauge public reaction to the work it
produces for its clients.
Hadoop Stores Sentiment Data
With Hadoop, social media posts can
be loaded into the Hadoop Distributed Files System (HDFS) using Apache Flume
for real-time streaming. Apache Pig and Apache Mahout organize the unstructured
data and score sentiment with advanced machine learning methodologies.
Sentiment analysis quantifies subjective
views expressed by consumers and target audiences on social media. Researchers need
big data to do this reliably. Words and phrases are assigned a polarity score
of positive, neutral or negative. By scoring and aggregating millions of
interactions, analysts can judge candid sentiment at scale, in real time. After scoring sentiment amongst a target
audience, Peter Mayer can combine this social media data with other sources of
data. Clickstream data can be used in conjunction with sentiment data to
attribute what was previously anonymous sentiment to a particular type of
customer and or segment of customers. The results from this conjunction of data
can be visualized and used to improve the advertisements that Peter Mayer makes
that affect their target customer and or segment of customers (McKenna 2015).
Server Log Data
Large corporations businesses like Peter
Mayer usually build, manage and protect their information networks. Server logs
are computer-generated records that report data on the status and operations of
these networks. The volume of server logs is typically massive, and often times
these logs are insignificant.
However, the two of the most common
use cases for server log data are network security breaches and network
compliance audits. In both of these cases, server log information is vital for
both rapid, efficient problem resolution and long-term resource planning.
Hadoop Helps You Protect Network Security
High-profile data breaches happen
frequently. Enterprises and government agencies invest vast sums on antivirus
and firewall software to protect their networks from malware and outside
attacks, and those solutions usually work. When security fails, Hadoop helps business
like Peter Mayer understand and repair its vulnerabilities quickly and facilitates
root cause analysis to create lasting protection (“Security Think Tank”).
The report contains three main
topics: information about Hadoop, the benefits implementing Hadoop at Peter
Mayer and the barriers of implementing Hadoop at Peter Mayer. It also serves as documentation of research
and a thorough proposal to the management of Peter Mayer Advertising to
implement Hadoop to restructure the agency’s big data. While there are some barriers
that could make the implementation of Hadoop at Peter Mayer somewhat difficult,
the pros of implementing Hadoop outweigh the cons. Peter Mayer would experience
a significant capital gain by investing in this technological tool.