mandag 13. juni 2016

Everything you ever wanted or needed to know about Big Data #BigData

Big Data is a phrase that gets bandied about quite a bit in the media, the board room – and everywhere in between. It’s been used, overused and used incorrectly so many times that it’s become difficult to know what it really means. Is it a tool? Is it a technology? Is it just a buzzword used by data scientists to scare us? Is it really going to change the world? Or ruin it?
This post is all about demystifying the mess that has become Big Data, and more importantly demonstrating how you can use it to improve your bottom line.

What Is Big Data?

First of all, what is Big Data? In it’s purest form, Big Data is used to describe the massive volume of both structured and unstructured data that is so large it is difficult to process using traditional techniques. So Big Data is just what it sounds like – a whole lot of data.
The concept of Big Data is a relatively new one and it represents both the increasing amount and the varied types of data that is now being collected. Proponents of Big Data often refer to this as the “datification” of the world. As more and more of the world’s information moves online and becomes digitized, it means that analysts can start to use it as data. Things like social media, online books, music, videos and the increased amount of sensors have all added to the astounding increase in the amount of data that has become available for analysis.
Everything you do online is now stored and tracked as data. Reading a book on your Kindle generates data about what you’re reading, when you read it, how fast you read it and so on. Similarly, listening to music generates data about what you’re listening to, when how often and in what order. Your smart phone is constantly uploading data about where you are, how fast you’re moving and what apps you’re using.
What’s also important to keep in mind is that Big Data isn’t just about the amount of data we’re generating, it’s also about all the different types of data (text, video, search logs, sensor logs, customer transactions, etc.). In fact, Big Data has four important characteristics that are known in the industry as the 4 V’s:
  • Volume – the increasing amount of data that is generated every second
  • Velocity – the speed at which data is being generated
  • Variety – the different types of data being generated
  • Veracity – the messiness of data, ie. it’s unstructured nature
Based on the incredible amount, speed, variety and unstructuredness of the data we are now generating and storing, it’s no surprise that it quickly became unmanageable using traditional storing and analysis methods. This is where the term Big Data becomes confusing, because it is often used to refer to the new technologies, tools and processes that have sprung up to accommodate this vast amount of data.

Glossary of Big Data Terms

Inevitably, much of the confusion around Big Data comes from the variety of new (for many) terms that have sprung up around it. Here is a quick run-down of the most popular ones:
  • Algorithm – mathematical formula run by software to analyze data
  • Amazon Web Services (AWS) – collection of cloud computing services that help businesses carry out large-scale computing operations without needing the storage or processing power in-house
  • Cloud (computing) – running software on remote servers rather than locally
  • Data Scientist – an expert in extracting insights and analysis from data
  • Hadoop – collection of programs that allow for the storage, retrieval and analysis of very large data sets
  • Internet of Things (IoT) – refers to objects (like sensors) that collect, analyze and transmit their own data (often without human input)
  • Predictive Analytics – using analytics to predict trends or future events
  • Structured v Unstructured data – structured data is anything that can be organized in a table so that it relates to to other data in the same table. Unstructured data is everything that can’t.
  • Web scraping – the process of automating the collection and structuring of data from web sites (usually through writing code)

Why Has It Become So Popular

Big Data’s recent popularity has been due in large part to new advances in technology and infrastructure that allow for the processing, storing and analysis of so much data. Computing power has increased considerably in the past five years while at the same time dropping in price – making it more accessible to small and midsize companies. In the same vein, the infrastructure and tools for large-scale data analysis has gotten more powerful, less expensive and easier to use. According to
As the technology has gotten more powerful and less expensive, numerous companies have emerged to take advantage of it by creating products and services that help businesses to take advantage of all Big Data has to offer.  According to Inc, in 2012 the Big Data industry was worth $3.2 billion and growing quickly. They went on to say that “Total [Big Data] industry revenue is expected to reach nearly $17 billion by 2015, growing about seven times faster than the overall IT market”. For more on the size and projected growth of the Big Data industry, check out this Forbes article.
Businesses have also started taking notice of the Big Data trend. In a recent survey, “Eighty-seven percent of enterprises believe big data analytics will redefine the competitive landscape of their industries within the next three years.”

Why Should Businesses Care?

Data has always been used by businesses to gain insights through analysis. The emergence of Big Data means that they can now do this on an even greater scale, taking into account more and more factors. By analyzing greater volumes from a more varied set of data, businesses can derive new insights with a greater degree of accuracy. This directly contributes to improved performance and decision making within an organization.
Big Data is fast becoming a crucial way for companies to outperform their peers. Good data analysis can highlight new growth opportunities, identify and even predict market trends, be used for competitor analysis, generate new leads and much more. Learning to use this data effectively will give businesses greater transparency into their operations, better predictions, faster sales and bigger profits.

Best Big Data Tools

Taking advantage of all that Big Data has to offer can seem like a daunting task, but there are a number of tools (both free and paid) that can help businesses to collect, store, analyze and derive insight from Big Data. Here are just a few…


OpenRefine is a data cleaning software that allows you to pre-process your data for analysis. This is especially useful if you are analyzing unstructured data or combining multiple data sets into one for analysis.


WorlframAlpha provides detailed responses to technical searches and does very complex calculations. For business users, it presents information charts and graphs, and is excellent for high level pricing history, commodity information, and topic overviews. is allows you to turn the unstructured data displayed on web pages into structured tables of data that can be accessed over an API.


Tableau is a visualization tool that makes it easy to look at your data in new ways. In the analytics process, Tableau’s visuals allow you to quickly investigate a hypothesis, sanity check your instincts or build a compelling infographic to convince your audience with.

Google Fusion Tables

Google Fusion Tables is a versatile tool for data analysis, large data set visualization and mapping.

Best Additional Resources (blog posts, case studies, books, videos, etc)

If you’re interested in learning more about Big Data and how you can use it, here are a few of our favorite resources:


  • No Free Hunch (kaggle) – Kaggle hosts a number of predictive modeling competitions. Their competition and data science blog, covers all things related to the sport of data science.
  • SmartData Collective – SmartData Collective is an online community moderated by Social Media Today that provides information on the latest trends in business intelligence and data management.
  • FlowingData – FlowingData explores the ways in which data scientists, designers, and statisticians use analysis, visualization, and exploration to understand data and ourselves.
  • KDnuggets – KDnuggets is a comprehensive resource for anyone with a vested interest in the data science community.
  • Data Elixir – Data Elixir is a great roundup of data news across the web, you can get a weekly digest sent straight to your inbox.

Online Courses/Learning Resources

  • DataCamp – DataCamp is a resource for learning data analysis and R interactively.
  • School of Data – School of Data offers a variety of courses designed for everyone, from the data science-newbie to the professional seeking inspiration.
  • Udemy – Udemy is the world’s largest destination for online courses with many in the data science field.
  • w3schools – W3schools is great online tutorials for learning basic coding and data analysis skills.


  • The Data Science Revolution – an expert panel that considerations of the future of data science and the ethics involved with data analytics and enhanced predictive powers.
  • Turning Big Data into Big Analytics – focuses on the opportunity businesses have when dealing correctly with their data and serves as a case study for data science professionals.


Looking Ahead

What the future of Big Data really holds, no one can predict. The rapid development of new technologies, especially in the machine learning space, will undoubtedly usurp any predictions we try to make. What is certain, is that Big Data is here to stay. The amount of data we are producing is only going to increase and by analyzing it, we can learn and eventually be able to predict some pretty cool things. Very soon, Big Data will touch and transform every industry and every piece of your daily life.

Wrapping Up

Whether or not you believe the hype about whether Big Data will change the world, the fact remains that learning how to use the recent influx of data effectively can help you make better, more informed decisions. The thing to take away from Big Data isn’t it’s largeness, it’s the variety. You don’t necessarily need to analyze a lot of data to get accurate insights, you just need to make sure you are analyzing theright data. To really take advantage of this data revolution, you need to start thinking about new and varied data sources that can give you a more well rounded picture of your customers, market and competitors. With today’s Big Data technologies, everything can be used as data – giving you unparalleled access to market factors.
What’s your take on the future of Big Data? Leave a comment for us below!

Ingen kommentarer:

Legg inn en kommentar