This microbook is a summary/original review based on the book: Big Data: Does Size Matter?
Available for: Read online, read in our mobile apps for iPhone/Android and send in PDF/EPUB/MOBI to Amazon Kindle.
Also available in audiobook
You’ve probably heard the term big data before. If you, like so many others, have no idea what it means or what it is used for, Harkness’ “Big Data” will help you understand. You might not be aware of it, but big data has become a constant aspect of our lives and has found its application in areas ranging from hiring decisions to democratic processes. So, get ready to learn more about it!
Big data is used to collect information, analyze it, and even predict the future. It describes large amounts of datasets that are combined in new ways to profile people’s decisions, for example. Big data can take a variety of forms and is amassed extremely fast and in real time.
Since big data comes in such large formats, many often believe that it is complete. That’s why many companies and government organizations turn to analyzing data instead of performing surveys, when they want to gain insights. This marks a shift in how we gather information in the modern world. “Data is often compared to oil, as the raw material that will power the next industrial revolution.”
To define big data, Harkness uses the acronym Big DATA: It’s big, it’s got many dimensions, it’s automatic, it’s timely, and it uses AI (artificial intelligence):
Especially in science, big data is often used for analysis. For example, in the field of neuroscience, big data is used for brain scans. Neuroscientists often employ AI algorithms to spot patterns and sort brain scans accordingly. Functional MRI now allows us to see changes in brain scans in real time.
Professor Van Horn, a neuroscientist, uses the information available to look for rare genetic variants associated with Parkinson’s disease or autism. To do so, he links genetic databases to his personal database of brain images.
The use of big data is especially useful in the field of medicine, as it can help identify risk factors for illnesses such as cardiovascular disease or cancer. Big datasets can include anything ranging from people’s postal codes to meteorological records and allows users to link findings in new ways.
The NHS (National Health Service) in the U.K, for example, collects and centralizes large amounts of medical data. This allows researchers and medical professionals to use statistical techniques to discover important links between patient behavior, the environment, and health outcomes. Therefore, public health organizations can identify small factors that could potentially improve the longevity of a population.
Even though using medical data for medical research seems very appealing, there is a downside. Medical data could also be collected to assess your eligibility for health insurance. In fact, some companies in the U.S. already encourage their employees to wear Fitbits to track their fitness.
Big data has many applications in the business world: from monitoring energy consumption to tracking shopping habits. It is also used to predict the economical future.
Customer intelligence companies such as Dunnhumby USA use existing data to predict food trends. For 2015, they predicted people would be interested in consuming “natural sweeteners” and food that was “responsibly produced,” as well as products produced in small batches and to certain religious standards. Dunnhumby also believed that fermented foods would be more popular.
Dunnhumby has access to 770 million shoppers around the world: they can collect the data from all these people through club card schemes, for example. You may swipe these when checking out, meaning you earn points to spend in store – while they earn your data to predict future shopping patterns.
Club cards also allow for profiling of customers. A purchase of diapers, for example, means that you will likely buy baby food, and later, school uniforms. Stores can then send you targeted offers. These days, the club card scheme has been taken one step further. Information such as bank card data, cell phone locations, and social media posts are easy to gather and can help predict future behavior. It should be noted, however, that although big data is good at recognizing patterns across populations, in order to see the big picture, it cannot possibly predict how individuals decide to live their lives.
Another field in which big data is used is in democratic elections. YouGov, a global data company, predicted in 2015 how every U.K. constituency would vote, ahead of the elections.
In the 2010 U.K. General Election, the Conservative Party used a database purchased from a credit reference agency to target possible voters. With this information, they sent out leaflets about childcare and schools to parents, and leaflets on pensions and crime to retired voters. Political campaigners now know more about you than you may wish to tell them.
The last two examples may have made you feel a bit uncomfortable. Surely, it can’t be right that anyone can access personal data that you do not wish to share. Privacy is one of the biggest concerns with the use of big data.
Of course, transparency isn’t always a bad thing. Members of Parliament in the U.K. allow you to see exactly what they do in parliamentary sessions, and what expenses they claim. Transparency also finds its application in other areas as well, such as the prevention of terrorism. Transparency, however, is not the same as accountability.
When an individual shares data, it needs to be in a form that can input easily into a computer. Permission is required to allow others access to the information, as well as the ability to use and share it. It also needs to be made traceable, so it can be connected back to you in order to verify its authenticity. Finally, it must not infringe on anyone else’s privacy. The issue of privacy is more complex than simply removing people’s names, however.
Even when your data has been anonymized, it becomes increasingly easy to link it back to you. In January 2015, the Massachusetts Institute of Technology showed how easy it was to link datasets and trace “untraceable data” back to an individual, starting with anonymized credit card records (using the date and time of transaction) and linking that to social media posts. This means data is pseudo-anonymized.
A rise in big data is linked to a loss of privacy. We should all be asking ourselves to what degree surveillance and data collection practices are reasonable. Governments need to be transparent on the amount of data they collect, unlike in 2015 when it was revealed that the U.K. had been using its secret service to routinely intercept phone calls for the previous 10 years.
One of the big pitfalls of big data is to believe that it is neutral. It’s just data and numbers, after all, so it must be neutral, right? Let’s not forget that it was humans who created the algorithms analyzing the data. Humans, who often are guided by subconscious bias. On top of that, the interpretation of algorithms always depends on what they’re being used for, and what kind of data they’ve been fed. Take crime monitoring as an example.
Graeme Tiffany, a community philosopher in Leeds, is worried about a new development called “at-risk-ism”. That is, crime-monitoring units around the world that analyze big data and use it to predict who is most likely to commit a future crime.
In Chicago, for example, big data perpetuates racism. Algorithms define the people most likely to commit crimes by analyzing and putting together data such as past criminal records, police records of friends and acquaintances, and whether any of their close contacts have been a victim of crime. Consider this for a second – having a criminal record doesn’t necessarily mean you will commit future criminal acts. It is much more likely that unemployed black kids in Chicago’s West Side are arrested for smoking weed than law students at the University.
Besides its uses in analyzing criminal records, big data is often used in hiring decisions. Algorithms not only analyze what applications people have been submitting, but they may also look at social media posts and other publicly available data. So if you have poor health or live in a poor neighborhood, your chances of getting a job are reduced.
Looking at big data might mislead you to think that by collecting large amounts of data, we can gain insights into a person. But let’s not forget, quantity isn’t quality. Harkness expresses this sentiment nicely when she states, “Looking at my bones when I’m dead won’t tell you anything of value about who I am, and neither will downloading my data while I’m alive.”
Of course, you might see gathering data about yourself as a way to gain control over your life: such as monitoring how many steps you walk or how many calories you burn. And this can help you to lead a healthier lifestyle by motivating you to change your behavior. But the autonomous self cannot be quantified. That is what differentiates us from artificial intelligence.
AI’s limit is ascertaining value to the data it accumulates. While it might be able to correctly read a hand gesture, the idea that there is a true and a false about it is a very narrow, and quantified, view of the world.
Big data can give valuable insights into patterns about how a population or a large group of people will act. It won’t give you any insights about the individual, however.
Even though big data allows us to link datasets together to gain insights into medical research, or even predict the future, for example, it certainly has its pitfalls. Through big data, it is impossible to profile individuals. It can mislead us into thinking we know everything, and that we can trust big data as a neutral informant.
“Big Data” is an enjoyable, albeit frightening read. In many examples, Harkness draws a vivid picture of how big data works and how it is already present in all of our lives.
Next time you post something on social media, think twice – this information is always traceable back to you.
Timandra Harkness is a writer, comedian, and broadcaster and has been presenting in scientific, mathematical and statistical subjects since the last days of the 20th century. She has written about trips to the Sunday Times, motoring for the Telegraph, science and technology for WIRED, BBC Focus Magazine and Men's Health Magazine, and is a regular voice on BBC... (Read more)
Now you can! Start a free trial and gain access to the knowledge of the biggest non-fiction bestsellers.