Is Big Data Cause or Consequence

Our world has everything connected. Data is readily available. Humans spend their time producing digital breadcrumbs. And all this data can be collected and analysed in ways that are new and never before imagined.

Is Big Data Cause Or Consequence

When discussing big data sometimes we worry about the difficulty of managing data being produced. The difficulty lays because the data is heterogeneous, because it appears in high volumes and with variety. This led to the appearance of the ‘paragon’ 3V (Variety, Velocity, and Volume of data).

But this corresponds to two views of our society. One that is technical. Another is social.

The growth in data, the growth in the capacity to analyse it, and the growth of Echelon to gather communications in real time are inevitable. These are infrastructural problems. The view of the world as a technological system. A system that wants to extract more and more information, process it in a sequence of marginal gains to find and tell (usually the verb is sell) us the secret patterns we didn’t know about ourselves.

The socio-cultural factors show a picture of society changing. People produce more data because people want commodity. The commodity to press a button in the fridge and place automatic orders for milk on the local store of choice. Were cars will be autonomous and safer because they talk to each other. Where people want to share their photos with family. And in all these processes some (or many) metadata will be shared. The Internet of Things allows anyone to convert a simple electric toy with a microcontroller to repurpose it and connect it to the IoT. We are all happily contributing to the digitisation of everything.

The data produced has always been a problem for those that want to understand it. For them knowledge is paramount. They can be monks copying ancient manuscripts in the medieval ages being technological challenged when wanting to copy an entire library. Or they can be IBM deploying Apache-Spark as the next big hammer after Hadoop. There is always a limitation to the amount of data society can process. And there is always some limitation to the amount of information that one can extract from that data. There is always a limit, and although the term Big Data is new, the problem isn’t.

The digitisation of the world we know is a social phenomenon that is growing with each new device sold. The life simplification of digitisation causes the production of streams of data and metadata that some consider excessive. Security can be a problem, but changes will happen, not because technology imposed by big companies, but because society needs the changes to accommodate our new needs. We want to be connected and functional in the modern world.

This discussion leads me to think that we are dealing with two views of the Big Data. One, that treats Big Data as the technological challenge of gathering, processing, and acting upon the world. Where Data is increasingly being produced and promoted as fueling advances of the future. And for this, we’ll need to be able to understand those streams of data. Another is to look at this as a sociological phenomenon where the growing digitisation of the planet is producing changes at a faster pace. In it Big Data is just a side effect of this social change. So, is Big Data Cause or Consequence?

This seems to be a chicken and egg problem. Is Big Data the analysis of more and more data, and with that analysis the cause of new knowledge that is transformative to the society? Or is the digitisation of the world and the social changes observed, producing a drastic change in the output of their digital signatures, that Big Data is a side product of?

The truth is that the two phenomena might be interconnected in a positive feedback mechanism. The more we see behavioural changes in the population, the more data will be produced and the more the technicalities of Big Data will be publicised. The more Big Data guardians have knowledge about the world, the more will they act on it, asking for more data. If you imagine what society can be if you read an encyclopaedia, imagine what it can produce when Watson receives enough information to decide the next fashion trend.

The present is the inventor of the future. And the question comes down to individual participatory choices (at least in democratic worlds). You cannot escape the future, but you can choose how much you want to engage with it. Progress (as a transformative force) is upon us every day. But progress is a society concept, not a technical one. And how much you engage is always going to be relative to a moving baseline, and not to a fixed reference point. The digitisation of the world is making Earth a better place (even if will be too hot to live in it soon).

Today’s Big data challenges are just the same of the past. How society changes and technology evolves are like the two faces of a Möbius strip. Both interconnected, always affecting each other in eternal motion. Always on an eternal continuum like present and future.

Your Grandfather’s Oldsmobile—NOT! self-driving cars around the corner

Your Grandfather’s Oldsmobile—NOT! – BLOG@UBIQUITY.

The self-driving car is coming to our streets. Might not be as soon as some predicted, but it will come in incremental steps. Technology has this feature: You dream about radical changes and then they appear slowly, one step at the time, and when you look back you realise that reality is bigger than your original dream. So let’s keep dreaming about self-driving cars, planes, bicycles or balloons. Someone will implement all the good things to get there and beyond.

A chatice de escrever Data Driven Blog Posts

Big Data

Muitas vezes tenho escrito alguns posts baseados na análise de alguns dados disponíveis. Foi por exemplo o caso da primeira luta contra a #pl118, ou da análise recente ao estado do cinema português.

A grande vantagem de escrever sobre os dados é que torna-se muito mais difícil opinar sobre um assunto e temos que nos cingir muito mais ao que realmente os dados nos mostram. Isto levou-me muitas vezes a rever as ideias pré-concebidas que tenho sobre determinados assuntos.

Por exemplo, no caso do cinema em Portugal, é claro que assumia que não se fazem filmes, que a crise está a afectar o sector como nunca, que é tudo um pântano. A verdade é que depois de procurar os dados que fundamentassem uma análise, não a encontrei. Os dados mostram que o cinema português produziu nos últimos 3 anos quase 450 filmes, entre ficção, documentários, curtos, etc.

Então os dados deram-me uma chapada e obrigam-me a pensar isto de uma forma diferente. Afinal há uma indústria de cinema que funciona mas que sistematicamente se queixa de um estado de crise permanente? Será isto um maneirismo da subsidio-dependência? São os meus dados robustos? A metodologia é boa?

Por um lado a metodologia pode ter alguns defeitos, os dados podem não ser os mais recentes, ou os mais precisos. Estou a depender do levantamento efectuado por outros, pelo que não conheço a metodologia adoptada. Há imensos aspectos que podem ser colocados em causa num post deste género. Mas até que alguém me apareça com dados novos, que eu possa utilizar e cujos resultados sejam reprodutíveis, tudo não passa de conversa. Os dados só são bons até aparecem dados de melhor qualidade.

Estes artigos baseados em dados acabam sempre por ser um pau de dois bicos. Por um lado servem para esclarecer factos e a afirmações que se ouvem a toda a hora na boca dos megafones públicos e por outro servem para levantar mais questões do que as explicações que apresentam. Acabam por nunca serem taxativos, mas procuram fazer apenas pensar alto um pouco. São uma espécie de brainstorming público. Nada mais.

Is Big Data Killing Theory?

Big Data and number crunhing

In other words, we no longer need to speculate and hypothesise; we simply need to let machines lead us to the patterns, trends, and relationships in social, economic, political, and environmental relationships.

The Guardian runs an opinion story in their Datablog about how big data could end theory. I think that the reduction of big data to simple a engineering problem of how to accommodate more data, process it in real time and monitoring the services on which the analysis runs, is not really near, neither will ever come to be.

We will always need theoretical scientists, but more important, we need philosophers, The idea that these big data analysis can be automated and the results applied without further explanation is terrifying and orwellian. Science, in it’s publish or perish race to top needs to bring back some thinking to itself and big data will need some big thinking on what it is producing to really understand and explain what society his. If it ended being just a computer output we’d all be in great danger as governments would make bad decisions based on ignorance and evil corporations would game the big data analysis for profit at society expenses.

Big Data is a great field to work right now, and will revolutionize our understanding of the society we live in, but it wont go far without someone being able to interpret and analyse its outputs, even if they are the most accurate ever produced. Society is not what we made, society is what we make.

How Big is Big in Big Data Analysis?

How Big is Big in Big Data Analysis?

I’ve said that Big Data analysis needs to become mainstream and reach small(er) industries, as until now, “big data” as been applied to massive volumes of data. With the hadoop becoming more popular (and easy to use too) new opportunities for the areas of data mining or data analytics in general will certainly emerge. But will it be big data analysis?

One of the defining characteristics of big data is that sometimes you need more than just labeling it BIG. Forbes as a blog post where a few questions about the data are made. The interesting aspect about those 4 questions is that they could be easily summed into 2 points:

1st) Big Data is by nature complex, with intricate structures and relations among its components in such way that this entanglement needs long computations to grasp its inner aspects

2nd) Big Data is only big data if the time to those computations is a critical aspect of the industry trying to deal with big data. If that’s not the case the author claims it’s just a matter of “large data” analysis. In many contexts this means real-time or quasi real-time data processing.

The author goes on to state that according to these very few industries really process Big Data, but I tend to disagree a bit here as It’s my belief that the two points presented about Big Data are so correlated that indeed you’ll have “Big Data” at many scales as the space of exploration isn’t on Volume or Time alone but on a time-volume space, and therefore we’ll be able to find examples of big data in different scales. In any case I totally agree that the two points are the ones that one must ask to see if our data fits the “big data” label.