The four filters

The extensive growth of sources and contents, the inexhaustible escalation of fake news producers, the proliferation of social channels with metrics far from being disclosed, the spread of messaging systems. All these factors contribute to make analysts’ life harder day after day.

Today 66% of the organizations do not know how to handle existing data, let forget about collecting new data in a way truly contributing to the business as we know it.

Big data, well data of any size, can be really uneasy to collect and to translate into something

There is a clear need for data ready to use without never ending set up operation, able to answer in seconds to questions.

“Data is an enterprise asset, which cuts across products, services, and organizational units of a company.  This makes data hard to manage and data initiatives difficult to organize. The big data mindset is driven by experimentation, discovery, agility, and a “data first” approach, characterized by analytical sandboxes, centers of excellence, and big data labs  This mindset often runs counter to, or can complement, traditional hypothesis-driven approaches to data management.”

Randy Bean, Forbes, 2016/11/08


The process of adoption of meaningful big data (or data of any size) can be rather complex then as every process impacting on the corporate culture and workflow.

Looking at this flow and then thinking about each one own organization, it’s easy to spot critical area.

The definition of Stage 1 & 2 is really critical as it is entangled with business priority as it cannot be a first come first serve model, so it is for helping problem owner in choosing the right data to achieve her/his goals.

I would probably add several check-points for process validation so not to get to the end of the entire trip and find out that something has changed in between.

The more accurate is the set up activity in modeling a monitoring system, the best is the experience for the client and the staff working on the platform.

Other than the current filtering modules available in the most performing platforms, we suggest to focus a filtering activity on four indexes with the aim to deliver a content ranking tailored for each client. A website, a Facebook page, a Twitter account could be a threat for a brand and neutral to others or positive…

Ranking the repository of sources is then a strategic activity which requires a periodic revision to keep the info dataset updated.

The index suggested are:

  • Productivity
  • Polarisation
  • Virality
  • Audience

a) Productivity  immagine3_051216

Contributors have each one a posting strategy, if professional authors, or post compulsively if their activity is driven by political or news agenda.

The general rule is that a consistent stream of posting and update keeps the audience more loyal to the author.

If the author does belong to conspiracy or boycott groups, to some political parties, then we do have to consider the filter bubbles he/she belongs. The bubbles support a high level of fidelity no matter how many posts are made each day or week.

b) Polarisation  immagine4_051216

While the algorithmic approach to measure sentiment is still rather questionable, on limited amount of text it can be run manually. This task may prove to be vital when introducing a Sentiment by source, better a polarization index. Why polarization? If some sources are 100% negatives by attitude towards some brands or some industries, others can have a mixed approach that depends from a single contributor or a extremely sensitive topic.

The latter being true for news outlets, the former being true for some Facebook pages, blogs and Twitter accounts.

Building a polarization index by source does help in building a knowledge system that can be expanded in a consistent and unique way

c) Virality   immagine5_051216

How fast a news spreads across the web? Well no one knows a rule for that nor it can be predicted easily though it is possible to build some model based on the previous posts that can help in forecasting a reasonable virality index by source. Within a virality index we do recommend to include the variable of the lifecycle of the news distributed by a source. These information can drive the appropriate reaction by a brand.

d) Audience  immagine6_051216

Related to each of the index above, the definition of the Audience is no longer a simple quantitative item but a rather qualitative one. While the size of the audience have a impact on the potential reach, it’s its loyalty to the author that impact on virality. Larger audience can work better as far as the goal is awareness, memorability though tend to have a high level of dispersion and a low level of conversion. On social media channels larger audiences are often inefficient.

What to do with these index?

The most important task to be completed is the creation of a unique index to help defining the true relevance of a contribution.

Charting the four indexes is the first step to visualize the impact and to identify at a glance the area of weakness and strength.


There is a long list of task to be completed through the output of the index, setting innovative alert system and multilayer priority definition, just to name a few.

Overall, it’s a matter to deliver rich insights to clients that go over the standard information building a shared knowledge system.