Research work

ML Research

Project financed by NCN (National Science Center)

Advanced modeling methods for viral processes


Each day billions of instant messages, comments, articles, blog posts, emails, tweets and other various mediums of communication are exchanged in reciprocal, social interactions. Research on information diffusion has become very productive and can be applied in maximising the influence and virality of a rumour or improving routing algorithms.

We note as well that models of information diffusion are based on classical epidemiological models, e.g. SIR. In recent months, these models have been used extensively to model the spread of COVID. Hence, understanding and modelling viral processes has become a key research direction. The models that describe viral processes usually assume that this process is stochastic, e.g. the famous SIR model. While it may be that this model is correct for the case for which it was created, i.e. to describe the process of disease spread, as we showed in our previous work it does not apply to the case of information spreading. First, this model does not take into account that information becomes less up-to-date over time and people are sharing it less actively. Secondly, information is spread through different channels.

If we want to study information dissemination on theTwitter network, we need to take into account other means as well, e.g. mass media. In particular, the lack of these effects causes the SIR model to overestimate the likelihood that information will become viral, i.e. reach almost the whole network. Our work (HT 2016) explains the experimentally observed cascade sizes by incorporating two effects:

  • exponential decay of the probability that a rumour is spread further;
  • the multi-source nature of the process, which can be attributed to the fact that information spreads outside the Twitter network.

Another possible explanation can be found in our work (WWW 2017), where we provide the only known theoretical model that explains why distribution of cascade sizes follows the power-law.

This paper introduces the concept of direction of information spread, i.e. from high-degree and high-trust nodes. The motivation for this assumption is the fact people are more likely to share information coming from high-degree nodes. In other words, it seems that we are actually far from understanding well the mechanism of information spread in social networks. This is despite the fact that in social networks, viral processes can be very accurately traced. Our lack of understanding implies that we are unable to correctly assess the risks related to very rare events. In particular, to our knowledge, our paper (HT 2016) is the only case where a metric that correctly accounts for rare events is used. This raises a question of whether in the epidemiological applications of such models, rare events such as pandemic spread of diseases are correctly described. Another line of research of viral processes is the prediction of how popular given information will become.

It should be noted that these models are created based on a completely different approach than assumed in our works. The typical approach is to build a regression model based on the observed process characteristics that predicts their further evolution. However, these models have limited effectiveness because they indirectly assume that the process is deterministic, and it is known that it has a stochastic nature and its evolution is not predetermined. This creates a challenge to develop models that would predict all possible continuations of its evolution described as a distribution. In particular, only this type of approach can lead to statistically valid results that would predict the chances that a process reaches the whole network. However, there are more important challenges here which form topics for the tasks of this project:

We aim to pinpoint mechanisms that are responsible for the decay of the probability that information is shared further

We will stochastically model risks that a viral process spreads to the entire network, or a large fraction of it

We will work on models for predicting evolution of a specific rumour

We study how to infer parameters and means of transmission of the viral process from indirect observations

We will check whether it is possible to detecting the nature of the viral element, i.e. real-world event, fake news, or viral article

We will apply the methodology developed in this project to model COVID epidemics. In particular, our work will shed light on correctly describing the risks of this process

Modelling stochastic processes is our specialty. We do world-class research in this field.

Get in touch

MIM Solutions can help develop AI in your company, please contact us and we’ll talk you through it.