The method includes collecting data from Facebook API and calibrating the data based on the official statistics. Although both the method and the data have certain limitations, they can give a very detailed insight into emigration trends over time. In fact, the main advantage of the presented methodology is not the ability to estimate the absolute numbers of migrants in individual countries but to follow the migration numbers' changes over time and give insights about the countries where the traditional demographic data is either not collected or of poor quality.
Users of social media platforms are not representative of society at large. This is particularly true for age, where the distribution of social network users skews to younger age groups. About 45% of Serbia’s residents use one of the Facebook technologies. Facebook data is skewed toward younger generations, with the most users aged 25-34. Since the minimum age requirement for a Facebook account is 13, the data reports no users below this age. Data from the official sources was used to validate the Facebook estimates and to develop the model. We used multiple sources to assemble the dataset on the number of Serbian migrants and their demographic characteristics. For countries in the European Union, migrant data (foreign-born migrant dataset) is available through Eurostat (but not for Germany, France, the UK, Spain, or Netherlands). Statistics for the number of Serbian migrants in the United States were collected through the American Community Survey. For the other countries, we used the United Nations' Department of Economic and Social Affairs (UNDESA) data and the UN Demographic Statistics Database.
Although Facebook numbers differ from the official sources, the numbers are highly correlated, with an extremely high correlation coefficient of 0.977. To correct the age bias in the data in a more systematic way, and account for the different Facebook usage patterns in different countries, we estimated the parameters of the following linear regression model:
The regression coefficients indicate that Facebook data underestimate the number of migrants in older age groups. Additionally, the magnitude of coefficient β2 suggests that the model is sensitive to the Facebook penetration rate. This means that in countries with high Facebook usage, the calculated number of migrants will be similar to the raw Facebook data. There will be no significant correction since everyone is already on Facebook. Conversely, in countries with a low usage rate, the projected number of migrants will be several times larger than the raw data to account for people not on Facebook.
The results indicate that Facebook is a good source of migration and demographic data. For the Serbian context and Balkan countries, this type of analysis could be of great assistance since countries in the region exhibit high out-migration on the annual level, which is impossible to measure with decennial censuses. An inexpensive and quick data collection process allows us to repeat the analysis every couple of months, giving a very detailed insight into emigration trends over time. In fact, the main advantage of the presented methodology is not the ability to estimate the absolute numbers of migrants in individual countries but to follow the migration numbers' changes over time. More broadly, social networks could be used to investigate migration phenomena related to wars, economic hardship, or political instability. This project is only the first step on our path to analyze the migration patterns of Serbian migrants as Europe gradually opens in the post-COVID environment. Some limitations are related to our method. First, Facebook data do not provide information about the number of years that people have spent in the country, one of the key variables in the traditional demographic research. Second, the accuracy of Facebook’s classification of expats is unknown.