GovLab Blog GovLab Digest

Beyond demographics: How search engine data can enhance the understanding of determinants of suicide in India and inform prevention

Paper by Daniela Paolotti;  Elad Yom-Tov; Natalia Adler; Ciro Cattuto; Kyriaki Kalimeri; Michele Tizzoni; Stefaan Verhulst;  and Andrew Young in the Journal of Medical Internet Research: “India is home to 20% of the world’s suicide deaths. In India, and around the world, young people are especially at risk of suicide. While statistics regarding suicide in India are distressingly high, data and cultural issues likely contribute to a widespread underreporting of the problem. Social stigma and only recent de-criminalization of suicide are but two factors hampering official agencies’ collection and reporting of suicide rates.

As the product of a data collaborative – the cross-sector exchange of data to create new public value – this paper leverages private-sector search engine data toward gaining a fuller, more accurate picture of the suicide issue among young people in India. By combining official statistics on suicide with data generated through search queries, this paper seeks to: 1) add an additional layer of information to more accurately represent the magnitude of the problem; 2) determine whether search query data can serve as an effective proxy for factors contributing to suicide that are not represented in traditional datasets; and 3) consider how data collaboratives built on search query data could inform future suicide prevention efforts in India and beyond.We combined official statistics on demographic information with data generated through search queries from Bing to predict suicide rates per state in India as reported by the National Crimes Record Bureau of India. We have extracted English language queries on five topics (“suicide”, “depression”, “hanging”, “pesticide”, “poison”). For each query, we recorded the time and date of the query, the state in India from which the user made the query, and the text of the query. We have then collected data on demographic information at state level in India, including: Urbanization, Growth Rate, Sex Ratio, Internet Penetration, Population. We have modeled the suicide rate per state as a function of the queries on each of the 5 topics considered as linear independent variables. We also built a second model by integrating the demographic information on Urbanization, Growth Rate, Sex Ratio, Internet Penetration and Population, all considered as additional linear independent variables in the model.
Results of the first model fit (R2) when predicting the suicide rates from the fraction of queries in each of the 5 topics, as well as the fraction of all suicide methods, show a correlation of about 0.5. The correlation increases significantly with the removal of even 3 outliers, and improves slightly when 5 outliers are removed. In all cases, statistically significant correlation is reached, but the best correlation is obtained for suicide methods (hanging, pesticide, and poison), and only to a lesser extent for depression. Results for the second model fit using both query data and demographic data show that for all categories, if no outliers are removed, demographic data predict suicide rates better than query data. However, when 3 outliers are removed, query data about pesticides or poisons improves the model over using demographic data.

Conclusions: Internet search data has been shown in previous work to serve as a proxy for many health-related behaviors, enabling the measurement of rates of different conditions ranging from influenza to suicide. In this work, we used both search data and demographics to predict suicide rates. In this way, search data serves as a proxy for unmeasured (hidden) factors corresponding to suicide rates. Moreover, our procedure for outlier rejection serves to single out states where the suicide rates have substantially different correlations with both demographic factors and query rates….(More)”.

GovLab Blog GovLab Digest

Elections won’t save our democracy. But ‘crowdlaw’ could.

RiskMap is a paradigmatic example of collective intelligence, which in this day and age means using the Internet to connect groups of people so they can share knowledge….

Yet despite the emergence of hundreds of collective intelligence platforms like RiskMap — including my new favorite, Penguin Watch, where people count the number of penguins in a picture to help scientists measure changes in their population — our political institutions seem to be getting stupider. That need not be. Collective intelligence isn’t just a tool for improving disaster response or enhancing scientific study. It can be used to improve governance too.

More than a hundred local city councils and parliaments at both the regional and national level, from Iceland to Ireland to India, are turning to “crowdlaw,” a form of crowdsourcing that uses novel collective intelligence platforms and processes to help governments engage with citizens. Crowdlaw is based on the simple but powerful idea that parliaments, governments and public institutions work better when they leverage new technologies to tap into diverse sources of information, judgments and expertise at each stage of the law and policymaking cycle. This helps improve the quality as well as the legitimacy of the resulting laws and policies….

Despite these proliferating examples, however, the success of collective intelligence platforms has been mixed. Many projects remain in the pilot phase, failing to expand. When Spain’s Podemos was still an upstart political party, for example, it successfully engaged its supporters in drafting an online party platform but saw less success embracing these crowdsourcing practices once in government. And the Decide Madrid platform, to which 400,000 people have signed up to propose policy to the city council, has resulted in only two new policies but not a single new law….(More)”.

GovLab Blog GovLab Digest

Open Data Demand: Toward an Open Data Demand Assessment and Segmentation Methodology

Report by Stefaan Verhulst and Andrew Young: “Across the world, significant time and resources are being invested in making government data accessible to all with the broad goal of improving people’s lives.Evidence of open data’s impact—on improving governance, empowering citizens, creating economic opportunity, and solving public problems—is emerging and is largely encouraging.
Yet much of the potential value of open data remains untapped, in part because we often do not understand who is using open data or, more importantly, who is not using open data but could benefit from the insights it may generate. By identifying, prioritizing, segmenting, and engaging with the actual and future demand for open data in a systemic and systematic way, practitioners can ensure that open data is more targeted.
We know that we cannot simply focus on releasing open data, nor can we build a portal without understanding its possible uses and demand. Yet, we often do just that. Understanding and meeting the demand for open data can increase overall impact and return on investment of public funds.

The GovLab, in partnership with the Inter-American Development Bank, and with the support of the French Development Agencydeveloped the Open Data Demand and Assessment Methodology (Beta) to provide open data policymakers and practitioners with an approach for identifying, segmenting, and engaging with demand. This process specifically seeks to empower data champions within public agencies who want to improve their data’s ability to improve people’s lives….(More)”.

GovLab Blog GovLab Digest

Where and when AI and CI meet: exploring the intersection of artificial and collective intelligence toward the goal of innovating how we govern

Stefaan Verhulst in the Journal AI and Society: “This paper seeks to explore the intersection of Artificial Intelligence (AI) and Collective Intelligence (CI), within the context of innovating how we govern. It starts from the premise that advances in technology provide policy makers with two important new assets: data and connected people. The application of AI and CI allows them to leverage these assets toward solving public problems. Yet both AI and CI have serious challenges that may limit their value within a governance context, including biases embedded in datasets and algorithms, undermining trust in AI; and high transaction costs to manage people’s engagement limiting CI to scale.

The main argument of this paper is that some of the challenges of AI and CI can in fact be addressed through greater interaction of CI and AI. In particular, the paper argues for:

  • Augmented Collective Intelligence where AI may enable CI to scale;

  • Human-Driven Artificial Intelligence where CI may humanize AI.

Several real-world examples are provided throughout the paper to illustrate emerging trends toward both types of intelligence; and their applications to solve public problems or make policy decision differently….(More)”.

GovLab Blog GovLab Digest

The Social Dynamics of Open Data

Social-Dynamics-of-Open-Data-Cover-275x444New (Open Access) Book by Francois van Schalkwyk, Stefaan Verhulst, Gustavo Magalhaes, Juan Pane & Johanna Walker (eds): “The Social Dynamics of Open Data is a collection of peer reviewed papers presented at the 2nd Open Data Research Symposium (ODRS) held in Madrid, Spain, on 5 October 2016.
Research is critical to developing a more rigorous and fine-combed analysis not only of why open data is valuable, but how it is valuable and under what specific conditions. The objective of the Open Data Research Symposium and the subsequent collection of chapters published here is to build such a stronger evidence base. This base is essential to understanding what open data’s impacts have been to date, and how positive impacts can be enabled and amplified.
Consequently, common to the majority of chapters in this collection is the attempt by the authors to draw on existing scientific theories, and to apply them to open data to better explain the socially embedded dynamics that account for open data’s successes and failures in contributing to a more equitable and just society….(More)”

GovLab Blog GovLab Digest

Blockchange: Blockchain Technologies for Social Change

The GovLab is pleased to announce the launch of a new initiative seeking to assess the potential of blockchain technologies for social change, or “Blockchange.” With support from the Rockefeller Foundation, we will seek to:

  1. Map Blockchange practices as it relates to providing trusted digital identity, including an assessment of enabling challenges and risks;
  2. Identify and assess Blockchange demand, capturing needs, challenges and opportunities; and
  3. Develop a set of evidence-based design principles that can guide the further development and use of blockchain technologies for social change.

The accompanying website for the project,, will serve as an initial platform to solicit input and share our findings with technologists, policy makers and other interested parties.
The new site also aims to become a dynamic hub for the use of blockchain technologies for social change, featuring a repository of Blockchange initiatives from around the globe and useful resources for those working in the space, like selected readings focused on using blockchain for identity and blockchain for transforming governance.
“We’re excited to explore whether blockchain technologies for social change can move the needle to become a transformative force in managing identity, and go even further to help solve other social challenges from development to governance,” said Stefaan G. Verhulst, co-founder and chief research and development officer of The GovLab, and lead investigator of the project. “With the new platform, we can translate learnings into impactful solutions.”
If you’re a researcher and would like to collaborate on Blockchange, please contact us here. To learn more about The GovLab, please visit

GovLab Blog GovLab Digest

Smarter Crowdsourcing Against Corruption

“Introducing the Smarter Crowdsourcing Against Corruption Initiative,” reposted from The Smarter Crowdsourcing Blog by Dinorah Cantu and Beth Simone Noveck
The GovLab: “To identify and implement innovative approaches for fighting corruption, we at The Governance Lab (GovLab) are partnering with Mexico’s Secretaría de la Función Pública (Secretariat of the Civil Service) and the Inter-American Development Bank
One in every three times a Mexican citizen interacts with government a bribe is paid (read more here or here). The real cost of such a problem goes beyond the billions of diverted taxpayer pesos. It also hinders the delivery of essential government services, harms public safety, and reduces public trust in government. In a recent survey, corruption was named as the second most relevant problem in Mexico behind only crime and ahead of unemployment.

In 2016, the challenge of corruption spurred an unprecedented legal reform process, driven by civil society. The passage of the National Anti-Corruption System calls for reforms across the federal government. The new legal framework –which has been widely heralded– creates, for example, a specialized court on corruption crimes and it expands and improves the ethics obligations of public servants. Although the Sistema Nacional Anticorrupción (National Anti-Corruption System) has propelled Mexico to global leadership in the reach and strength of its anti-corruption laws, most of the battle still lies ahead, as government agencies, the judiciary, and civil society put this law into practice…

We are applying the Smarter Crowdsourcing methodology to the pressing challenge of corruption in an effort to help Mexico rapidly identify practical reform strategies that have worked elsewhere. The goal is to harness the momentum created by the passage of the National Anti-Corruption System to go beyond legal principles and, in addition, to implement new practices…

The method we employ marries the agility and diversity of crowdsourcing (also called “open innovation”) with curation to target those with relevant know-how and bring them together in a format designed to produce effective and implementable outcomes.

This more targeted form of crowdsourcing, which quickly matches the demand for expertise to the supply of it, is what we call “smarter crowdsourcing.”…

This model has five phases, as outlined below (figure 1):

First, we break a big problem down into a set of specific, core challenges that need to be addressed.

Second, we work with our government partners to conduct background research on each challenges and ensure that we understand its root causes and, particularly, how those manifest themselves in each context.

Third, we solicit the participation of leading experts to address these core challenges. We both put out an open call for volunteers and hand-select the list of guests who can contribute most to helping governments to identify practical solutions.

Next, we hold online conferences on each challenge to identify potential innovative approaches to solving them.

Finally, in order to enable implementation of what is learned during the conferences, we complement the online dialogues with research and write up detailed implementation roadmaps in order that our partners can put the best ideas into action quickly….(More)”

GovLab Blog GovLab Digest

How social media data can improve people’s lives – if used responsibly

Stefaan Verhulst (et al) in the Conversation: “In January 2015, heavy rains triggered unprecedented floods in Malawi. Over the next five weeks, the floods displaced more than 230,000 people and damaged over 64,000 hectares of land.
Almost half the country was labelled a “disaster zone” by Malawi’s government. And as the humanitarian crisis unfolded, relief agencies, such as the Red Cross were faced with the daunting task of allocating aid and resources to places that were virtually unrecorded by the country’s mapping data, and thus rendered almost invisible.
Humanitarian workers struggled to navigate in many of the most affected areas, and one result was that aid did not necessarily reach those most in need.
To prevent similar knowledge gaps in the future, researchers, volunteers and humanitarian workers in Malawi and elsewhere, have turned to an unlikely partner: Facebook.
In 2016, as part of its “Missing Maps” project, the Red Cross accessed Facebook’s rich population density data to find and map people who were critically vulnerable to natural disasters and health emergencies, but remained unrecorded in existing maps.
During local Mapping Parties, volunteers in Malawi used Facebook’s satellite and population data, in addition to other satellite imagery, to trace roads, houses, and water points across Malawi’s communities.
Two years later, Missing Maps in collaboration with Facebook has identified more than 2 million people in Malawi, allowing aid and relief organisations to better plan projects in Malawi’s disaster prone areas.
Disasters kill nearly 100,000 and affect or displace 200 million people annually. As climate change is expected to increase the frequency and severity of disasters in the near future, leveraging social media data, crowd-sourcing and other means will only become more important.

The potential of data collaboratives

The Malawi partnership is just one manifestation of the concept of data collaboratives. We have defined this as a new form of collaboration beyond the public-private partnership model, in which participants from different sectors  —  including private companies, research institutions, and government agencies  —  can exchange data to help solve public problems.
While such collaboratives are emerging in a number of sectors and areas, the Malawi case is an example of a particular kind of collaborative. It’s what we might call a social media data collaborative.
While much attention has been paid to the impact of social media on politics, much value can be generated from social media data for governing as well, but only when done responsibly….
All of these initiatives are promising, but it is not yet clear that they add up to a comprehensive data responsibility framework or decision tree enabling new ways of working. Such a framework could provide data stewards the means to assess the public value of social media data as well as the risks and harms of sharing it. It could also suggest ways to adequately mitigate this risk.
What’s more, it might help achieve the necessary balance between the benefits and risks of sharing, and ensure that the vast amounts of data being generated by the public every second are ultimately used for the greater good.
More specifically, a generally accepted responsibility framework can help accelerate the emergence of new, innovative data collaboratives, and maximise their potential….(More)”.

GovLab Blog GovLab Digest

Smarter Health: Boosting Analytical Capacity in Healthcare

Screen Shot 2017-02-08 at 6.08.18 PMReport by Beth Noveck, Stefaan Verhulst, Andrew Young, Maria Hermosilla, Anirudh Dinesh  and Juliet McMurren: “Public institutions such as the National Health Service in England increasingly want—and are expected—to base their actions on nationally agreed standards rather than anecdote. The collection and analysis of data, when done responsibly and in a trusted manner, has the potential to improve treatment and drive towards value, both social and economic, in healthcare.
However, the goal of using data to improve the NHS and social care is hampered by a “talent gap” – a lack of personnel with data analytical skills – that stands in the way of uncover- ing the rich insights expected to reside in the NHS’ own data. The NHS is not unique among public and even private sector institutions who are struggling to identify, hire and retain people with data science skills, and, above all, with the ability to apply new technological tactics to advancing the institution’s priorities….

Informed by both a literature review and analysis as well as over fifty interviews with NHS and other experts, this paper offers a multiplicity of proposed recommendations for meeting the data analytic talent gap and achieving greater institutional readiness without full-time hiring. …
These recommendations fall into four categories:

  • Using new technology to coordinate distributed talent already present in the NHS, including project marketplaces.
  • Using new technology such as talent banks and skill finders, to find talent hiding in plain sight—namely those with the relevant skills but who are not classed as analysts and match them to projects.
  • Using expert networks to connect with empirical expertise outside the NHS.
  • Creating cost effective incentives to bring talent in from outside, including prize-backed challenges and foundation-funded fellowships…(More)”
GovLab Blog GovLab Digest

Data Justice Network

Press Release: “The Governance Lab (The GovLab) at the NYU Tandon School of Engineering has launched the Data Justice Network (  The website fosters peer-to-peer learning among criminal justice practitioners and policymakers and helps officials get fast and comprehensive answers to their questions about how to make better use of data to reduce incarceration and crime.
Built by the GovLab with support from the Laura and John Arnold Foundation(LJAF) and in collaboration with The Justice Management Institute, the website was designed for practitioners by practitioners to ensure that the platform is both useful and simple. …
“Criminal justice data are collected by multiple agencies, stored in different formats, and maintained in various systems,” LJAF Vice President of Criminal Justice Matt Alsdorf explained. “The lack of data coordination makes it difficult for jurisdictions to analyze information and evaluate the effect of their local criminal justice policies. We are pleased to support the Data Justice Network and believe that it can help to address this issue and make it easier for communities to use data and predictive analytics to safely reduce their jail populations.”
On the website, criminal justice practitioners can search for colleagues with relevant experience, ask and answer questions, and track their own knowledge of innovative ways of using data at every stage of the criminal justice process….
With Data Justice Network, participants can easily get help and advice from those with experience to explore issues such as

  • How can data be used to create a better post-arrest diversion process for the mentally ill and reduce time spent in jail?
  • How can better data be collected about the number of mentally ill or substance abusers in county jails?
  • What is the best way to develop algorithms to predict super-utilizers of the criminal justice system?…(More)”