GovLab Blog

White House launches Data-Driven Justice Initiative in partnership with NYU's The GovLab

Reposted from NYU Tandon School of Engineering
The School’s GovLab Is Helping Stakeholders Address Recidivism, Bail, Juveniles, and More
With more than 2 million people incarcerated in the nation’s prisons – a marked increase over previous decades – there is bipartisan support for improving public safety while reducing incarceration and making the criminal justice system fairer.
A key tool in the movement for criminal justice reform is the ability to use data to understand where problems lie and to develop and test new solutions. The Governance Lab (GovLab) at the NYU Tandon School of Engineering is working closely with the White House to encourage responsible data collection, sharing, and usage among criminal justice, mental health and other administrative agencies, thus maximizing the opportunity for evidence-based policymaking and reform while safeguarding civil liberties.
In support of the White House Data Driven Justice Initiative, the GovLab is running Data Driven Criminal Justice Innovation Projects, the first in a series of training and coaching programs designed to help criminal justice practitioners working on data-related criminal justice innovation projects take their work closer to implementation and scale them larger, in order to improve people’s lives.
In June, the GovLab launched a first-of-its-kind 10-week program working with 20 teams from 16 cities to give them the help they need to take a data-driven reform project from idea to implementation. Participants are primarily mid-career professionals who hold management and leadership roles in federal, state and local government agencies as well as nonprofits and who are in a position to make change happen in their communities.
Screen Shot 2016-06-30 at 3.35.29 PM
Funded by the Laura and John Arnold Foundation, participant’s projects are aimed at:

  • Mitigating the impact of bail
  • Reducing recidivism
  • Developing better programs for those with mental health and substance use disorders
  • Identifying super-utilizers, or the frequently incarcerated
  • Engaging in more effective planning and coordination among criminal justice and other social service agencies

“The people we have the honor to work with are the true heroes – the boots-on-the-ground reformers working day to day to reduce jail populations, ensure that the mentally ill receive treatment instead of jail time, and help decision-makers get a better picture of who is in the system and why,” said Beth Simone Noveck, the Jerry M. Hultin Professor at NYU Tandon and Director of the GovLab.
The Data-Driven Criminal Justice Projects Coaching Program provides participants skills-based coaching and expert mentoring. “Peer to peer learning is one of the most exciting aspects of the coaching programs,” says GovLab Legal Fellow Ana Tovar. “By using technology to bring people together facing common challenges, they learn as much from helping one another as they do from the dozens of international experts we get to advise them.”
The coaching program is part of the GovLab’s broader training efforts known as the GovLab Academy aimed at helping public and civic innovators to become more effective, and part of NYU Tandon’s overarching mission to place technology in service to society.
‎“Digital data are being generated at an unprecedented rate,” Dean Katepalli Sreenivasan said. “And at Tandon we are finding ways to harness data to make our cities more livable, our families healthier, and our planet greener. The White House has recognized that data can also be used in the drive to make our justice system more equitable and efficient, and we are honored that the GovLab has been called upon for its expertise.”
The NYU Tandon School of Engineering dates to 1854, when the NYU School of Civil Engineering and Architecture as well as the Brooklyn Collegiate and Polytechnic Institute (widely known as Brooklyn Poly) were founded. Their successor institutions merged in January 2014 to create a comprehensive school of education and research in engineering and applied sciences, rooted in a tradition of invention, innovation and entrepreneurship. In addition to programs at its main campus in downtown Brooklyn, it is closely connected to engineering programs in NYU Abu Dhabi and NYU Shanghai, and it operates business incubators in downtown Manhattan and Brooklyn. For more information, visit
The GovLab is an action-research center with a mission to improve people’s lives by changing the way we govern. We aim to accomplish this by leveraging advances in technology to enable more open, collaborative, effective, and legitimate ways to make better decisions and solve public problems. For more information, visit

GovLab Blog

Creating Simpler, More Effective Government Services for Everyone

The GovLab, Pivotal and the 21st Century Democracy and Technology Meetup recently hosted a talk by Jen Pahlka, founder and executive director of Code for America and former U.S. Deputy Chief Technology Officer, on deploying simpler, more effective government services for everyone. Pahlka shared insights gained from her time leading efforts to improve government services from the outside, through the Code for America fellowship program, and from the inside, through efforts like the U.S. Digital Services Playbook. The assembled audience comprised software engineers, policy researchers, design professionals and others looking for new pathways for improving government services and their impacts on citizens’ lives.
Although there is seemingly no shortage of interest in deploying technology in government to improve services, Pahlka argued that for technology to have its intended effects on the public good, government must recognize that technology “is not something you buy, it’s something you do.” In other words, leveraging government technology is a process to design, implement, manage and iterate upon, not a commodity to purchase.   
With that in mind, Pahlka shared a number of key lessons learned across a number of Code for America initiatives – from a crowdsourced map for identifying blight in New Orleans to an efforts to reduce bench warrants in Atlanta to a more innovative food stamp program in California – as well as her time in the U.S. government.

Key Takeaways

Move from Apps to Ops

  • Although there is no shortage of civic apps that seek to improve public life by working around government, Pahlka emphasized the potential of lightweight technology interventions that work with government to the benefit of all parties involved.  
  • In order for this potential to take hold, though, there needs to be a recognition that technology in government isn’t about the tech itself, “it’s about our ability to govern.” Many of the most transformative uses of technology in government, she argued, are directly tied to behind-the-scenes government processes, improving their sustainability.

Build for What People Need

  • While the impact of technology in government can often be tied to an intervention’s potential to actually improve government operation, Pahlka also highlighted the importance of understanding users, i.e., people, and their needs – especially at this current moment of low public trust in government.
  • This focus on user needs is meant to be front and center in the U.S. Digital Services Playbook, which draws on lessons learned within and outside government to improve the effectiveness of government digital services. The first of the Playbook’s 13 steps is: “Understand what people need.” While seemingly self-evident, this articulation (and its clear prioritization) is instructive. The Playbook notes that, “The needs of people — not constraints of government structures or silos — should inform technical and design decisions.” The checklist that accompanies this guidance features items like: a) “Early in the project, spend time with current and prospective users of the service;” and b) “Create a prioritized list of tasks the user is trying to accomplish, also known as ‘user stories.’”

Always Be Iterative and Agile

  • Throughout Pahlka’s talk, she pointed to examples where the first or second plan of action for addressing a given problem was pivoted away from in favor of a more targeted, simpler, and often cheaper solution. In some cases, this was due to a tendency for government to view procurement, and investing large sums of money in traditional government technology vendors, as the only means for solving problems.
  • Pahlka pointed to examples, like the New Orleans BlightStatus, where relationship-building and direct (i.e., in-person) engagement successfully uncovered implementable solutions to problems, belying the belief that expensive, monolithic tech products are always the best way forward.

Don’t Focus Exclusively on the Most Visible Problems

  • Most of the issues Pahlka discussed touched on the lives of lower-income citizens and vulnerable populations. After telling these stories – from issues with accessing food stamps to arrests following an inability to pay simple traffic tickets – she pointed out that the bungled rollout of was daily frontpage news, helping to lead the charge toward improved agility and usability for the platform. The type of government dysfunction experienced by the poor, however, rarely raises the type of public ire, or, indeed, even public recognition that can force the iteration of broken government services. Her point drove home the fact that in order to “create simpler, more effective government services for everyone,” we cannot simply focus on the visible problems experienced by the middle and upper class.   

For more information on innovating government service delivery, check out the GovLab Digest, Open Governance Research Exchange (OGRX) and GovLab Academy.

GovLab Blog Ideas Lunch

Crowdsourcing a Meeting of Minds: Designing the Future of Work

As far as labor revolutions go, crowdsourcing may not seem like such a groundbreaking concept. But according to Michael Bernstein, assistant professor of computer science at Stanford University, crowdsourcing and computation have the potential to revolutionize the way we work and share skills. Bernstein visited the GovLab this month as part of our Ideas Lunch series to share his research on how expert crowdsourcing can be used to achieve complex and sophisticated projects.
Computers are already having a profound influence on our employment. Researchers estimate that in the future, 20 percent of our workforce could exist online, representative of approximately 45 million workers. This staggering number shows that computers are more than just another tool in our office to improve productivity. Rather, as Bernstein revealed, computers are becoming vast, powerful networks which connect us with others, best seen in apps like Uber.
According to Bernstein, there is great potential locked away in these computerized networks to radically transform how work is performed. Traditionally, crowdsourcing has been used to complete menial, micro-tasks, seen in projects like Amazon’s Mechanical Turk which primarily uses crowdsourced labor for image labeling, data collection and other non-expert tasks. For Bernstein, such an approach neglects the potential of crowdsourcing to achieve complex, interdependent goals by curating crowds of experts.
With fellow researchers at Stanford University, Bernstein investigated whether “flash-teams” of crowdsourced experts could achieve ambitious results, like designing a hi-fi prototype of an app or making a short animation in just one day. By recruiting workers through the website UpWork, and creating a web platform Foundry to manage workflows, Bernstein and his team found that flash-teams were able to achieve goals significantly faster than self-managed teams, with almost 50 percent fewer work hours expended.
Nevertheless, Bernstein pointed out that these flash-teams are limited in what they can do. Flash-teams need pre-defined workflows so that tasks can be delegated and guided, and only small teams can be involved on a single project. For larger, more complex projects, where workflows may evolve or be undefined, flash-teams are unable to deliver sufficient results.
Furthermore, there are considerable ethical challenges to such crowdsourced forms of labor. Research by Bernstein’s colleagues into collective action by crowd workers found that “the technical infrastructure [of crowdsourcing] actively disempowers workers”, and that new forms of computationally-empowered labor collectives are therefore needed to meet the needs of this distinct workforce. But experiments in delivering such a model to connect and spur advocacy among crowd workers, specifically through the web platform Dynamo—where workers could propose ideas, vote on these ideas, and then discuss and mobilize action—revealed some of the shortcomings of computerized crowdsourcing. Particularly, though the web is adept in gathering a vast array of people quickly, it is also just as easy for people to quickly disperse if they lose interest in the cause or encounter an obstacle. Trying to coordinate collective labor actions therefore becomes more difficult than simply providing a space for workers to share and discuss ideas online.
Bernstein’s research into the challenges and benefits of expert crowdsourcing continues to make exciting discoveries. For instance, a current project seeking to crowdsource research participants from across the world suggests that crowdsourcing can even help solve open-ended, messy and large-scale problems. There remains a vast array of untapped possibilities for computerized crowdsourcing to bring workers together to tackle complex and multifaceted problems.

Key Takeaways

  • There are four features afforded by computational crowdsourcing which make crowdsourced Flash Teams effective:
    1. Modularity of crowdsourcing means that team structures can be replicated and scaled across projects;
    2. Elastic work-flows allow tasks and team members to grow and shrink dynamically depending on the evolving needs of the project.
    3. Pipelining allows incomplete results to be passed down the timeline to proceeding workers on a project. This means the entire system adapts to missed deadlines or unexpected changes to a project.
    4. Creation by request means that synthetic teams can be created instantly depending on the project proposed. Tasks are also translated into time dependent ‘strips’ of action divided among team members.
  • Self-managed teams don’t work, often because they are inefficient and poorly coordinated, leading to frustrations among team members.
  • Computational crowdsourcing provides ‘light scaffolding’ enabling workers to be shepherded through tasks and for schedules and files to be shared between members through the workflow.
  • Flash-teams (mean time to finish is 13hr 2min) are significantly faster than self-managed teams (mean time to finish is 23hr47min), p=0.05
  • If crowdsourced Flash Teams are a new form of work collective, there is also a need for new forms of worker counterbalance.

All this allows us to glimpse at what the future of work might look like, and, according to Bernstein, we can expect crowdsourcing to achieve more complex and interdependent goals, to better advocate for pro-social outcomes, and to solve open-ended challenges.

About Michael Bernstein

Michael Bernstein is an Assistant Professor of Computer Science at Stanford University and member of the Human-Computer Interaction group. His research focuses on the design of crowdsourcing and social computing systems. This work has received five Best Paper awards and eleven honorable mentions at premier venues in human-computer interaction and social computing. Michael has been recognized as a Robert N. Noyce Family Faculty Scholar, and awarded the Sloan Fellowship, NSF CAREER award and the George M. Sprowls Award. He holds a bachelor’s degree in Symbolic Systems from Stanford University, and a master’s and Ph.D. in Computer Science from MIT.

GovLab Blog

Call for Questions: Help Shape the Open Data Research Agenda

The 2016 Open Data Research Symposium (#ODRS16) is crowd-sourcing questions that if answered could radically increase our understanding of open data

On October 5th, researchers from around the world will come together at the 2nd Open Data Research Symposium (ODRS) a pre-event to the International Open Data Conference in Madrid, Spain. Similar to last year’s event, ODRS 16 will offer open data researchers an opportunity to reflect critically on the findings of their completed research, and seek to create cohesion within the research community regarding the potential impacts of open data.
Although the ODRS Call for Abstracts closed at the end of May, an invitation is extended to all members of the open data community to help shape the event’s program and highlight areas of particular importance to the field through our active Call for Questions.

Please share the questions that researchers should be asking to increase

our understanding of open data.

You can submit your questions from now until July 1st using this form on the ODRS 16 website, or by using #ODRS16 on Twitter.
The questions posed in the lead up to the event will be used to:

  • Help identify which submitted abstracts explore topics of particular interest to the open data research community;
  • Craft the ODRS program to ensure sessions are targeted at the most pressing questions;
  • Build a collaborative research agenda for the field; and
  • Inform specific research efforts and synergize collaborations during the ODRS.

The ODRS Program Committee looks forward to learn from you all and hopes to see you in Madrid for ODRS 16!

BigData GovLab Academy GovLab Blog

Data Driven Criminal Justice Projects Coaching Program Launches This Week: Online Mentoring to Support Public Interest Entrepreneurs

With a dramatic rise in the numbers in prison, there is bipartisan support – and a major White House push — for improving public safety while reducing incarceration and making our criminal justice system fairer. A key tool in the movement for criminal justice reform is the ability to use data to understand where problems lie and to develop and test new solutions. But a recent survey by the GovLab showed that many of those working in criminal justice, health, mental health and related agencies – though expert in their own fields – often lack the computational skills needed to pursue data-driven reform efforts.
That’s why this week forty-five people from twelve states (Alabama, Arizona, California, Illinois, Iowa, Kentucky, Minnesota, New York, Ohio, Pennsylvania, South Dakota, Virginia, and Washington, DC) and sixteen cities will meet online for the first session of a first-of-its-kind coaching program designed to help those working on data-related criminal justice innovation projects take their work closer to implementation and scale.

Aimed at those practitioners who share a common desire to make greater use of data to understand past performance, improve day-to-day operations, and develop innovative enhancements to the operations of the criminal justice system, the goal of the Data Driven Criminal Justice Projects Coaching Program,, supported by a grant from the Laura and John Arnold Foundation and organized by The GovLab, is to support the work of these public entrepreneurs trying to people’s lives. The twenty teams and individuals participating are among those who are fighting to improve the system and who need help integrating new technology into hidebound bureaucracies or developing approaches for sharing data responsibly. Beth Simone Noveck, NYU Professor and Director of the Governance Lab, will coordinate the program.
Over the next ten weeks, the GovLab, an action research institute based at New York University, will provide skill-based coaching and expert mentoring to support those trying to build a better recidivism risk profile or to develop a process for matching the supply of crisis psychiatric beds to the demand for them to reduce the number of mentally ill people going to jail or trying simply to count how many juveniles in their jurisdiction are sent to juvenile hall versus those who are diverted to other programs.
Projects fall into one of five categories. People are working on strategies for sharing administrative data between agencies and using that data to: 1) mitigate the impact of bail, 2) reduce recidivism, 3) develop better programs for those with mental health and substance use disorders, 4) identify super-utilizers, and 5) engage in more effective planning and coordination among criminal justice and other social service agencies.
Building on the GovLab’s experience with online learning, every project team receives rigorous coaching and personalized feedback designed to help them define the problem they are trying to solve. Up-front diagnosis of impediments to implementation allows the coaches to make introductions to appropriate experts. Finally, frequent opportunities to present their work all are intended to help these public entrepreneurs— passionate and innovative people who wish to take advantage of new technology like big data to do good in the world—to advance their projects.
The participants, who will meet in large and small groups over the course of the summer to workshop their projects, are primarily mid-career professionals who hold management and leadership roles in federal, state and local government agencies as well as nonprofits and who are in a position to make change happen in their communities. Success will be measured, not by the number of people in the program, but by the eventual impact on the lives these admirable public servants are trying to improve.

Data and its uses for Governance Data Collaboratives GovLab Blog Selected Readings

The GovLab Selected Readings on Data Collaboratives (Updated and Expanded)

By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young
As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. In this edition, we explore the literature on Data Collaboratives. To suggest additional readings on this or any other topic, please email
The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors  (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.
Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)
Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

* Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process. 
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from:

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from:

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from:

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

* Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

* Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
    • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504.

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

* Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016.

  • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
    • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from:

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. 

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

* Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from:

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

* Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from:

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17.

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from:

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

* Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

* Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

* Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum.

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.
GovLab Blog

Open Governance Research Exchange (OGRX) Announces New Partners and New Publications

Launched in April, the Open Governance Research Exchange (OGRX) seeks to improve the evidence base on how to change the way we solve problems and make decisions by curating and making accessible a diversity of findings on innovations in governance. We are delighted to share some recent updates:
CONTENT: The site has since expanded substantially with dozens of new publications added and shared across topic areas like behavioral science and nudges, citizen engagement and crowdsourcing, and open data.
Those interested in sharing their own research can do so using this submission form.
PARTNERS: We are also pleased to announce new partnerships with five organizations working at the forefront of governance innovation: Centre for Public Impact, Open Evidence, Digital Commons Lab, Open Data for Development and the Swedish Law and Informatics Research Institute. They join existing organizations collaborating on building OGRX into a common resource including the Centre for Innovation at Leiden University, Arizona State University’s Center for Policy Informatics, MacArthur Foundation Research Network on Opening Governance and Research Consortium on the Impact of Open Government.
Benefits of partnership on OGRX include dedicated organization research pages, as well as a voice in the continued development of the platform to the end of collaboratively making this resource for the field as useful as possible.
If you are interested in joining this effort to build a research and evidence base of means of public problem-solving, please reach out to OGRX editor-in-chief Andrew Young ( to explore partnership opportunities.