From ~30 expert interviews and online submissions we broke down this large problem into ~90 mini-challenges for democratizing data. Each ‘node’ to the right is one of those challenges.
We made a conscious effort to keep these challenges at a consistent level of resolution focused on goals rather than specific solutions. We used targeted follow-up interviews to identify gaps in the list, and continue to solicit new submissions from ‘the crowd’. You can help us identify those gaps by participating
THIS IS THE FIRST CROWD-MAPPED NETWORK STRUCTURE OF A COMPLEX PROBLEM!
Each node was identified by at least one expert as a critical challenge for democratizing data, and each link is at least one human brain saying, “If this problem improves, it has a strong direct influence on this other problem.”
We called on a community of experts to identify strong relationships among all possible pairs of challenges. Using an online tool, the MAPPR, they evaluated, for each challenge, if it improves, does it improve others? or make them worse?
For this first phase, about 40 people participated resulting in >6,550 link votes, to create a “Full” web of 3,842 links, or a “Consensus” web of 1,786 links. We analyzed both webs to identify results that are robust to this uncertainty in the crowd-defined network. In each, ~93% of the links were ‘positive’ relationships - meaning that solving one problem helps solve another (rather than making it worse).
HIGHLY CONNECTED HUBS…
have broad direct reach to almost the entire network. They are like ‘JFK’ or ‘Chicago O’Hare’ airports with ‘direct flights’ to or from the vast majority of other challenges.
MARKETPLACE FOR DATA ANALYTICS AND SERVICES
developing a vibrant marketplace for data analytics and services has many direct positive influences on other challenges…
MARKETPLACE FOR DATA ANALYTICS AND SERVICES
…and these marketplaces directly benefit from improvement in many other areas. These hubs may be ‘auto-catalytic’ challenges that are naturally improved by current market forces.
AVERAGE PATH LENGTH is the average shortest path, or number of ‘hops’, from that node to every other node in the network.
CONNECTIVITY is the total number of incoming and outgoing links.
HIGHLY ASYMMETRIC HUBS
are challenges that, if solved, have a strong direct influence on many problems, but are weakly influenced by few.
The yellow and orange nodes are among the top ~25%. The yellow nodes were identical independent of whether we included every single link vote (the ‘Full’ network), or whether we removed over 50% of the links for the ‘Consensus’ network. The orange nodes were less robust to this uncertainty in network structure.
Tools to Anonymize Personal Data
the ability to anonymize personal data has a positive direct influence on many other challenges.
Consider a citizen journalist witnessing government violence in the street; if they have easy access to tools to anonymize their personal identity, they are more likely to document and share that event, and it’s more likely to catalyze an entire community of citizen reporters while protecting them from the threat of surveillance, imprisonment, or death.
Tools to Anonymize Personal Data
…but this challenge only weakly benefits from improvement in a few other challenges.
One reason why these types of privacy tools may be 'under-nourished' is that there are not clear short term market incentives for helping people anonymize their personal data... Even though addressing this challenge could, in the long run, catalyze new economies around data sharing.
TOP ASYMMETRIC HUBS ARE POTENTIAL ‘UNDER-NOURISHED CATALYSTS’
These are challenges that if solved could help solve many, but have comparatively few things weakly helping them.
We only included the yellow nodes that were the same in both the ‘full’ and ‘consensus’ networks. In other words, these results were robust to removing over half the links that had few votes per views.
is the relative difference in median strength of all outgoing vs incoming links – (Log (median strength incoming links / median strength outdoing links))
Link thickness reflects link ‘strength’, defined here as the number of votes for a ‘strong direct link’ divided by the number of people who viewed that node pair. (e.g., if 10 people looked at a node pair and 5 drew a link, strength = 0.5)
Tools to Anonymize Personal Data
For example, the ability to anonymize our personal data has a positive direct influence on many other challenges – like the number of people who will voluntarily share personal data, and the ability to catalyze a critical mass of community engagement. For example, if many people having asthma attacks voluntarily share those events with the public, the broad patterns could help identify geographic outbreaks related to air quality conditions. More people would participate if they knew their personal identify was stripped from the event so they would not be at risk of being denied health care coverage.
TOP ASYMMETRIC HUBS… SELF ORGANIZED
We used a force-directed layout to cluster those challenges with high asymmetric centrality into a few groups that are closely connected and conceptually similar.
Note that these clusters are similar regardless of whether we consider the ‘Full’ or ‘Consensus’ networks. The only difference is in the 2 orange challenges we subjectively classified under ‘Platform Openness’ because they were conceptually distinct from the rest.
From the original ~90 challenges identified by experts, 4 potential grand challenge areas emerge. These were defined by a community, and they can evolve with more input.
WE ARE DATA
How can #WeTheData benefit from (and avoid being harmed by) the explosion of data we generate everyday?
We used an ecological network approach, developed by Vibrant Data Labs, to make sense of this messy problem and identify Grand Challenges for catalyzing positive change.
“A problem well defined is a problem half-solved.”
– John Dewey
“We're mapping collective understanding of the problem and using the network structure to spark creative solutions where they're most needed.”
-Eric Berlow Ph.D.Ecologist | Complexity Scientist | Founder -
Vibrant Data Labs
Human Health and Wellness
Civil and Political Rights
Science Education, and Human Knowledge
ability to accrue personal value from offering personal data
flexible allocation of costs
ability to be locally relevant
ease with which micro-entrepreneurs can enter the market
ability to protect against malicious uses
increased efficiency and effectiveness of public services
visibility of small success stories and examples of value from data
direct utility of open data to those providing it
direct utility of tools and platforms to those using them
ability to detect/self-correct unintended consequences of Vdat
transparency and accountability of large institutions
ability of everyday people to monitize dormant skills and assets
development of a marketplace for data, analytics, and other data services
transparency and accountability of information providers
degree of platform openness (copy and modify)
reputation system that engenders trust among participants
system of rewards for participation that are sensitive to context
degree to which the participants have a shared problem
ease of editing / adding to a dataset
ability of tools to be highly customizable
proportion of the population motivated by social rewards
ability to convert data into action
ability to collaboratively co-create
ability to collaboratively analyze and share insights about data
ability to make informed decisions based on data
ability to collaboratively improve platforms
formation of communities of shared interest or action around data
ability to catalyze a critical mass of community engagement
proportion of the population that is functionally data literate
ability to easily manipulate data granularity
reduction in cost of computation
computational power of small devices
ability to use data to make predictions
proportion of public that can critically evaluate conclusions drawn from data
ability to easily find pattern across multiple data streams
ability to intuitively explore and answer questions with data
proportion of the public using data to inform daily life decisions
ability to fact check online information
ease of access to novel statistical methods of pattern discovery
ability to see broad trends and place ourselves in context
ability to check and validate data quality
total number of people offering data viz and analysis tools/services
availability of automatic language translation tools
degree which UI design enables participation by diverse groups
degree to which underlying technologies and data are invisible
ease to users of managing personal data access permissions
legal/policy framework for personal digital rights management
legal framework for accessing/sharing copyright protected data
proportion of data from large institutions that is accessible
concentration of data and access to data in few corporations
tools to anonymize sensitive data
ability to control data access permissions
proportion of population with access to information infrastructure
wireless connectivity and bandwidth
accessibility of stored open data
ability to integrate info from multiple sources (e.g. spatial data in real time)
tools for seeing our own 'personal data exhaust'
ease of discovering datasets
number of people with easy access to data
proportion of real time data accessible by mobile phone
proportion of online info and media protected from censorship
ability to see under the hood of tools to avoid oversimplification
inter-connectivity of mobile apps
incentives for large institutions to open up data for social good
degree to which large institutions don't have anything to hide
ease with which everyday people can share data
ease of adding to / modifying metadata
more coherent data standards
reduction in cost of networking and storage
data self description
ability to automate inter-operabiilty of different datasets
accuracy and reliability of real time info with respect to purpose
ability to collaboratively clean and filter data
ability to share edit and update public data
ability and ease of cleaning of data
level of clarity, simplicity, and utility of data created by sensors
availability of suitable sensors
amount of personal data voluntarily shared
ability of people and institutions to be real time sensor networks
total number people contributing data
ease with which everyday people can collect/ create data
Reaching people across geographical, cultural and linguistic boundaries requires translation. For example, a large proportion of basic information on the web is not accessible to Arab speakers. Sharing and leveraging open data will be limited if there are not tools that enable translation. Beyond the direct and obvious linguistic translation, this may include cultural differences, such as what units are used on data.
For disparate data sets to be able to “talk to each other”, to be used in the same analysis, and be used to drive new discovery, they must be combinable. That requires standardization. Data standards have traditionally been very difficult to implement because people are always coming up with new data-types. Open, flexible standards are essential to enable truly Vibrant data.
As basic performance capabilities - computing power, networking, even the power of analytics tools - increase, the world will see a shift in emphasis from simple performance to a better design of that performance to fit the needs of real people. UI design is an important element in making the power of technology fit with the ways of learning, the language, the visual skills and other characteristics of real people.
Refers to the ability of people who understand enough about data and analysis to derive real value from it. Finding patterns and meaning in data will be simplified with the development of new analysis and visualization tools. The proportion of people who are “data literate” will drive the rate of development of tools that make it easier for people without a lot of specialized training to analyze data for meaning.
Some may want to explore all that is possible with a given tool, while others may want to just use the basics. Design plays a key role in enabling novices to immediately understand the use of a given technology or data set, while enabling deeper explorations over time. By artfully hiding or exposing details and functionality, designers will enable more people to share and use data in ways that suit them.
Tools that enable the manipulation of data granularity enable people with different skills and interests to look at common data sets in their own way. Sometimes, it is important to dig into the details. Sometimes, it is more appropriate to just have a “bird's eye view”. The more our data analysis tools can accommodate both ways of looking (and those in between) the more likely that more people will find value in them.
Another issue refers to the importance of putting flexible tools for assigning data access permissions in the hands of people, not large institutions (reference “ability to control data access permissions”). These tools must involve a minimum of effort and management overhead on the part of individuals and be largely automated, otherwise they will be too complicated to be useful.
Many sensors today gather data in ways that are either proprietary or idiosyncratic. We need new methods for sensor data to be made more easily accessible and understandable from the very moment it is first created. This may be through improvements of devices themselves or through methods that automatically format and make available data by way of post-processing.
Openness here is defined as free to copy and change. Open source has been a major enabler for many organizations to collaborate, share and leverage expertise. It enables innovation and local cultural relevancy people adapt tools and technologies to local needs. Similarly- by allowing the open sharing of data, and (appropriate) modification of information such as meta-tags, the utility of a database can be improved tremendously.
People will not be willing to openly share their personal data unless they have assurances that their privacy will be protected. Current privacy agreements - implemented by corporations and institutions - do little to assure people of the protection of their privacy. Legal and policy frameworks that return control over privacy to individuals, along with technology tools to help individuals manage those protections, will be vital for data vibrancy.
Tight legal restrictions limit peoples ability to work with and build on the data of others. Too often, such restrictions favor powerful enterprises, who “lock down” access. Without any restrictions, however, people may not be willing to share personal data for fear of violations of privacy. Legal frameworks require a balance between openness and individual protections, without those protections becoming a tool for abuse by powerful interests.
While data is much more easily shared, laws, institutional power, access to technology and other resources can create barriers to openness and transparency. Even in highly democratic societies, large firms and institutions have the means to gather information about individuals (for advertising, or surveillance). Real data vibrancy will only occur when this one-sided approach to data gathering and possession gives way to much broader circulation of data.
People who remotely collaborate need tools to assess the trustworthiness of others they share data with. Reputation systems have been developing online, and many firms are working on generalized reputation scores that apply across domains. But beyond simple scores, people will need other tools, including assurances about recourse in the event that trust is breached, in order to feel safe sharing information broadly.
Large, consumer-facing companies guard consumer data they collect. On the one hand, it means that corporations might protect those data from abuse. On the other hand, this hoarding of data reduces the chances for people to discover unexpected new meaning and value.
Tight legal restrictions limit peoples’ ability to work with and build on the data of others. Too often, such restrictions favor powerful enterprises, who “lock down” access. Without any restrictions, however, people may not be willing to share personal data for fear of violations of privacy. Legal frameworks require a balance between openness and individual protections, without those protections becoming a tool for abuse by powerful interests.
Metadata is information about data - descriptions of context, characterizations, labels, all ways that help people know what a data set or element is, and how it might be used. Metadata is a key component to make data sets more compatible and useful across boundaries. Therefore, we need open systems that allow metadata to be modified appropriately.
As data access becomes more open, people will need the means of setting personal data permissions. Current data permissions systems (e.g., those provided by online services or corporations with regards to personal data) are not nearly flexible or powerful enough, and generally favor the corporation as “owner” of the data. This will obviously have to change for people to willingly share and circulate their data in a more open way.
Most smart phones have many sensors on them - GPS, accelerometers, cameras, microphones, etc. A more vibrant ecology of data creation will depend new types of affordable mobile sensors to collect data in new ways - for instance, after the 2011 earthquake in Japan, small mobile radiation sensors deployed by citizens were critical to mapping radiation plumes around the Fukushima power plant.
Since the late 1960s computing performance has continuously doubled in power every 18 months, while the cost per unit of computing has dropped precipitously. This is a factor that will continue to fuel the growth of open and vibrant data exchange, as more and more people can ultimately afford to access computing power - and thus, digital technology.
Network access in more remote - and usually poorer - parts of the world remains an acute issue. As with the cost of computing, digital storage technologies continue to decline in price and increase in power. Solid state storage devices promise to provide a more robust option for small devices, while new, open standards for servers have created a proliferation of storage “in the cloud”.
Since the late 1960s computing performance has continuously doubled in power every 18 months, while the cost per unit of computing has dropped precipitously. This is a factor that will continue to fuel the growth of open and vibrant data exchange, as more and more small devices become increasingly powerful computers. Right now, an average smartphone has more computing power within it than was used in the entire Apollo moon landing.
For people to openly exchange data and build on each others' insights, a very basic ingredient is simply access to technology and digital data. “Cloud” based storage, smart phones, wireless networking technologies and many other factors have made this a reality for an increasing number of people, but remote and poorer regions of the world still lack that basic access. As access increases, so does vibrancy.
For vdat to become real wireless connectivity needs to be extended to the underserved people of the world. Amid the Arab Spring uprisings governments cut off carrier services to prevent people from organizing. New tools, like peer to peer connectivity of mobile devices could be one way of ensuring connectivity for all.
THE ABILITY TO COPY, MODIFY, CUSTOMIZE AND IMPROVE WHILE PREVENTING ABUSE.
Open is key to accelerating transparency and innovation in the world. Like passing the baton to the next runner in a relay, each developer accessing an open platform will race to create a new interface – they’ll edit and modify. By customizing for their own needs, they’ll improve the platform for others. Information is vibrant power when it is open to being creatively re-mixed. For the value of Openness to be fully realized, we also need to solve the challenges of Access, Trust, and Literacy.
THE ABILITY TO INTUITIVELY ANSWER QUESTIONS WITH DATA AND CRITICALLY INTERPRET THE ANSWERS.
Empowering non-experts to easily mine data for meaningful value is critical to enabling broad participation in this new data economy. Even if we have more Access, and Openness with Digital Trust leads to more sharing, it can only produce value in our everyday lives if non-experts can read, interpret, and think critically about the information within. Teachers, designers, artists, technologists, sociologists - we are calling on every field to nurture Functional Data Literacy.
THE TECHNICAL ABILITY FOR THE UNDERSERVED TO ACCESS, NETWORK AND STORE.
Access is the foundation. Key to democratizing our data ecosystem is ensuring the underserved have access to more powerful, affordable, connected mobile devices, sensors, and storage which allow them not just to consume information, but to participate in producing real value in their lives. While technological access is a necessary foundation, it is not sufficient. Trust, Literacy, and Openness are critical for this technical Access to create value.
THE ABILITY TO CONTROL OUR PERSONAL DATA ‘EXHAUST’ AND BUILD REPUTATION AND ACCOUNTABILITY.
In digital we do not trust. We are not in control of, nor have a right to, much of our own data. This collective distrust, from widespread tracking of personal data, is stifling the rise of a new economic ecosystem of sharing. Trust and accountability will not only prevent civil rights abuses, but will catalyze the new personal data economy. For this potential to be fully realized, we also need Literacy, Openness, and broader Access.