logo

Follow Us

#WETHEDATA For the People, By the People #WETHEDATA For the People, By the People!

  • Home
  • About
    • WHY
    • WHAT
    • WHO
    • HOW
  • Video
  • Blog
  • Events
  • Participate
    • TELL
    • CREATE
    • MAP
4 grand challenges

THE GRAND CHALLENGES

With every click, tweet, and text we collectively generate a torrent of data that is becoming one of the most abundant and valuable resources on the planet - but for whom? We asked a crowd of experts, "How can we make our data work for us and not against?" These Grand Challenge areas emerged from a network analysis of that collective input. If solved together, we believe they will catalyze a new data economy – for the people, by the people. Join us in answering the challenges and adding your voice to the global conversation: We The Data!

More on the Research
Consensus Results
Excludes links where (#votes/#views) <20%.
Full Results
Includes all links no matter how few votes.

From ~30 expert interviews and online submissions we broke down this large problem into ~90 mini-challenges for democratizing data. Each ‘node’ to the right is one of those challenges.

read more…

We made a conscious effort to keep these challenges at a consistent level of resolution focused on goals rather than specific solutions. We used targeted follow-up interviews to identify gaps in the list, and continue to solicit new submissions from ‘the crowd’. You can help us identify those gaps by participating

THIS IS THE FIRST CROWD-MAPPED NETWORK STRUCTURE OF A COMPLEX PROBLEM!

Each node was identified by at least one expert as a critical challenge for democratizing data, and each link is at least one human brain saying, “If this problem improves, it has a strong direct influence on this other problem.”

read more…

We called on a community of experts to identify strong relationships among all possible pairs of challenges. Using an online tool, the MAPPR, they evaluated, for each challenge, if it improves, does it improve others? or make them worse?

For this first phase, about 40 people participated resulting in >6,550 link votes, to create a “Full” web of 3,842 links, or a “Consensus” web of 1,786 links. We analyzed both webs to identify results that are robust to this uncertainty in the crowd-defined network. In each, ~93% of the links were ‘positive’ relationships - meaning that solving one problem helps solve another (rather than making it worse).

HIGHLY CONNECTED HUBS…

have broad direct reach to almost the entire network. They are like ‘JFK’ or ‘Chicago O’Hare’ airports with ‘direct flights’ to or from the vast majority of other challenges.

FOR EXAMPLE,

MARKETPLACE FOR DATA ANALYTICS AND SERVICES

developing a vibrant marketplace for data analytics and services has many direct positive influences on other challenges…

FOR EXAMPLE,

MARKETPLACE FOR DATA ANALYTICS AND SERVICES

…and these marketplaces directly benefit from improvement in many other areas. These hubs may be ‘auto-catalytic’ challenges that are naturally improved by current market forces.

AVERAGE PATH LENGTH is the average shortest path, or number of ‘hops’, from that node to every other node in the network.

CONNECTIVITY is the total number of incoming and outgoing links.

HIGHLY ASYMMETRIC HUBS

are challenges that, if solved, have a strong direct influence on many problems, but are weakly influenced by few.

read more…

The yellow and orange nodes are among the top ~25%. The yellow nodes were identical independent of whether we included every single link vote (the ‘Full’ network), or whether we removed over 50% of the links for the ‘Consensus’ network. The orange nodes were less robust to this uncertainty in network structure.

FOR EXAMPLE,

Tools to Anonymize Personal Data

the ability to anonymize personal data has a positive direct influence on many other challenges.

read more…

Consider a citizen journalist witnessing government violence in the street; if they have easy access to tools to anonymize their personal identity, they are more likely to document and share that event, and it’s more likely to catalyze an entire community of citizen reporters while protecting them from the threat of surveillance, imprisonment, or death.

FOR EXAMPLE,

Tools to Anonymize Personal Data

…but this challenge only weakly benefits from improvement in a few other challenges.

read more…

One reason why these types of privacy tools may be 'under-nourished' is that there are not clear short term market incentives for helping people anonymize their personal data... Even though addressing this challenge could, in the long run, catalyze new economies around data sharing.

TOP ASYMMETRIC HUBS ARE POTENTIAL ‘UNDER-NOURISHED CATALYSTS’

These are challenges that if solved could help solve many, but have comparatively few things weakly helping them.

read more…

We only included the yellow nodes that were the same in both the ‘full’ and ‘consensus’ networks. In other words, these results were robust to removing over half the links that had few votes per views.

STRENGTH ASYMMETRY

is the relative difference in median strength of all outgoing vs incoming links – (Log (median strength incoming links / median strength outdoing links))

Link thickness reflects link ‘strength’, defined here as the number of votes for a ‘strong direct link’ divided by the number of people who viewed that node pair. (e.g., if 10 people looked at a node pair and 5 drew a link, strength = 0.5)

LINK ASYMMETRY

is the relative difference in outgoing vs incoming links – (Log (# outgoing links / # incoming links))

Tools to Anonymize Personal Data
For example, the ability to anonymize our personal data has a positive direct influence on many other challenges – like the number of people who will voluntarily share personal data, and the ability to catalyze a critical mass of community engagement. For example, if many people having asthma attacks voluntarily share those events with the public, the broad patterns could help identify geographic outbreaks related to air quality conditions. More people would participate if they knew their personal identify was stripped from the event so they would not be at risk of being denied health care coverage.

TOP ASYMMETRIC HUBS… SELF ORGANIZED

We used a force-directed layout to cluster those challenges with high asymmetric centrality into a few groups that are closely connected and conceptually similar.

read more…

Note that these clusters are similar regardless of whether we consider the ‘Full’ or ‘Consensus’ networks. The only difference is in the 2 orange challenges we subjectively classified under ‘Platform Openness’ because they were conceptually distinct from the rest.

From the original ~90 challenges identified by experts, 4 potential grand challenge areas emerge. These were defined by a community, and they can evolve with more input.

1/12
10

WE ARE DATA
How can #WeTheData benefit from (and avoid being harmed by) the explosion of data we generate everyday?

We used an ecological network approach, developed by Vibrant Data Labs, to make sense of this messy problem and identify Grand Challenges for catalyzing positive change.


“A problem well defined is a problem half-solved.”
– John Dewey

Show Me The

Analysis

“THIS IS THE FIRST PROBLEM NETWORK DRAWN BY A COMMUNITY OF HUMAN BRAINS!”

“We're mapping collective understanding of the problem and using the network structure to spark creative solutions where they're most needed.”


-Eric Berlow Ph.D.Ecologist | Complexity Scientist | Founder -
Vibrant Data Labs

Economic Opportunity

Human Health and Wellness

Civil and Political Rights

Environmental Sustainability

Science Education, and Human Knowledge

ability to accrue personal value from offering personal data

flexible allocation of costs

ability to be locally relevant

ease with which micro-entrepreneurs can enter the market

ability to protect against malicious uses

increased efficiency and effectiveness of public services

visibility of small success stories and examples of value from data

direct utility of open data to those providing it

personal accountability

direct utility of tools and platforms to those using them

ability to detect/self-correct unintended consequences of Vdat

transparency and accountability of large institutions

ability of everyday people to monitize dormant skills and assets

development of a marketplace for data, analytics, and other data services

transparency and accountability of information providers

degree of platform openness (copy and modify)

reputation system that engenders trust among participants

system of rewards for participation that are sensitive to context

degree to which the participants have a shared problem

ease of editing / adding to a dataset

ability of tools to be highly customizable

proportion of the population motivated by social rewards

ability to convert data into action

ability to collaboratively co-create

ability to collaboratively analyze and share insights about data

ability to make informed decisions based on data

ability to collaboratively improve platforms

formation of communities of shared interest or action around data

ability to catalyze a critical mass of community engagement

proportion of the population that is functionally data literate

ability to easily manipulate data granularity

reduction in cost of computation

computational power of small devices

ability to use data to make predictions

proportion of public that can critically evaluate conclusions drawn from data

ability to easily find pattern across multiple data streams

ability to intuitively explore and answer questions with data

proportion of the public using data to inform daily life decisions

ability to fact check online information

ease of access to novel statistical methods of pattern discovery

ability to see broad trends and place ourselves in context

ability to check and validate data quality

total number of people offering data viz and analysis tools/services

availability of automatic language translation tools

degree which UI design enables participation by diverse groups

degree to which underlying technologies and data are invisible

ease to users of managing personal data access permissions

legal/policy framework for personal digital rights management

legal framework for accessing/sharing copyright protected data

proportion of data from large institutions that is accessible

concentration of data and access to data in few corporations

tools to anonymize sensitive data

ability to control data access permissions

proportion of population with access to information infrastructure

wireless connectivity and bandwidth

accessibility of stored open data

ability to integrate info from multiple sources (e.g. spatial data in real time)

tools for seeing our own 'personal data exhaust'

ease of discovering datasets

number of people with easy access to data

proportion of real time data accessible by mobile phone

proportion of online info and media protected from censorship

ability to see under the hood of tools to avoid oversimplification

inter-connectivity of mobile apps

incentives for large institutions to open up data for social good

degree to which large institutions don't have anything to hide

ease with which everyday people can share data

ease of adding to / modifying metadata

more coherent data standards

reduction in cost of networking and storage

data self description

ability to automate inter-operabiilty of different datasets

accuracy and reliability of real time info with respect to purpose

ability to collaboratively clean and filter data

ability to share edit and update public data

ability and ease of cleaning of data

level of clarity, simplicity, and utility of data created by sensors

availability of suitable sensors

amount of personal data voluntarily shared

ability of people and institutions to be real time sensor networks

total number people contributing data

ease with which everyday people can collect/ create data

Higher Goals

Value Derived

Action & Collaboration

Analysis

Access & Circulation

Organization

Creation

How we nourish higher goals like broader economic opportunity, civil and political rights, human health, etc

How we derive value from data

How we turn that meaning into action and catalyze communities

How we analyze and discover meaning in data

How easy they are to access and share, and how we control access permissions

How they are stored and organized

How data are created

~90 challenges and >3,500>1700 links identified by the community!

availability of automatic language translation tools

Reaching people across geographical, cultural and linguistic boundaries requires translation. For example, a large proportion of basic information on the web is not accessible to Arab speakers. Sharing and leveraging open data will be limited if there are not tools that enable translation. Beyond the direct and obvious linguistic translation, this may include cultural differences, such as what units are used on data.

more coherent data standards

For disparate data sets to be able to “talk to each other”, to be used in the same analysis, and be used to drive new discovery, they must be combinable. That requires standardization. Data standards have traditionally been very difficult to implement because people are always coming up with new data-types. Open, flexible standards are essential to enable truly Vibrant data.

degree which UI design enables participation by diverse groups

As basic performance capabilities - computing power, networking, even the power of analytics tools - increase, the world will see a shift in emphasis from simple performance to a better design of that performance to fit the needs of real people. UI design is an important element in making the power of technology fit with the ways of learning, the language, the visual skills and other characteristics of real people.

proportion of the population that is functionally data literate

Refers to the ability of people who understand enough about data and analysis to derive real value from it. Finding patterns and meaning in data will be simplified with the development of new analysis and visualization tools. The proportion of people who are “data literate” will drive the rate of development of tools that make it easier for people without a lot of specialized training to analyze data for meaning.

degree to which underlying technologies and data are invisible

Some may want to explore all that is possible with a given tool, while others may want to just use the basics. Design plays a key role in enabling novices to immediately understand the use of a given technology or data set, while enabling deeper explorations over time. By artfully hiding or exposing details and functionality, designers will enable more people to share and use data in ways that suit them.

ability to easily manipulate data granularity

Tools that enable the manipulation of data granularity enable people with different skills and interests to look at common data sets in their own way. Sometimes, it is important to dig into the details. Sometimes, it is more appropriate to just have a “bird's eye view”. The more our data analysis tools can accommodate both ways of looking (and those in between) the more likely that more people will find value in them.

ease to users of managing personal data access permissions

Another issue refers to the importance of putting flexible tools for assigning data access permissions in the hands of people, not large institutions (reference “ability to control data access permissions”). These tools must involve a minimum of effort and management overhead on the part of individuals and be largely automated, otherwise they will be too complicated to be useful.

level of clarity, simplicity, and utility of data created by sensors

Many sensors today gather data in ways that are either proprietary or idiosyncratic. We need new methods for sensor data to be made more easily accessible and understandable from the very moment it is first created. This may be through improvements of devices themselves or through methods that automatically format and make available data by way of post-processing.

degree of platform openness (copy and modify)

Openness here is defined as free to copy and change. Open source has been a major enabler for many organizations to collaborate, share and leverage expertise. It enables innovation and local cultural relevancy people adapt tools and technologies to local needs. Similarly- by allowing the open sharing of data, and (appropriate) modification of information such as meta-tags, the utility of a database can be improved tremendously.

legal/policy framework for personal digital rights management

People will not be willing to openly share their personal data unless they have assurances that their privacy will be protected. Current privacy agreements - implemented by corporations and institutions - do little to assure people of the protection of their privacy. Legal and policy frameworks that return control over privacy to individuals, along with technology tools to help individuals manage those protections, will be vital for data vibrancy.

legal framework for accessing/ sharing copyright protected data

Tight legal restrictions limit peoples ability to work with and build on the data of others. Too often, such restrictions favor powerful enterprises, who “lock down” access. Without any restrictions, however, people may not be willing to share personal data for fear of violations of privacy. Legal frameworks require a balance between openness and individual protections, without those protections becoming a tool for abuse by powerful interests.

proportion of data from large institutions that is accessible

While data is much more easily shared, laws, institutional power, access to technology and other resources can create barriers to openness and transparency. Even in highly democratic societies, large firms and institutions have the means to gather information about individuals (for advertising, or surveillance). Real data vibrancy will only occur when this one-sided approach to data gathering and possession gives way to much broader circulation of data.

reputation system that engenders trust among participants

People who remotely collaborate need tools to assess the trustworthiness of others they share data with. Reputation systems have been developing online, and many firms are working on generalized reputation scores that apply across domains. But beyond simple scores, people will need other tools, including assurances about recourse in the event that trust is breached, in order to feel safe sharing information broadly.

concentration of data and access to data in few corporations

Large, consumer-facing companies guard consumer data they collect. On the one hand, it means that corporations might protect those data from abuse. On the other hand, this hoarding of data reduces the chances for people to discover unexpected new meaning and value.

tools to anonymize sensitive data

Tight legal restrictions limit peoples’ ability to work with and build on the data of others. Too often, such restrictions favor powerful enterprises, who “lock down” access. Without any restrictions, however, people may not be willing to share personal data for fear of violations of privacy. Legal frameworks require a balance between openness and individual protections, without those protections becoming a tool for abuse by powerful interests.

ease of adding to / modifying metadata

Metadata is information about data - descriptions of context, characterizations, labels, all ways that help people know what a data set or element is, and how it might be used. Metadata is a key component to make data sets more compatible and useful across boundaries. Therefore, we need open systems that allow metadata to be modified appropriately.

ability to control data access permissions

As data access becomes more open, people will need the means of setting personal data permissions. Current data permissions systems (e.g., those provided by online services or corporations with regards to personal data) are not nearly flexible or powerful enough, and generally favor the corporation as “owner” of the data. This will obviously have to change for people to willingly share and circulate their data in a more open way.

availability of suitable sensors

Most smart phones have many sensors on them - GPS, accelerometers, cameras, microphones, etc. A more vibrant ecology of data creation will depend new types of affordable mobile sensors to collect data in new ways - for instance, after the 2011 earthquake in Japan, small mobile radiation sensors deployed by citizens were critical to mapping radiation plumes around the Fukushima power plant.

reduction in cost of computation

Since the late 1960s computing performance has continuously doubled in power every 18 months, while the cost per unit of computing has dropped precipitously. This is a factor that will continue to fuel the growth of open and vibrant data exchange, as more and more people can ultimately afford to access computing power - and thus, digital technology.

reduction in cost of networking and storage

Network access in more remote - and usually poorer - parts of the world remains an acute issue. As with the cost of computing, digital storage technologies continue to decline in price and increase in power. Solid state storage devices promise to provide a more robust option for small devices, while new, open standards for servers have created a proliferation of storage “in the cloud”.

computational power of small devices

Since the late 1960s computing performance has continuously doubled in power every 18 months, while the cost per unit of computing has dropped precipitously. This is a factor that will continue to fuel the growth of open and vibrant data exchange, as more and more small devices become increasingly powerful computers. Right now, an average smartphone has more computing power within it than was used in the entire Apollo moon landing.

proportion of population with access to information infrastructure

For people to openly exchange data and build on each others' insights, a very basic ingredient is simply access to technology and digital data. “Cloud” based storage, smart phones, wireless networking technologies and many other factors have made this a reality for an increasing number of people, but remote and poorer regions of the world still lack that basic access. As access increases, so does vibrancy.

wireless connectivity and bandwidth

For vdat to become real wireless connectivity needs to be extended to the underserved people of the world. Amid the Arab Spring uprisings governments cut off carrier services to prevent people from organizing. New tools, like peer to peer connectivity of mobile devices could be one way of ensuring connectivity for all.

Platform Openness

the ability to copy, modify, customize, and improve while preventing abuse

Data Literacy

the ability to intuitively answer questions with data and critically interpret the answers

Digital Access

the technical ability for the underserved to access, network and store

Digital Trust

the ability to control our personal data 'exhaust' and build systems of reputation and accountability

from2nodes
Participate
Ideas? Wanna share?
close

Platform Openness

THE ABILITY TO COPY, MODIFY, CUSTOMIZE AND IMPROVE WHILE PREVENTING ABUSE.

Open is key to accelerating transparency and innovation in the world. Like passing the baton to the next runner in a relay, each developer accessing an open platform will race to create a new interface – they’ll edit and modify. By customizing for their own needs, they’ll improve the platform for others. Information is vibrant power when it is open to being creatively re-mixed. For the value of Openness to be fully realized, we also need to solve the challenges of Access, Trust, and Literacy.

from7nodes
Participate
Ideas? Wanna share?
close

Data Literacy

THE ABILITY TO INTUITIVELY ANSWER QUESTIONS WITH DATA AND CRITICALLY INTERPRET THE ANSWERS.

Empowering non-experts to easily mine data for meaningful value is critical to enabling broad participation in this new data economy. Even if we have more Access,  and Openness with Digital Trust leads to more sharing, it can only produce value in our everyday lives if non-experts can read, interpret, and think critically about the information within. Teachers, designers, artists, technologists, sociologists - we are calling on every field to nurture Functional Data Literacy.

from6nodes
Participate
Ideas? Wanna share?
close

Digital Access

THE TECHNICAL ABILITY FOR THE UNDERSERVED TO ACCESS, NETWORK AND STORE.

Access is the foundation. Key to democratizing our data ecosystem is ensuring the underserved have access to more powerful, affordable, connected mobile devices, sensors, and storage which allow them not just to consume information, but to participate in producing real value in their lives. While technological access is a necessary foundation, it is not sufficient. Trust, Literacy, and Openness are critical for this technical Access to create value.

from8nodes
Participate
Ideas? Wanna share?
close

Digital Trust

THE ABILITY TO CONTROL OUR PERSONAL DATA ‘EXHAUST’ AND BUILD REPUTATION AND ACCOUNTABILITY.

In digital we do not trust. We are not in control of, nor have a right to, much of our own data. This collective distrust, from widespread tracking of personal data, is stifling the rise of a new economic ecosystem of sharing. Trust and accountability will not only prevent civil rights abuses, but will catalyze the new personal data economy. For this potential to be fully realized, we also need Literacy, Openness, and broader Access.

Learn More
Learn More
Learn More
Learn More
Embed
sparked by
in collaboration with >>
  • Vibrant Data Labs
  • Brainvise
  • Intel Labs
  • Quid
  • The Story Studio
  • The Gathering Think Tank
  • Research, Analysis
  • Creative Direction, Design, Development
  • Research Partnership
  • Visualization Platform
  • Creative Advisory
  • Social Architecture