Search This Blog

Friday, December 1, 2023

'More Data Is Public. Why That Matters to Governments and Corporations'

 The average human generates at least enough data to fill a 350-page book every second. Smart phones have accelerated the amount and rate of data creation to an almost incomprehensible scale. The International Data Corporation estimates that by 2025, 463 exabytes of data will be produced every day. To get an idea of how much this is, it is estimated that only 5 exabytes would store all of the words spoken by all humans throughout history.  

The other element is that much of this data is public. It includes YouTube videos, home purchases, streaming radio, crop harvests, government budgets, academic publications, music composition, corporate reports, news, Yelp reviews, comments on online forums, public databases, trademark applications, flight tracker information, and so on.

This has raised appropriate concerns about privacy. But if this ever-growing ocean of data can be collected in substantial amounts, in a timely and ethical fashion, and can be analyzed correctly, it has tremendous value and ability to do good. It can help the government improve national security and better represent and serve its citizens. It can help businesses make smarter decisions about capital allocations and the marketplace to deliver better products and services to consumers.

The explosion of data, and the increasingly sophisticated tools to gather and understand that data, has combined to create a revolution of publicly available information.

Publicly available information, or PAI, can be defined by what it is not. It is not legally protected personal health and medical information, personal identification data, or material non-public information, such as a company’s trade secrets. These should be off-limits to ethical data collection.

In theory, PAI can be found by anybody. In reality, the sheer amount of PAI means that it can only be usefully gathered and analyzed by organizations that have invested in technology that allows them to gather the data at scale; infrastructure that enables them to store and access it en masse; and data engineering that enables them to take unstructured data and make it digestible and understandable to users.

PAI can reveal hidden patterns in the behavior of consumers or combatants. An analysis of corporate hiring notices may reveal a strategic direction a company is about to take, giving insight to investors and competitors. Blips in supply chains can inform military intelligence about another country’s military intentions. Increasingly, governments and corporations want to find the gold inside PAI, for both national security and commercial reasons.

If an organization decides it wants to leverage the enormous power of PAI, it’s critical to be a good citizen of the internet. Only PAI that has been aggregated and anonymized should be collected and analyzed. Data that can reveal individuals’ identities, such as biometrics or even advertising information, should not be used. Organizations should follow the data compliance practices of the regulatory bodies in that sector. Responsible PAI firms should do no harm. Reckless firms that engage in common-crawl discovery of PAI can disrupt the websites they’re looking at. That can result in problems ranging from annoying – slowing down an e-commerce website – to much worse, such as disabling a medical or health site. And most importantly, organizations first need to understand exactly what they want to find in PAI. A responsible firm will be their guide.

What are some real-world examples of how smart collection and use of publicly available data can help organizations?

In the government sector, PAI is mission-critical to helping agencies tackle the toughest problems of natural security. Consider Taiwan. The country produces more than 60 percent of the world’s semiconductors and more than 90 percent of the most advanced ones. Any conflict involving Taiwan would create catastrophic disruptions to the global supply chain in both the commercial and defense spheres. Fortunately, there is copious data in the public domain that can spotlight the Taiwanese chip industry’s critical vulnerabilities ahead of time. 

These include identifying choke points for raw materials and components needed to make semiconductors, uncovering ownership stakes by hostile countries in chip-adjacent companies, predicting workforce disruptions, assessing cyberattack potential, and understanding the susceptibility of logistics corridors in contested space. On the positive side, PAI can suggest de-risking options – what do the patent and regulatory environments in other friendly countries look like, and how hard would it be to expand chipmaking there? 

In the commercial sector, imagine an owner of multiple apartment and office buildings. They want to get new tenants and retain existing ones, but can do so only if customers are happy. It can be hard for a big company to understand the customer experience at a granular level; they can only hire so many customer service agents. But it is not difficult for someone to have their voice heard in the public domain today. In addition to direct feedback tenants may give, they will post their opinions on Google reviews, Reddit and sub-Reddit forums, YouTube, and elsewhere. All of this information is publicly available, and can help the building owner identify and fix problems in its buildings.

Data-generation will continue to grow exponentially, and much of it will continue to be in the public domain. The understanding and effective use of PAI by the government and companies is a net-positive for citizens and customers, as long as personal information and PAI never intersect. PAI can help everyone make smarter decisions.

Brian O’Keefe is the CEO of Vertical Knowledge, a data products, insights and intelligence company.

https://www.realclearpolicy.com/articles/2023/12/01/more_data_is_public_why_that_matters_to_governments_and_corporations_996161.html

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.