Humans are producing more information right now than we did at any point in history. We’re up to our eyeballs in data — data from the web, from sensors, from mobile devices and satellites, all growing exponentially. The amount of data created over the next three years will be more than the data created over the past three decades. In the next five years, these technologies will produce more than three times the information that they did in the previous five.
What does this flood of data mean for the people whose job it is to make information accessible? How can we leverage emerging practices of journalism to derive business intelligence?
The ever-rising sea of data presents both a dramatic challenge and, up to this point, largely untapped opportunity for the information industry. The field’s long-standing, mostly rigid methods do not have the capacity to interpret or contextualize the enormous amount of data our society is poised to produce. We need faster and more sophisticated information gathering if we don’t want to drown in big data.
Determining what is deemed “newsworthy” will evolve into a process focused on finding and analyzing statistical outliers with clarity, while articulating meaning by providing the human context behind the data. News becomes information when it demonstrates clear utility for decision making.
The concept of “drowning in data” sounds abstract, but there are real stakes involved in ensuring that the methods of journalism adapt to the reality of today’s information landscape. In the next decade, society will face extraordinary challenges — from the increasing probability of global pandemics to crumbling infrastructure and the effects of climate change. The disruptions to come call for an urgent reassessment of how information is sourced, because good information plays a vital role in creating a shared reality — a common frame of reference among the public, which in turn enables individuals, governments and organizations to work towards solutions from a place of shared “ground truth.”
To meet the demands of the next decade, we must move towards a new type of journalism that is more scientific and analytical, a journalism that contextualizes raw data by continuously tracking multiple sources and by processing and vetting it in real-time. This is computational journalism.
The sector that currently gets closer to this notion is financial information; stock prices and economic indices guide coverage and provide organizations and the world with a “ground truth” about the markets. But this has not always been the case. A number of companies, such as Dow Jones and Bloomberg, were originally built with the goal of collecting financial data, normalizing it, and providing it to individuals and organizations. Since their inception, these institutions evolved into fundamental cornerstones of the global economic infrastructure.
But what if we could do something similar for sectors to help drive growth beyond just capital — for example, to spur progress across the life sciences, energy, infrastructure, and supply chain industries? What if we could define a “ground truth” in real time that illustrates how our healthcare system operates, or provides an assessment of an organization’s environmental footprint, or conveys the state of local infrastructure? What if we could track and contextualize significant “state changes” about the health of people, places, and the planet with the same rigor and analytical detail as we do the financial markets?
As companies and governments increasingly move their operations online, they leave data trails that, if properly mined, can provide unique insights into how these entities work. Massive troves of data across sectors hold vast potential for shaping how organizations function — *all* that is needed is a systematic approach to consolidating and contextualizing that data to provide actionable insights.
It is a short step from where we are today to a future defined by new systems that allow us to detect and contextualize data from a diverse array of sources at the continuous rate with which we track stock prices. However, challenges remain: for example, such systems require scale that is not currently supported by traditional newsrooms, due to the need for monitoring multiple data sources simultaneously and at a pace surpassing manual capabilities.
At the same time, these systems need transparency that tech companies have failed to offer. The same way journalists ask questions to human sources, they should be able to ask questions of their algorithms, to try to understand their inner workings. This will allow for the creation of reliable algorithms, and help avoid the pitfalls of purely autonomous systems, such as machine bias. By applying journalistic principles in the development of technology we can mitigate unwanted AI biases and ensure fairness.
Editorial algorithms like the ones that we are developing at Applied XL are needed to make sense of enormous quantities of data while maintaining high journalistic standards.
At Applied XL, we are creating information systems that will work 24/7 to monitor billions of data points. Our intelligent systems are guided by principles defined by the computational journalists who program them and are vetted by experts in specific domains, combining both the high standards of a newsroom with the scale of Silicon Valley platforms.
Our first swarm of editorial algorithms is currently monitoring clinical trials data to identify newsworthy events and state changes, in many instances before those insights appear in press releases or news coverage.
Given the expanding accessibility of artificial intelligence and the proliferation of cloud computing, we are now confident that we can scale the collection raw data automatically and reliably at a speed previously unimaginable, making it possible to keep up with the explosion of data. This does not mean that AI will take over for human journalists, but it will undoubtedly transform their roles: the journalists of the near future will be information officers and algorithmic editors.
Naturally, the prospect of such a paradigm shift is bound to set off alarm bells in some quarters of the media. If everything is reduced to numbers, won’t we lose sight of the human context? To answer this question we need to first understand the origin and evolution of journalism. In its early days, journalism was designed to provide context in a period of information scarcity — wherein a relatively limited amount of readily available information could be manually sourced.
Information scarcity made it feasible for reporters to rely on human connections for intel, which they acquired through interviews, requesting documents from their contacts, and attending press conferences. Eventually news organizations had to become more standardized to keep pace with the growing complexity of the information landscape, developing new practices such as “beats,” the inverted pyramid, and planning-based news cycles. These techniques and processes have been adapted time and again with the advent of new technologies, including the internet. But we have reached an inflection point where human interaction alone, even if optimized, is no longer enough to provide journalists — and their consumers — with the knowledge they need. Processes that were once useful are quickly becoming obsolete for an environment of information abundance, which we often refer as “data vomit”.
It’s true that numbers, on their own, are insufficient to tell a full story. It’s for this reason that journalists and other knowledge experts play an integral part as the arbiters of algorithmic truth.
In fact, journalism has always been about data. It’s just that today, the times have outstripped the industry’s methods. And from a purely data science perspective, journalism is the detection and documentation of statistically significant events. With editorial algorithms, journalists as well as experts are still the ones who decide what weights, parameters and transparency principles to apply to their machine learning models.
Most information companies, especially legacy ones, have consolidated their processes over the decades in such a rigid way that it has become difficult to introduce new technologies that would update existing workflows. They may be able to improve components of their systems, but very unlikely their entire workflow. The reality is that it’s too costly for them to re-engineer their systems and retrain their staff.
The specialized B2B information industry has seen immense growth in the last several years as companies have sought to gain an edge. But the professionals these companies serve report being overwhelmed by the sheer volume of information, the lack of reliable real-time context for this data, and the complexity of navigating it. For the information industry to thrive in the next decade, we need entirely new approaches able to serve the needs of a new generation of consumers and decision makers where speed and accuracy is essential.
Our first vertical will focus on life sciences, an industry with specific information needs currently not being fully met by existing private and public data sources; for example, many existing data providers cannot deliver the kind of real-time context life science organizations require to make decisions on clinical trial development, competitive positioning and commercialization. Applied XL will address this need by mining data on clinical trials, pharmaceutical industry regulation, and healthcare policy, and automate the generation of alerts and reports. We will initially work with select partners, including professionals working in strategy functions at biotech and pharma companies, and will open our data platform to life science experts, including scientists and journalists, enabling them to access data and calibrate machine-driven insights. Following the launch of our life science product, Applied XL will tailor its technology to meet the evolving needs of a range of other industries.
Applied XL aims to use editorial algorithms to track the health of people, places, and the planet.