TT02 Usable Augmented Data

In an era where data quality dictates business success, more usable and complete data by augmentation will be critical as mixing good with bad data continues. The development of technologies that generate and refine data will expand the possibilities of data usage.

The Increasing Value of Data

Within our society, several enormous digital platformers market innovative technologies and services in rapid succession to sustain rapid growth. This is creating a virtual monopoly of data and nations and other corporations are waging data battles by, for example, restricting data export beyond national borders. These fights stem from the fact that data has become an indispensable business driver and a primary source for national and corporate economic growth. This is indicated by the fact that data is sometimes referred to as the new oil. Emails, social networking services, voice assistants, electronic commerce websites and conveniences that we take for granted are continuously generating new data used to analyze consumer preferences and improve AI performance. In addition, once data creates a new service, yet another new service is produced using information from the first service, thereby creating a virtuous cycle.

Even after huge amounts of data are secured no new value can be created without the technologies and ideas that utilize the data. Recent advancements in AI technology are propelling data utilization to a new level. However, in addition to its potential benefit, this progress also presents new challenges.

The Data Bias Challenge

Data bias is causing societal problems and concerns with both data analysis and AI learning. For example, some AI-based face recognition programs have significant variances in accuracy depending upon gender and race. Because of this, the use of these AI programs for security purposes often results in unjust differentiation, whereby a specified group is asked to go through additional security checks. Although added learning data enables AI to perform more accurately, it may also be necessary to uniformly organize learning data to control for object-dependent variations, even if it results in less overall accuracy.

Numerous other issues are resulting from data-based bias. For example, AI that predicts recidivism is controversial because it produces biased predictions based on race. This situation arises not from bias within the learning data, but from absorbing historical data and applying it, without question, to predict the future. AI makes decisions based on the data it has learned assuming a causal relationship, even though only a correlation actually exists. For instance, because the possibility of repeat offenses tends to be higher in the past for a certain race does not mean that particular race is the cause of repeat offenses.

What is required is fair and secure AI that individuals and society can use. To attain this, organizations must verify bias in learning data and processing logic. In 2018, multiple companies announced tools to detect AI bias. In the future, companies must resolve any problematic bias in order to gain social trust and sustainable growth.

Verification of Authenticity

With the launch of software using deepfake, a synthesis technology that can replace one human face in a video with another, it has become relatively easy for anyone to create fraudulent videos. Consequently, fake videos with inserted faces of celebrities and election candidates are now in global circulation, causing social issues such as the infringement of image and human rights and the manipulation of public opinion. Furthermore, voice synthesis technology has also evolved. In fact, only one minute of voice data is now sufficient to create any length of virtual speech. Combining fake video with voice capability will make the differentiation of what is authentic more difficult. Adding to the confusion and potential fraud, AI can also create verisimilar text. Once a person writes an initial sentence, AI can produce subsequent sentences in a similar style on any topic.

To address these and other problems, the development of technologies to detect fraudulent content has become vital. For example, a service that detects synthesized images and videos posted on social networks in real time has been launched. In addition, the U.S.Defense Advanced Research Projects Agency (DARPA) has established a media forensics program to develop sophisticated tools to detect fake videos, images and voice. This detection program focuses on physiological characteristics of humans that AI cannot replicate, such as unnatural eye and head movements. However, the elucidation of a fault in fake data will likely motivate the development of yet another technology to counteract it. In fact, technology has emerged already that elaborately imitates the motion of blinking eyes and head movements after a face is replaced within an image.

The battle between the synthesis technologies that further improve realness and those that detect fakery is expected to continue into the foreseeable future. To make matters more confusing, about 60% of articles on a social networking service are shared without actually opening the article itself 1, spreading information due to its attention-grabbing title. As a result, advancement of technology to eliminate fake data while refining real data, and the improvement of user literacy are critical.

1Social Clicks: What and Who Gets Read on Twitter?

A New Potential for Synthetic Data

While AI-based data generation technology increases fake content, it also creates the potential for enhanced data utilization. One such advancement is a technology called Generative Adversarial Networks (GANs), which generates synthetic images and videos indistinguishable from the real thing. Technology has also emerged to recreate specific characteristics and styles of such generated objects. For example, the technology can specify during the generation of a human face the angle of the face, hair style, eye and nose characteristics and skin color. Improvement in AI’s accuracy and minimization of bias can be expected through the use of such learning data.

Another factor encouraging the use of synthetic data is the advancement of a technology called domain adaptation. This method applies knowledge acquired through the learning data of one domain to another domain. Based on knowledge acquired via a virtual CG space, this technology has the hidden potential to solve real-world problems for which sufficient learning data is unavailable. For instance, robot arms and self-driving cars can already be partially controlled through simulation-based learning in a virtual CG space. Thus the advancement of technology to synthesize realistic data combined with learning methods adaptable to the real world, will expand its applicability in the years ahead.

Simulation learning allows AI to acquire knowledge in a short time through high-speed and parallel processing, and prepares AI to adapt to extraordinary situations. For instance, simulation learning for self-driving cars permits AI to prepare for blizzards, dense fog and other natural phenomena, as well as situations too risky to replicate, e.g., a pedestrian or motorcycle suddenly appearing in the road. In the future, how to process and generate data that improves AI accuracy will likely become a key source of competitive advantage. Today, as it becomes difficult to differentiate between real and fake data, society requires a means to eliminate counterfeit content while still benefitting from valuable, synthetic data.

What are you looking for? search