What is active learning?
Active Learning (AL) is a method used in the field of machine learning that allows AI to efficiently label data that is difficult to judge. Use cases include the situation of “when you want to use new data based on the trained model” and “when you want to increase the data and learn when the model is not working well”. In the latter situation, there are several cases where the data needs to be newly labelled, which increases the cost. Therefore, AL efficiently labels by “selecting data that existing AI model is confused to make a judgement”
AL has several strategies for selecting data. One uses Uncertainty, and it is assumed that the ones that are difficult to judge with the current model are worth learning. Max Probability is a typical example, and the lower the prediction probability of the class with the highest prediction probability, the more valuable it is. In addition, there is a Query-by –committee that states that the data with different predictions of multiple models prepared are more valuable. The strategic part of such data selection is the subject of research.
Benefits and initiatives of utilizing active learning in the medical field
Utilization of AL is also expected in the medical field. NTT DATA carries out research and development of medical image diagnosis support system such as CT and MRI using AI, but the challenge in development is to prepare a sufficient amount of data for AI to learn. Medical images are accumulated in daily work, but it is difficult to extract accurate coordinate information of the area where the lesion is seen from the interpretation report, it requires help from a doctor to create a new label for AI. Depending on the disease, it may take 20 minutes or more to label one patient, which usually takes time. In addition, since it requires the cooperation of the doctor, reasonable amount of money is required which becomes burden for development.
This article verified AL in the medical field with the theme of lesion detection on kidney CT images. Starting from a small number of training data, compared which (1) AL is added or (2) Random addition is faster in terms of improving accuracy. As a result, (1) AL was able to achieve high accuracy for difficult cases with a small number of images. On average, for all cases, it was verified that the accuracy level reached at 4000 random images for (2), AL was able to achieve about 2000 images, showing that it is possible to improve the efficiency of data selection.
It is expected that the available data will increase in the future. It is necessary to select important data from and create an AI model. Under such circumstances, if AL can help AI to learn with less data, hurdle for introducing an AI system will be lowered. In order to contribute to the expansion of AI system development, NTT DATA will continue development with the aim of applying AL in various fields.