1. Outline 

The 4th industrial revolution is the next-generation industrial revolution made by the convergence of information and communication technology (ICT). It is one of most important industrial age after the Industrial Revolution in the 18th century.

1st industrial revolution 2nd industrial revolution 3rd industrial revolution 4th industrial revolution
Steam power-based mechanization revolution Electronic energy- based Mass production Computer and internet-based Knowledge information revolution Big data, AI, and IoT based Hyper-connected revolution

The 4th industrial revolution starts with DATA, so data is called the foundation of the 4th industrial revolution. Accordingly, Open data and selling  it are actively conducted in order to actively utilize it in each institution and company, including government and organizations.


  1. Data Annotation

In order to utilize data, the entire process of data collection, processing and analysis, and utilization must be organically connected. Here, data labeling and cleaning refers to a series of processes that organize, standardize, and integrate collected data.

Data analysis is like the process of rick cooking that we do everyday morning. In order to cook and eat rice, you must first obtain rice from a store or market. If not, you must plant rice seeds, plant rice, and go through several processes to obtain rice. Writing and speaking are easy, but planting rice is a task that requires expertise and hard work, and without this process, we cannot eat rice. Furthermore, in order to obtain delicious rice, good rice seeds (raw data collection) are required, and steady management (data cleaning) such as water management and others is essential. Finally, it comes to us through a harvest operation (data labeling).

In December 2019, South Korea government announced the ‘Artificial Intelligence (AI) National Strategy’ and announced the vision and tasks to become Top an artificial intelligence country.While fully opening public data from public institutions, actively building public data to activate the use of AI for autonomous driving and smart cities is planned.Through this policy planning, a larger amount of data was collected and produced than before, and unstructured data (images, videos, voices, etc.) increased.

Unstructured data refers to that is not predefined in a formal way, and various pre-processing is required to utilize it as information. In order to recognize the license plate, only the license plate part is cut out from the picture of the car, or when the face is recognized, the hat or other hidden objects that covers the face is deleted and only the face part is extracted to perform the desired function properly. In this way, the operation of extracting objects to be analyzed from photos, videos, and voices is called data cleaning, and the operation of tagging for the extracted information is called annotation. Artificial intelligence itself must be implemented by computers without human intervention but building an automated AI algorithm requires human annotation of large amounts of unstructured data.

In July 2020, the government started the ‘data dam’ project, which is the representative task of the ‘digital new deal’. Among the core businesses that make up the data dam, the most important is the ‘AI learning data construction business’. A total of 22,000 jobs will be created, most of which will be devoted to processing and cleaning unstructured data.

