PIAAC2ESCO – An AI-driven classification of the PIAAC Background questionnaire onto the ESCO Skills Pillar (2021-2022)

Mercorio, Fabio (Università degli Studi di Milano-Bicocca)


Changelog (Archive version):

1.0 - 2023-02-03
* first import routine of the study
* compared to the information provided by the source, the names of the dataset variables have been modified according to the UniData archiving standard



Changelog (Source release):

1.0 - 2022-12-01
* first release



UniData provides two kind of data access:

  • open data
    (all users can use the data - an expense allowance could be required)
  • restricted data
    (the data use is limited - see below)

Data Kind:

Time Dimensions:

other documentation

The study concerns the design and use of Artificial Intelligence algorithms for the analysis of information on the labour market. The result is represented by a new mapping of professional skills, which combines the skills detected through the survey on adult skills conducted by the OECD as part of the PIAAC program (International Assessment of Adult Competencies) and the ESCO classification of professional skills in Europe.


In particular, PIAAC2ESCO provides a characterisation of the PIAAC background questionnaire on the base of the ESCO Skills Pillar. In practice it associates a list of ESCO skills (v1.0.8) to questions of the PIAAC background questionnaire (version 2010), based on their similarity. The linkage is done using AI in a framework that combines various methods: embeddings, selection of the best embedding, taxonomy alignment and experts’ validation.


Sections F to I of the PIAAC background questionnaire are used, from which the questions relevant to the analysis (73 out of 84) are extracted  and the best matches with the skills present in the ESCO Skills Pillar (13.600 items) are extracted. The validated dataset covers 21 PIAAC questions and the mapped ESCO skills, which are enriched using alternative labels.

Data are available for free (registration required). After logging in, please click on the green button download_arrow in the box at the right.

Keywords: , , , ,

Share on Facebook Share on Google+ Share on Twitter

Topic Classification:

  • LABOUR AND EMPLOYMENT - employment
  • LABOUR AND EMPLOYMENT - working conditions
  • SCIENCE AND TECHNOLOGY - information technology
  • Geographical Unit: not applicable

    Analysis Unit: other

    Universe: No reference universe

    Sample Procedure: The data is not sample type

    Weight: No weight used

    Collection Mode: other

    Collection Size: UniData supplies: 1 dataset in SPSS format, 1 dataset in CSV format, 1 methodological notes in PDF format (eng), 1 codebook in PDF format (eng) (4 file)


    Methodological Notes (pdf):
    Codebook (pdf):
    DDI Documentation:
    DDI Documentation


    1. Guo, Y., Langer, C., Mercorio, F., Trentini F. (2022) Skills Mismatch, Automation, and Training: Evidence from 17 European Countries Using Survey Data and Online Job Ads. EconPol Forum 23 (5), 11-15. CESifo, Munich, 2022

    Data Use Restriction:
    Data are released in according to Creative Commons – Attribution 4.0 Licence, available here.
    Source Contact: Francesco Trentini - Università degli Studi di Milano-Bicocca

    Mercorio, Fabio. (2021-2022) PIAAC2ESCO – An AI-driven classification of the PIAAC Background questionnaire onto the ESCO Skills Pillar. Mezzanzanica, Mario [Producer]. Pallucchini, Filippo [Producer]. Trentini, Francesco [Producer]. Guo, Yuchen [Producer]. Langer, Christina [Producer]. UniData - Bicocca Data Archive, Milan. Study Number SI399. Data file version 1.0 doi:10.20366/unimib/unidata/SI399-1.0

    Deposit Requirement:
    The user is obliged to quote all data and documents disseminated by UniData and used in the own publications, using the information previously showed. The user is also obliged to send UniData the bibliographic citations related to the publications where the requested data and documents are used.

    Neither the depositor nor UniData bear any responsibility for the analysis or interpretation of the data produced by the user.