TABLE OF CONTENTS


A resume parser (CV parser) is used within human resource software and on recruitment websites, job boards, and candidate application portals to simplify and accelerate the application process. It does so by extracting and classifying thousands of attributes about the candidate. 


A resume parser also provides the foundation for a semantic search of candidate data. The parser identifies different kinds of information within a resume or CV and tags each data point (for example,  Work Experience, Educational Background, Skills, and Personal Details of the candidate). 


Resume Parsing on Skillate has two essential steps:

1. Text (Content) Extraction, 

2. Information Extraction


Resume parser workflow


Text (Content) Extraction


In the Content Extraction step, the resume content is extracted from the uploaded files (from various formats of raw documents .pdf, .doc, .docx, etc. formats).


Information Extraction

 

The next step in resume parsing is extracting structured information from unstructured or semi-structured machine-readable documents.


A typical resume is a collection of information that includes the Work Experience, Educational Background, Skills, and Personal Details of a candidate. And this information will be presented on a resume in various formats: tables, multiple lines, sections, etc.


Deep Learning Algorithms in NLP (Natural Language Processing) help extract information from the content of a resume. Skillate has trained a custom Deep Learning NER (Named Entity Recognition) model based on Google’s BERT language model with the help of over 100000 resumes.


Understanding NER


Named Entity Recognition (NER) helps fetch the information from the extracted content. The NER locates and classifies the named entities in the unstructured text into predefined categories such as the person names, organizations, locations, etc. These are part of Skillate’s customized NER model (AI).


Consider the below two statements: 
‘2000–2008: Professor at IIT Kanpur’ 
‘B.Tech in Computer Science from IIT Kanpur’ 

Here, IIT Kanpur will be treated as an Employer Organization in the former statement and as an Educational Institution in the later. We can differentiate between the two meanings of IIT Kanpur here by observing the context. 

1. The first statement has Professor which is a Job Title. So, IIT Kanpur will be tagged as a Professional Organization. 

2. The second one has a degree and major mentioned. So, IIT Kanpur will be tagged as an Educational Organization. 
 
The below snippet is from our NER model results. It shows how the model can recognize and differentiate the different meanings for the phrase “IIT Kanpur” in various contexts. Each word has a corresponding label. 
 

TIT — Designation COM — Professional Organization INS — Educational Institute DEG — Degree OTH — Other


Relying solely on the NER model will not yield high accuracy in all cases. Hence, Skillate has created post-processing algorithms using NLP for sanitizing the information extracted from the resumes.


Limitations of Resume Parser

Even with all the advancements and research into Deep Learning and other NLP technologies, achieving 100% accuracy in AI is impossible. Improving the accuracy of the models is a continuous process due to the size of the training data and time for training.  


Below are some of the cases where Skillate’s parsing accuracy is not at its best -

  1. Complex resumes with multiple vertical/horizontal sections and multiple tables.

  2. Inconsistent patterns or tabs or whitespaces 

  3. Resumes with images, diagrams, arts, etc.

  4. Resumes created from screenshots, scanned copies, photographed, etc.

  5. Wrong information or format furnished by the candidate.


Skillate keeps testing and updating these algorithms to improve the overall parsing quality.



Skillate’s Structured Resume Format

The resume parser gets the parsed resume in a structured format. Provided below is the structured resume format generated by the Skillate resume parser.


The structured resume in Skillate will have the following fields captured:


  1. Name

    1. First name

    2. Middle name

    3. Last name

  2. Emails

 

  1. Phone Number

 

  1. Total Number of Working Experience (Years)

 

  1. Education Detail(s)

    1. Institute

    2. Degree

    3. Major

    4. Start year & month *skillate counts month starting from 0 to 11

    5. End year & month

    6. Whether current institute or not 

    7. Description / summary

    8. Grades

  2. Experience Detail(s) 

    1. Company

    2. Job Title

    3. Years of Experience

    4. Start year & month

    5. End year & month

    6. Industry

    7. Whether the current company or not [BOOLEAN]

    8. Description / summary

    9. Location

  3. Skills

    1. Core skills

    2. All skills

    3. Functional & Behavioral skills

  4. Current Company

  5. Current Location

  6. Current Job Title

  7. Latest Institute

  8. Latest Degree

  9. Latest Major

  10. Highest Degree

  11. Highest Major

  12. Profile / Social links (e.g.: LinkedIn / GitHub profile, website link, etc) [LIST]

  13. Overall summary / description



Learn more about Skillate AI- Deduplication of Candidate Resumes | Matching Engine