TABLE OF CONTENTS
- Text (Content) Extraction
- Information Extraction
- Understanding NER
- Limitations of Resume Parser
- Skillate’s Structured Resume Format
A resume parser (CV parser) is used within human resource software and on recruitment websites, job boards, and candidate application portals to simplify and accelerate the application process. It does so by extracting and classifying thousands of attributes about the candidate.
A resume parser also provides the foundation for a semantic search of candidate data. The parser identifies different kinds of information within a resume or CV and tags each data point (for example, Work Experience, Educational Background, Skills, and Personal Details of the candidate).
Resume Parsing on Skillate has two essential steps:
1. Text (Content) Extraction,
2. Information Extraction
Text (Content) Extraction
In the Content Extraction step, the resume content is extracted from the uploaded files (from various formats of raw documents .pdf, .doc, .docx, etc. formats).
Information Extraction
The next step in resume parsing is extracting structured information from unstructured or semi-structured machine-readable documents.
A typical resume is a collection of information that includes the Work Experience, Educational Background, Skills, and Personal Details of a candidate. And this information will be presented on a resume in various formats: tables, multiple lines, sections, etc.
Deep Learning Algorithms in NLP (Natural Language Processing) help extract information from the content of a resume. Skillate has trained a custom Deep Learning NER (Named Entity Recognition) model based on Google’s BERT language model with the help of over 100000 resumes.
Understanding NER
Named Entity Recognition (NER) helps fetch the information from the extracted content. The NER locates and classifies the named entities in the unstructured text into predefined categories such as the person names, organizations, locations, etc. These are part of Skillate’s customized NER model (AI).
Consider the below two statements: ‘2000–2008: Professor at IIT Kanpur’ ‘B.Tech in Computer Science from IIT Kanpur’ Here, IIT Kanpur will be treated as an Employer Organization in the former statement and as an Educational Institution in the later. We can differentiate between the two meanings of IIT Kanpur here by observing the context. 1. The first statement has Professor which is a Job Title. So, IIT Kanpur will be tagged as a Professional Organization. 2. The second one has a degree and major mentioned. So, IIT Kanpur will be tagged as an Educational Organization. The below snippet is from our NER model results. It shows how the model can recognize and differentiate the different meanings for the phrase “IIT Kanpur” in various contexts. Each word has a corresponding label.TIT — Designation COM — Professional Organization INS — Educational Institute DEG — Degree OTH — Other
Relying solely on the NER model will not yield high accuracy in all cases. Hence, Skillate has created post-processing algorithms using NLP for sanitizing the information extracted from the resumes.
Limitations of Resume Parser
Even with all the advancements and research into Deep Learning and other NLP technologies, achieving 100% accuracy in AI is impossible. Improving the accuracy of the models is a continuous process due to the size of the training data and time for training.
Below are some of the cases where Skillate’s parsing accuracy is not at its best -
Complex resumes with multiple vertical/horizontal sections and multiple tables.
Inconsistent patterns or tabs or whitespaces
Resumes with images, diagrams, arts, etc.
Resumes created from screenshots, scanned copies, photographed, etc.
Wrong information or format furnished by the candidate.
Skillate keeps testing and updating these algorithms to improve the overall parsing quality.
Skillate’s Structured Resume Format
The resume parser gets the parsed resume in a structured format. Provided below is the structured resume format generated by the Skillate resume parser.
The structured resume in Skillate will have the following fields captured:
Name
First name
Middle name
Last name
Emails
Phone Number
Total Number of Working Experience (Years)
Education Detail(s)
Institute
Degree
Major
Start year & month *skillate counts month starting from 0 to 11
End year & month
Whether current institute or not
Description / summary
Grades
Experience Detail(s)
Company
Job Title
Years of Experience
Start year & month
End year & month
Industry
Whether the current company or not [BOOLEAN]
Description / summary
Location
Skills
Core skills
All skills
Functional & Behavioral skills
Current Company
Current Location
Current Job Title
Latest Institute
Latest Degree
Latest Major
Highest Degree
Highest Major
Profile / Social links (e.g.: LinkedIn / GitHub profile, website link, etc) [LIST]
Overall summary / description
Learn more about Skillate AI- Deduplication of Candidate Resumes | Matching Engine