Abstract
This article presents a practical case study of building a GenAI-powered system for matching CVs to job descriptions and ranking candidates fairly, based on insights from the Kaggle/Google GenAI Intensive Course. We detail the end-to-end pipeline, discuss prompt engineering, structured outputs, fairness evaluation, and lessons learned, with a focus on real-world HR applications at Parser.
Sample Case: Matching CVs to job descriptions and ranking candidates fairly
Over the past few years, we’ve seen Generative AI move from hype to reality. But reality brings complexity: building real-world GenAI systems isn’t just about creative prompts — it’s about reliability, structure, fairness, and real-world impact.
Earlier this month, I completed the 5-Day Generative AI Intensive Course, a collaboration between Kaggle and Google. This experience not only gave me the chance to work hands-on with Kaggle Notebooks and the Gemini model APIs on a real-world-like project, but also helped me truly grasp many of the abstract concepts I had been studying — like agents and MLOps — by putting them into practical use. While the course focused on Gemini and Kaggle for rapid prototyping and seamless Google Cloud integration, I’ve also considered the use of Anthropic’s Claude (noted for its safety features), OpenAI’s GPT-4 (industry-leading performance and Azure integration), and AWS Bedrock (enterprise scalability and multi-model support). Then, I’ve chosen Gemini for its native integration with Kaggle Notebooks and Google Cloud, which accelerated our experimentation and deployment.
The experience was far from just academic — it demanded hours of reading research papers, listening to podcasts, and experimenting with predefined notebooks. Over five intensive days, we covered a lot: starting with building prompts, handling retries and errors, producing structured outputs (like enums and JSONs), diving into embeddings and semantic search, and learning to leverage key libraries like Keras, scikit-learn, Pandas and frameworks like LangGraph. We explored model evaluation, agents orchestration, fine-tuning strategies for real-world applications, and capped it all off with an eye-opening session on MLOps — guided by the excellent Agent Starter Pack.
By the end of the program, I was able to bring everything together in a practical and meaningful way — applying GenAI principles to solve a real challenge we faced at Parser: matching CVs to job descriptions and ranking candidates fairly. (You can explore the full working prototype notebook in this link: CVs to JD Matcher & Ranker – Kaggle Notebook).
What follows is a breakdown of what I learned, how I applied it, and how these ideas can help us all build smarter, more responsible GenAI systems.
Defining the Problem: CVs and JDs Don’t Speak the Same Language
The challenge I tackled was clear: How can we intelligently match and rank candidate CVs to job descriptions, when both are written in wildly different formats, styles, and levels of detail?
Traditional keyword matching breaks easily. We needed something smarter — a system that could understand context, evaluate skills holistically, and output structured, auditable results.
By adding structure to the GenAI outputs, the system became a strong fit for the task, even though it introduced some variability in the results — meaning the outputs were not fully deterministic, but still consistent and explainable.
Following are some of the most relevant key learnings and how I applied them the CVs to JDs matching & ranking problem:
1. Structured Outputs: Making AI Results Machine-Readable
One of my biggest realizations early on was: If you want GenAI to feed into real systems, you must structure its output.
Instead of asking the model for a “summary,” I prompted it to extract CVs into consistent JSON objects (see response_mime_type=”application/json” in the next code snippet):
def standardize_cv(model, cv_text):
prompt = f""" Extract the following fields from the CV and return only a JSON object:
- Personal Information
- Education
- Certifications
- Work Experience
- Skills
- Languages
CV Text:
{cv_text}
"""
response = model.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(
temperature=0.0,
response_mime_type="application/json"),
contents=prompt
)
return json.loads(response.text)
Example output from the standardize_cv function:
{
"Personal Information": {
"Name": "Jane Doe",
"Email": "jane@example.com"},
"Education": [{
"Degree": "BSc Computer Science",
"Institution": "Parser University"}],
"Certifications":
["AWS Certified Solutions Architect"],
"Work Experience": [{
"Role": "Software Engineer",
"Company": "TechCorp"}],
"Skills": ["Python", "Cloud", "ML"],
"Languages": ["English", "Spanish"]
}
With structured CVs, analysis became possible — and reliable.
2. Few-Shot Learning: Teaching the Model by Example
Another powerful insight: Don’t just tell the model what you want — show it.
To generate detailed job descriptions from a short brief, I fed the model three (but it can be extended to much more than that) real examples of full JDs and asked it to mimic the structure and depth. Those JDs were passed thru few_shots_example_jds parameter:
def create_job_description_with_few_shot(model, short_description, few_shots_example_jds):
def create_job_description_with_few_shot(model, short_description, few_shots_example_jds):
prompt = f"""
You are an expert HR professional tasked with creating a detailed job description based on a short initial description.
Learn from the following example job descriptions to understand the typical format, sections, and level of detail expected.
Use these examples as a guide to create a comprehensive job description.
Short Job Description: {short_description}
"""
prompt += "\n\nExample Job Descriptions:\n"
for i, jd in enumerate(few_shots_example_jds):
prompt += f"Example {i+1}:\n{jd}\n"
prompt += "\n\nBased on these examples and the short description, create a comprehensive job description."
response = model.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(temperature=0.0),
contents=prompt
)
return response.text
The result was a much richer, human-quality job description that matched company tone and role expectations — without hallucinations or generic templates.
3. Multi-Step Workflows: From Prompt to Pipeline
Building a GenAI system isn’t about magic — it’s about creating a well-orchestrated sequence of steps.
Here’s a quick pseudocode view of the main pipeline:
# Step 1: Create detailed JD
jd_text = create_job_description_with_few_shot(model, short_jd_input, jd_examples)
# Step 2: Standardize CVs
cv_data_list = [standardize_cv(model, cv) for cv in raw_cvs]
# Step 3: Analyze each CV against JD
analyses = [analyze_cv_vs_jd(model, jd_text, cv_data, evaluation_criteria) for cv_data in cv_data_list]
# Step 4: Rank candidates based on total weighted score
ranked_cvs = rank_cvs(analyses)
# Step 5: Evaluate fairness and bias
fairness_report = evaluate_fairness_and_bias(model, jd_text, analyses)
# Step 6: Generate interview questions for top candidate
top_candidate_cv = select_top_cv(ranked_cvs)
interview_questions = generate_interview_questions(model, jd_text, top_candidate_cv)
As part of the evaluation, we scored the CVs using weighted categories:
evaluation_criteria = {
"formal_education": {"weight": 1},
"certifications": {"weight": 2},
"programming_languages": {"weight": 3},
"cloud_experience": {"weight": 2},
"similar_experience": {"weight": 3},
"job_stability": {"weight": 1},
"english_proficiency": {"weight": 3}
}
These categories balance technical skills (programming, cloud), formal qualifications (education, certifications), and soft indicators (job stability, language proficiency). The higher weights prioritize technical fit and communication skills — both critical for engineering roles.
The following high level diagram represents a simplified view of the overall process presented earlier.
Figure1. The diagram illustrates the flow from raw CVs and job descriptions through the GenAI-powered standardization, analysis, ranking, fairness evaluation, and interview question generation modules.
4. Interview Question Generation: Targeting Candidate Strengths
Once top candidates were identified, the system generated personalized interview questions. Example output:
Question: Can you walk us through a Python-based project where you integrated cloud services like AWS or Azure?
Expected Answer: The candidate should explain the project context, mention specific services (e.g., S3, EC2, Lambda), describe their role, and discuss challenges and learnings.
This allowed interviewers to validate real experience and dive deeper into the areas that matter.
5. Responsible AI: Evaluating Fairness and Bias
One critical realization: even a technically strong system needs fairness checks.
I built a step where the model critiques its own scoring:
def evaluate_fairness_and_bias(model, jd, cv_analyses):
prompt = f"""
Identify any fairness or bias risks in the following CV analyses. Only use the information provided; no external assumptions.
Job Description:
{jd}
CV Analyses:
{json.dumps(cv_analyses, indent=4)}
"""
response = model.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(temperature=0.0),
contents=prompt
)
return response.text
The model flagged risks like overemphasis on certifications and native language bias — helping surface issues recruiters should watch for.
What Worked Well — and Where the Technology Still Struggles
Worked well:
- Clear, explicit prompting
- Few-shot learning with examples
- Structured outputs via JSON mode
- Multi-step pipelines for reliability
Challenges:
- Stochastic outputs: Some variability even at temperature=0 (averaging results helped).
- Parsing errors: Occasionally JSON format drift needed retries.
- Scalability: For large-scale use, a vector search + RAG architecture would be essential.
The Art of the Possible
Imagine scaling this further:
Architectural:
- Transforming each major functionality to an individual agent orchestrated through LangGraph.
- Creating a standalone app that stores the CVs extracted JSON to a searchable DB.
- Complementing the current scoring computation with similarity search between the CVs and JD embeddings.
Functional:
- Dynamic JD-to-CV matching on career portals.
- AI-generated personalized feedback to applicants.
- Real-time bias alerts during hiring.
We have the building blocks. What’s needed now is reliable, auditable, ethical GenAI engineering.
Final Thoughts
This capstone project taught me that Generative AI isn’t just about prompts — it’s about full system design.
Precision matters. Structure matters. Fairness matters.
And most importantly: GenAI systems still keep humans in the loop with sensitive cases — questioning, curating, improving.
I’m excited to bring these lessons into future projects — and to keep pushing for smarter, more responsible AI systems.
References
- Google. Whitepaper Foundational Large Language Models & Text Generation : Google : Free Download, Borrow, and Streaming : Internet Archive
- Google Cloud. “Gemini API Documentation.” https://cloud.google.com/ai/gemini/docs
- LangChain. “LangGraph: Orchestrating LLM Agents.” https://www.langchain.com/langgraph
- Google Cloud. “Google’s reference Agent Starter Pack.” https://goo.gle/agent-starter-pack
- Full Kaggle Notebook prototype for the CVs to JD Matcher & Ranker: https://www.kaggle.com/code/martinmiceliparser/cvs-to-jd-matching-ranker
- Parser Digital website. https://parserdigital.com