After trying to train a model from scratch to predict salary based on job descriptions, I decided to try to do it using a LLM
I used this data: here and used the title and description to predict the normalized_salary. The description was scrubbed to remove any mention of salary amounts.
Using Gpt-4o, the average error was around 29K
With llama-3.2-1B, which is a much smaller model (1B parameters, vs. gpt-4o which has approximately 200B), the average error was around 74K!!
After fine-tuning llama-3.2, I was able to get the average error down to around 20K, significantly lower than even the much larger gpt-4o!
Designed by HTML Codex