Trend-Detection

Detecting Trends in Job Advertisements.

Authors

Khalil Mrini, Kshitij Sharma, Pierre Dillenbourg

Abstract

We present an automatic method for trend detection in job ads. From a job-posting website, we collect job ads from 16 countries and in 8 languages and 6 job domains. We pre-process them by removing stop words, lemmatising and performing cross-domain filtering. Then, we improve the vocabulary by forming n-grams and restrict it by filtering based on named-entity and part-of-speech tags. We split the job ads to compare two time periods: the first halves of 2016 and 2017. A trending word is defined as a word with a higher TF-IDF weight in 2017 than in 2016. The results obtained show a close correlation between the position of a word in its text and its trendiness regardless of country, language or job domain.

Coding Format

Language: Python 3.

Packages Used: nltk, numpy, matplotlib, scipy, polyglot, pandas, pytrends, bs4, requests, urllib, pattern3, pymystem3.

Python Files Description

The files are described hereafter in the order they should be used:

AdzunaJobAdRetriever.py: Generates json files, one per page, of the job ads of Adzuna in the Raw Data folder
AdzunaJobDescriptionFetcher.py: Fetches the descriptions if available from the original website and outputs them in the Raw Text folder
TrendDetectionPipeline.py: Performs all of the trend detection, with the help of the following files:
- TreeTagger.py: Implements a tree tagger class for pre-processing
- SequenceMining.py: Implements the Generalised Sequential Pattern (GSP) Algorithm
TimeSeries.py: Gives counts of the number of job ads collected over time
TrendPositions.py: Computes Trend Positions in the pre-processed text
GoogleTrends.py: Computes the energy of a trending word in Google Trends for comparison

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trend-Detection

Authors

Abstract

Coding Format

Python Files Description

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Google Trends		Google Trends
Lemmatised Text		Lemmatised Text
Raw Data		Raw Data
Raw Text		Raw Text
TF IDF Delta		TF IDF Delta
Time Series		Time Series
Trend Positions		Trend Positions
.gitignore		.gitignore
AdzunaJobAdRetriever.py		AdzunaJobAdRetriever.py
AdzunaJobDescriptionFetcher.py		AdzunaJobDescriptionFetcher.py
GoogleTrends.py		GoogleTrends.py
README.md		README.md
SequenceMining.py		SequenceMining.py
TimeSeries.py		TimeSeries.py
TreeTagger.py		TreeTagger.py
TrendDetectionPipeline.py		TrendDetectionPipeline.py
TrendPositions.py		TrendPositions.py
keys.npy		keys.npy
stopwords.npy		stopwords.npy

chili-epfl/Trend-Detection

Folders and files

Latest commit

History

Repository files navigation

Trend-Detection

Authors

Abstract

Coding Format

Python Files Description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages