Quick Guide: 3 Resume Projects For Data Science – You will not only learn Machine Learning by implementing these projects, but also beef your resume up. Project number one will show your understanding of NLP techniques. It’s gonna be a fake job posting prediction model. Unfortunately, there are plenty of fake job postings on the internet.
Some companies try to gather information about general interest for particular positions and in the worst possible scenario, some are just scams that try to use people to do a particular work for free, stating that it’s necessary to complete some tasks in order to pass to the next step of the recruiting process.
Ughhh… It sucks a lot but there are some people from the margins of our society that don’t care about others at all. Let’s make their lives harder and build a model that distinguishes such postings from those that are valid. We will use a dataset from The University of the Aegean that consists of 18,000 job descriptions, from which about 800 are fake.
It’s a typical classification problem for Natural Language Processing and the possibilities are endless here. I’d personally use one of the gradient boosting libraries like CatBoost or XGBoost because that’s my favorite pick when it comes to classification.
For the purpose of this video I recommend using a different set of algorithms. Knowledge of the NLTK library will perfectly suit your NLP side of a resume and implementing Support Vector Machines will demonstrate your understanding of text classification problems. This dataset is very valuable as occurrences of real and fake job postings are unbalanced, which means you can show the prospective employer how you can handle such a situation.
It’s a very common scenario in the real-world. I’ve attached an article about 8 tactics to use to fight against unbalanced datasets in the description. The second project will be connected with computer vision. Not only will it be developmental but also very useful in a good way. I propose to use a dataset of sign language letters.
It will allow you to tackle a multiclassification problem, as there are 24 possible classes. Thanks to the carefully-prepared images, you’ll be able to play with it directly on your laptop. No strong GPUs needed. This dataset also isn’t too popular which makes it perfect for your resume. Computer vision problems are solved with deep learning and there isn’t a better tool for it than Keras.
Keras is designed to make Deep Learning like a walk in a park. Another wise choice to make would be to use Convolutional Neural Networks If you’re not familiar with this term, I’m including a fantastic, short video of what CNNs are in the description, along with other useful links. When you implement a basic model that is able to recognize particular classes, it’s the perfect time to add API to it.
Ask a friend who’s a mobile developer to implement a simple app with your model in the backend. Just imagine a recruiter asking you about the projects you made and you giving him a phone with your app recognizing sign language. Substantial proof of your project is priceless. The position is yours in most cases.
So we’ve got Natural Language Processing and Computer Vision projects done. Now, it’s time to demonstrate your skills in tabular data problems. Previous models solved classification problems, so this time I’m proposing something for regression. There’s only one king for regression problems and it’s gradient boosting algorithm. I propose solving this problem with the XGBoost library as it’s the most recognizable algorithm throughout data science.
As it happens, I’ve prepared a video where I explained a step-by-step approach on how to tackle a regression problem with XGBoost. You should be able to see this video recommendation in the right upper corner, somewhere here. I used a dataset of FIFA players to predict the ranking of the players, based on several characteristics.
It demonstrates how you’re able to do feature engineering and solve a problem with very high accuracy. Feel free to use my code as a guide to your boosted solution. In the end, it’s very important to not only implement interesting projects, but also to present them in a fashionable way. Github is a perfect place to store your projects. Just remember to include descriptions and focus on READM.me file as it’s your showcase to the employer. I’ll keep my fingers crossed!
reference – Quick Guide: 3 Resume Projects For Data Science
Web enthusiast. Thinker. Evil coffeeaholic. Food specialist. Reader. Twitter fanatic. Music maven. AI and Machine Learning!