5 CORE Data Science Skills You Should Master

  • by
5 CORE Data Science Skills You Should Master

5 CORE Data Science Skills You Should Master – Welcome everyone, let’s get straight to the topic. If you treat your Data Science career seriously, you should be proficient in these 5 skills and not miss a single one. Great, now let’s get started. Without the first skill, you wouldn’t be able to even tackle other Data Science processes. This one is crucial because everything begins with data.

I call this skill: data manipulation.

First of all, most of the time you will be forced to create your own dataset. Data in the real world doesn’t look like a Kaggle dataset. It’s not structured. It lacks information in many rows.

It often has accidentally-typed values. Generally, it doesn’t look like you want it to look like. At this stage, Pandas and Numpy libraries come to the rescue. They make data manipulation so much easier. If you’re not sure where you should start learning Data Science, you’ve got the answer: Pandas and Numpy libraries.

When you’ve got your data in the right format, you should manipulate it, so it’s digestible by the model you choose. It’s not a matter of this article to dig deeper, so let’s jump straight to the second skill. The next skill that’s essential for your

Data Science career is building a model itself.

You should be familiar with different possibilities and you should know how to implement them. In the beginning, it’s perfectly fine if you just learn how to use available libraries. You will be shocked by how easy it is to build a model nowadays. When you’ll be able to use your skills in practice in the easiest possible scenario, it will be a good moment to dig a little bit deeper.

You could start looking for other algorithms that are exceptionally useful in some specific situations. It depends on the amount of your free time on whether it will be better for you to focus on just one area of Machine Learning or to tackle a wider range of possibilities. If you’d like to start learning these skills, I strongly suggest taking a closer look at Xgboost library for regression and classification problems, Keras for problems like Computer Vision, so problems that require Deep Learning techniques and one NLP library – for example NLTK.

Knowing basics in these 3 libraries will give you a really solid foundation for a future career.

The third skill is training your model.

What I mean by training your model, isn’t just starting a training itself because that’s most of the time just 1 line of code. Generally we could simplify everything to one sentence. It’s a skill of training your model to solve the problem in the most acceptable way withing particular constraints of money and time. It sounds simple but in reality it’s a really complicated problem.

First of all you should be able to distinguish what hardware is optimal for particular algorithms – which are CPU optimized and which require GPUs to work efficiently. Another important factor is memory as some algorithms need loads of RAM and for others you need for example very fast hard drives.

When you’re aware of hardware you should use for a particular problem, you should start thinking about appropriate parameters, responsible for how the model behaves during the training. It’s a good rule of thumb to start manipulating them manually to get a better feel of how different changes affect the model.

Just be aware that in the future you’ll run auto hyper-parameter tuning and your role will be to just inform the algorithm of ranges within which it should look for ideal parameters.

The next skill is testing your model.

It’s all about understanding a business problem and what’s been impacted by particular changes in your model. And here comes the confusion matrix which beautifully represents how particular model works. You should be able to distinguish 4 possible results which are: true-positive, true-negative, false-positive, and false-negative, and afterward you should be able to judge what’s the most important for the business.

Just imagine a situation where you build a model responsible for distinguishing terrorists from the crowd. 99% accuracy might look great on the paper but it actually would be a horrible result when trying to catch such an unusual event. Marking everyone as not a terrorist would give better accuracy and it wouldn’t have any real sense.

That’s why the most important is minimizing cases, where a terrorist wasn’t recognized but still keeping a number of normal people recognized as terrorists as low as possible. At first glance it might be confusing but when you work on a couple of examples like this one, everything will be clear. To interpret testing results appropriately and react by changing particular parameters in the right way, it requires just lots of practice.

The next skill skill is deploying your model, so giving it real life.

You deploy a model to give it a chance to talk with other applications – it might be everything, web app, mobile app, desktop app, everything. It can be a strong distinguishing feature for everyone looking for a job as I noticed that most courses focus only on building a model and there isn’t much information about deploying your model to the world.

Fortunately, it’s not hard at all and you’ve got many possibilities here. You could use Flask as one of the simplest libraries for building APIs but there’s also Hug, which is constantly developed and it’s yet easier! I mentioned Hug in my previous article, so if you haven’t watched it, feel free to check this out.

Many people feel some kind of resistance to learning how to build APIs and if you spend literally 2-3 days on reading and trying it out, you will be perfectly fine for the start. Smash that like button if you like it and don’t hesitate to dislike if you don’t like it as I appreciate every feedback.

Let me know in the comments, what’s your plan for developing new skills in Data Science.


Useful links:

reference – 5 CORE Data Science Skills You Should Master

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *