Strategies for Learning Data Science in 2020

  • by
Strategies for Learning Data Science in 2020

Strategies for Learning Data Science in 2020 – We’re going to cover about how to become a data scientist in 2020. If you’re wondering on the path that is required to become a data scientist if you’re starting out or if you’re interested in the field, what path should you take in order to become a data scientist? Let’s say should you have a computer science degree in order to become a data scientist or if you have a non-technical background could you also make that kind of transition to the field.

So based on my own experience, I am NOT a computer scientist. So as you may recall from my very first episode my first undergraduate degree is in biological science and because of my interest in computers and data analytics. I have been self-studying in order to learn the concepts that are necessary for doing data science. So if you think of it, data science is a multidisciplinary field which encompass several disciplines such as informatics computer science (of course) statistics are science mathematics, data visualization and most importantly problem-solving.

So if you look into the website LinkedIn and look for data scientist you will notice that there are various first degree that the data scientists have. So the background is quite diverse. If you look at that you will see that many are not computer science graduates. However if you have a computer science degree you may have an advantage. If you don’t have a computer science degree that’s okay. So just the passion to learn about the field and to apply data science to whatever field you are in, I think that is what it takes to become a data scientist.

There are quite big data in the field of biology. I have made the transition to become a data scientist by applying data science in analyzing the big in biology and biomedical sciences in order to find and discover new drugs to understand the mechanism of action of drugs and also to create diagnostic tools that will be able to assist clinicians and health professionals in diagnosing patients for a particular disease of interest.

So the second question that you may be wondering about is the amount of time that is needed to become a data scientist. So if you have a computer science degree, the time to become a data scientist would not be so long because you already have the fundamentals, you already have the technical background, you already know how to program. So that will make your transition much quicker. So let’s say that if you are a web developer and you’re going to learn about R or Python then you will be at a better position to learn both languages or either one of the languages than a non-technical person coming from say biology.

So for a biology major the time that it takes to learn R or Python might be longer, it really depends on the background of the individual person. However I believe that if you have the mindset to learn, if you have the passion to learn that I think that is all that is necessary so I read somewhere that if you spend say maybe 10,000 hours you will be able to master anything skills or knowledge. So let’s say that you spend about two hours a day learning about the concepts of programming and data science so I believe that within a year or two you will be able to learn enough to become a data scientist.

So given that you also practice. So the next question that you may have is do you need to learn how to program to become a data scientist. So it really depends yes or no so before I learn how to program I use this program called WEKA. It is a point-and-click graphical user interface software that allows me to analyze data that I have compiled during the course of my PhD study.

So over time I began to notice that analyzing the data via the point-and-click interface was not so efficient and it really require manual time in which I have to physically use the mouse to click the program, to import the data, to specify the input parameter to initiate the training of the model, to collect the data, to put it into Excel, to combine it and so all of these are quite tedious.

And so I remember that during the course of my PhD study I use let’s say maybe 40 to 50 computers at the same time. So each computer I will run some simulation and then I will manually collect the data from each of the 40 computers and then pull it together and analyze the data in a Microsoft Excel. So is it possible to analyze the data so you might become a data analyst or a bio- informatician if you’re in the field of biology so yes it’s possible to become a junior data scientist and however in order to be efficient if you know how to program just only a little bit that will greatly speed up your workflow. So yes I would recommend learning how to program so if I could turn back the clock to the time when I was doing my PhD which was about 15 years ago I would indeed want to learn how to program so that would greatly increase the speed at which I perform the project.

So now the very important question is what language should you learn how to program so if you’ve been googling or if you’ve been watching videos on YouTube you may come across two languages that are very popular for data science. So the first one being R and the second one being Python though there’s a debate whether to let R or Python so this really dependent on your own personal preference or your mentor. So personally I’ve learned Python first not because of any particular reason of the language itself.

So the decision to learn Python was rather due to the fact that one of my colleague he knows already how to use Python and we were working together on a research project in which he coded in Python and so at the time I had no programming experience so he recommended a book a Python book and I looked at his source code in which we published together.

We generated artificial data set for bacterial and edification so that work we published the paper in the EXCLI Journal and over the years I had another master student over the course of his master’s degree I have learned R as a new language to do data science and also to help to tweak his code so really the language depends on your own personal preference R or Python so if you have a mentor who knows R then go ahead and learn R if you have a mentor who knows Python go ahead and choose Python.

If you have no mentor then follow this channel and I can be your virtual mentor so you can ask your question in the comment section or you can also ask and then you could post a question in the comments down below so the next step to becoming a data scientist is you have to become familiar with the standard library of Python or the standard package and modules of R so you have to know what packages or libraries R or Python are available for you to wrangle with your data, to pre-process your data also to create your prediction model.

So for example if you want to deal with data frames in Python you would use pandas so you have to learn about pandas how you can merge different data frames together and in R it is DPLYR and also the data frame built-in function of R. If you want to know how to create graphs in Python you would use matplotlib and sea- born in R you would use the our base plot function or the ggplot2 in order to do machine learning or build prediction models in python you would need to use scikit-learn or you want to use keras or tensorflow for your deep learning models in our you would use carrot and rattle and also tensorflow and keras as well one of the particular package that I really like is called shiny.

So this package in our allows me to create web applications that are data-driven and we published several papers in collaboration with one of our colleague at the University. And the most important point of becoming a data scientist is that you have to persevere, you have to try hard so this journey won’t be easy but you have to put in your effort and most importantly of all you have to code you have to do data science project and so on this channel we’re going to have a series of data science project in R.


Useful links:

reference – Strategies for Learning Data Science in 2020

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *