Data Science: Queries

  • by
Data Science: Queries

Data Science: Queries – How do we extract information from tabular data? The answer, is a query! A query is computer representation of a question we want answer using a table of data. They allow us to ask questions of the data and return answers as results.

Queries are created using programming languages. More specifically, we use a special type of programming language called a query language. The most popular query language is Structured Query Language (or SQL for short).

However, you can also perform queries using other programming languages like Python and R. For example, let’s say we want to answer the question:

“What was Bill’s average body temperature over the past 5 years”. We could write this question in the form of a SQL query. The SQL query allows us to express that we want to: select the average temperature – from the Vital Signs table where the patient’s name is “Bill” and the date is greater than or equal to January 1, 2015 (which we’re assuming is five years ago).

I’ve intentionally kept this query simple for those of you who have never seen a SQL query before. However, if you’ve had some experience with SQL, you may see several ways we could improve this query.

When we execute this query: First, the database will scan all of the records in the Vital Signs table Next, it will filter out anyone who is not our patient named Bill. Then, it will filter out any row that is older than January 1, 2015.

Next, it will select just the Temperature column from the Vital Signs table. Finally, it will compute the average of the remaining temperature values. The computer will then return this value as a result.

As we can see, the result of executing this query is 37 degrees Celsius. Essentially, this is how we get from a question to an answer, using a query. We start with a question in our natural language. We construct a SQL query to express that question to the computer.

We execute the query which returns a result. And then we express that result in the form of an answer to the question. Queries are the primary tool used for extracting information from tabular data in data science.

Share this post ...

Leave a Reply

Your email address will not be published. Required fields are marked *