《Ten Advanced SQL Concepts You Should Know for Data Science Interviews》

Original link: https://towardsdatascience.com/ten-advanced-sql-concepts-you-should-know-for-data-science-interviews-4d7015ec74b0

As the volume of data continues to grow, so does the demand for qualified data professionals. Specifically, there is a growing demand for SQL-fluent professionals, not just at the junior level.

So Nathan Rosidi, the founder of Stratascratch, and I think I think of the 10 most important and relevant intermediate to advanced SQL concepts.

If you want to query a subquery, that’s when CTEs come to the fore – CTEs basically create a temporary table.

Using common table expressions (CTEs) is a great way to modularize and break down your code, in the same way that you break down your article into several paragraphs.

Use a subquery in the Where clause to make the following query.

This may seem difficult to understand, but what if there are many subqueries in the query? This is where CTEs come into play.

It is now clear that the Where clause is filtered in the name of Toronto. CTEs are useful if you notice that you can break down your code into smaller chunks, but they are also useful because it allows you to assign variable names (i.e. toronto_ppl and avg_female_salary) to each CTE.

Similarly, CTEs allow you to complete more advanced techniques such as creating recursive tables.

A recursive CTE is a reference to its own CTE, just like a recursive function in Python. Recursive CTE is particularly useful when it involves querying hierarchical data for organization charts, file systems, link diagrams between web pages, etc.

Recursive CTE has 3 parts:

Here is an example of a recursive CTE that gets the manager ID of each employee ID:

Check this out if you want to learn more about temporary functions, but knowing how to write temporary functions is why it’s important:

Consider the following example:

Instead, you can take advantage of temporary functions to capture case clauses.

With temporary functions, the query itself is simpler and more readable, and you can reuse the seniority function!

You are likely to see many problems requiring the use of CASE WHEN when making statements, simply because it is a versatile concept. If you want to assign a value or class based on other variables, you are allowed to write complex conditional statements.

Less well known, it also allows you to pivot data. For example, if you have a month column and you want to create a single column for each month, you can use a statement to trace the situation of the data.

Example problem: Write an SQL query to reformat a table so that there is one revenue column per month.

Except for almost different operations. They are both used to compare rows between two queries/tables. As said, there are subtle nuances between these two people.

First, in addition to filtering to remove duplicates and return different rows with different rows that are not in the middle.

Similarly, except for the same number of columns in the query/table, where a single column is no longer compared with each query/table.

An SQL table joins itself. You might think it doesn’t work, but you’ll be surprised at how common this is. In many real-life situations, data is stored in one large table rather than many smaller tables. In this case, self-connection may be required to solve unique problems.

Let’s look at an example.

Example problem: Given the employee table below, write out a SQL query to understand the salaries of employees who are paid more than their managers. For the table above, Joe is the only employee who pays more than his manager.

It is a very common application to rank rows and values. Here are some examples of how companies often use rankings:

In SQL, there are several ways you can assign “ranks” to rows, which we’ll explore with examples. Consider the following Query and the result:

ROW_NUMBER() returns the unique number at the beginning of each line. When there is a relationship (for example, BOB vs Carrie), ROW_NUMBER() assigns numbers arbitrarily if no second criterion is defined.

Rank() returns the unique number of each row starting at 1, except that when there is a relationship, the rank () will be assigned the same number. Similarly, gaps will follow a repetitive hierarchy.

dense_rank() is similar to the rank (), except that there is no gap after repeating the level. Note that with the dense_rank (), Daniel ranks 3rd, not 4th ().

Another common application is to compare values from different time periods. For example, what is the delta between this month’s and last month’s sales? Or what was this month and this month last year?

When comparing values from different time periods to calculate Deltas, this is when Lead() and LAG() come into play.

Here are some examples:

If you know about row_number() and lag()/lead(), this may not surprise you. But if you don’t, this is probably one of the most useful window features, especially when you want to visualize growth!

Using the window function with SUM(), we can calculate the total number of runs. See the following example:

You should definitely expect some sort of SQL problem involving datetime data. For example, you may need to group data groups or convert a variable format from DD-MM-YYYY to simple months.

Some of the features you should know are:

Example problem: Given a weather table, write an SQL query to find the IDs of all dates that are hotter compared to their previous (yesterday) date.

That’s it! I hope this helps you in interview preparation – I believe that if you know these 10 internal concepts, you’ll do a great job when it comes to most SQL questions out there. 

Past Recommendations

Some deep thinking about microservices

Is it really safe for Redis to implement distributed locks?

Comics take you to understand “cloud native”, containers, microservices, and DevOps all at once

Talk about continuous delivery