Hey you all!
It’s Josep here again one more week! 👋🏻
This time, we’re taking a deep dive into SQL execution order, but with a twist—we’re adding JOINs to the mix!
You may already know that SQL doesn’t execute queries like we write them. But when multiple tables come into play, things get a bit more tricky. 🤔
Stick with me for just 6 minutes—I promise it’ll be time well spent! 🚀
The most important news of the week?
I’m currently in Madrid, teaching Data Visualization classes—and let me tell you, there are some exciting changes and new projects coming soon. Stay tuned!
Now that we’ve caught up, let's dive into the important stuff 👨🏻💻.
Before we dive into today’s topic, I highly recommend checking out my earlier SQL-focused issues:
📌 Why Understanding Execution Order Matters?
When writing SQL queries, what you see isn’t always what happens first. SQL processes queries in a specific order, and understanding this can make or break your queries—especially when JOINs enter the scene.
In my previous databites issue, we broke down SQL’s execution order.
But what happens when multiple tables are involved?
That’s what we’ll explore today.
SQL Execution Order - Recap
SQL isn’t read top-to-bottom as we write it—it has its own execution order!
Here’s how a basic query is processed:
1️⃣ FROM – Identify the data source 📦
2️⃣ WHERE – Filter rows 🔍
3️⃣ GROUP BY – Group similar data 📊
4️⃣ HAVING – Filter groups ✂️
5️⃣ SELECT – Pick the columns 🎨
6️⃣ ORDER BY – Sort the results 📚
7️⃣ LIMIT – Keep only a certain number of rows 🔢
Simple enough, right? But JOINs introduce a twist! 🌀
But before we start…
Here’s a Cheatsheet of the Week! 🎁👇🏻
SQL Execution Order with JOINs - Step by Step
#1 FROM - Defining the Base Table(s)
This is where SQL starts everything. It determines which table(s) we’re working with.
➡️ If we’re using JOINs, SQL processes them here before doing any filtering!
FROM employees AS e
LEFT JOIN employee_company_data AS ed
ON e.ID = ed.ID
At this point, SQL combines the tables based on the JOIN condition. The type of JOIN (INNER, LEFT, RIGHT, FULL) affects which rows stay in the result set.
#2 WHERE - Filtering Rows 🔍
Now that we have a combined dataset, SQL filters individual rows before aggregation.
➡️ If you filter before the JOIN, you might accidentally remove rows you actually need.
WHERE ed.Experience_years > 5
Think of WHERE as a detective’s magnifying glass—it zooms in on what really matters. 🔍
#3 GROUP BY - Aggregating Data 📊
Once SQL has the right rows, it groups similar data into categories.
➡️ This is especially useful when calculating averages, totals, or counts per category.
GROUP BY e.Field
Example: Want to find the average salary per department? GROUP BY makes it happen!
#4 HAVING - Filtering Groups ✂️
HAVING is like WHERE’s big sibling, but for grouped data.
➡️ It filters after GROUP BY, meaning it only works with aggregated values.
HAVING COUNT(e.ID) >= 5
This ensures that only departments with an average salary greater than 60K are included.
#5 SELECT - Picking the Columns 🎨
Now SQL decides what to show in the final output.
➡️ You can only select columns that are in GROUP BY or are aggregated values.
SELECT
Field,
AVG(ed.Salary) AS Avg_salary,
COUNT(e.ID) AS num_answers
Think of SELECT as an artist picking the right colors for a painting. 🎨
#6 ORDER BY - Sorting the Output 📚
Now that we have our final dataset, SQL sorts the results based on our preferences.
ORDER BY Avg_salary DESC
This arranges departments from highest to lowest average salary.
#7 LIMIT - Keeping it Concise 🔢
If we only want to see a few results, we limit the number of rows returned.
LIMIT 1
Like picking the Top 5 songs of the year, this keeps things focused and efficient! 🎵
💡 Example Query: Top Earning Fields
Let’s put everything together in a real-world scenario.
🛠 Goal:
Find the average salary and number of employees in each field for employees with more than 5 years of experience.
SELECT
e.Field,
AVG(ed.Salary) AS Avg_salary,
COUNT(e.ID) As num_individuals
FROM employees AS e
LEFT JOIN emplyee_company_data AS ed
ON e.ID = ed.ID
WHERE ed.Experience_years > 5
GROUP BY e.Field
HAVING COUNT(e.ID) > 2
ORDER BY Avg_salary DESC
LIMIT 5
✅ What this query does:
🔹 Joins the employees
table with employee_company_data
using a LEFT JOIN.
🔹 Filters employees with more than 5 years of experience.
🔹 Groups data by Field to calculate averages.
🔹 Filters out fields with fewer than 3 employees using HAVING.
🔹 Sorts fields by highest average salary.
🔹 Limits results to the top 5 fields.
And voilà!
⚠️ Common Mistakes & How to Avoid Them
🔴 Forgetting that JOINs happen before filtering
→ Ensure you understand how JOINs change your dataset before applying WHERE filters.
🔴 Using WHERE instead of HAVING for grouped data
→ WHERE works on raw data, HAVING works on aggregated data.
🔴 Selecting columns not in GROUP BY
→ If you use GROUP BY, ensure all non-aggregated columns are included!
🎯 Key Takeaways
SQL executes queries in a specific order—understanding this helps write better queries.
JOINs happen first, before WHERE filters out rows.
GROUP BY & HAVING work together to aggregate and filter grouped data.
ORDER BY & LIMIT finalize the query, ensuring a clean, sorted output.
Still with me? 🧐
As fellow data enthusiast, I’m sure you’d be eager to help me shape some impactful KPIs and take this newsletter to the next level!
So here’s how you can help:
👉🏻 I want this newsletter to be truly valuable for you, so please share your feedback!
Additionally, you can let me know any preference for future content or any idea you think might add more value to you!
Before you go, tap the 💚 button at the bottom of this email to show your support—it really helps and means a lot!
My latest articles 📝
Unlocking the Power of Geospatial Data 🌍 in DataBites.
Fuzzy Logic in AI: Principles, Applications, and Python Implementation Guide in DataCamp.
AI Project Cycle Explained: From Problem Scoping to Real-World Impact in DataCamp.
My Favorite Resources! 👨🏻💻
I strongly recommend following AIgents roadmaps if you want to become a full Data Scientist, ML Engineer, or Data Engineer,
You can follow the Data Engineer Zoomcap on GitHub to become a fully proficient Data Engineer.
Want to get more of my content? 🙋🏻♂️
Reach me on:
LinkedIn, X (Twitter), or Threads to get daily posts about Data Science.
My Medium Blog to learn more about Data Science, Machine Learning, and AI.
Just email me at rfeers@gmail.com for any inquiries or to ask for help! 🤓
Remember now that DataBites has an official X (Twitter) account and LinkedIn page. Follow us there to stay updated and help spread the word! 🙌🏻