If you’re reading this, you probably suspect it already: data science is a fascinating field… and also overwhelming.
With so many languages, tools, and possible paths, it’s easy not to know where to start.
That’s why one of the questions I get most is:
How do you become a data scientist?
This article is my attempt to answer it clearly.
I won’t promise magic shortcuts, but I will offer a realistic, step-by-step roadmap to understand what a data scientist does, what skills you need, and how you could start, even from zero.
So let’s begin with the first question you should have in mind…
What is a data scientist, really?
A data scientist is someone who turns data into decisions. That can take many forms:
Automating processes with machine learning models
Cleaning and exploring data with SQL and Python
Communicating findings through visualizations
Contributing to products that move the business
And here’s the important part: not everyone does everything.
Some profiles are more technical, others are more analytical, and others act as a bridge between teams.
There’s room for different talents and paths.
Why become a data scientist?
It’s a role with impact, growing demand, and a work environment where you never stop learning.
Having a university degree can help (computer science, math, statistics, engineering), but it’s not essential.
Many professionals arrive via non-traditional routes, combining…
Curiosity
Self-learning
Personal projects
Online training
The key isn’t where you come from, it’s what you can do.
Knowing languages like Python, R, or SQL can open many doors.
And if you want to prove your level, there are recognized certifications that help.
So, now that you’re convinced… let’s look at the skills you need to become a data scientist.
The skills you need
Data science blends technical know-how with human skills. Both matter.
Technical skills
Python and R: your base languages for analysis, visualization, and modeling
Statistics and mathematics: to understand what the data is really saying
SQL and NoSQL: to access, combine, and prepare information
Data visualization: because what isn’t seen isn’t understood
Machine learning: predictive models and decision automation
Deep learning and NLP: to work with text, images, or large volumes
Big Data: when the data doesn’t fit on your laptop
Cloud computing: because today we work in distributed environments
Human skills
Clear communication: explaining your findings is as important as finding them
Data storytelling: giving context and narrative to what you discover
Critical thinking: question, validate, don’t take anything for granted
Business sense: connect your analysis to real decisions
Problem-solving: with creativity and method
Teamwork: projects are always collective
Where to start?
Here’s a roadmap in 8 steps you can adapt to your context:
1) Learn to code
If you want to work with data, you need to program. You don’t have to be a software engineer, but you should know how to manipulate data with code. The three essential languages:
SQL: to access data
Python: to transform, analyze, and model it
R: a powerful alternative for statistics and visualization
Practical tip: start with SQL. It’s widely used, stable, and has a friendly learning curve. One resource I like is SQLShortReads, which has an excellent introduction.
I also published an introductory SQL course in my DataBites newsletter, and I’ll release the practical part in the coming weeks. Take a look if you’re interested.
Then move to Python. It lets you analyze, build models, and automate tasks. Kaggle’s Python intro course is a great starting point. You can also practice directly in Google Colab or Kaggle Notebooks without installing anything.
2) Learn to wrangle, visualize, and communicate data
One of the first real challenges is facing “real” data: messy, incomplete, poorly structured.
The process of cleaning, transforming, and preparing this data is called data wrangling, and it’s one of the most useful skills from the start.
To begin: Kaggle has solid beginner courses on data cleaning, working with tables in pandas, and data visualization. I recommend starting there.
Recommended tools
In Python: pandas, matplotlib, seaborn, plotly
No-code: Power BI, Tableau (you can get started with beginner courses in both)
Personal tip: it’s not just about finding insights—it’s about telling them well. Explaining your findings clearly and visually is a huge advantage. Invest time in data storytelling and communicating with non-technical audiences.
Trust me, it makes the difference.
3) Strengthen your foundations in math, statistics, and machine learning
You don’t need a PhD in stats or math to become a data scientist, but you do need a solid base to understand the models you’ll use and avoid treating them as a black box.
Key topics to master (or at least understand):
Probability, distributions, inference
Linear algebra: vectors and matrices
Calculus: derivatives and optimization
ML: regression, classification, overfitting, cross-validation
Useful resources:
StatQuest: the perfect channel to understand stats and math clearly
Khan Academy: great for reinforcing math from the basics
Intro ML courses like fast.ai
4) Understand how databases work
Data rarely arrives in a perfect CSV. Most of the time it lives in complex systems or relational databases.
📚 To practice:
Learn PostgreSQL or MySQL with YouTube tutorials
Install PostgreSQL + pgAdmin to set up a local environment
Learn to connect everything with Python using SQLAlchemy or psycopg2
The next natural step is the cloud.
5) Get familiar with Big Data and the cloud
As data grows, tools change. Many companies work in the cloud and process data in distributed systems using AWS, Google Cloud, or Azure. It’s worth getting to know services like S3, BigQuery, or data lakes.
Personally, I enjoy working with Google Cloud. The free Google Cloud Skills Boost is solid, and creators like TheCloudGirl make very accessible content.
6) Practice, build projects, and connect with others
This is, honestly, the most important part. No matter how much theory you read, if you don’t practice, you won’t learn. Build your portfolio with personal projects, analysis challenges, well-explained notebooks, and mini-apps with Streamlit or dashboards in Power BI or Tableau.
You can learn the earlier steps directly through practice and turn them into projects you can show.
If you’re out of ideas… the whole internet is there for inspiration.
Some useful GitHub repos:
Once you start building, you need a place to document, share, and keep things organized.
That place is GitHub.
But first, learn the basics of Git, the version control tool: save changes, collaborate without stepping on each other’s toes, and roll back when you break something (it happens more than you think).
Recommended resources:
GitHub Skills: free, guided mini-courses from GitHub
7) Get an internship or your first job
Once you have some basics and a few projects, start applying for internships or junior roles. You don’t need to know everything. You do need to show you can learn.
Prepare well:
A clear, structured portfolio (GitHub, Notion, Medium)
A concise LinkedIn profile oriented to what you want
Practice interview-style exercises: SQL, business reasoning, EDA
Key advice: how you communicate is worth as much as (or more than) your code.
8) Connect with the community
Data science moves fast. The best way to keep up is to be close to others who are learning, building, and sharing.
Follow people you trust, join Discord or LinkedIn communities, go to events. And if you can, share what you’re learning: writing, teaching, or explaining will help you lock in what you’ve learned.
A final note
If you’ve made it this far, you’ve already taken the first step: understanding that this isn’t about knowing everything—it’s about moving forward bit by bit.
With patience, curiosity, and consistency.
No one starts out knowing.
But we all start in the same place: by taking the first step.
Are you in?
Hope to see you in the community soon!
Sincerely,
— Josep
Are you still here? 🧐
👉🏻 I want this newsletter to be useful, so please let me know your feedback!
Before you go, tap the 💚 and the restack buttons at the bottom of this email to show your support—it really helps and means a lot!
Any doubt? Let’s start a conversation! 👇🏻