Getting Started with R and RStudio: A Beginner’s Guide to Data Visualization
If you are starting your journey in data science, you’ll often hear about R and RStudio. R is a programming language designed for statistical computing and data visualization, while RStudio is a user-friendly IDE (Integrated Development Environment) that makes working with R much easier.
In this guide, we’ll go through:
R basics
Popular visualization packages
Using the inbuilt
plot()functionPlotting with
ggplot2Extending ggplot2 with GGally
And yes—we’ll look at code snippets you can try on your own! 🚀
Setting up R and RStudio
Download R → Install from CRAN (Comprehensive R Archive Network).
Download RStudio → Install from RStudio IDE.
Open RStudio → You’ll see Console, Script editor, Environment, and Plots panel.
Now you’re ready to code in R! 🎉
R Basics
Here are some quick basics in R:
# Arithmetic
2 + 3 # 5
10 / 2 # 5
# Variables
x <- 5
y <- 10
x + y # 15
# Vectors
numbers <- c(1, 2, 3, 4, 5)
mean(numbers) # Average = 3
R is designed with statistics and visualization in mind, so plotting data is one of its strongest features.
Popular R Visualization Packages
R has a rich ecosystem of visualization libraries. Let’s look at the most popular ones:
1. ggplot2 (Most popular for advanced plots)
Based on the Grammar of Graphics.
Highly customizable.
Example:
library(ggplot2)
data(mpg) # Built-in dataset
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
theme_minimal() +
labs(title = "Engine Size vs Highway Mileage")
2. plotly (Interactive plots)
Creates interactive charts.
Great for dashboards.
library(plotly)
fig <- plot_ly(data = iris, x = ~Sepal.Length, y = ~Petal.Length,
type = "scatter", mode = "markers", color = ~Species)
fig
3. lattice (Multi-panel plots)
- Useful for conditioning plots (splitting data by categories).
library(lattice)
xyplot(mpg ~ wt | cyl, data = mtcars,
main = "MPG vs Weight by Cylinders",
xlab = "Car Weight", ylab = "Miles per Gallon")
4. leaflet (Maps in R 🌍)
- Best for interactive maps.
library(leaflet)
leaflet() %>%
addTiles() %>%
addMarkers(lng = 77.5946, lat = 12.9716, popup = "Bangalore")
5. Others worth exploring
cowplot → Enhances ggplot layout.
highcharter → Interactive charts.
corrplot → Correlation matrix visualization.
Using the Inbuilt plot() Function
R comes with a very handy base function:
plot(mtcars$wt, mtcars$mpg,
main = "Car Weight vs MPG",
xlab = "Weight", ylab = "Miles per Gallon",
col = "blue", pch = 19)
👉 Additional Features:
pch→ point shapecol→ colorstype→ "l" for line, "p" for points, "b" for bothabline()→ add regression lines
# Adding a regression line
model <- lm(mpg ~ wt, data = mtcars)
abline(model, col = "red", lwd = 2)
Plotting with ggplot2
ggplot2 is the powerhouse of visualization in R.
Example: Scatter Plot with Smooth Curve
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
theme_classic() +
labs(title = "Displacement vs Highway Mileage",
x = "Engine Displacement (L)", y = "Highway Mileage")
👉 Extras:
geom_histogram()for histogramsgeom_boxplot()for boxplotsfacet_wrap(~class)for multi-panel plots
Going Further: GGally Extension
GGally extends ggplot2 with additional functionality.
Example: Pairwise Scatter Plots (Great for EDA 🔍)
library(GGally)
# Pair plot of iris dataset
ggpairs(iris, aes(color = Species))
This generates a matrix of plots showing relationships between variables—super helpful in exploring datasets quickly.
Wrapping Up
R + RStudio = Perfect combo for beginners and professionals in data science.
Use base R plots (
plot()) for quick visual checks.Switch to ggplot2 for polished, publication-quality visuals.
Explore plotly, lattice, leaflet for interactivity, multi-panels, and maps.
Extend with GGally for advanced exploratory analysis.
📌 Visualization is a core skill in data science, and R provides some of the best tools available.
👉 Start with small datasets (like iris, mtcars, or mpg) and keep experimenting.