R for Music Research
Meet Alex
Getting started with R and RStudio
- Write your code in an R script to be able to save it
- Run code in an R script using
Command
+Return
on Mac,Ctrl
+Return
on Windows/Linux, or by pressing the Run button - Use
install.packages()
to download and install a library package - Use
library()
to load the downloaded package in your environment - Use
help()
,help.search()
and the?
and??
help operators to look up documentation on commands and packages
Creating a directory structure
- The term ‘directory’ in R has the same meaning as the common term ‘folder’
- Use
getwd()
to check your current working directory - Establish a new working directory with
setwd()
- Create new directories using
dir.create("nameofnewdirectoryhere")
or through the navigation pane - Use
list.files()
to view all files in your working directory
Reading survey data in R
- Use
read.csv()
to import a csv data file in R - Use
read_excel()
from thereadxl
package to import an Excel data file in R - Use the assign operator
<-
to give a name to your data set - Specify how to deal with missing values using the
na.strings
argument inread.csv()
when importing a csv file
Inspecting your data in R
- Inspect the dimensions of a data frame using
dim()
- Find out the number of rows and columns of a data frame using
nrow
andrcol
- Use
colnames()
andrownames()
to display column and row names respectively - Use
head()
to view the first six rows of a data frame - Look at data in a specific column by using the
$
operator - Use
[]
to subset data from a data frame - Use
str()
to inspect the internal structure of the data frame
Cleaning your data
- Use the
select()
function from thedplyr
package to remove unneccesary data columns - Use the
filter()
function to omit data based on a specific parameter - Use
is.na()
to identify missing values in the data - Use
na.omit()
to exclude rows with any missing data - Use
write.csv()
to save the cleaned data as a new data file
Analysing survey data
- Use
mean()
andsd()
to calculate the mean and standard deviation of a variable - Use
min()
to identify the minimum value of a particular variable - Use
max()
to identify the maximum value of a particular variable - Include the
na.rm = TRUE
argument in functions when possible for the calculation to ignore missing values in the data - Use
filter()
to subset your data by a specific variable - Use the
t.test
function with the following syntaxt.test(DependentVariable ~ IndependentVariable, data)
to compare whether the means of two groups are statistically different or not
Visualising survey data with ggplot2
- A ggplot has 3 main components: data, aesthetics, and geom
- A ggplot may be customised by adding layers of elements
- Use
geom_point()
to create a scatterplot,geom_boxplot()
for a boxplot, andgeom_bar()
for bar graphs - Use
facet_wrap(~variable)
to create separate plots simultaneously based on the unique values of a variable - Give your plot a title with
ggplot('title here')
and label your axes withylab()
andxlab()
- Save your plot with
ggsave()
A second case study for music research
- An advantage of using R to work with data is that the same code can be run for different data sets of different sizes, subject to the data sets being in similar formats
Some best practices when writing code in R
- Keep your files organised in your working directory
- Consider your working directory and what is required to reproduce your code (e.g., packages)
- Be consistent in your naming conventions
- Use
#
to add comments to your code