Content from Introduction
Last updated on 2024-02-29 | Edit this page
Overview
Questions
- What is the scope and purpose of the lesson?
Objectives
- Understand the scope and purpose of the lesson.
Introduction
This is a lesson created via The Carpentries Workbench. It is written in Pandoc-flavored Markdown for static files and [R Markdown] for dynamic files that can render code into output. Please refer to the Introduction to The Carpentries Workbench for full documentation.
It is an lesson introducing command-line basics and the fundamentals of web data for music researchers.
Key Points
- This lesson introduces the basics of working on the command-line alongside some fundamentals of working with web data
- It is in “pre-alpha” stage, and is a work-in-progress.
Content from Navigating the Filesystem
Last updated on 2023-11-28 | Edit this page
Overview
Questions
- How can I move around on my computer?
- How can I see what files and directories I have?
- How can I specify the location of a file or directory on my computer?
Objectives
Understand the filesystem tree and explain how to navigate between directories
Explain the function of the commands
ls
,pwd
andcd
Demonstrate how to view and move around inside a file hierarchy
Explain the similarities and differences between a file and a directory.
Translate an absolute path into a relative path and vice versa.
Construct absolute and relative paths that identify specific files and directories.
Use options and arguments to change the behaviour of a shell command.
Demonstrate the use of tab completion and explain its advantages.
The part of the operating system responsible for managing files and directories is called the file system. It organizes our data into files, which hold information, and directories (also called ‘folders’), which hold files or other directories.
Several commands are frequently used to create, inspect, rename, and delete files and directories. To start exploring them, we’ll go to our open shell window.
First, let’s find out where we are by running a command called
pwd
(which stands for ‘print working directory’).
Directories are like places — at any time while we are using
the shell, we are in exactly one place called our current
working directory. Commands mostly read and write files in the
current working directory, i.e. ‘here’, so knowing where you are before
running a command is important. pwd
shows you where you
are:
OUTPUT
/Users/nelle
Here, the computer’s response is /Users/nelle
, which is
Nelle’s home directory:
Home Directory Variation
The home directory path will look different on different operating
systems. On Mac, it is /Users/nelle
, on Linux,
/home/nelle
, and on Windows, it will be similar to
C:\Documents and Settings\nelle
or
C:\Users\nelle
. (Note that it may look slightly different
for different versions of Windows.) In future examples, we’ve used Mac
output as the default - Linux and Windows output may differ slightly but
should be generally similar.
We will also assume that your pwd
command returns your
user’s home directory. If pwd
returns something different,
you may need to navigate there using cd
or some commands in
this lesson will not work as written. See Exploring Other Directories for
more details on the cd
command.
To understand what a ‘home directory’ is, let’s have a look at how the file system as a whole is organized. For the sake of this example, we’ll be illustrating the filesystem on our scientist Nelle’s computer. After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, but not be exactly identical.
On Nelle’s computer, the filesystem looks like this:
The filesystem looks like an upside down tree. The topmost directory
is the root directory that holds everything else. We
refer to it using a slash character, /
, on its own; this
character is the leading slash in /Users/nelle
.
Inside that directory are several other directories: bin
(which is where some built-in programs are stored), data
(for miscellaneous data files), Users
(where users’
personal directories are located), tmp
(for temporary files
that don’t need to be stored long-term), and so on.
We know that our current working directory /Users/nelle
is stored inside /Users
because /Users
is the
first part of its name. Similarly, we know that /Users
is
stored inside the root directory /
because its name begins
with /
.
Slashes
Notice that there are two meanings for the /
character.
When it appears at the front of a file or directory name, it refers to
the root directory. When it appears inside a path, it’s just a
separator.
Underneath /Users
, we find one directory for each user
with an account on Nelle’s machine, her colleagues imhotep and
larry.
The user imhotep’s files are stored in
/Users/imhotep
, user larry’s in
/Users/larry
, and Nelle’s in /Users/nelle
.
Nelle is the user in our examples here; therefore, we get
/Users/nelle
as our home directory. Typically, when you
open a new command prompt, you will be in your home directory to
start.
Now let’s learn the command that will let us see the contents of our
own filesystem. We can see what’s in our home directory by running
ls
:
OUTPUT
Applications Documents Library Music Public
Desktop Downloads Movies Pictures
(Again, your results may be slightly different depending on your operating system and how you have customized your filesystem.)
ls
prints the names of the files and directories in the
current directory. We can make its output more comprehensible by using
the -F
option which tells ls
to classify the output by adding a marker to file and directory names to
indicate what they are:
- a trailing
/
indicates that this is a directory -
@
indicates a link -
*
indicates an executable
Depending on your shell’s default settings, the shell might also use colors to indicate whether each entry is a file or directory.
OUTPUT
Applications/ Documents/ Library/ Music/ Public/
Desktop/ Downloads/ Movies/ Pictures/
Here, we can see that the home directory contains only sub-directories. Any names in the output that don’t have a classification symbol are files in the current working directory.
Clearing your terminal
If your screen gets too cluttered, you can clear your terminal using
the clear
command. You can still access previous commands
using ↑ and ↓ to move line-by-line, or by
scrolling in your terminal.
Content from Creating files and directories
Last updated on 2024-02-28 | Edit this page
Overview
Questions
- How do I create, delete, copy, and move files and directories on my computer?
- How can I identify the location of files on my computer?
Objectives
- Create a directory hierarchy that matches a given diagram
- Create empty files in the filesystem at a given location
- Delete specified files and/or directories
- Copy and move files and/or folders from and to a specified location
Working with files and folders
As well as navigating directories, we can interact with files on the command line: we can read them, open them, run them, and even edit them. In fact, there’s really no limit to what we can do in the shell, but even experienced shell users still switch to graphical user interfaces (GUIs) for many tasks, such as editing formatted text documents (Word or OpenOffice), browsing the web, converting sound files from one format to another, etc. But if we wanted to do this hundreds of music tracks, say then we could automate that conversion work by using shell commands.
Before getting started, we will use ls
to list the
contents of our current directory. Using ls
periodically to
view your options is useful to orient oneself.
Copy and moving files into subdirectories
Start with the following filesystem hierarchy:
/Users/jamie/data
- chords.json
Assume your username is jamie
. Reorder the following
commands
cp recombined/chords.json ../chords-backup.json
cd ~/data
mv chords.json recombined/
mkdir recombined
so that when you execute the following commands, they return the output as shown:
cd ~/data
cp recombined/chords.json ../chords-backup.json
mkdir recombined
mv chords.json recombined/
Try another way
Can you think of another way to achieve the same effect?
There are lots.
Using history
Use the history
command to see a list of all the
commands you’ve entered during the current session. You can also use
Ctrl + r to do a reverse lookup. Press
Ctrl + r, then start typing any part of the
command you’re looking for. The past command will autocomplete. Press
enter
to run the command again, or press the arrow keys to
start editing the command. If multiple past commands contain the text
you input, you can Ctrl + r repeatedly to cycle
through them. If you can’t find what you’re looking for in the reverse
lookup, use Ctrl + c to return to the prompt. If
you want to save your history, maybe to extract some commands from which
to build a script later on, you can do that with
history > history.txt
. This will output all history to a
text file called history.txt
that you can later edit. To
recall a command from history, enter history
. Note the
command number, e.g. 2045. Recall the command by entering
!2045
. This will execute the command.
Key Points
-
cp
copies data from one location (a source) to another (a target) -
cp
takes its source(s) and target as arguments -
mkdir
can be used to create directories -
mv
can be used to move data from one location to another, and is similar to copying followed by deletion -
cp
andmv
modify your files, and can lead to data loss
Content from Introduction to Web technologies
Last updated on 2024-02-29 | Edit this page
Overview
Questions
- How do browsers retrieve and display websites from remote servers?
- How and where are the structure and content of a web page located in code?
- How do I specify the location of a website or other resource on a remote server?
Objectives
- To know that HTTP is used to request remote resources by browsers and by other applications
- To understand that browsers make multiple HTTP requests when loading most modern websites
- To understand the important components of a URL
- To introduce the browser’s developer tools as a resource for researchers
What happens when we use a web browser?
The majority of users access web resources by using browsers, such as Google Chrome, Mozilla Firefox, and Apple’s Safari, to view and interact with websites. Remote computers deliver the content of websites, via the internet, to the users’s computer following a request from the browser. In this course, we don’t have time to describe the details of how this content is delivered. However, It is important to know the following terminology:
- server: a computer connected to the internet that provides content on request
- client: a computer connected to the internet that requests and receives content
Usually, the user deliberately makes these requests from the browser either by directly typing in the location of the content that is being requested into the browser’s navigation bar, or, more frequently, by clicking on hyperlinks, or links. The browser then uses a set of rules (shared by the client and the server) to initiate the request, and handle the response.
A set of rules regulating the communication between two parties is
called a protocol. The most commonly used set of rules has been
specified over the last several decades of its existence and is called
HTTP, which stands for HyperText Transfer Protocol. HTTP provides a set
of formal guidelines about how Web content is published and made
available on a server, and how a client should make and handle requests
for this content. It is a relatively straightforward protocol, using a
set of “verbs” that specify the actions that the client wishes to
communicate to the server. One such verb is GET
, which is
why this appears in the diagram above.
The set of rules or patterns that specify the form the data that is
passed between two parties is called a file format. You are likely
familiar with a selection of file formats already. For example, the
difference between a .doc
file and a .pdf
file
of the same final-year essay is primarily a difference of format. One
file format that is commonly requested by and delivered to web users is
HyperText Markup Language, or HTML (.html
file
extension).
We will look more closely at the HTML format in a subsequent episode. For now, you can think of the HTML format as providing a way to specify (a) the content of a document (i.e. what text, images, and other media - if any - should it consist of) alongside (b) the logical structure of the document (i.e. any hierarchical aspects that structure these contents relative to each other, such as a title, headings, and subheadings, and/or other part-whole relationships).
What is a URL?
A URL can be thought of as a path to a resource on a remote server, which is nothing more than the location of data on someone else’s computer. A “resource” may be a web page, a media file, an entire site, a folder, or even a more complicated interactive experience. The following are examples of URLs:
- http://rte.ie
- https://www.eamonnbell.com/blog/2020/11/15/ams-broken-light-2020/
- https://duckduckgo.com/?q=discogs.com&ia=web
- https://www.discogs.com/sell/item/1969451180
URLs have some obvious similarities with paths. For example, the
forward slash character (/
) is used as a separator. This
helps keep web resources organised, but can also help researchers, as we
will see in later episodes. Howeer, there is no guarantee that the
structure of the URL corresponds directly to the layout of particular
directories on the remote server.
There are also some additional elements to URLs that distinguish them from paths. Let’s have a look at the anatomy of a URL:
- scheme: tells what protocol should be used to interact with the remote resource
- domain name: points us to the location of the remote server in human-readable form
- port: an integer that is used to further specify where to look on the remote server
- path: where the remote resource is located within the organisational structure of the server
-
parameters: a list of key-value pairs, separated by the
&
symbol which modify the resource returned in some way - anchor: this optional part of a URL can be used to link directly to some component of the file being requested, typically a header or subheader of a HTML document
Challenge 1: Dissecting a URL
Study the following URL
and post your answer to the following questions:
- What is the scheme?
- How many domains are in the URL?
- Is there a port specified in this URL?
- What is the value associated with the second key in URL parameters?
- https (Secure HTTP/TLS)
- One:
duckduckgo.com
. (discogs.com
is in the URL parameters; arbitrary strings may appear here - potentially even including whole URLs) - There is not. However, different schemes have default ports associated with them which will be used by applications when no port is specified explicitly.
web
Content vs. styling: HTML and friends
Web browsers are little more than glorified document fetchers and
viewers for the web. The main reason that browsers do not seem quite
like this is because these documents have become highly interactive, and
because web designers have moved to a mode of designing websites and web
applications so that they increasingly resemble the kinds of graphical
applications that run on your desktop computer (with icons, menus,
toolbars, animations/transitions, etc.) The earliest websites were much
less interactive. Yet, no matter how complicated or simple the design of
a website, every single piece of content in a web page - starting with
textual content and the logical strucutre of the page itself - must be
fetched by the browser. We have already learned that format used to
encode this content is called HTML, and is associated with the file
extension .html
.
Next, we consider the overall look and feel or “style” of the
website. Aspects of styling may include the size, color, and font
choices for the text but also other considerations such as the relative
or absolute layout of key page elements and their size. This information
is conventionally transferred from the server to the browser as a
separate file or set of files, in a format called Cascading Style Sheets
(or, CSS), and is associated with the .css
file extension.
In order for the browser to present the web resource as intended,
another HTTP request is required to retrieve the appropriate CSS
file(s). Since HTTP makes use of URLs to locate and retrieve resources,
there will be a URL associated with each of these files.
The web is a multimedia platform, and one of the earliest media types
to be supported by browsers was the image. As you may know, images may
be stored in a variety of formats (e.g. GIF, JPEG, WebP), and there is
therefore a variety of extensions associated with them
(i.e. .gif
, .jpg
or .jpeg
,
.webp
). Again, for each image, a new HTTP request is
typically required. Hence, each image will likely be associated with its
own URL. You may notice that the URLs for resources, such as images, do
not necessarily contain the same domain name as the domain name of the
site you are visiting.
Increasingly, designers are keen to ensure that websites and related
resources are interactive and dynamic, and this requires the transfer of
content in yet another format: Javascript. Javascript is a flexible,
general-purpose programming language that is executed (more-or-less)
entirely contained within the browser and allows web developers to
create extremely rich, interactive modifications to the document content
on the fly. It is commonly stored in files with the .js
extension, which are requested by the browser, again using HTTP.
Once the assorted files have been requested by the client, which in this case is the web browser, they are assembled and interpreted to provide the total experience of the page. The specific details of each of thes file formats are not relevant to this lesson; the key takeaway is that most web pages in fact decompose into multiple parts, each of which is associated with a single HTTP request. Understanding this fact allows us to begin to pick apart web resources into their consituent parts, some of which are more or less usable for research purposes.
Developer Tools and the complexity of the modern web
To understand precisely what and how many HTTP requests that a browser makes in the course of requesting a single web page, we can use the built-in Developer Tools function of most modern borwsers, to inspect the all network activity for a single page load event.
Most modern browsers include a set of tools called “developer tools”, which areused by the programmers who create websites and other interactive experiences to debug and assess the performance of their creations, among other things. However, they are an extraordinarily useful asset for researchers, and are a useful way to get started thinking about what’s “under the hood” of the web. These are almost certainly available in the browser you’ve already installed.
Developer tools in different browsers
Different browsers (and different operating systems) expose developer tools in different ways. Here is a quick guide. Some of the details of what follows will vary depending on your specific browser. To follow along exactly, it is recommended to use Google Chrome.
Microsoft Edge
- Click on the three-dot menu in the top right corner.
- Hover over More Tools.
- Click on Developer Tools.
Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.
Firefox
- Click on the three-line menu in the top right corner.
- Click on Web Developer.
- Click on Toggle Tools.
Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.
Google Chrome
- Click on the three-dot menu in the top right corner.
- Hover over More Tools.
- Click on Developer Tools.
Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.
Safari (on Mac)
- Click on Safari in the top left corner of the screen.
- Click on Preferences.
- Go to the Advanced tab.
- Check the box at the bottom that says Show Develop menu in menu bar.
- Close the Preferences window. The Develop menu will now appear in the menu bar.
- Click on Develop in the menu bar.
- Click on Show Web Inspector.
Or simply use the shortcut Cmd+Option+I on your keyboard.
To do this:
- First, navigate to a site of your choice
- Then, open Developer Tools using the three dots menu in the top right (Windows/Linux) or under the menu View > Developer… (or Ctrl+Shift+I/Cmd+Shift+I)
- A new pane will pop open; navigate to the “Network” pane.
- Ensure that network activity is being recorded by clicking the red “record” button in the top left of the pane
- Refresh the page (shortcut: F5)
- Once the page is finished loading, disable recording
Something like the image shown here will result:
The colored bars at the top of the screen show each individual HTTP request graphically (the “waterfall”), with the duration taken for each request to be fulfilled is given by the length of the bar. The different colors indicate what the status of the request is over time. Notice also the statisics at the end of the file list: the total number of requests made, the total amount of data transferred, and the total load time for the site.
To dig into a particular request, select it from the list and double
click on it. It will open a new pane as below. Look at the very first
request in the list, the request for the document itself: index.html.
From this we’ll see a a lot more information about the request,
including the HTTP status code associated with the response, and the IP
address of the responding server. We also see the request method, the
HTTP “verb” that was used to fetch the resource. Almost all
browser-initated requests will use the GET
verb, but others
are available (POST
is another verb and is used to submit
form data, and sometimes to call APIs — see later episodes).
Under the Timing pane, you’ll see a more fine-grained look at the time that each request took and what the colors stand for.
Challenge 2: The feel of the web
- Pick a website that you regularly consult for research purposes; any website will do
- Visit any page on this website
- Open the developer tools in your browser and navigate to the Network tab (the name may vary). Press the refresh button (or F5)
Post your answer to the following questions:
- The URL of the page or resource on the site that you visited
- A URL pointing to a CSS file (ending .css) used by this page
- A URL pointing to a Javascript file (ending .js) used by this page
- A URL pointing to an image file (many extensions possible) used by this page
- A URL pointing to some other file (many extensions) used by this page
Absolute vs. relative URLs
Just like with paths, there are absolute URLs and relative URLs. This distinction is not relevant for the rest of this lesson, but if you would like to learn more about this, please consult the mdn web docs page, “What is a URL?”.
Key Points
- Web servers provide remote resources to clients, most commonly browsers, using the HTTP protocol
- URLs are the “addresses” of the web, and they specify the location of a remote resource for the purposes of retrieval
- Most websites today consist of resources of a variety of file formats, and each remote resource usually demands its own HTTP request and has its own URL associated with it
- We can inspect the torrent of HTTP requests that websites require by using most modern browser’s Developer Tools