Introduction to the Command Line and Web Data Fundamentals for Music Researchers: All in One View

Last updated on 2024-02-29 | Edit this page

Overview

Questions

What is the scope and purpose of the lesson?

Objectives

Understand the scope and purpose of the lesson.

Introduction

This is a lesson created via The Carpentries Workbench. It is written in Pandoc-flavored Markdown for static files and [R Markdown] for dynamic files that can render code into output. Please refer to the Introduction to The Carpentries Workbench for full documentation.

It is an lesson introducing command-line basics and the fundamentals of web data for music researchers.

Key Points

This lesson introduces the basics of working on the command-line alongside some fundamentals of working with web data
It is in “pre-alpha” stage, and is a work-in-progress.

Content from Navigating the Filesystem

Last updated on 2023-11-28 | Edit this page

Overview

Questions

How can I move around on my computer?
How can I see what files and directories I have?
How can I specify the location of a file or directory on my computer?

Objectives

Understand the filesystem tree and explain how to navigate between directories
Explain the function of the commands ls, pwd and cd
Demonstrate how to view and move around inside a file hierarchy
Explain the similarities and differences between a file and a directory.
Translate an absolute path into a relative path and vice versa.
Construct absolute and relative paths that identify specific files and directories.
Use options and arguments to change the behaviour of a shell command.
Demonstrate the use of tab completion and explain its advantages.

The part of the operating system responsible for managing files and directories is called the file system. It organizes our data into files, which hold information, and directories (also called ‘folders’), which hold files or other directories.

Several commands are frequently used to create, inspect, rename, and delete files and directories. To start exploring them, we’ll go to our open shell window.

First, let’s find out where we are by running a command called pwd (which stands for ‘print working directory’). Directories are like places — at any time while we are using the shell, we are in exactly one place called our current working directory. Commands mostly read and write files in the current working directory, i.e. ‘here’, so knowing where you are before running a command is important. pwd shows you where you are:

BASH

$ pwd

OUTPUT

/Users/nelle

Here, the computer’s response is /Users/nelle, which is Nelle’s home directory:

Home Directory Variation

The home directory path will look different on different operating systems. On Mac, it is /Users/nelle, on Linux, /home/nelle, and on Windows, it will be similar to C:\Documents and Settings\nelle or C:\Users\nelle. (Note that it may look slightly different for different versions of Windows.) In future examples, we’ve used Mac output as the default - Linux and Windows output may differ slightly but should be generally similar.

We will also assume that your pwd command returns your user’s home directory. If pwd returns something different, you may need to navigate there using cd or some commands in this lesson will not work as written. See Exploring Other Directories for more details on the cd command.

To understand what a ‘home directory’ is, let’s have a look at how the file system as a whole is organized. For the sake of this example, we’ll be illustrating the filesystem on our scientist Nelle’s computer. After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, but not be exactly identical.

On Nelle’s computer, the filesystem looks like this:

The file system is made up of a root directory that contains sub-directoriestitled bin, data, users, and tmp

The filesystem looks like an upside down tree. The topmost directory is the root directory that holds everything else. We refer to it using a slash character, /, on its own; this character is the leading slash in /Users/nelle.

Inside that directory are several other directories: bin (which is where some built-in programs are stored), data (for miscellaneous data files), Users (where users’ personal directories are located), tmp (for temporary files that don’t need to be stored long-term), and so on.

We know that our current working directory /Users/nelle is stored inside /Users because /Users is the first part of its name. Similarly, we know that /Users is stored inside the root directory / because its name begins with /.

Slashes

Notice that there are two meanings for the / character. When it appears at the front of a file or directory name, it refers to the root directory. When it appears inside a path, it’s just a separator.

Underneath /Users, we find one directory for each user with an account on Nelle’s machine, her colleagues imhotep and larry.

Like other directories, home directories are sub-directories underneath"/Users" like "/Users/imhotep", "/Users/larry" or"/Users/nelle"

The user imhotep’s files are stored in /Users/imhotep, user larry’s in /Users/larry, and Nelle’s in /Users/nelle. Nelle is the user in our examples here; therefore, we get /Users/nelle as our home directory. Typically, when you open a new command prompt, you will be in your home directory to start.

Now let’s learn the command that will let us see the contents of our own filesystem. We can see what’s in our home directory by running ls:

BASH

$ ls

OUTPUT

Applications Documents    Library      Music        Public
Desktop      Downloads    Movies       Pictures

(Again, your results may be slightly different depending on your operating system and how you have customized your filesystem.)

ls prints the names of the files and directories in the current directory. We can make its output more comprehensible by using the -F option which tells ls to classify the output by adding a marker to file and directory names to indicate what they are:

a trailing / indicates that this is a directory
@ indicates a link
* indicates an executable

Depending on your shell’s default settings, the shell might also use colors to indicate whether each entry is a file or directory.

BASH

$ ls -F

OUTPUT

Applications/ Documents/    Library/      Music/        Public/
Desktop/      Downloads/    Movies/       Pictures/

Here, we can see that the home directory contains only sub-directories. Any names in the output that don’t have a classification symbol are files in the current working directory.

Clearing your terminal

If your screen gets too cluttered, you can clear your terminal using the clear command. You can still access previous commands using ↑ and ↓ to move line-by-line, or by scrolling in your terminal.

Content from Creating files and directories

Last updated on 2024-02-28 | Edit this page

Overview

Questions

How do I create, delete, copy, and move files and directories on my computer?
How can I identify the location of files on my computer?

Objectives

Create a directory hierarchy that matches a given diagram
Create empty files in the filesystem at a given location
Delete specified files and/or directories
Copy and move files and/or folders from and to a specified location

Working with files and folders

As well as navigating directories, we can interact with files on the command line: we can read them, open them, run them, and even edit them. In fact, there’s really no limit to what we can do in the shell, but even experienced shell users still switch to graphical user interfaces (GUIs) for many tasks, such as editing formatted text documents (Word or OpenOffice), browsing the web, converting sound files from one format to another, etc. But if we wanted to do this hundreds of music tracks, say then we could automate that conversion work by using shell commands.

Before getting started, we will use ls to list the contents of our current directory. Using ls periodically to view your options is useful to orient oneself.

BASH

$ ls

Copy and moving files into subdirectories

Start with the following filesystem hierarchy:

/Users/jamie/data
    - chords.json

Assume your username is jamie. Reorder the following commands

cp recombined/chords.json ../chords-backup.json
cd ~/data
mv chords.json recombined/
mkdir recombined

so that when you execute the following commands, they return the output as shown:

BASH

$ ls
recombined

$ ls recombined
chords.json

Show me the solution

cd ~/data
cp recombined/chords.json ../chords-backup.json
mkdir recombined
mv chords.json recombined/

Try another way

Can you think of another way to achieve the same effect?

Show me the solution

There are lots.

Using `history`

Use the history command to see a list of all the commands you’ve entered during the current session. You can also use Ctrl + r to do a reverse lookup. Press Ctrl + r, then start typing any part of the command you’re looking for. The past command will autocomplete. Press enter to run the command again, or press the arrow keys to start editing the command. If multiple past commands contain the text you input, you can Ctrl + r repeatedly to cycle through them. If you can’t find what you’re looking for in the reverse lookup, use Ctrl + c to return to the prompt. If you want to save your history, maybe to extract some commands from which to build a script later on, you can do that with history > history.txt. This will output all history to a text file called history.txt that you can later edit. To recall a command from history, enter history. Note the command number, e.g. 2045. Recall the command by entering !2045. This will execute the command.

Key Points

cp copies data from one location (a source) to another (a target)
cp takes its source(s) and target as arguments
mkdir can be used to create directories
mv can be used to move data from one location to another, and is similar to copying followed by deletion
cp and mv modify your files, and can lead to data loss

Content from Introduction to Web technologies

Last updated on 2024-02-29 | Edit this page

Overview

Questions

How do browsers retrieve and display websites from remote servers?
How and where are the structure and content of a web page located in code?
How do I specify the location of a website or other resource on a remote server?

Objectives

To know that HTTP is used to request remote resources by browsers and by other applications
To understand that browsers make multiple HTTP requests when loading most modern websites
To understand the important components of a URL
To introduce the browser’s developer tools as a resource for researchers

What happens when we use a web browser?

The majority of users access web resources by using browsers, such as Google Chrome, Mozilla Firefox, and Apple’s Safari, to view and interact with websites. Remote computers deliver the content of websites, via the internet, to the users’s computer following a request from the browser. In this course, we don’t have time to describe the details of how this content is delivered. However, It is important to know the following terminology:

server: a computer connected to the internet that provides content on request
client: a computer connected to the internet that requests and receives content

Usually, the user deliberately makes these requests from the browser either by directly typing in the location of the content that is being requested into the browser’s navigation bar, or, more frequently, by clicking on hyperlinks, or links. The browser then uses a set of rules (shared by the client and the server) to initiate the request, and handle the response.

An illustration of what happens when a user “visits” a web page using a browser

A set of rules regulating the communication between two parties is called a protocol. The most commonly used set of rules has been specified over the last several decades of its existence and is called HTTP, which stands for HyperText Transfer Protocol. HTTP provides a set of formal guidelines about how Web content is published and made available on a server, and how a client should make and handle requests for this content. It is a relatively straightforward protocol, using a set of “verbs” that specify the actions that the client wishes to communicate to the server. One such verb is GET, which is why this appears in the diagram above.

The set of rules or patterns that specify the form the data that is passed between two parties is called a file format. You are likely familiar with a selection of file formats already. For example, the difference between a .doc file and a .pdf file of the same final-year essay is primarily a difference of format. One file format that is commonly requested by and delivered to web users is HyperText Markup Language, or HTML (.html file extension).

We will look more closely at the HTML format in a subsequent episode. For now, you can think of the HTML format as providing a way to specify (a) the content of a document (i.e. what text, images, and other media - if any - should it consist of) alongside (b) the logical structure of the document (i.e. any hierarchical aspects that structure these contents relative to each other, such as a title, headings, and subheadings, and/or other part-whole relationships).

What is a URL?

A URL can be thought of as a path to a resource on a remote server, which is nothing more than the location of data on someone else’s computer. A “resource” may be a web page, a media file, an entire site, a folder, or even a more complicated interactive experience. The following are examples of URLs:

URLs have some obvious similarities with paths. For example, the forward slash character (/) is used as a separator. This helps keep web resources organised, but can also help researchers, as we will see in later episodes. Howeer, there is no guarantee that the structure of the URL corresponds directly to the layout of particular directories on the remote server.

There are also some additional elements to URLs that distinguish them from paths. Let’s have a look at the anatomy of a URL:

A sample URL with its components highlighted: scheme, domain name, port, path, parameters, and anchor

scheme: tells what protocol should be used to interact with the remote resource
domain name: points us to the location of the remote server in human-readable form
port: an integer that is used to further specify where to look on the remote server
path: where the remote resource is located within the organisational structure of the server
parameters: a list of key-value pairs, separated by the & symbol which modify the resource returned in some way
anchor: this optional part of a URL can be used to link directly to some component of the file being requested, typically a header or subheader of a HTML document

Challenge 1: Dissecting a URL

Study the following URL

https://duckduckgo.com/?q=discogs.com&ia=web

and post your answer to the following questions:

What is the scheme?
How many domains are in the URL?
Is there a port specified in this URL?
What is the value associated with the second key in URL parameters?

Show me the solution

https (Secure HTTP/TLS)
One: duckduckgo.com. (discogs.com is in the URL parameters; arbitrary strings may appear here - potentially even including whole URLs)
There is not. However, different schemes have default ports associated with them which will be used by applications when no port is specified explicitly.
web

Content vs. styling: HTML and friends

Web browsers are little more than glorified document fetchers and viewers for the web. The main reason that browsers do not seem quite like this is because these documents have become highly interactive, and because web designers have moved to a mode of designing websites and web applications so that they increasingly resemble the kinds of graphical applications that run on your desktop computer (with icons, menus, toolbars, animations/transitions, etc.) The earliest websites were much less interactive. Yet, no matter how complicated or simple the design of a website, every single piece of content in a web page - starting with textual content and the logical strucutre of the page itself - must be fetched by the browser. We have already learned that format used to encode this content is called HTML, and is associated with the file extension .html.

Next, we consider the overall look and feel or “style” of the website. Aspects of styling may include the size, color, and font choices for the text but also other considerations such as the relative or absolute layout of key page elements and their size. This information is conventionally transferred from the server to the browser as a separate file or set of files, in a format called Cascading Style Sheets (or, CSS), and is associated with the .css file extension. In order for the browser to present the web resource as intended, another HTTP request is required to retrieve the appropriate CSS file(s). Since HTTP makes use of URLs to locate and retrieve resources, there will be a URL associated with each of these files.

The web is a multimedia platform, and one of the earliest media types to be supported by browsers was the image. As you may know, images may be stored in a variety of formats (e.g. GIF, JPEG, WebP), and there is therefore a variety of extensions associated with them (i.e. .gif, .jpg or .jpeg, .webp). Again, for each image, a new HTTP request is typically required. Hence, each image will likely be associated with its own URL. You may notice that the URLs for resources, such as images, do not necessarily contain the same domain name as the domain name of the site you are visiting.

Increasingly, designers are keen to ensure that websites and related resources are interactive and dynamic, and this requires the transfer of content in yet another format: Javascript. Javascript is a flexible, general-purpose programming language that is executed (more-or-less) entirely contained within the browser and allows web developers to create extremely rich, interactive modifications to the document content on the fly. It is commonly stored in files with the .js extension, which are requested by the browser, again using HTTP.

Once the assorted files have been requested by the client, which in this case is the web browser, they are assembled and interpreted to provide the total experience of the page. The specific details of each of thes file formats are not relevant to this lesson; the key takeaway is that most web pages in fact decompose into multiple parts, each of which is associated with a single HTTP request. Understanding this fact allows us to begin to pick apart web resources into their consituent parts, some of which are more or less usable for research purposes.

Developer Tools and the complexity of the modern web

To understand precisely what and how many HTTP requests that a browser makes in the course of requesting a single web page, we can use the built-in Developer Tools function of most modern borwsers, to inspect the all network activity for a single page load event.

Most modern browsers include a set of tools called “developer tools”, which areused by the programmers who create websites and other interactive experiences to debug and assess the performance of their creations, among other things. However, they are an extraordinarily useful asset for researchers, and are a useful way to get started thinking about what’s “under the hood” of the web. These are almost certainly available in the browser you’ve already installed.

Developer tools in different browsers

Different browsers (and different operating systems) expose developer tools in different ways. Here is a quick guide. Some of the details of what follows will vary depending on your specific browser. To follow along exactly, it is recommended to use Google Chrome.

Microsoft Edge

Click on the three-dot menu in the top right corner.
Hover over More Tools.
Click on Developer Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Firefox

Click on the three-line menu in the top right corner.
Click on Web Developer.
Click on Toggle Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Google Chrome

Click on the three-dot menu in the top right corner.
Hover over More Tools.
Click on Developer Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Safari (on Mac)

Click on Safari in the top left corner of the screen.
Click on Preferences.
Go to the Advanced tab.
Check the box at the bottom that says Show Develop menu in menu bar.
Close the Preferences window. The Develop menu will now appear in the menu bar.
Click on Develop in the menu bar.
Click on Show Web Inspector.

Or simply use the shortcut Cmd+Option+I on your keyboard.

To do this:

First, navigate to a site of your choice
Then, open Developer Tools using the three dots menu in the top right (Windows/Linux) or under the menu View > Developer… (or Ctrl+Shift+I/Cmd+Shift+I)
A new pane will pop open; navigate to the “Network” pane.
Ensure that network activity is being recorded by clicking the red “record” button in the top left of the pane
Refresh the page (shortcut: F5)
Once the page is finished loading, disable recording

Something like the image shown here will result:

The colored bars at the top of the screen show each individual HTTP request graphically (the “waterfall”), with the duration taken for each request to be fulfilled is given by the length of the bar. The different colors indicate what the status of the request is over time. Notice also the statisics at the end of the file list: the total number of requests made, the total amount of data transferred, and the total load time for the site.

To dig into a particular request, select it from the list and double click on it. It will open a new pane as below. Look at the very first request in the list, the request for the document itself: index.html. From this we’ll see a a lot more information about the request, including the HTTP status code associated with the response, and the IP address of the responding server. We also see the request method, the HTTP “verb” that was used to fetch the resource. Almost all browser-initated requests will use the GET verb, but others are available (POST is another verb and is used to submit form data, and sometimes to call APIs — see later episodes).

Under the Timing pane, you’ll see a more fine-grained look at the time that each request took and what the colors stand for.

Challenge 2: The feel of the web

Pick a website that you regularly consult for research purposes; any website will do
Visit any page on this website
Open the developer tools in your browser and navigate to the Network tab (the name may vary). Press the refresh button (or F5)

Post your answer to the following questions:

The URL of the page or resource on the site that you visited
A URL pointing to a CSS file (ending .css) used by this page
A URL pointing to a Javascript file (ending .js) used by this page
A URL pointing to an image file (many extensions possible) used by this page
A URL pointing to some other file (many extensions) used by this page

Absolute vs. relative URLs

Just like with paths, there are absolute URLs and relative URLs. This distinction is not relevant for the rest of this lesson, but if you would like to learn more about this, please consult the mdn web docs page, “What is a URL?”.

Key Points

Web servers provide remote resources to clients, most commonly browsers, using the HTTP protocol
URLs are the “addresses” of the web, and they specify the location of a remote resource for the purposes of retrieval
Most websites today consist of resources of a variety of file formats, and each remote resource usually demands its own HTTP request and has its own URL associated with it
We can inspect the torrent of HTTP requests that websites require by using most modern browser’s Developer Tools

Overview

Questions

Objectives

Introduction

Key Points

Overview

Questions

Objectives

BASH

OUTPUT

Home Directory Variation

Slashes

BASH

OUTPUT

BASH

OUTPUT

Clearing your terminal

Overview

Questions

Objectives

Working with files and folders

BASH

Copy and moving files into subdirectories

BASH

Show me the solution

Try another way

Show me the solution

Using history

Key Points

Overview

Questions

Objectives

What happens when we use a web browser?

What is a URL?

Challenge 1: Dissecting a URL

Show me the solution

Content vs. styling: HTML and friends

Developer Tools and the complexity of the modern web

Developer tools in different browsers

Microsoft Edge

Firefox

Google Chrome

Safari (on Mac)

Challenge 2: The feel of the web

Absolute vs. relative URLs

Key Points

Using `history`