Introduction to Web technologies

Last updated on 2024-02-29 | Edit this page

Estimated time: 30 minutes

Overview

Questions

How do browsers retrieve and display websites from remote servers?
How and where are the structure and content of a web page located in code?
How do I specify the location of a website or other resource on a remote server?

Objectives

To know that HTTP is used to request remote resources by browsers and by other applications
To understand that browsers make multiple HTTP requests when loading most modern websites
To understand the important components of a URL
To introduce the browser’s developer tools as a resource for researchers

Instructor Note

Learners should be familiar with paths and file extensions. This episode asks learners to use a web browser.

What happens when we use a web browser?

The majority of users access web resources by using browsers, such as Google Chrome, Mozilla Firefox, and Apple’s Safari, to view and interact with websites. Remote computers deliver the content of websites, via the internet, to the users’s computer following a request from the browser. In this course, we don’t have time to describe the details of how this content is delivered. However, It is important to know the following terminology:

server: a computer connected to the internet that provides content on request
client: a computer connected to the internet that requests and receives content

Usually, the user deliberately makes these requests from the browser either by directly typing in the location of the content that is being requested into the browser’s navigation bar, or, more frequently, by clicking on hyperlinks, or links. The browser then uses a set of rules (shared by the client and the server) to initiate the request, and handle the response.

An illustration of what happens when a user “visits” a web page using a browser

A set of rules regulating the communication between two parties is called a protocol. The most commonly used set of rules has been specified over the last several decades of its existence and is called HTTP, which stands for HyperText Transfer Protocol. HTTP provides a set of formal guidelines about how Web content is published and made available on a server, and how a client should make and handle requests for this content. It is a relatively straightforward protocol, using a set of “verbs” that specify the actions that the client wishes to communicate to the server. One such verb is GET, which is why this appears in the diagram above.

The set of rules or patterns that specify the form the data that is passed between two parties is called a file format. You are likely familiar with a selection of file formats already. For example, the difference between a .doc file and a .pdf file of the same final-year essay is primarily a difference of format. One file format that is commonly requested by and delivered to web users is HyperText Markup Language, or HTML (.html file extension).

We will look more closely at the HTML format in a subsequent episode. For now, you can think of the HTML format as providing a way to specify (a) the content of a document (i.e. what text, images, and other media - if any - should it consist of) alongside (b) the logical structure of the document (i.e. any hierarchical aspects that structure these contents relative to each other, such as a title, headings, and subheadings, and/or other part-whole relationships).

What is a URL?

A URL can be thought of as a path to a resource on a remote server, which is nothing more than the location of data on someone else’s computer. A “resource” may be a web page, a media file, an entire site, a folder, or even a more complicated interactive experience. The following are examples of URLs:

URLs have some obvious similarities with paths. For example, the forward slash character (/) is used as a separator. This helps keep web resources organised, but can also help researchers, as we will see in later episodes. Howeer, there is no guarantee that the structure of the URL corresponds directly to the layout of particular directories on the remote server.

There are also some additional elements to URLs that distinguish them from paths. Let’s have a look at the anatomy of a URL:

A sample URL with its components highlighted: scheme, domain name, port, path, parameters, and anchor

scheme: tells what protocol should be used to interact with the remote resource
domain name: points us to the location of the remote server in human-readable form
port: an integer that is used to further specify where to look on the remote server
path: where the remote resource is located within the organisational structure of the server
parameters: a list of key-value pairs, separated by the & symbol which modify the resource returned in some way
anchor: this optional part of a URL can be used to link directly to some component of the file being requested, typically a header or subheader of a HTML document

Challenge 1: Dissecting a URL

Study the following URL

https://duckduckgo.com/?q=discogs.com&ia=web

and post your answer to the following questions:

What is the scheme?
How many domains are in the URL?
Is there a port specified in this URL?
What is the value associated with the second key in URL parameters?

Show me the solution

https (Secure HTTP/TLS)
One: duckduckgo.com. (discogs.com is in the URL parameters; arbitrary strings may appear here - potentially even including whole URLs)
There is not. However, different schemes have default ports associated with them which will be used by applications when no port is specified explicitly.
web

Content vs. styling: HTML and friends

Web browsers are little more than glorified document fetchers and viewers for the web. The main reason that browsers do not seem quite like this is because these documents have become highly interactive, and because web designers have moved to a mode of designing websites and web applications so that they increasingly resemble the kinds of graphical applications that run on your desktop computer (with icons, menus, toolbars, animations/transitions, etc.) The earliest websites were much less interactive. Yet, no matter how complicated or simple the design of a website, every single piece of content in a web page - starting with textual content and the logical strucutre of the page itself - must be fetched by the browser. We have already learned that format used to encode this content is called HTML, and is associated with the file extension .html.

Next, we consider the overall look and feel or “style” of the website. Aspects of styling may include the size, color, and font choices for the text but also other considerations such as the relative or absolute layout of key page elements and their size. This information is conventionally transferred from the server to the browser as a separate file or set of files, in a format called Cascading Style Sheets (or, CSS), and is associated with the .css file extension. In order for the browser to present the web resource as intended, another HTTP request is required to retrieve the appropriate CSS file(s). Since HTTP makes use of URLs to locate and retrieve resources, there will be a URL associated with each of these files.

The web is a multimedia platform, and one of the earliest media types to be supported by browsers was the image. As you may know, images may be stored in a variety of formats (e.g. GIF, JPEG, WebP), and there is therefore a variety of extensions associated with them (i.e. .gif, .jpg or .jpeg, .webp). Again, for each image, a new HTTP request is typically required. Hence, each image will likely be associated with its own URL. You may notice that the URLs for resources, such as images, do not necessarily contain the same domain name as the domain name of the site you are visiting.

Increasingly, designers are keen to ensure that websites and related resources are interactive and dynamic, and this requires the transfer of content in yet another format: Javascript. Javascript is a flexible, general-purpose programming language that is executed (more-or-less) entirely contained within the browser and allows web developers to create extremely rich, interactive modifications to the document content on the fly. It is commonly stored in files with the .js extension, which are requested by the browser, again using HTTP.

Once the assorted files have been requested by the client, which in this case is the web browser, they are assembled and interpreted to provide the total experience of the page. The specific details of each of thes file formats are not relevant to this lesson; the key takeaway is that most web pages in fact decompose into multiple parts, each of which is associated with a single HTTP request. Understanding this fact allows us to begin to pick apart web resources into their consituent parts, some of which are more or less usable for research purposes.

Developer Tools and the complexity of the modern web

To understand precisely what and how many HTTP requests that a browser makes in the course of requesting a single web page, we can use the built-in Developer Tools function of most modern borwsers, to inspect the all network activity for a single page load event.

Most modern browsers include a set of tools called “developer tools”, which areused by the programmers who create websites and other interactive experiences to debug and assess the performance of their creations, among other things. However, they are an extraordinarily useful asset for researchers, and are a useful way to get started thinking about what’s “under the hood” of the web. These are almost certainly available in the browser you’ve already installed.

Developer tools in different browsers

Different browsers (and different operating systems) expose developer tools in different ways. Here is a quick guide. Some of the details of what follows will vary depending on your specific browser. To follow along exactly, it is recommended to use Google Chrome.

Microsoft Edge

Click on the three-dot menu in the top right corner.
Hover over More Tools.
Click on Developer Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Firefox

Click on the three-line menu in the top right corner.
Click on Web Developer.
Click on Toggle Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Google Chrome

Click on the three-dot menu in the top right corner.
Hover over More Tools.
Click on Developer Tools.

Or simply use the shortcut F12 or Ctrl+Shift+I on your keyboard.

Safari (on Mac)

Click on Safari in the top left corner of the screen.
Click on Preferences.
Go to the Advanced tab.
Check the box at the bottom that says Show Develop menu in menu bar.
Close the Preferences window. The Develop menu will now appear in the menu bar.
Click on Develop in the menu bar.
Click on Show Web Inspector.

Or simply use the shortcut Cmd+Option+I on your keyboard.

To do this:

First, navigate to a site of your choice
Then, open Developer Tools using the three dots menu in the top right (Windows/Linux) or under the menu View > Developer… (or Ctrl+Shift+I/Cmd+Shift+I)
A new pane will pop open; navigate to the “Network” pane.
Ensure that network activity is being recorded by clicking the red “record” button in the top left of the pane
Refresh the page (shortcut: F5)
Once the page is finished loading, disable recording

Something like the image shown here will result:

The colored bars at the top of the screen show each individual HTTP request graphically (the “waterfall”), with the duration taken for each request to be fulfilled is given by the length of the bar. The different colors indicate what the status of the request is over time. Notice also the statisics at the end of the file list: the total number of requests made, the total amount of data transferred, and the total load time for the site.

To dig into a particular request, select it from the list and double click on it. It will open a new pane as below. Look at the very first request in the list, the request for the document itself: index.html. From this we’ll see a a lot more information about the request, including the HTTP status code associated with the response, and the IP address of the responding server. We also see the request method, the HTTP “verb” that was used to fetch the resource. Almost all browser-initated requests will use the GET verb, but others are available (POST is another verb and is used to submit form data, and sometimes to call APIs — see later episodes).

Under the Timing pane, you’ll see a more fine-grained look at the time that each request took and what the colors stand for.

Challenge 2: The feel of the web

Pick a website that you regularly consult for research purposes; any website will do
Visit any page on this website
Open the developer tools in your browser and navigate to the Network tab (the name may vary). Press the refresh button (or F5)

Post your answer to the following questions:

The URL of the page or resource on the site that you visited
A URL pointing to a CSS file (ending .css) used by this page
A URL pointing to a Javascript file (ending .js) used by this page
A URL pointing to an image file (many extensions possible) used by this page
A URL pointing to some other file (many extensions) used by this page

Absolute vs. relative URLs

Just like with paths, there are absolute URLs and relative URLs. This distinction is not relevant for the rest of this lesson, but if you would like to learn more about this, please consult the mdn web docs page, “What is a URL?”.

Key Points

Web servers provide remote resources to clients, most commonly browsers, using the HTTP protocol
URLs are the “addresses” of the web, and they specify the location of a remote resource for the purposes of retrieval
Most websites today consist of resources of a variety of file formats, and each remote resource usually demands its own HTTP request and has its own URL associated with it
We can inspect the torrent of HTTP requests that websites require by using most modern browser’s Developer Tools