How Pages are Identified
Browsers obviously need to be able to find particular pages. What makes this possible is that each page has a unique identifier called a URL, which stands for Uniform Resource Locator. In fact, every resource on the Internet has its own unique URL. The URL is the "address" of the page on the Internet, specifying:
- what type of resource it is (whether it's a Web page, a file in an archive, etc.)
- the Internet address of the server it's located on
- its location on that server.
For example, the URL for this Web page is:
|http:||//||www.spub.co.uk||/||wpg / text||/||a13.html|
|folders+filename = pathname
http: indicates that it's a Web page, located on the server
www.spub.co.uk (server names are preceded by double slashes), and its pathname on that server is
/wpg/text/a12.html. A pathname is a combination of the filename and the names of the directories or folders in which the file is stored: here the file
a12.html is stored in the directory
wpg/text/ - single slashes separate the elements of the pathname.
Note that pathnames are case sensitive: if you type the URL above with
WPG instead of
wpg you will get an error message. Server names, on the other hand are not case sensitive and
wWw.SPub.co.UK would work just as well.
You will often see URLs that don't give a file name, and end with a slash, or which simply give the name of the server. Whenever the Web server hasn't been asked to send a specific page, it automatically sends the default or home page for the relevant directory.
So, for example, you can access the main Web page for this book by going to
http://www.spub.co.uk/wpg/text/ - the Web server knows to send the page
index.html if no other page in the directory
/wpg/text/ is specified. If the URL contains no pathname at all, the server will send back to the browser the default page for the entire server, so, for example, Microsoft's home page can be specified simply as
http://www.microsoft.com/ without a pathname.
 For a more detailed (and technically more precise) definition, see my Introduction to URLs.
 "http" stands for "HyperText Transfer Protocol", the communications standard which underlies the transmission of Web pages between client and server.
 The message will be "Page not found". You will get this message of this sort if a page has changed its URL, but more often it will be down to typing errors. However, some servers can automatically detect URLs that are in the wrong case and correct for it.
--> NEXT: The Process of Web Publishing