Web Publishing for Genealogy

Introduction

Home
LINKS to Web site mentioned in the text
REVIEWS
Where and how to ORDER
Finding Genealogy on the Internet The Genealogist's Internet
Valid HTML 4.01!
Valid CSS!

How Pages are Identified

Browsers obviously need to be able to find particular pages. What makes this possible is that each page has a unique identifier called a URL, which stands for Uniform Resource Locator.  In fact, every resource on the Internet has its own unique URL. The URL is the "address" of the page on the Internet, specifying:

  • what type of resource it is (whether it's a Web page, a file in an archive, etc.)
  • the Internet address of the server it's located on
  • its location on that server.[1]

For example, the URL for this Web page is:

http://www.spub.co.uk/wpg / text/a13.html
typeserverfoldersfilename
folders+filename = pathname

The http: indicates that it's a Web page,[2] located on the server www.spub.co.uk (server names are preceded by double slashes), and its pathname on that server is /wpg/text/a12.html. A pathname is a combination of the filename and the names of the directories or folders in which the file is stored: here the file a12.html is stored in the directory wpg/text/ - single slashes separate the elements of the pathname.

Note that pathnames are case sensitive: if you type the URL above with WPG instead of wpg you will get an error message.[3] Server names, on the other hand are not case sensitive and wWw.SPub.co.UK would work just as well.

You will often see URLs that don't give a file name, and end with a slash, or which simply give the name of the server. Whenever the Web server hasn't been asked to send a specific page, it automatically sends the default or home page for the relevant directory.

So, for example, you can access the main Web page for this book by going to http://www.spub.co.uk/wpg/text/ - the Web server knows to send the page index.html if no other page in the directory /wpg/text/ is specified. If the URL contains no pathname at all, the server will send back to the browser the default page for the entire server, so, for example, Microsoft's home page can be specified simply as http://www.microsoft.com/ without a pathname.


[1] For a more detailed (and technically more precise) definition, see my Introduction to URLs.

[2] "http" stands for "HyperText Transfer Protocol", the communications standard which underlies the transmission of Web pages between client and server.

[3] The message will be "Page not found". You will get this message of this sort if a page has changed its URL, but more often it will be down to typing errors. However, some servers can automatically detect URLs that are in the wrong case and correct for it.

--> NEXT: The Process of Web Publishing

1. Introduction