Advanced Queries use operators and expression syntax to construct queries. The rules for defining words and phrases, capitalization and wildcards are, however, the same as for Simple Queries.
Compare, for example, a one-word Simple Query, say plato, with the same word submitted as an Advanced Query, but with no ranking specified. More specifically, this latter query has plato in the search field and nothing in the ranking field. Each of the two queries produces "about 20000" documents, but the ranking is different in each case.
The explanation for the difference in ranking is rather complex, but briefly, AltaVista implements Simple Queries as Advanced Queries. More specifically, a Simple Query gets transformed into a boolean expression together with a set of words to rank the results.
In the example above, AltaVista will implement the Simple Query consisting of the one word, plato as an Advanced Query with nothing in the search field, but plato in the ranking field. Recall that in this example, the Advanced Query had plato in the search field and nothing in the ranking field; in other words, the two queries were actually not identical, and hence the different rankings.
If you submit a different Advanced Query, this time with plato in both the search field and the ranking field, the rankings of the documents matched will also be identical to those produced by the Simple Query for plato.
To sum up, all three of the following queries produce the same matches and in the same ranking order.
Type of Query fields Query word ============================================================ Simple search only plato ------------------------------------------------------------ Advanced search --- ranking plato ------------------------------------------------------------ Advanced search plato ranking plato ------------------------------------------------------------
The following query will give you the same matches as for the queries above, but in no particular ranking order.
Type of Query fields Query word ============================================================ Advanced search plato ranking ---- ------------------------------------------------------------
Search field (gold near silver) and platinum Ranking field Result 2000 documents found and listed in no particular order.
The 2000 documents found will contain the words gold located close to silver and in addition in the same document, the word platinum. If you now choose platinum to rank the search results, the query will produce the same 2000 documents, as you might expect, but ranked so that those with the highest scores for platinum are placed at the head of the resulting list.
Search field (gold near silver) and platinum Ranking field platinum Result 2000 documents found, ranked so that those with high scores for platinum are listed first.
You might want to proceed further. On the assumption that documents containing matches for these metals also contain references to other metals, you might want to check for occurrences of another. But notice what happens now to the search results.
Search field (gold near silver) and platinum Ranking field palladium Result 200 documents found
In this case, the Advanced Query has not returned the 2000 documents that resulted from this search and reranked them so that any with matches for palladium are listed first. A second level of filtering has been applied to the search result; 1800 documents that do not contain matches for palladium have been discarded. In other words, when the ranking field is not empty, documents that contain none of the words in the ranking field are discarded.
It is possible to restrict searches to certain portions of documents by using the following syntax. The keyword (link, title, image,...) should be in lower-case, and immediately followed by a colon.
AltaVista treats every page on the Web and every article of Usenet news as a sequence of words. A word in this context means any string of letters and digits delimited either by punctuation and other non-alphabetic characters (for example, &, %, $, /, #, _, ~), or by white space (spaces, tabs, line ends, start of document, end of document). To be a word, a string of alphanumerics does not have to be spelled correctly or be found in any dictionary. All that is required is that someone typed it as a single word in a Web page or Usenet news article. Thus, the following are words if they appear delimited in a document: HAL5000, Gorbachevnik, 602e21, www, http, EasierSaidThanDone, etc. The following are all considered to be two words because the internal punctuation separates them: don't, digital.com, x-y, AT&T, 3.14159, U.S., All'sFairInLoveAndWar.
Only the words in a document are significant to AltaVista. AltaVista does not index punctuation or white space, so you can use AltaVista to look only for words and phrases, not punctuation.
A phrase is a string of words that are adjacent in a document, although they may be separated by any amount of white space or punctuation. They do not have to be grammatical in any human language--they just have to occur in a document as an adjacent sequence of words. Some examples:
Since the punctuation and white space are insignificant to AltaVista (except that they delimit words), the phrases above are indistinguishable from the following variants:
There are two conventions for typing a phrase in a query. The best way, leading to the least ambiguity, is to type the phrase as "a sequence of words separated by spaces and surrounded by double quotes". However, as an alternative, you may type the words of the phrase with punctuation (and no white space) between each pair of words. For example, these are all equivalent as queries:
The first is the one we generally recommend. Be aware that the punctuation characters & | ! and ~ have meaning in Advanced queries, and * indicates the *-notation used in both Simple and Advanced queries.
Capital letters are considered distinct from lower-case letters. When a word is found in a Web page or a news article, its case is preserved when it is stored in the index.
When you enter a word in a query, therefore, it is always safe, and generally recommended, to type it all in lower-case, because lower-case letters indicate a case-insensitive match. If you type any capital letters, you force an exact case match on the entire word.
Thus, the word turkey in a query will match any of turkey, Turkey, tUrKeY or TURKEY occurring in a document. But the capitalized word Turkey in a query will match only Turkey in the document, and not any of the other capitalization variants.
Accents are treated in the same way as capitalization. An accented word used in a query forces an exact match on the entire word. For example, if you use éléphant in a query, you will match only the French spelling for the pachyderm. However, if you do not care to enter accents in the search window (something which is browser, platform, and keyboard-dependent), you can always safely omit the accents, thereby matching both the French and English spellings.
To search for occurrences of any of a group of words with a similar pattern, AltaVista provides the *-notation. For example, you might want to search for matches of sing, singer, singers, singing. In this case, place the *-notation at the end of the word whose inflections you want to include in the search: sing*. But, a word of warning. AltaVista will also match words lexically unrelated to your query word. So the query sing* will also find matches for singe, single, singular, and for foreign words such as French singulier.
The *-notation cannot be used without restriction. To make such queries computationally feasible, AltaVista requires that the * be used only after at least three letters. The *-notation will match from zero up to five additional letters in lower-case only. Capital letters and digits will not therefore be matched.
The *-notation can sometimes be useful for finding variant spellings: for example, cantalo* will find matches for cantaloup, cantaloupe, cantalope, and their plurals. But take care how you construct the query word. For example, if you want to find matches for both color and colour, a query of the form col*r is not the most efficient. This query will also find matches for collector and atomic collider. In this case, it is more efficient to submit the query colo*r, which will find matches for both color and colour.
Finally, if your search using the *-notation finds too many matches, AltaVista will ignore the query. The query inte*, for example, produces the result,
Ignored inte*: 4292323
No documents match this query
In the absence of any other information, AltaVista will index all words in your document (except for comments), and will use the first few words of the document as a short abstract.
It is however possible for you to control how your page is indexed by using the META tag to specify both additional keywords to index, and a short description. Let's suppose your page contains:
<META name="description" content="We specialize in grooming pink poodles."> <META name="keywords" content="pet grooming, Palo Alto, dog">
AltaVista will then do two things:
AltaVista will index the description and keywords up to a limit of 1,024 characters.