California Law Help Page Index


CALIFORNIA LAW

DETAILED TIPS ABOUT SEARCHING
  • Stop words
  • Natural Language Search
  • Literal Strings
  • Boolean Operators
  • Fielded Search
  • Date and Numeric Ranges
  • Right Truncation (Wildcards)
  • Grouping Search Terms
  • Relevance Ranking
  • Proximity Relationships
  • Special Characters
  • Limited Hits

    California Law Help

    CALIFORNIA LAW

    California Law, including the 29 subject area Codes, the Constitution and the Statutes, is updated as new laws are enacted. These laws can be displayed or searched.

    To display the Table of Contents for a Code, select the Code and click on Search. From within the Table of Contents, you may select a range of sections to display by clicking on the underlined section numbers.

    To search by keyword(s), select one Code, several Codes, or ALL for all Codes, enter the keyword(s) in the box provided, and click on Search. Searching on multiple keywords will retrieve documents containing all of the entered keywords.

    When documents are returned after a search has completed, the order in which the documents appear on the screen is based on what is known as relevance ranking. Each document is scored based on its relevance to a user's question, where the most relevant document has the highest score, or rank -- 1000 being the highest, 1 being the lowest. Documents are returned to the screen in order of the document with the highest score listed first and the document receiving the lowest score appearing last.

    A document receives a higher score if the words in the question are in the headline, if the words appear many times, or if phrases occur as they do in the question. A document's score is derived using techniques such as word weighting, term weighting, proximity relationships, and word density.

    Each of the documents retrieved will be underlined and will consist of the name of the Code along with the range of sections corresponding to the groupings made in the Table of Contents. The word(s) specified in the search will be found within these section code ranges and will be highlighted. All code sections within the range will be displayed by clicking on the underlined code.

    To display code sections, click on the underlined title. To display information about the California Constitution or California Statutes, click on the underlined titles.


    DETAILED TIPS ABOUT SEARCHING

    Special Search Note

    Words so common that they occur in almost every document are called stop words. These stop words cannot be used for searching a document because they occur so frequently that they are not useful for distinguishing one document from another. Stop words are ignored during a search. If they are inadvertently used for a search, the results may be undesirable.

    Natural Language Search

    The server can be queried using natural language questions. The server does not understand the question, rather it takes the words and phrases in the question and finds documents that have those words and phrases in them. "Tell me about portable computers." is an example of a natural language question. In this example, the WAIS server would search for documents containing the words 'portable' and 'computers'; the other words, 'tell', 'me', and 'about', are called "stop words" -- words so common that they occur in almost every document and so they are not used for searching a document.

    Literal Strings

    A similar but more specific kind of query asks to find documents that contain one or more exact phrases by enclosing them in double quotation marks. This is known as a literal. For example, the query

      "search engine capabilities"

    returns only documents that contain this exact phrase. The WAIS search engine performs a literal search exactly as if you had used the boolean operator ADJ. Thus the above example would yield the same results as

      search ADJ engine ADJ capabilities

    For this reason, it is best to stick to noun phrases when using literals; if your literal phrase includes stopwords, the stopwords will be ignored.

    Boolean Operators

    The boolean operators, AND, OR, NOT, and ADJ aid in establishing logical relationships between concepts expressed in natural language. These operators are especially useful in narrowing down the search.

    AND, &&

    The AND operator is helpful in restricting a search when a particular pair or larger group of terms is known. For instance, when searching for documents on the weather in Boston, a question such as "weather AND Boston" would return only those documents that contain both the word "weather" and the word "Boston". You can use more than one AND in a query, e.g. "weather AND Boston AND November". Note that the C-like double ampersand (&&) may be used instead of spelling out the word AND.

    OR, ||

    The OR operator is often used to join two different phrases of a Boolean search. A question such as "hurricane OR tornado" would search for all documents containing either the word "hurricane", or the word "tornado", or both. You can also use more than one OR in a query. A natural language question is much like having an implicit OR between the words, except that the search engine does more work in a natural language query to determine the relevance of words and their relationships in a phrase. Note that the C-like double vertical bars (||) may be used instead of spelling out the word OR.

    NOT

    NOT is a binary operator. That is, it has to come between two or more words or parenthesized clauses. NOT is used to reject any documents that contain certain words. The question "basketball NOT college" would find all documents containing the word "basketball", that do not also contain the word "college". Note, however, that this question would eliminate articles on any professional players that mention their alma maters; in other words, be careful not to limit your search too much with the NOT operator, make sure that you know what you're throwing away.

    Don't be afraid to use NOT! One good search strategy is to search for a broadly occurring term and get lots of documents you don't want, and then to use NOT to filter out the bad documents. For example, if you're trying to cook okra, you might search for "cooking AND okra" and find nothing; but if you search for "cooking", you find lots of articles on cooking meats and pastas. You then can search for "cooking NOT meat NOT pasta", and you might find more interesting articles that eventually lead you to your goal. Another handy trick is to use NOT to "break the 40 barrier". Typical WAIS clients only display 40 documents, but if you use NOT wisely, you can flush out the documents you don't like in those 40 and progressively refine your search, adding better and better documents to the 40 that you see.

    ADJ

    The adjacent operator, ADJ, is used to ensure that one word is followed by another in the returned document, with no other words in between. For example, "cordless ADJ telephone" returns only documents containing "cordless telephone" and ignores documents that only contain one of the words or that contain both but not adjacent to one another. ADJ will nonetheless work when stopwords interrupt two words; for example, the preceding example will find occurrences of "cordless for telephone". Note that the ADJ operator yields the same results as does a literal query. Also note that ADJ, unlike AND, OR, and NOT, is not a commutative property - "telephone ADJ cordless" does not work the same as "cordless ADJ telephone".

    Mixing Natural Language, Literals, And Booleans

    The ability to mix natural language, literals, and boolean operators is unique to the WAISserver search engine. Combining natural language and boolean operators enables end users to better target their searches. For example, suppose you were looking for documents specifically on portable laptop computers that are not made by Tosuji Corporation. The question could then be "Tell me about portable laptop computers NOT Tosuji."

    Fielded Search

    For data sets whose documents have special data fields, selected portions of the documents can be tagged by the WAIS parser as fields. A client can then ask a WAIS server to limit its search to those documents containing a user- specified value of a particular field. This is called a fielded search.

    The mail-or-email parse format is an example of a parse format in which fields are tagged. For this parse format, the WAIS parser detects the "to" and "cc" fields, the "from" and "sender" fields, the "subject" field, and the "date" field. An example of a question using natural language, a boolean operator, and fielded search is: "company picnic AND from=barbara". The WAIS server would then find email messages about a company picnic that Barbara sent.

    Date and Numeric Ranges

    For a date or numeric field, a range may be specified using the syntax

      field-name    comparison-operator    value

    where comparison-operator may be one of > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to), or = (equal to).

    Currently, dates with the following formats are supported:

                  m-d-yy    m-d-yyyy    mm-dd-yy
                  m/d/yy    mm/dd/yy    m.d.yy
                  today     yesterday

    Only positive integers are supported for numeric fields. If the comparison operator is =, then the range may be specified using the word TO, as in

               date = 4/15/93 TO 4/14/94

    Both ends of the range are inclusively specified.

    Right Truncation (Wildcards)

    A user can specify right truncation by ending a word with the asterisk (*) wild card character. This tells the search engine to search on words whose first several characters match the base characters before the *. For example, you might use right truncation in a question such as geo*, which may retrieve documents containing the words: geographer, geography, geologist, geometry, geometrical, etc.

    Grouping Search Terms

    A user can group search terms and phrases together using parentheses.

    For example, if you wish to search for information about snowstorms, tornadoes, or hurricanes in New York City, you might search for "(snowstorms OR tornadoes OR hurricanes) AND (New ADJ York ADJ City)." You can also nest your parentheses; for example, "from = ( (ben ADJ wais) OR (brewster ADJ think) )" searches for messages from either ben@wais.com or brewster@think.com. When you're using several boolean operators, you should always group, to disambiguate how the operators are to be applied.

    Relevance Ranking

    When documents are returned after a search has completed, the order in which the documents appear on the screen is based on what is known as relevance ranking. Each document is scored based on its relevance to a user's question, where the most relevant document has the highest score, or rank -- 1000 being the highest, 1 being the lowest. Documents are returned to the screen in order of the document with the highest score listed first and the document receiving the lowest score appearing last.

    A document receives a higher score if the words in the question are in the headline, if the words appear many times, or if phrases occur as they do in the question. A document's score is derived using techniques such as word weighting, term weighting, proximity relationships, and word density. These scoring techniques are outlined below.

    Proximity Relationships

    Proximity relationship scoring specifies that if the words in a natural language question are located close together in a document, they are given a higher weight than those found further apart. The idea behind a proximity relationship is that if a document contains a phrase similar to one in the user's question, that document is more likely to be relevant.

    Special Characters

    The WAIS server was originally designed to be as general as possible and, in this spirit, it ignores all characters in a document that are not either an alphabetical letter or a number. In fact, non-alphanumeric characters usually separate words for the parser, for example, "F.Y.I." parses out to three words. This rule also applies to queries used to search a directory of servers.

    Limited Hits

    The number of hits is the number of times the bill is found in the database. For every version of a bill, i.e. introduced, amended, etc., an occurrence of the bill is stored in the database and is considered one hit. The most current version of a bill is kept for display. The database also uses an additional hit for housekeeping purposes for each search. Therefore the number of hits to the number of documents displayed is not one to one.

    For example, we searched for the word "DOG". The search found "DOG" in two bills Bill_1 and Bill_2.
    Bill_1 has four versions, (1)introduced, (2)amended,and (1)enrolled. Bill_2 has two versions, (1)introduced and (1)amended.
    Each version is considered a hit. In this example there are seven hits.

        Bill_1   four (versions)
        Bill_2   two (versions)
        plus one (housekeeping)
        For a total of seven hits.

    We have found that using the default setting of 50 hits will provide 8-12, on average, documents for display.
    For best performance results, it is highly recommend that the default setting is used.