April 14, 2006

General Input Tests for Strings


Here are some routine tests to try for a simple string field.

To use these values, you should first know the Minimum and Maximum number of characters required.

Then decide which of the following are relevant for your input field and use them.

If you are using an automated test tool, these values can easily be used exhaustively, or randomly, by a test script:
  • Nothing
  • Empty field (clear any defaults)
  • More than the maximum number of characters
  • Much more than the maximum number of characters
  • Any valid string
  • A single leading space
  • Many leading spaces
  • A single trailing space
  • Many trailing spaces
  • Leading and trailing spaces
  • A single embedded space
  • Many embedded spaces
  • Nonprinting character (e.g., Ctrl+char)
  • Operating system filename reserved characters (e.g., "*.:")
  • Language-specific reserved characters
  • Upper ASCII (128-254) (a.k.a. ANSI) characters
  • ASCII 255 (often interpreted as end of file)
  • Uppercase characters
  • Lowercase characters
  • Mixed case characters
  • Modifiers (e.g., Ctrl, Alt, Shift-Ctrl, and so on)
  • Function key (F2, F3, F5, and so on)
  • Characters special to sprintf (like or %d)
  • Keyboard special characters (~!@...)

Character Input Testing

The following is a list of characters that are, or may be, special to some portion of the processing chain for many web applications.

A robust input and display testing strategy will verify that these characters are
  • correctly rendered on input
  • if persisted in some way, correctly rendered after being persisted and retrieved from the persistent store
  • work correctly when placed in different parts of an input string (beginning, middle, end)
Note that some of these collections overlap -- the intent is for each list to be complete in and of itself.
  • Characters that are special to file system paths   
    • . \ / : ;
  • Characters that are illegal in file system paths     
    • unknown, TBD
  • Characters that are special to XML                   
    • < > & " '
  • Characters that are illegal in XML
    • The following BNF rule describes valid characters in XML (from XML specification):
      • Char ::= #x9
        #xA
        #xD
        [#x20-#xD7FF]
        [#xE000-#xFFFD]
        [#x10000-#x10FFFF]
  • Any character not in these ranges (for example #xC) is not allowed to appear anywhere in an XML document. (We still may need to handle these values correctly in some places.  Even if the user may not be able to enter them, other services sometimes return them to us. For example, some SNMP object discoveries return strings containing #x0.)
  • Characters that are special in Javascript 
    • TBD, includes " ' \ and some character combinations beginning with \, like \u \n.
  • Characters that are illegal in Javascript
    • TBD
  • Characters that are special in URLs
    • See RFC 2396, sections 2.2-2.4. The following characters are special:
      • reserved = ";" "/" "?" ":" "@" "&" "=" "+" "$" ","
  • Characters that are illegal in URLs
    • See RFC 2396, sections 2.2-2.4. The following characters (the "unreserved" rules) are legal in URLs:
      • unreserved = alphanum | mark
      • mark = "-" "_" "." "!" "~" "*" "'" "(" ")"
      • alphanum = A-Z a-z 0-9
    • Any characters not in this set (for example 16-bit characters, 8-bit non-US-ASCII characters, and punctuation marks not in this list, except reserved characters being used in their official capacity) must be hex-encoded to be included in a URL.
    • The following rules describe characters that are explicitly disallowed (this set is NOT exhaustive, as the previous paragraph is):
      • control =
      • space =
      • delims = "<" ">" "#" "%" "<" ">"
      • unwise = "{" "}" "" "\" "^" "[" "]" "`"
    • Characters that are 8-bit (e.g. European characters from ISO-8859-1,2,3): a with umlaut, e with accent, o with slash
    • Characters that are 16-bit: Korean, Japanese characters