|
Functions that Operate on Texts
-
XPATH_VALUE(text, xpath)
- Returns an output text corresponding to the input text, in which those nodes matching the xpath pattern which are
to be marked, become marked in the output text.
- MARK_UNION(text1, text2, ...)
- Returns a new text being the smallest that includes each input text. This new text has as its set of
marks the union of the marks occuring in all input texts. All input texts must be reference the same external
document.
- MARK_INTERSECT(text1, text2, ...)
- Returns a new text being the smallest specified within the input text which is contained within all input
texts. This new text has as its set of marks the interesection of the marks occuring in all input texts. All
input texts must be reference the same external document.
- MARK_EXCEPT(text1, text2)
- Returns a new text being identical to text1 except that all marks contained within text
2 are absent in this new text.
- COUNT_MARKS(text)
- This function has been depreciated since it is now supported by the more general text_to_integer() class
of functions.
- KEEP_MARKS(text, integer1 [, integer2])
- Returns a new text being identical to text1 but containing only the indicated set of marks,
extracted from the list. Integer1 refers to the initial mark offset within the list to keep (with
the first mark occurring at offset 1). Integer2 if present specifies the number of marks within
the list to keep. If absent, all marks occurring at or after offset integer1 are kept.
- SUBTREE(text [, integer])
- Returns a new text rooted at the nth subtree identified by the marks in the input text. The desired
subtree is specified by the integer, which defaults to 1. The first subtree is rooted at the earliest mark
in the list of marks, the second subtree is rooted at the first mark not occuring under this first mark, and
so on. All marks occurring in the input text which do occur under the specified subtree are included as the
only marks in the resulting text.
- MARK_TO_TEXT(text [, integer])
- Returns a new text rooted at the nth textmark within the input text. This defaults to 1 if not specified.
Returns NULL if the mark requested is less than 1 or greater than the number of marks currently in the text.
- TEXT_TO_STRING(text , method [, tagid])
- Casts the input text as a string. If not specified tagid defaults to "Marked". The method qualifies how the text is to be converted
into a string. This may be any of:
- default
- Equivalent to omitting the string. Some default method is chosen, consistent with the method employed
when text is explicitly cast as a string.
- clear
- Removes all markup and comments from the input text. Replaces such markup by whitespace.
- nodes
- Returns an XML string documenting the node numbers and offsets used within the text.
- offsets
- Returns an XML string documenting the text offsets and lengths within the external text.
- raw
- Returns the raw text spanned by the input text.
- content
- Returns the raw text after removing any outer tags.
- tagged
- Returns an XML string containing the original text, with all marked subtexts tagged with this string
themselves tagged with <tagid> .. </tagid> tags.
- keep
- Similar to tagged but returns only those parts of the tagged text tagged with <tagid> .. </tagid> tags.
- omit
- Similar to tagged but returns only those parts of the tagged text not tagged with <tagid> .. </tagid> tags.
- markup
- Similar to raw but returns only those parts of the tagged text which constitute markup.
- cmarkup
- As per markup but also preserve comments.
- rootname
- Returns the string representing the name of the root node in the text. This will be a FILE name, a
DOCTYPE name, an ELEMENT name, ATTRIBUTE name, or word.
- roottype
- Returns a string defining the type of the root element. This is one of:
- FILE
- DOCUMENT
- BODY
- PI
- PI WORD
- COMMENT
- COMMENT WORD
- ELEMENT
- ATTRIBUTE
- ATTRIBUTE VALUE
- WORD
- start
- Returns the raw text from this node up to but not including the start of the next node.
- prefix
- Returns the namespace prefix associated with this element or attribute node else the empty string.
- localname
- Returns the local part of the name associated with this element or attribute else the empty string.
- uri
- Returns the namespace uri associated with this node, or the empty string if none.
- name
- Returns the original name of this element or attribute node else the empty string.
- attributes
- Returns the attributes in the text associated with this element else the empty string.
- baseattrs
- Returns all attributes in the text associated with this element except 'xmlns' attributes, else the
empty string.
- namespaces
- Returns as XML text the set of namespaces which are applicable to this node and its descendants.
- value
- Returns the value associated with this attribute node else the empty string.
- TEXT_TO_INTEGER(text [, string])
- Returns some property of the root of the input text as a 64-bit integer value. An optional method
qualifies what value is desired. This may be any of:
- default
- Equivalent to specifying 'node'.
- node
- The node number of the root of the text is returned.
- span
- The number of nodes spanned by the root of the text is returned. This is 0 if the root of the text is
also a leaf.
- tonode
- The last node number spanned by this text. This is equivalent to node+span.
- truespan
- The number of nodes actually spanned by the root node of the text. Note that the span associated with
the root node of a text may differ from the true number of nodes spanned by this root node.
- offset
- The physical character offset within the text is returned.
- length
- The number of characters in the text spanned by the specified span.
- truelength
- The number of characters in the text spanned by the root node, spanned by the true span of this root
node.
- end
- The physical character offset beyond the end of the text is returned.
- parent
- The parent of the root node is returned. Nodeno 1 Represents the entire text. Nodeno 0 is returned for
nodes that have no parents.
- marks
- Returns the number of marks in the text
- children
- Returns the number of children beneath the root.
- child
- Returns the sibling number of the root with respect to siblings of its parent. 0 is the left most
sibling. For a parent with 'n' siblings sibling number 'n-1' would be the rightmost sibling under this
parent.
- TEXT_TO_DOUBLE(text [, string])
- Returns some property of the root of the input text as a 64-bit integer value. An optional method
qualifies what value is desired. This may be any of:
- default
- Equivalent to specifying 'sum'.
- sum
- The sum of the marks when interpreted as numeric values.
- max
- The maximum value of any mark when the text spanned by this mark is interpreted as a numeric value.
- min
- The minimuum value associated with any mark in the text.
- avg
- The average value associated with any mark in the text.
- TEXT_ROOT(text [, integer1 [, integer2]])
- Returns the input text rooted at the indicated node integer1 (or at the start of the document
if absent), and if present having span integer2. Adjusts a -ve node upwards to 1, decrementing as
appropriate the span. Adjusts a node greater than the max down.
- ADD_MARKS(text, integer1 , integer2, ...)
- Adds the specified node/span pairs as marks to the input text, provided that these marks are subsummed by
the root of the input text. Otherwise the marks are ignored.
- TEXT_SECTION(text, integer1, integer2)
- Returns as a string the section of text beginning at offset integer1 and having length
integer2.
- GRAMMAR_TO_STRING(text, string)
- Casts the grammar associated with the input text as a string. An optional method qualifies how the
grammar is to be converted into a string. This may be any of:
- default
- Equivalent to omitting the string. Some default method is chosen, consistent with the method employed.
- type
- Returns a string documenting the type of the grammar employed within this input text.
- content
- Returns a string documenting the grammar associated with the text.
- dtdroot
- The element name which defines the root of the DTD
- indexname
- The name of the index file supporting access to this text
- encoding
- The character encoding used on the text. This is one of:
- ASCII
- UNICODE FFFE (low endian)
- UNICODE FEFF (high endian)
- endian
- The endian architecture that the index was built on. This is one of:
- LOW
- HIGH
- DAMAGED (Index is corrupt)
- sourcetype
- The type of input used to construct the index. This is currently one of:
- UNKNOWN
- FILE
- STRING
- NETWORK
- locale
- The GUID of the class (supporting the interface IQsm2Locale) used to encode the text into words,
numerics, etc.
- sourcename
- The name of the input source from which the index was constructed if known.
- internal
- Returns YES if the text is stored internally within the index. Returns NO if it is stored separately.
- GRAMMAR_TO_INTEGER(text [, string])
- Returns some property of the grammar as a 64-bit integer value. An optional method qualifies what value
is desired. This may be any of:
- default
- Equivalent to specifying 'sot'.
- sot
- Returns the start of text character offset.
- eot
- Returns the number of characters in the text.
- maxnode
- Returns the maximum node number used within the original text.
- version
- Version of indexer used to index text
- elements
- Number of elements in the grammar
- attributes
- Number of attributes in the grammar
- blksize
- The blocking factor used to index the text and grammar
- indexsize
- The number of bytes in the index
- nodesperblock
- The number of node descriptions per block
- filetime
- The filetime when the index was created
Cast functions
- CAST(text as string)
- This operation used the default TEXT_TO_STRING method to convert the text to a string.
- CAST(text as integer)
- This operation attempts to cast the contents of the text as an integer. A null is returned rather than
an exception if this is not possible. If a run time exception is desired cast the text to a string and then
the string to an integer.
- CAST(text as real)
- This operation attempts to cast the contents of the text as an real. A null is returned rather than an
exception if this is not possible. If a run time exception is desired cast the text to a string and then the
string to an real.
- CAST(text as date)
- This operation attempts to cast the contents of the text as an date. A null is returned rather than an
exception if this is not possible. If a run time exception is desired cast the text to a string and then the
string to an date.
- CAST(text as time)
- This operation attempts to cast the contents of the text as an time. A null is returned rather than an
exception if this is not possible. If a run time exception is desired cast the text to a string and then the
string to an time.
- CAST(text as timestamp)
- This operation attempts to cast the contents of the text as a timestamp. A null is returned rather than an
exception if this is not possible. If a run time exception is desired cast the text to a string and then the
string to an timestamp.
Functions that Operate on Strings
- TRANSLATE_STRING(string1, string2)
- The XML string contained in string1, is translated into a new string according to the
rules specified in string2. This new string is returned.
- PLAN_TO_STRING(string, query)
- The plan produced by the query is converted into a string, according to the method documented in string.
Note that string is specified before the query since this simplifies parsing of the construct. Method may be
any of:
- default
- Some default method is chosen.
- all
- Returns a string documenting the entire plan.
- tables
- Returns a string documenting the tables used within the plan.
- DOCUMENT(string1, [string2])
- Casts the input string1 as an object of type text, using
the text indexer. String2 if present provides runtime instructions
to the indexer, as would be provided on the command line.
If the
file() function wraps this string, then the string is taken to be the name of a file,
or URL whose contents are to be cast as a text.
- STRING_TO_STRING(string1, string2)
- Converts string1 according to the method specified in string2. This may be one of:
- XML
- Encode the special XML markup characters.
Specifically maps (1) '>' to '>' (2) '<' to '<' (3) '&' to '&' (4) '#'
to '#' (5) " to '"' and (6) ' to '&pos;'
- CGI
- Encode the special CGI markup characters. This is the appropriate translation to apply to XML encoded
data.
Specifically maps (1) whitespace (cr, nl, blank, tab) to '+' (2) '%' to '%25' (3) '+' to '%2B' (4) '&' to
'%26' and (5) '?' to '%3F'.
For safety (although these symbols are not expected to be seen in the input string) also maps " to '%22',
' to '%27', '<' to '%3C' and '>' to '%3E'.
- URI
- First encode the string according to XML.
Then encode the resulting string using CGI. This is the appropriated CGI translation to perform when
embedding plain text in a CGI command line.
- Unquote
- Remove any leading/trailing single or double quote from the input string.
- Noblanklines
- Removes all the blank lines from the input string.
- XSTL(string1, string2)
- Performs the desired XSLT string transformation on the XML string1, using the XSLT rule file
string2, returning the result. If the file() function, wraps either
input string, then this string is presented to XSLT as the name of a file to opened, rather than as
a string to be processed. Providing XSLT with filenames, avoids the need to construct memory resident
images of the file contents in huge strings. For this function to work
MSXML4.0 (available as a
free download
from Microsoft) must first be installed.
Other Functions
- FILE_TO_TEXT(string)
- Assumes that the filename provided in the input string is a text index constructed using
tokenize, and returns the internal representation of this text value. This function
will return null (for security reasons) if the input string fails to have a file extension described in the
value of the registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Dealers Choice Software\TextServer\FileTypes\Text.
(See also TEXT_FILES.)
- FILE_TO_STRING(string)
- Assumes that the filename provided in the input string itself contains an ascii or unicode string.
Returns as its result the value of this internal file string. This function will fail if the string exceeds
the maximum string length allowed. This function will return null (for security reasons) if the input string
fails to have a file extension described in the value of the registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Dealers Choice Software\TextServer\FileTypes\String.
(See also STRING_FILES.)
- FILE(string)
- This function merely returns a copy of its input. The resulting string has a special attribute set
indicating that where possible it is to be considered to represent a filename. At present this special
attribute is employed only by xslt() and document() functions.
|
|