CXString

alpha version, October 2000

The CXString instances represent generic strings. The API is (or will be, since the XSdk is in an extremely early stage) luxuriously rich, supporting many standard actions. The UNICODE is fully supported (currently only partially, but with no scheduled API changes--just the implementation does not work properly for some technicalities, like the UNICODE composed characters).

Types and constants

There are some string-related types defined:


typedef unsigned XStringEncoding;

The XStringEncoding values represent string encodings, ie. the possible methods how a string might be represented. The CXString internal representation is never an issue for a programmer; the concrete encodings are used for import and export strings.

There are some encodings so often used that is was worth to allocate particular named constants to them. They are the XPlainAsciiStringEncoding, XUnicodeStringEncoding, XISOLatin2StringEncoding, and XWindoze1250StringEncoding . The names are hopefully descriptive enough.

Note the CXString supports much more encodings than just these four; see the availableStringEncodings method below.

The Unicode encoding is pretty nicely implemented, with support for both endian variants when importing and for writing out the Unicode mark word (0xfeff) when exporting (the exported Unicode is always little-endian). Though, the Unicode composite characters are not implemented well: in particular, a Unicode string with a non-spacing acute and 'E' will have a length of 2, and will be considered different from another Unicode string, containing one composited character, the 'E with acute'. This will be fixed in time.


typedef TUint16 unichar;

The Unicode character type is so generic that it is worth to use a type name without the X prefix.


Creation

There is a number of methods to create a string object. Note that in a sense you should consider one of them the string constant, a @"..." primitive (see the preprocessor for more details). It makes a static string with the contents given; the following two lines are thus more or less equivalent:

CXString *s1=@"xyz";
CXString *s2=[CXString staticStringWithStaticEString:_L("xyz")];


+CXString *string;

Returns a newly created empty string.


+CXString *stringWith(const TDesC &des,...);

Returns a newly created string with the contents made using the data from the Epoc descriptor given, and all the following arguments. The descriptor can contain a printf-like format. The format characters recognized are the same ones as in the Epoc Format method, and one more: if you use in the format string a "%@", the appropriate argument is any object, and its description will be written out. Even the nil value is acceptable (written out as "*nil*").


+CXString *stringWithList(const TDesC &des,VA_LIST argList);

Like the above one, but uses the VA_LIST argument list variable.


+CXString *stringWith(CXString *str,...);

Like the above stringWith method, but uses the CXString instance for the format string.


+CXString *stringWithList(CXString *str,VA_LIST argList);

Like the above one, but uses the VA_LIST argument list variable.


+CXString *stringWithEString(const TDesC &des);

Returns a newly created string with the contents made by copying the value of the Epoc descriptor given. There is currently no support for a shared buffers (but there will be in near future).

This creator does not interpret format characters.


+CXString *stringWithStaticEString(const TDesC &des);

Just like the one above, but presumes the Epoc descriptor is a static one (ie. defined with the _L or _LIT Epoc macros). Therefore, the string does not need to copy its contents.

This creator does not interpret format characters.


+CXString *staticStringWithStaticEString(const TDesC &des);

Like the one above, but the string itself is static (ie. not autoreleased--see the beStatic method of CXObject class).

This creator does not interpret format characters.


+CXString *stringWithCString(const char *cp);

Returns a newly created string with the contents made by copying the value of the plain C string (zero terminated). There is currently no support for a shared buffers (but there will be in near future).

This creator does not interpret format characters.


+CXString *stringWithString(CXString *str);

Returns a newly created string with the contents made by copying the string given. There is currently no support for a shared buffers (but there will be in near future).

This creator does not interpret format characters.


+CXString *stringWithData(CXData *data,XStringEncoding stringEncoding=0);

Returns a newly created string with the contents made by interpreting the contents of the data object as a string in the given encoding. Unicode can be in litle- or big-endian form, provided there is the mark word 0xfeff; without it, it is considered always to be little-endian. The encoding of zero means "the system default encoding", see the defaultCStringEncoding method below.

This creator does not interpret format characters.


Initializers

-id initWith(const TDesC &des,...);

<<description forthcoming>>

-id initWithList(const TDesC °s,VA_LIST argList);

<<description forthcoming>>

-id initWith(CXString *str,...);

<<description forthcoming>>

-id initWithList(CXString *str,VA_LIST argList);

<<description forthcoming>>

-id initWithEString(const TDesC °s); // designated

<<description forthcoming>>

-id initWithString(CXString *str);

<<description forthcoming>>

-id initWithData(CXData *data,XStringEncoding stringEncoding);

<<description forthcoming>>

-id initWithData(void *data,int length,XStringEncoding stringEncoding);

<<description forthcoming>>

Generic services

-TDesC *eString;

Returns a pointer to an Epoc descriptor, which represents the string value. In the 8-bit Epoc build uses the system default encoding (see the defaultCStringEncoding method below), in UNICODE builds uses the little-endian Unicode with the mark word.


-const unsigned char *cString;

Returns a plain C string representation of the string value. The system default encoding (see the defaultCStringEncoding method below) is used.

The zero terminator is automatically added.


-unichar characterAtIndex(int index);

Returns the character at the index given. This is the preferred way to access the string contents. Raises an exception for improper indices.


-int length;

Returns the string length in characters (whatever it means--a Unicode character might represent quite a number of bytes, for example).


-CXString *stringByLeftTrim(CXString *trimDelimiters=nil);
-CXString *stringByRightTrim
(CXString *trimDelimiters=nil)
-CXString *stringByTrim
(CXString *trimDelimiters=nil)

These three methods make a new string without unneeded whitespaces at sides (the left side, the right one, or both).

In case the argument is used, it contains all the whitespace characters to be trimmed. Otherwise, the default ones are used.

+CXString trimDelimiters;

Returns the current set of default trim delimiters--by default, they are the space, tab, and line delimiters CR and LF.

+CXString setTrimDelimiters(CXString *delims)

Sets new default trim delimiters, to be used in case the argument in the stringByLeftTrim, stringByRightTrim, or stringByTrim method was nil.


-CXString *stringByAppending(CXString *str,...);

Returns a newly created string with the contents made by appending the string given (format characters are interpreted) to the string whose method was called.


-CXString *substringFromIndex(int index);

Returns a newly created string with the substring of the string whose method was called, starting with the character at the index given. Raises an exception for an index outside of the string.


-CXString *substringToIndex(int index);

Returns a newly created string with the substring of the string whose method was called, ending with the character just before the index given. Raises an exception for an index outside of the string.


-CXString *substringWithRange(XRange range);

Returns a newly created string with the substring of the string whose method was called, specified by the range given. Raises an exception for a range outside of the string.


-CXArray *componentsSeparatedByString(CXString *str);

An extremely handy convenience method: it returns a newly created array of strings, which are parts of the string whose method was called, separated by the string given:

CXString *s=@"aa and bb and cc";
CXArray *a=[s componentsSeparatedByString:@" and "];

will make an array containing strings "aa", "bb", "cc".

Note there is a counterpart for this method in the CXArray class.

-CXArray *componentsSeparatedByString(CXString *str,CXString *quot,CXString *esc=nil);

A more comprehensive version of the previous method. The delimiter str is taken into account only when it is not inside a pair of quotation marks quot. If the esc is non-nil, the quotation mark itself is taken into account only if preceded by zero or even number of escs. In case the quot is @"\"" and esc is @"\\" this emulates the standard C interpretation of quoted strings.


-int intValue;

This service works as supposed; besides, it ignores triad delimiteres (space and comma).


-double doubleValue;

Tries to interpret the string contents as a decimal number with mantissa and exponent. The format is [-][0-9]*[.[0-9]*][e|E[0-9]*]. Triad delimiters (space and comma) are ignored in the integral part of the mantissa.


Comparisons

-int compare(CXString *string);
-int compareFold(CXString *string);

<<<description forthcoming>>>


-BOOL hasPrefix(CXString *string);

Checks if there is the prefix given.


-BOOL hasSuffix(CXString *string);

Checks if there is the suffix given. Note that it is a generic service, not bound to file names: thence, should you want to know if a string has a "txt" extension, you have to check for a ".txt" suffix.


-XRange rangeOfString(CXString *string);

Tries to find a substring given; if successfull, returns its range inside the string whose method was called. If not found, returns a range of location -1 and length 0.

Currently there is a small bug (blame the Epoc!), see the XPreprocessor.html end.

-XRange rangeOfString(CXString *string,XRange range);

Just like the above one, but the substring is searched for in the range given only.


Path management

There is a number of new CXString services, to allow for easy, flexible and portable programming of file name and file path related algorithms. In case you use just these services, the file names will be portable even between systems which use different path delimiters:


-BOOL isAbsolutePath;

Returns YES in case the receiver contains an absolute path.


-CXString *lastPathComponent;

Returns the last non-empty path component, or an empty string in case there is none.


-CXString *pathExtension;

Returns the extension (the text after the last dot) of the last component of the path.


-CXString *stringByAppendingPathComponent(CXString *name);

Returns a newly made string, containing the receiver's path with new component added to end. The proper path delimiter is used automatically.


-CXString *stringByAppendingPathExtension(CXString *ext);

Returns a newly made string, containing the receiver's path with new extension added to end. The dot is used automatically.


-CXString *stringByAppendingPathExtensionIfNone(CXString *ext);

Like the previous one, but adds the new extension only in case there was none. A path which already has an extension is returned unchanged.


-CXString *stringByDeletingLastPathComponent;

Returns a newly made string, containing the receiver's path with last non-empty component removed. The path delimiter is removed as well.


-CXString *stringByDeletingPathExtension;

Returns a newly made string, containing the receiver's path without an extension.


-CXString *stringByEpocParse(CXString *fn2,CXString *fn3=nil);

Returns a newly made string, prepared by the specifications of the Epoc "parse"; the receiver is the first mandatory path, while the two optional paths are supplied from the strings fn2 and fn3.


Unicode and other encodings

-BOOL canBeConvertedToEncoding(XStringEncoding stringEncoding);

Checks if the string whose method was called can be converted to the encoding given without loss of information. Is generally more effective than trying to convert it using the method below and discarding the data if successful.


-CXData *dataUsingEncoding(XStringEncoding stringEncoding,BOOL allowLossy=NO,BOOL appending=NO);

Converts the contents of the string whose method was called to the encoding given. If the allowLossy argument is YES, the conversion succeeds even if some characters must be discarded (for they can not be expressed in the target encoding). If so, each such character will be represented by "_". In case the allowLossy argument is NO and some characters can not be expressed in the target encoding, the method fails, returning nil.

As above, 0 can be used as the encoding number for the system default encoding.

The appending argument is valid for the Unicode target encoding only: if it is set to YES, the mark word 0xfeff will not be generated. In case it contains NO, the mark word will be generated at the start of the resulting data. The generated Unicode is always little-endian.


-CXData *data;

Just a conveniency shortuct for dataUsingEncoding(0).


-XStringEncoding fastestEncoding;

<<description forthcoming--currently not implemented>>

-XStringEncoding smallestEncoding;

<<description forthcoming--currently not implemented>>


+XStringEncoding *availableStringEncodings;

List of all the encodings, supported by the current version of the CXString. The encodings are given as a C array of XStringEncoding (ie. unsigned) values, zero terminated. See below for an example how to use it.


+XStringEncoding defaultCStringEncoding;

The number of the system default encoding. It depends on the particular system and configuration; in the current Epoc, which does not support Unicode, and uses the X.soft's RAM Czech localization, it is always XWindoze1250StringEncoding. Do not, though, presume that, and use this method--that way will your programs be portable in future without problems anywhere.


+CXString *localizedNameOfStringEncoding(XStringEncoding stringEncoding);

As its name suggests, this method returns a localized name of the encoding given. The localization support is currently unfinished, you just have to presume the proper language is set system-wide.

Using this method and the availableStringEncodings one, you can easily list all the available encodings:

XStringEncoding *encs=[CXString availableStringEncodings];
while (*encs)
  printf("%s\n",[CXString localizedNameOfStringEncoding:*encs++];

There is a more convenient way to get all the available string encodings:


+CXArray *availableEncodingLocalizesNamesAndNumbers

This method returns an array of arrays. The nested arrays are always with just two objects; the first of the being the localized name of an encoding, the second one being the encoding number (converted to an object using the XNUM2OBJ macro).


Copyright © 1999-2000 X.soft, all rights reserved