StringsJuly 27. 2010
String handling in WirbelStrings are Wirbel's most important datatype. A string is a sequence of characters. String literals are always written in double quotes: name = "Mathias" A character is an own datatype and is written in single quotes: x = 'M' OperatorsStrings can be concatenated with +. Also it's possible to add a character to a string: name = "Mathias" + ' ' + "Kettner" With += a string can be changed inplace: name += ", Munich" Strings can be compared with the usual operators ==, !=, <, <=, > and >=:
if name == "Mathias":
print("It's me!")
The special operator =~ checks if a string matches a POSIX extended regular expression. The match astring =~ apattern is successful, if the apattern does match a part of astring:
name = "Henry 1st Human"
if name =~ "[0-9]+":
print("Name contains a number")
If you want to make an exact match you have to add the anchors ^ and $ for beginning and end of string:
number = "4711"
if number =~ "^[0-9]+$":
print("Yes, it's a number")
String FunctionsMany functions are available for working with strings: Here are a few examples of the most important string functions: Informationlen() returns the length of a string. It is used in method-notation, i.e. suffixed to the string with a dot:
"Hallo".len() # makes 5
("ab" + "cd").len() # makes 4
The special function empty() returns true if the string is empty:
if name.empty():
print("You have no name?")
SubstringsYou can extract characters and substrings from a string using subscription and splices. Positive indices count from the beginning and start at 0. Note that subscriptions extract characters and splices extract strings: name = "Mathias" name[0] # subscription returns a character --> 'M' name[0:2] # Part from 0 until but not including 2 --> "Ma" name[2:5] # --> "thi" Leaving out one or both borders means "start" or "end" resp.: name[3:] # --> "hias" name[:2] # --> "Ma" name[:] # --> "Mathias" Negative indices are relative to the end of the string. The index -1 denotes the last character: name[-1] # --> 's' name[-2] # --> 'a' name[-2:] # --> "as" name[:-1] # --> "Mathia" MatchingThe method find looks for a substring and returns the index of the first occurrance or -1 if it has not been found:
"Haystack".find("stack") # --> 3
"Haystack".find("needle") # --> -1
If you are looking for a special string at the beginning or at the end then startswith() and endswith() come in handy:
"hello.w".endswith(".w") # --> true
"/usr/bin/wic".startswith("/usr/") # --> true
TransformingStrings can be transformed in various ways. One is by searching and replacing texts. The method replace() alters a string directly:
name = "Mathias"
name.replace("a", "??")
print(name) # --> "M??thi??s"
replaced() does the same, but in contrast to replace() returns the result as a copy. The original string object is not changed:
name = "Mathias"
print(name.replaced("a", "??")) # --> "M??thi??s"
print(name) # --> "Mathias"
Very useful in situations where you process configuration files is stripping off whitespaces at the beginning or end of a string: name = " something \n" print(name.lstrip()) # --> "something \n" print(name.rstrip()) # --> " something" print(name.strip()) # --> "something" Spliting and JoiningThe functions split() and join() are useful when parsing files. split splits up a string at a separator and returns a list of substrings:
"one,two,three".split(",") # --> ["one", "two", "three"]
If you leave out the separator then split splits at groups of whitespaces: " a word and anotherone\r\n".split() # --> [ "a", "word", "and", "anotherone" ] join() does exactly the opposite by joining a list of strings together. It is called as method upon the join-string: "-".join(["one", "two", "three"]) # --> "one-two-three" Strings versus CharactersIn some cases you might wonder wether to use a single character 'x' or a string "x" of length one. The former will be translated to a C char and can be directly mapped to single CPU instructions. Also it takes only one byte to store. The later one will be stored in a dynamically allocated buffer. This will need much more memory and CPU cycles. Internal representationStrings are represented internal as array of bytes. Each character takes exactly one byte. Unicode strings are not yet supported but you could use UTF-8 to encode them. If you do this, you must be aware that one Unicode characters may map to several bytes in the Wirbel string and thus several Wirbel characters. Function like len() do not take UTF into account but simply count the bytes. Wirbel characters map directly to the C datatype char. Wirbel strings are not zero terminated and thus can contain a 0-bytes. They are completely binary safe and can be used as byte buffers. In situations where operating system calls are done (e.g. opening a file) those system calls usually use zero terminated strings and do not allow 0-bytes. In such a case everything after the first 0-byte is silently ignored. |
| |||||||||||||||||||||||||||||||||||||||