tsunami

Language Theory

UggBoots
2008-02-13 19:07 UTC

Introduction & Presumptions

As far as the learning aspect goes, I've taken a few weeks of set theory which has taught me a bit about sets and languages, but nothing about regular expressions, though the textbook does provide a reference I may consult for that later.

Here are the tools necessary to understand the fundamentals of languages, and later regular languages and expressions. They're pretty straight-forward, and I don't think there'll be any trouble with them.
  • A symbol for our use is essentially what you know as a character. Even though pretty much anything can be a symbol, it's easiest to think of them as characters, since that's how we'll be using them primarily.
  • An alphabet is then a set of symbols. For example, A = {a,b,c,...,z} would be an alphabet containing all of the lower-case letters of the English alphabet.
  • A string is an ordered, duplicable set (sequence) of symbols, and is generally written simplified as, e.g. S = "dog". In this example, S is merely the list < d, o, g > (which doesn't give any more detail than writing it like a C-string.)
  • Lambda (λ) functions as our empty string, and it basically functions as if nothing were there. For example, the strings "dλoλλλgλ" and "dog" are equivalent.
  • The closure of an alphabet (specifically the Kleene-closure, or closure under string concatenation) is the set of all strings (including λ) that are concatenations of different quantities of each symbol in the alphabet in different orders. We will denote this by A* where A is the alphabet we're taking the closure of. For example, if A = {0,1} then A* = {λ, 0, 1, 00, 01, 10, 11, 000, ...}. As you can probably see, the closure of any alphabet other than the empty alphabet is an infinite set.
  • Finally, a language over an alphabet is simply any subset of the closure of that alphabet. Referring to the above A = {0, 1} and A*, we can see that one possible language for A is L = {00, 01}. We'll see how we can describe and generate these languages later.

Generating/Describing Languages
Todo: Inductively Defined
Todo: Grammars