A Gentle Introduction to First-order Logic

Submitted by Eus
on February 6, 2010 - 4:03am

Source:
(1) Knowledge Representation and Reasoning by Ronald Brachman and Hector Levesque
(2) Artificial Intelligence: A Modern Approach (2nd Edition) by Stuart Russell and Peter Norvig

II.2. Syntax of FOL

Logical symbols:
1. punctuation: ( ) .
2. connective: ~, ^, v, =, and quatifiers (V, 3)
3. variable

Nonlogical symbols:
1. function: start with lower-case letter
        functions with arity 0 denoted a, b or c are called constants
        functions with nonzero arity are denoted f, g or h
2. predicate: start with upper-case letter
        predicates with arity 0 are called PROPOSITIONAL SYMBOLS.

The stuff that I learned during the high school is a subset of FOL called propositional subset: no terms, no quantifiers and only PROPOSITIONAL SYMBOLS are used.

A FOL sentence is any formula without free variables. FOL sentences are used to express knowledge.

Expressions in FOL:
1. Terms
        variables
        function applications to terms (e.g., f (t1, ..., tn))
2. Formulas
        predicate applications to terms (e.g., P (t1, ..., tn))
        t1 = t2 where t1 and t2 are terms
        ~a, (a ^ b), (a v b), Vx.a and 3x.a where a and b are formulas and x is a variable

Atomic formulas or atoms are formulas that don't contain any other formula (i.e., predicate applications and t1 = t2 where t1 and t2 are terms).

II.2. Semantic of FOL
Semantic is defined as:
a function of the interpretation of the predicate and function symbols (i.e., nonlogical symbols).

(!) Here the tuple symbol is < > (e.g., <3, 4> for a 2D coordinate instead of (3, 4)).

II.3.1. Interpretation \mathcal{I} is a pair <D, I> where
        D is any nonempty set of objects: the the domain of interpretation
        I is the interpretation mapping from nonlogical symbols to:
                1. functions
                        I[f] an element of [D x ... x D -> D]
                        Eg., I[bestFriend] is to map a person to his/her best friend.
                2. relations (i.e., predicates) over D
                        I[P] is a subset of D x ... x D
                        or
                        I[P] an element of the set of characteristic functions [D x ... x D -> {0, 1}]
                                When P is of arity zero, I[P] is simply a mapping to either 0 or 1 (i.e., F or T).
                                        This is how it works in the high school.
                        Eg., I[OlderThan] is to say that two objects in D satisfy OlderThan relationship.
So, interpretation/semantic in FOL doesn't deal with what "best friend" or "older than" is. It simply:
        1. maps some objects in D to another object in D, or
        2. shows that some objects in D has the specified relationship.

(!) While D is subject to change, I is fixed as state above.

(!) We use a relation when a particular thing can be related with another thing (e.g., DaughterOf) or a particular thing is a member of a particular group or is having a particular property (e.g., Female, Happy) while a function is used when a particular thing wants to be mapped to another thing (e.g., biologicalMother, biologicalParents, gender). Since in FOL all functions are taken to be total, if not all elements in the domain can be mapped to another element (e.g., not all person has a daughter), you have to use a relation instead.

GOAL: a clear specification of the meaning of sentences as a function of the interpretation of the predicate and function symbols.

Only 2 points need to be considered because other aspects of the world do not matter:
1. There are object in the world.
2. For any predicate P, some of the objects will satisfy P while the other objects will not as determined by the interpretation of P. For any function f, it will map to a particular object depending on the interpretation function.

II.3.2. Denotation
Any variable-free term of FOL denotes elements in D.
That implies that variables need to be assigned with elements in D (i.e., variables assignment over D). A variable assignment is formally specified as \mu[x] where \mu is a variable assignment and x is a variable producing some element in D.

The denotation of term t written ||t||_{\mathcal{I}, \mu} is defined as follows:
1. If x is a variable, ||x||_{\mathcal{I}, \mu} = \mu[x]
2. If t1, ..., tn are terms and f is a function symbol of arity n,
   ||f (t1, ..., tn)||_{\mathcal{I}, \mu} = F (d1, ..., dn) where F = I[f] and di = ||ti||_{\mathcal{I}, \mu}

II.3.3 Satisfaction and Models
Knowing interpretation \mathcal{I} and denotation ||t||_{\mathcal{I}, \mu}, the truthfulness of a FOL sentence can be determined on the light of the given interpretation.

A formula \alpha is satisfied in \mathcal{I} (i.e., \mathcal{I}, \mu |= \alpha [(!) remember that with the use of \mu, any formula \alpha will be transformed to a FOL sentence since a FOL sentence is a formula without free variables]) according to the seven rules.

(!) A sentence is satisfiable if it is true in _some_ \mathcal{I}.

If \alpha is already a sentence, no variable assignment is needed. Therefore, writing \mathcal{I} |= \alpha suffices.
Moreover, in the case of propositional subset of FOL where only propositional symbols are allowed (i.e., predicates with arity 0), it is convenient to write I[\alpha] = 1 to say I |= \alpha or I[\alpha] = 0 to say I |/= \alpha.

\mathcal{I} is the logical model of S where S is a set of sentences iff \mathcal{I} |= S that means that all sentences in S are true in \mathcal{I}.

II.4 The Pragmatics
II.4.1 Logical Consequence
Gist: There are connections among FOL sentences that do not depend on the semantics.

Let S be the set of sentences and \alpha is a sentence.
S logically entails \alpha (i.e., \alpha is the logical consequence of S), which is written as S |= \alpha, iff for every interpretation \mathcal{I}, if \mathcal{I} |= S then \mathcal{I} |= \alpha (i.e., every model of S satisfies \alpha).

IOW, there is no interpretation \mathcal{I} where \mathcal{I} |= S U {~\alpha} (i.e., S U {~\alpha} is unsatisfiable. IOW, S ^ ~\alpha is unsatisfiable. This is called proof by contradiction).

If \alpha is the logical consequence of the empty set (i.e., |= \alpha), \alpha is valid (remember |= \alpha means that \mathcal{I} |= {} and \mathcal{I} |= \alpha).
IOW, \alpha is valid iff for every interpretation \mathcal{I}, it is the case that \mathcal{I} |= \alpha (e.g., P v ~P).
IOW, \alpha is valid iff the set {~\alpha} is unsatisfiable (i.e., unsatisfiable means that there is no interpretation \mathcal{I} that can make ~\alpha true. IOW, \alpha is true in all interpretations while ~\alpha is false is all interpretations).
Valid sentences are also known as tautology.

Therefore, if S = {\alpha_1, ..., \alpha_n}, S |= \alpha iff [(\alpha_1 ^ ... ^ \alpha_n) -> \alpha] is valid (i.e., |= [(\alpha_1 ^ ... ^ \alpha_n) -> \alpha]). This is called the deduction theorem. Well, since a -> b is logically equivalent to ~a v b, saying that ~a v b must be valid is equivalent to proving that the opposite (i.e., a ^ ~b) is unsatisfiable just like what we see above.

(!) CHECKPOINT: You understand the meaning of "entailment", "validity" and "satisfiability".

Example of using the above knowledge:
P: Formalize "Norma Jeane Baker is a daughter of Marilyn Monroe's parents" and "Norma Jeane is not a sister of Marilyn" together with the needed background knowledge on relationships as first-order sentences. Provide a semantical proof that "Norma Jeane is Marilyn" is an entailment thereof.
A:
Firstly, we formalize the given sentences:
1. Daughter (NJB, parents (M))
2. ~ Sister (NJB, M)

Secondly, we give the definitions of the predicates:
3. Vx Vy . Daughter (x, y) \equiv Female (x) ^ parents (x) = y ^ x != y
4. Vx Vy . Sister (x, y) \equiv Female (x) ^ parents (x) = parents (y) ^ x != y

Then we have S = {1, 2, 3, 4}.

Third, we give the interpretations of the functions and predicates:
I[parents] \subseteq D -> D
I[Female] \subseteq D
I[Daughter] \subseteq D x D
I[Sister] \subseteq D x D

Fourth, we give the denotations of the terms according to a particular interpretation \mathcal{I}:
Let \mathcal{I} be an interpretation such that \mathcal{I} |= S,
||NJB||_{\mathcal{I}} = I[NJB] = a,
||M||_{\mathcal{I}} = I[M] = b, and
||parents (M)||_{\mathcal{I}} = I[parents] (I[M]) = I[parents] (b) = c

Finally, use the interpretations and denotations to show that S |= (NJB = M).
IOW, show that \mathcal{I} |= S:
\mathcal{I} |= Vx Vy . Daughter (x, y) has <d, d'> \in I[Daughter] iff d \in I[Female] and
                                                               d' = I[parents] (d) and
                                                               d != d'
\mathcal{I} |= Vx Vy . Sister (x, y) has <d, d'> \in I[Sister] iff d \in I[Female] and
                                                           I[parents] (d) = I[parents] (d') and
                                                           d != d'
\mathcal{I} |= Daughter (NJB, parents (M)) has <a, c> \in I[Daughter] iff a \in I[Female] and
                                                                          c = I[parents] (a) and
                                                                          a != c
\mathcal{I} |= ~ Sister (NJB, M) has <a, b> \notin I[Sister] iff a \notin I[Female] or
                                                                 I[parents] (a) != I [parents] (b) or
                                                                 a = b

a \notin I[Female] is not possible because \mathcal{I} |= Daughter (NJB, parents (M)) requires that a \in I[Female].
I[parents] (a) != I [parents] (b) is not possible because \mathcal{I} |= Daughter (NJB, parents (M)) requires that I[parents] (b) = c = I[parents] (a).
a = b is possible.

Therefore, \mathcal{I} |= (NJB = M) meaning that S |= (NJB = M).

(!) It is very crucial to understand the use of "Let \mathcal{I} be an interpretation such that \mathcal{I} |= S" or usually "Let \mathcal{I} be an interpretation such that \mathcal{I} |= KB" where KB stands for Knowledge Base. Imagine an interpretation \mathcal{I} as defining a world containing objects as well as the grouping and mapping of the objects. Each \mathcal{I} differs in the objects contained in the defined domain (i.e., world) as well as the grouping and mapping of the objects (e.g., D={bob, sarah, ellen} where I[Sister] = {<bob, sarah>, <bob, ellen>, <bob, bob>}). Therefore, there are some \mathcal{I} that satisfies S (\mathcal{I} |= S) and there are some \mathcal{I} that does not satisfy S (\mathcal{I} |/= S).