Skip to content

Inductive first order learner, Amrendra Singh Yadav PhD Research Scholar, Bennett University, Gr. Noida Email: [email protected]

yadavamrendra edited this page Nov 13, 2017 · 2 revisions

Abstract: It is shown that the system compares favorably with commonly used symbolic learning methods that use heuristic rather than explicit map methods to guide their search in the rules space. First-order learning involves finding a clause-form definition of a relationship based on examples of the relationship and relevant contextual information. In this article, a particular first-order learning system is modified to customize it to find functional relationship definitions. This restriction leads to faster learning times and, in some cases, definitions that have greater predictive accuracy. Other first-class learning systems could benefit from similar specialization. This article describes FORE, a system that learns horn clauses from data expressed as relationships [2]. FOIL is based on ideas that have proven effective in attribute-value learning systems, but extends them to first-rate formalism. This new system has been successfully applied to several tasks from the machine learning literature. Introduction: My main goal in writing this was simply to experiment with machine learning through inductive logic and reproduction. One of the most expressive and readable representations of the hypotheses learned is that of the sets of production rules (if-then rules). Rules can be derived from other representations (eg decision trees) or can be learned directly. Here we focus on the direct method. An important aspect of rule-based learning algorithms is that they can learn sets of first-order rules that have much more power of representation than propositional rules that can be derived from decision trees [1]. Learning first-order rules can also be considered as an automatic deduction of PROLOG programs from examples. Suppose we want a computer to learn to determine if a molecule is "sticky" or not (stickiness is a property I invented) based on its atomic structure. One approach would be to create a learning set of molecules represented using first-rate ground facts. Idea: • Iteratively build rules that cover - some positive examples, but no negatives. Once a rule has been found, remove the positive examples covered and continue. To build a rule: – Add literals to the body until no negative examples are covered - if the literals introduce new variables, extend the examples of tuples by all possible constants. Propositional versus First-Order Logic: The logical propositional does not include the variables and thus cannot express the general relations between the values of the attributes. Example 1: in Propositional logic, you can write:
IF (Father1=Bob) ^ (Name2=Bob)^ (Female1=True) THEN Daughter1,2=True. This rule applies only to a specific family! Example 2: In First-Order logic, you can write:
IF Father(y,x) ^ Female(y), THEN Daughter(x,y) This rule (which you cannot write in Propositional Logic) applies to any family! History: The FOIL algorithm (Quinlan and Cameron-Jones 1993) takes as input a set of binary data R (separated by spaces) and produces a set of RACs. The resulting classifier, as generated by the LUCS-KDD FOIL algorithm described herein, comprises a linked list of rules ordered according to their Laplace precision (Clark and Boswell 1991). First-order rule-learning sets: FOIL (Quinlan, 1990): FOIL is similar to the learning approach of the propositional rule, except for the following: FOIL accepts the rules of first order and must therefore take into account the variables in the prerequisites of the rule. FOIL uses a special performance measure (FOIL-GAIN) which takes into account different variable bindings. FOILS only look for predictable rules when the target is literal (True or false). FOIL performs a simple climbing search rather than a beam search.

Inductive logic programming (ILP) is an increasing subtheme of machine learning that studies the induction of Prolog programs from examples in the presence of basic knowledge (Muggleton, 1992, Lavrac and Dzeroski, 1994). Because of the expressivity of first-order logic, ILP methods can learn relational and recursive concepts that can not be represented in the attribute / value representations assumed by most machine learning algorithms [3]. ILP methods have succeeded in inducing small sorting and list manipulation programs (Shapiro 1983, Sammut and Banerji 1986, Muggleton and Buntine 1988, Quinlan and Cameron Jones 1993) and encouraging results on important applications such as prediction of proteins. Secondary structure (Muggleton, King and Sternberg, 1992) and automating the construction of parsers in natural language (Zelle and Mooney, 1994b), However, current ILP techniques make important assumptions that limit their application. Here are three common assumptions: 1. Basic knowledge is provided as an extension in the form of a set of literals on the ground. 2. Explicit negative examples of the target predicate are available. 3. The target program is expressed in \ pure "Prolog where the order of the clauses is irrelevant and the procedural operators such as cut (!) Are forbidden The ILP systems currently the most known and most successful [6], Golem (Muggleton & Feng, 1990) and Foil (Quinlan, 1990), both make these three assumptions, but each of these assumptions provides important limitations since:

  1. An adequate extensional representation of basic knowledge is often infinite or intractably large.
  2. Explicit negative examples are often unavailable and an adequate set of negative examples calculated using a closed world hypothesis is infinite or intractably large.
  3. The concise representation of many concepts requires the use of ordering clauses and / or cuts (Bergadano, Gunetti and Trinchero, 1993). Example:
  4. CIGOL

This idea can also be applied to First-Order Resolution! 2. First-Order Multi-Step Inverse Resolution Example:

  1. Friends & Smokers

Algorithm: The FOIL algorithm is as follows: Input List of examples Output Rule in first-order predicate logic FOIL(Examples) Let Pos be the positive examples Let Pred be the predicate to be learned Until Pos is empty do: Let Neg be the negative examples Set Body to empty Call LearnClauseBody Add Pred ← Body to the rule Remove from Pos all examples which satisfy Body Procedure LearnClauseBody Until Neg is empty do: Choose a literal L Conjoin L to Body Remove from Neg examples that do not satisfy L Application: Medical diagnosis, VAX computer configuration, Hydrocarbon separation system configuration, Configuration of fire-protection equipment in buildings Discussion: The relevance of a first-class learner for the problem of the past tense and the problems of previous ILP systems in the management of functional tasks whose best representation is the rules with exceptions. Our results clearly demonstrate that an ILP system outperforms both the decision tree and the neural network systems previously applied to the task of the past [4]. This is important because there have been very few results showing that a first-class learner works much better than applying propositional learners to the best coding based on the characteristics of a problem. The FOIL LUCS-KDD implementation described here was used by the LUCS-KDD research group to contrast and compare a variety of CARM algorithms and techniques. The software is available for free, but the author would appreciate proper recognition [5]. The following reference format for referring to the software is suggested: Resources: Clark, P. and Boswell. R. (1991). Rule Induction With CN2: Some Recent Improvements. Proc. European Working Session on Learning (ESWL'91), Porto, Portugal. pp151-163. Quinlan, J. R. and Cameron-Jones, R. M. (1993). FOIL: A Midterm Report. Proc. ECML, Vienna, Austria, pp3-20. Yin, X. and Han, J. (2003). CPAR: Classification based on Predictive Association Rules. Proc. SIAM Int. Conf. on Data Mining (SDM'03), San Fransisco, CA, pp. 331-335. http://www.cxc.liv.ac.uk/~frans/KDD/Software/FOIL_PRM_CPAR/foil.html, Department of Computer Science, The University of Liverpool, UK. Let Var be the largest number of distinct variables for any clause in rule R, excluding the last conjunct. Let MaxP be the number of predicates with largest arity MaxA. Then an approximation of the number of nodes generated to learn R is: NodesSearched ≤ 2 * MaxP * (Var + MaxA – 1)MaxA , as shown in Pazzani and Kibler (1992). Michael Pazzani and Dennis Kibler. The Utility of Knowledge in Inductive Learning. Machine Learning, Volume 9, Number 1, 1992.