Extracting Rules from Neural Networks with Partial Interpretations


  • Cosimo Persia University of Bergen
  • Ana Ozaki University of Bergen




Exact Learning, Explainable AI, Horn logic


We investigate the problem of extracting rules, expressed in Horn logic, from neural network models.
Our work is based on the exact learning model, in which a learner interacts with a teacher (the neural network model) via queries in order to learn an abstract target concept, which in our case is a set of Horn rules. We consider partial interpretations to formulate the queries. These can be understood as a representation of the world where part of the knowledge regarding the truthness of propositions is unknown. We employ Angluin’s algorithm for learning Horn rules via queries and evaluate our strategy empirically.


V. Alekseev. On the number of intersection semilattices [in russian]. DiskretnayaMat.1, page 129–136, 1989.

D. Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1988. ISSN 0885-6125. doi: 10.1023/A:1022821128753.

Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In ICML. icml.cc / Omnipress, 2012. doi: 10.48550/arXiv.1112. 6209.

A. Meurer et al. Sympy: symbolic computing in python. PeerJ Comput. Sci., 3:e103, 2017. doi: 10.7717/peerj-cs.103.

J. Montiel et al. Scikit-multiflow: A multioutput streaming framework. Journal of Machine Learning Research, 19(72):1–5, 2018. doi: 10.48550/arXiv.1807.04662.

G. Burosch, J. Demetrovics, G. Katona, D. Kleitman, and A. Sapozhenko. On the number of closure operations. pages 91–105. Janos Bolyai Mathematical Society, Budapest, 1993.

M. Campbell, A. J. H. Jr., and F. Hsu. Deep blue. Artif. Intell., 134(1-2):57–83, 2002. doi: 10.1016/S0004-3702(01)00129-1.

F. Chollet et al. Keras, 2015. URL https://github.com/fchollet/keras.

L. De Raedt. Logical settings for conceptlearning. Artificial Intelligence, 95(1):187– 201, 1997. ISSN 0004-3702. doi: 10.1016/ S0004-3702(97)00041-6.

P. M. Domingos and G. Hulten. Mining highspeed data streams. In KDD, pages 71–80. ACM, 2000. doi: 10.1145/347090.347107.

D. Ferrucci. Introduction to “this is watson”. IBM Journal of Research and Development, 56:1:1–1:15, 05 2012. doi: 10.1147/JRD.2012. 2184356.

M. Frazier and L. Pitt. Learning from entailment: An application to propositional horn sentences. In ICML, 1993. doi: 10.1007/ 3-540-49730-7 11.

S. Holldobler, S. Mohle, and A. Tigunova. Lessons learned from alphago. In YSIP2, pages 92–101. CEUR-WS.org, 2017.

A. Horn. On sentences which are true of direct unions of algebras. The Journal of Symbolic Logic, 16(1):14–21, 1951. ISSN 00224812. doi: 10.2307/2268661.

T. Okudono et al. Weighted automata extraction from recurrent neural networks via regression on state spaces. In AAAI, pages 5306– 5314. AAAI Press, 2020. doi: 0.1609/aaai. v34i04.5977.

M. S. Santos et al. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. Journal of Biomedical Informatics, 58:49–59, 2015. ISSN 1532-0464. doi: 10.1016/j.jbi.2015. 09.012.

A. Shih, A. Darwiche, and A. Choi. Verifying binarized neural networks by angluinstyle learning. In SAT, 2019. doi: 10.1007/ 978-3-030-24258-9 25.

L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984. ISSN 0001-0782. doi: 10.1145/1968.1972.

V. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. 1971. doi: 10. 1007/978-3-319-21852-6 3.

G. Weiss, Y. Goldberg, and E. Yahav. Extracting automata from recurrent neural networks using queries and counterexamples. In ICML, volume 80, pages 5244–5253. PMLR, 2018. doi: 10.48550/arXiv.1711.09576.

Y. Zhang, P. Tino, A. Leonardis, and K. Tang. A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726–742, 2021. doi: 10.1109/TETCI.2021.3100641.