In this project we test the capabilities of rule-based and statistical NLP in a modern business context.
In the early days, many language-processing systems were designed by hand-coding a set of rules such as by writing grammars or devising heuristic rules for stemming.
Since the so-called "statistical revolution" in the late 1980s and mid-1990s, much natural language processing research has relied heavily on machine learning. The machine-learning paradigm calls instead for using statistical inference to automatically learn such rules through the analysis of large corpora (the plural form of corpus, is a set of documents, possibly with human or computer annotations) of typical real-world examples.
Many different classes of machine-learning algorithms have been applied to natural-language-processing tasks. These algorithms take as input a large set of "features" that are generated from the input data. Some of the earliest-used algorithms, such as decision trees, produced systems of hard if-then rules similar to the systems of handwritten rules that were then common. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to each input feature. Such models have the advantage that they can express the relative certainty of many different possible answers rather than only one, producing more reliable results when such a model is included as a component of a larger system.
Systems based on machine-learning algorithms have many advantages over hand-produced rules:
The programming language used in this project is Python. Knowledge of other programming languages is a plus.
Source: https://en.wikipedia.org/wiki/Natural_language_processing#Rule-based_vs....