Document Classification

Many business use cases utilize document classification. Such as spam detection, sentiment detection, classifying emails received by customer support, detection of legal and regulatory anomalies in text etc, etc. While automated document and text classification in general, reduces the amount of manual human work, there is still present an error margin that requires human validation.

What we want to achieve is transparent machine reasoning that explains the results of classification. We want machine to explain to us its choices. Hierarchical Attention The network makes this concept possible. We can capture the attention on a word or sentence level. So, once the machine classifies a document, it gives the attention weight per sentence and the ones with the highest weight are “responsible” for a choice. As a side product of this approach, we can get a summarization of a document for free.

The Model of Hierarchical Attention Networks (HAN)


We have made a classifier for predicting a book summary or movie plot genre. For a training set we have used 25 MB corpus containing 8000 movie plots and book summaries extracted from Wikipedia. Here are some examples of movie plots that were not included in the training set:

Jurasic Park:

Attention score per sentences:

[[2.6538652e-01 8.7205047e-04 3.2566871e-02 1.4174080e-02 1.3849878e-01 8.5900359e-02 8.9636356e-02 5.3633921e-02 3.2533205e-03 2.3259513e-02 1.1045787e-02 1.5430967e-02 5.1559955e-02 2.0149840e-02 1.2313045e-02 4.1443533e-03 6.0154960e-02 1.0449971e-02 3.3974717e-04 2.7475390e-04 6.9490466e-03 1.0953726e-03 3.4041560e-04 8.2153376e-05 7.6774456e-02 1.1693974e-02 8.0809314e-03 1.0653714e-03 8.7114697e-04 1.8717493e-06]]

Top 5 sentences sorted by the biggest scores: [ 0 4 6 5 24] Those sentences are highlighted in the picture above and the darker the color, the heigher attention score is for the sentence. The overal predictions (category/score):

predictions ['125', '115', '159'] [0.381755, 0.12484848, 0.1170649] So, class 125 leads with the score 0.381755.

Class explanation: 125: Apocalyptic and post-apocalyptic fiction, Dystopia, Utopian fiction 115: Humour, comedy 159: Science Fiction, Time travel, Human extinction, Science Fantasy

One more example for Pirates of Caribean

The overal predictions (category/score): ['35', '85', '16'] [0.79274225, 0.06110028, 0.031343408] Class explanation: 35: Adventure, Lost World, Naval Adventure 85: Children's literature, Youth 16: Mystery