Automatic Entity Extraction and Link Analysis for Adverse Media Scanning

Abstract

Banks are required, by law, to report accounts which may be linked to illegal activities. This process includes using published lists of suspected and confirmed persons and institutions as well as scanning media outlets for related articles and checking the results against the bank account databases. The media scanning portion is currently performed in an almost entirely manual fashion, at best by using Google Alerts and then reading all articles for possible offending entities. We are creating a system to perform this work in a far more automated fashion. We employ Natural Language Processing, Machine Learning, and other Data Mining methods to automatically extract names and other identifying information from returned articles. We then look for further articles on these individuals and information related to the article to find both further identifying information and additional entities related to the criminal activities. This process is repeated to find all information for a given topic. Finally, the accumulation of this information is presented to the user in a graphical environment for analysis, allowing him or her to take necessary actions to flag current accounts and watch for new accounts related to these entities. The system will greatly reduce the amount of manual effort while increasing the number of entities found, improving compliance with regulations and reducing potential liabilities on the part of the banks. The topic presents interesting research challenges, primarily related to the identification of unique entities and their relevant information as well as links between entities.