【机器学习】异常检测文献阅读:概览和综述篇
在科研训练老师的推荐下,选择走上这条未知的道路——NLP,更具体一点又或者是文档中的异常检测(Outlier Detection)。
前几天看到阿里达摩院的青橙奖颁布全程记录的视频,心中似乎、可能、大概对科研有了那么一丢丢兴趣。也看到一个关于本科生、研究生、博士生学习区别的视频,视频中假设人类全部已知的知识在一个有固定半径的圆内,本科生便是对圆内一个方向进行探索,研究生可以接触到这个方向的边界,博士生则是努力把这个方向的圆往外括出一个小凸点。
若不必苦于为生计东奔西走,用十年、用一生去解决一个问题,去探索一片未知领域,去拓宽人类的知识边界,又何尝不是一件值得去做的事情呢?
感谢老师能够提供相关的指导,为我用心整理了出一些需要研读的文献。这个系列文章就是为了记录 OD文献 阅读过程中的所学、所思、所想……
1. A Survey of Outlier Detection Methodologies.
1.1 Article
Paper Title | Venue | Year | Author | Materials |
---|---|---|---|---|
A survey of outlier detection methodologies | ARTIF INTELL REV | 2004 | Victoria J. Hodge 、Jim Austin | [PDF] |
1.2 Aim
- In this paper, we introduce a survey of contemporary techniques for outlier detection.
- We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
1.3 Conclusion
defination of OD:
- An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.
- An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data.
three fundamental approaches to the problem of outlier detection:
Determine the outliers with no prior knowledge of the data.(unsupervised clustering)
事先并不知道数据集中异常与正常的数据分别是哪些,需要通过一些列算法找出异常与正常数据的边界值(accommodation)或不断剔除数据集最偏离其他数据的数据(diagnosis),进而区别异常数据与正常数据。
Model both normality and abnormality. (supervised classifification)
事先已经将数据中的异常值与正常值分好,当有新数据加入时更接近正常值新数据就会被归类为正常值,反之则会被归类为异常值。
但是新数据已知正常值和异常值都存在较大差异(比如一些之前从未出现过的错误),那么分类就会出现问题。
Model only normality or in a very few cases model abnormality.(semi-supervised recognition or detection)
在已知所有正常值而不知道异常值、异常情况数据比较珍贵的情况下,判断新数据是异常还是正常。(这与现实中许多实际情况相符合)
目标是定义出一些正常数据的边界。
1.4 Background
1.5 Key result
1.6 Methods
2. Anomalous Instance Detection in Deep Learning: A Survey
1.1 Article
Paper Title | Venue | Year | Author | Materials |
---|---|---|---|---|
Anomalous Instance Detection in Deep Learning: A Survey | Preprint | 2020 | Saikiran Bulusu、Bhavya Kailkhura、Bo Li、Pramod K. Varshney、Dawn Song | [PDF] |
1.2 Aim
- This survey tries to provide a structured and comprehensive overview of the research on
anomaly detection for DL based applications. - Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic.