在科研训练老师的推荐下,选择走上这条未知的道路——NLP,更具体一点又或者是文档中的异常检测(Outlier Detection)。

前几天看到阿里达摩院的青橙奖颁布全程记录的视频,心中似乎、可能、大概对科研有了那么一丢丢兴趣。也看到一个关于本科生、研究生、博士生学习区别的视频,视频中假设人类全部已知的知识在一个有固定半径的圆内,本科生便是对圆内一个方向进行探索,研究生可以接触到这个方向的边界,博士生则是努力把这个方向的圆往外括出一个小凸点。

若不必苦于为生计东奔西走,用十年、用一生去解决一个问题,去探索一片未知领域,去拓宽人类的知识边界,又何尝不是一件值得去做的事情呢?

感谢老师能够提供相关的指导,为我用心整理了出一些需要研读的文献。这个系列文章就是为了记录 OD文献 阅读过程中的所学、所思、所想……

1. A Survey of Outlier Detection Methodologies.

1.1 Article

Paper Title Venue Year Author Materials
A survey of outlier detection methodologies ARTIF INTELL REV 2004 Victoria J. Hodge 、Jim Austin [PDF]

1.2 Aim

  • In this paper, we introduce a survey of contemporary techniques for outlier detection.
  • We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.

1.3 Conclusion

  • defination of OD:

    • An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.
    • An observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data.
  • three fundamental approaches to the problem of outlier detection:

    1. Determine the outliers with no prior knowledge of the data.(unsupervised clustering

      事先并不知道数据集中异常与正常的数据分别是哪些,需要通过一些列算法找出异常与正常数据的边界值(accommodation)或不断剔除数据集最偏离其他数据的数据(diagnosis),进而区别异常数据与正常数据。

    2. Model both normality and abnormality. (supervised classifification

      事先已经将数据中的异常值与正常值分好,当有新数据加入时更接近正常值新数据就会被归类为正常值,反之则会被归类为异常值。

      但是新数据已知正常值和异常值都存在较大差异(比如一些之前从未出现过的错误),那么分类就会出现问题。

    3. Model only normality or in a very few cases model abnormality.(semi-supervised recognition or detection

      在已知所有正常值而不知道异常值、异常情况数据比较珍贵的情况下,判断新数据是异常还是正常。(这与现实中许多实际情况相符合)

      目标是定义出一些正常数据的边界。

1.4 Background

1.5 Key result

1.6 Methods

2. Anomalous Instance Detection in Deep Learning: A Survey

1.1 Article

Paper Title Venue Year Author Materials
Anomalous Instance Detection in Deep Learning: A Survey Preprint 2020 Saikiran Bulusu、Bhavya Kailkhura、Bo Li、Pramod K. Varshney、Dawn Song [PDF]

1.2 Aim

  • This survey tries to provide a structured and comprehensive overview of the research on
    anomaly detection for DL based applications.
  • Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic.

1.3 Content

1.4 Background

1.5 Key result

1.6 Methods