BERT 入門

浅川伸一 (東京女子大学) asakawa@ieee.org

14/Jun/2020

心理学に現れた注意のまとめ

Dicotomy

ボトムアップとトップダウン
何と何処 (腹側背側)
特徴，対象，場所へ向けられるの注意
外発的，内発的注意

認知心理学分野

フィルタリング [Broadbent (1958)}, 減衰説 (Treisman 1969)
特徴統合理論 (Treisman and Gelade 1980);(Treisman 1988)
Guided Search 2.0 (Wolfe 1994)
目標／妨害刺激類似性: (Duncan and Humphreys 1989, 1992DuncanHumphreys_engagement)
サーチライト(スポットライト)仮説 (Crick 1984), ズームレンズ(Eriksen and St.James 1986)
勝者占有回路(Koch and Ullman 1985) = softmax

計算モデル (Implementation)

(Milanese et al. 1994)
(Itti, Koch, and Niebur 1998)
(Borji and Itti 2013) SOTA

総説論文

(Itti and Koch 2001)
(Knudsen 2007)
(Petersen and Posner 2012)
(Kimura, Yonetani, and Hirayama 2013)
(Itti and Borji 2015) Oxford Handbook of attention

深層学習系

自動翻訳 (Bahdanau, Cho, and Bengio 2015, 2015Luong_attention)
画像脚注付け (Vinyals et al. 2015)
注意 (Wang and Shen 2018)

温故知新

脳梁切断患者による分離脳 (Sperry 1961)
半側空間無視 (Heilman and Valenstein 1979)
頭頂葉損傷患者の注意のディスエンゲージメント(Posner 1980)
両耳分離聴実験, カクテルパーティ効果 (Broadbent 1958);(Treisman 1964)
特徴統合理論[Treisman and Gelade (1980),1988Treisman}
計算論的モデルサーチライト(スポットライト)仮説 (Crick 1984)
モデルとデータセット公開，競技会 (Itti and Koch 2001);(Itti and Borji 2014)
DeepGazeII (Kümmerer et al. 2017)

分離脳 Split brain

From (Sperry 1968) Fig. 5

半側空間無視

From (Bloom and Lazerson 1988) Fig. 17-6

ポズナーとコーヘン

From (Posner 1980) Fig. 1, Fig.6: 右頭頂葉障害を呈した患者 (R.S.) の結果。円:ターゲットが左視野提示，三角:ターゲット右視野提示。白点線:非有効手がかり，黒実線:有効手がかり。横軸は ISI。縦軸は反応時間中央値

特徴統合理論 (FIT)

From (Treisman and Souther 1985) Fig. 9

探索非対称性 search asymmetry}

From [Treisman (1988)} Fig. 3

上図右の結果は横軸に同時に提示された刺激の個数であり，縦軸は反応時間です。線分特徴が存在する刺激 (Q) が目標となるか，存在しない (O) が目標となるかによって反応時間に差が認められます。結果は点線，すあんわち特徴が存在しない目標を探索する条件，点線で描画，では同時に提示された刺激数が増加するに従って反応時間が増大します。一方，特徴が存在する目標を探索する条件では，同時提示された刺激の個数によらず反応時間は平坦になります。以下に同様な実験結果を示しました。

スポットライトメタファー

From (Koch and Ullman 1985) Fig. 5

スポットライトメタファー (Crick 1984)

Attention can be likened to a spotlight that enhances the efficiency of detection of events within its beam. Unlike when acuity is involved, the effeet of the beam is not related to the fovea. When the fovea is unill uminated by attention, its ability to lead to detection is diminished, as would be the case with any other are a of the visual system. Posner p.172
(Summerfield et al. 2006) は AI の研究にも影響
ネットワークの内部メモリから読み出す情報を選択するために注意機構
機械翻訳 (Bahdanau, Cho, and Bengio 2015)，NTM (Graves et al. 2016)
コンテンツアドレス(Hopfield 1982)
BERT (Devlin et al. 2018)

Inhibition of Return (IOR)

From http://www.scholarpedia.org/article/Inhibition_of_return

From The superior colliculus (SC) has been implicated as the neural substrate for IOR through four converging, but indirect, lines of evidence.

IOR is abnormal in patients with midbrain degeneration due to progressive supranuclear palsy (PSP).
It is preserved in patients with hemianopia, a condition in which only extrageniculate pathways are available to process visual information.
It is present in newborn infants, in whom the geniculostriate pathways are not yet developed.
It is generated asymmetrically in temporal and nasal visual fields, suggesting retinotecal mediation.

ガイド付き探索モデル Guided Search 2.0

最初にトップダウン注意を明示的に示した *ガイド付き探索モデル** [Wolfe (1994)}

From (Wolfe 1994) Fig. 2

(Itti and Borji 2015) の総説論文からそれまでのモデルの概説図

From (Itti and Borji 2015) Fig. 2

Friston’s attetion

From (Friston et al. 2014) Fig. 1

上丘 SC

From (Olshausen, Anderson, and Essen 1993) Fig. 10a

霊長類の視覚系の動作は注意を伴う視線の移動により外界を認識
すべての入力を並行して処理するのではなく，視覚的注意は場所や物体間の遷移(Koch and Ullman 1985; Moore and Zirnsak 2017; Posner and Petersen 1990)
情報の優先順位付け，取捨選択(Olshausen, Anderson, and Essen 1993; Salinas and Abbott 1997)

リズム現象

From (Fiebelkorn, Saalmann, and Kastner 2013) Fig. 1 and Fig. 2a

リズム現象 (2)

From (Buschman and Kastner 2015) Fig. 3b}

リズム現象 (3)

From (Buschman and Kastner 2015) Fig. 3a, Fig. 6}

DeepGaze II

From (Kümmerer et al. 2017) Fig. 2

DeepGaze II (2)

From (Kümmerer et al. 2017) Fig. 2

DeepGazeII より成績の良い最右の棒は人間の眼球運動データ

DeepGaze II (3)

From (Kümmerer et al. 2017) Fig. 3

IG: 情報ゲイン, IGE: 修正情報ゲイン, ACU: area under the ROC curve, sAUC: シャッフル精度, NSS: 正規化済キャンパス顕在性 normalized scanpath saliency

DeepGaze III

From (Kümmerer, Wallis, and Bethge 2019) Fig. 1

ヘルムホルツマシン

(Dayan et al. 1995);(Hinton et al. 1995)

ヘルムホルツマシン

\[ \log p(d\vert\theta) = -\sum Q_aE_a-\sum Q_a\log Q_a + \sum Q_a\log\left(\frac{Q_a}{P_a}\right)\\ =- F(d;\theta,Q)+\sum_a Q_a\log\left(\frac{Q_a}{P_a}\right) \]

\[ q^{(l)}\left(\phi,\mathbf{s}^{(l-1)}\right)=\sigma\left(\sum s^{l-1}\phi^{(l-1,l)}\right) \]

\[ Q_\alpha(\phi,d)=\prod\prod\left[q^{(l)}\left(\phi,\mathbf{s}^{(l-1)}\right)\right]^{s^{l}} \left[1-q^{(l)}\left(\phi,\mathbf{s}^{(l-1)}\right)\right]^{1-s} \]

\[ p_j^{(l)}\left(\theta,\mathbf{s}^{(l+1)}\right)=\sigma\left(\sum s^{(l+1)}\theta^{(l+1)}\right) \]

\[ p(\alpha\vert \theta)=\prod\prod\left[p_j^{(l)}\left(\theta,\mathbf{s}^{(l+1)}\right)\right] \]

モデル: ヘルムホルツマシン

From (Kawato, Hayakawa, and Inui 1993) Fig. 1 より

From (Hinton et al. 1995) Fig. 1 より

上位層は下位層からの情報をサンプリング \(\rightarrow\) 認識形成
下位層は上位層からの情報を受けとる \(\rightarrow\) 情報再構成

ボトムアップ処理による認識とトップダウン処理による(こう見えるはずだという思い込みの)生成を \(n\) (\(n=2,\ldots,4\)) 回繰り返す \(\rightarrow\)

定式化

思い込みの印象 \(\alpha\) と入力画像 \(d\) を用いて%%の記述長は，単なる前隠れ層ユニットの記述損失であり

\[ C(\alpha,d)=C(\alpha)+C(d\vert\alpha)\\ =\sum_{\ell\in L}\sum_{j\in\ell} C(s_j^\alpha)+\sum_i C(s_i^d\vert\alpha) \]

上式を用いて結合係数の更新を行う \[ \Delta w_{kj}=\epsilon s_k^\alpha \left(s_j^\alpha-p_j^\alpha\right), \]

\[ C(d) = \sum_\alpha Q(\alpha\vert d) C(\alpha, d) - \left[-\sum_\alpha Q(\alpha\vert d) \log Q(\alpha\vert d)\right]. \]

\[ p\left(\alpha\vert d\right)=\frac{e^{-C(\alpha,d)}}{\sum_\beta e^{-C(\beta,d)}} \]

\[ \Delta s_{j,t+1}=\epsilon s_{j,t}^\gamma(s_{j,t}^\gamma-q_{j,t}^\gamma) \]

全体の良い表象が得られるまで，すなわち下位層の活性を再構築するように複数回繰り返す

計算例 (2) 眼球運動のサンプリング}

文献

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. “Neural Machine Translation by Jointly Learning to Align and Translate.” In Proceedings in the International Conference on Learning Representations (ICLR), edited by Yoshua Bengio and Yann LeCun. San Diego, CA, USA. http://arxiv.org/abs/1409.0473.

Bichot, Narcisse P., Matthew T. Heard, Ellen M. DeGennaro, and Robert Desimone. 2015. “A Source for Feature-Based Attention in the Prefrontal Cortex.” Neuron 88 (November): 832–44.

Bloom, Floyd E., and Arlyne Lazerson. 1988. Brain, Mind, and Behavior. 2nd ed. New York, NY: Freeman.

Borji, Ali, and Laurent Itti. 2013. “State-of-the-Art in Visual Attention Modeling.” IEEE Transaction on Pattern Analysis and Machine Intelligence 35 (1): 185–207.

Broadbent, Donald E. 1958. Perception and Communication. Oxford,UK: Pergamon.

Buschman, Timothy J., and Sabine Kastner. 2015. “From Behavior to Neural Dynamics: An Integrated Theory of Attention.” Neuron 88 (October): 127–44.

Crick, Francis. 1984. “Function of the Thalamic Reticular Complex: The Search Light Hypothesis.” Proceedings of the National Academy of Sciences 81 (July): 4586–90.

Dayan, Peter, Geoffrey E. Hinton, Radford M. Neal, and Richard S. Zemel. 1995. “The Helmholtz Machine.” Neural Computation 7: 889–904.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” arXiv Preprint.

Duncan, John, and Glyn W. Humphreys. 1989. “Visual Search and Stimulus Similarity.” Psychological Review 96 (3): 433–58.

Eriksen, Charles W., and James D. St.James. 1986. “Visual Attention Within and Around the Field of Focal Attention: A Zoom Lens Model.” Perception and Psychophysics 40 (4): 225–40.

Fiebelkorn, Ian C., Yuri B. Saalmann, and Sabine Kastner. 2013. “Rhythmic Sampling Within and Between Objects Despite Sustained Attention at a Cued Location.” Current Biology 23 (December): 2553–8.

Friston, Karl J, Klaas Enno Stephan, Read Montague, and Raymond J Dolan. 2014. “Computational Psychiatry: The Brain as a Phantastic Organ.” The Lancet Psychiatry 1: 148–58.

Graves, Alex, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwi nska, Sergio Gómez Colmenarejo, et al. 2016. “Hybrid Computing Using a Neural Network with Dynamic External Memory.” Nature 538: 471–76. https://doi.org/10.1038/nature20101.

Heilman, Kennerh M., and Edward Valenstein. 1979. “Mechanisms Underlying Hemispatial Neglect.” The Annals of Neurology 5 (2): 166–70.

Hinton, Geoffrey E., Peter Dayan, Brendan J. Frey, and Radford M. Neal. 1995. “The "Wake-Sleep" Algorithm for Unsupervised Neural Networks.” Science 268 (5214): 1158–61.

Hopfield, John Joseph. 1982. “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.” Proceedings of the National Academy of Sciences 79: 2554–8.

Itti, Laurent, and Ali Borji. 2014. “Computational Models: Bottom-up and Top-down Aspects.” In The Oxford Handbook of Attention, edited by Anna C. Nobre and Sabine Kastner, 1122–58. Oxford University Press.

———. 2015. “Computational Models of Attention.” ArXiv Preprint.

Itti, Laurent, and Christof Koch. 2001. “Computational Modelling of Visual Attention.” Nature Reviews Neuroscience 2: 1–11.

Itti, Laurent, Christof Koch, and Ernst Niebur. 1998. “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis.” IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (11): 1254–9.

Kawato, Mitsuo, Hideki Hayakawa, and Toshio Inui. 1993. “A Forward-Inverse Optics Model of Reciprocal Connections Between Visual Cortical Areas.” Network: Computation in Neural Systems 4 (4): 415–22.

Kimura, Akisato, Ryo Yonetani, and Takatsugu Hirayama. 2013. “Computational Models of Human Visual Attention and Their Implementations: A Survey.” IEICE Transactions of Information & Systems E96-D (3): 562–78.

Knudsen, Eric I. 2007. “Fundamental Components of Attention.” Annual Revivew of Neuroscience 30: 57–78.

Koch, Christoh, and Simon Ullman. 1985. “Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry.” Human Neurobiology 4: 219–27.

Krauzlis, Richard J., Lee P. Lovejoy, and Alexandre Zénon. 2013. “Superior Colliculus and Visual Spatial Attention.” Annual Review of Neuroscience 36 (165–182).

Kümmerer, Matthias, Thomas S. A. Wallis, and Matthias Bethge. 2019. “DeepGaze III: Using Deep Learning to Probe Interactions Between Scene Content and Scanpath History in Fixation Selection.” In Proceedings of Cognitive Computational Neuroscience, 542–45. Berlin, Germany. https://doi.org/https://doi.org/10.32470/CCN.2019.1235-0.

Kümmerer, Matthias, Thomas S. A. Wallis, Leon A. Gatys, and Matthias Bethge. 2017. “Understanding Low- and High-Level Contributions to Fixation Prediction.” In The IEEE International Conference on Computer Vision (ICCV), 4789–98. Venice, Italy.

Milanese, Ruggero, Harry Wechsler, Sylvia Gill, Jean-Marc Bost, and Thierry Pun. 1994. “Integration of Bottom-up Integration of Bottom-up and Top-down Cues for Visual Attention Using Non-Linear Relaxation.” In The Proceedings of CVPR, IEEE – Institute of Electrical and Electronics Engineers, 781–85. Dallas Texas, USA: Computer Vision; Pattern Recognition (CVPR).

Miller, Earl K., and Jonathan D. Cohen. 2001. “An Integrative Theory of Prefrontal Cortex Function.” Annual Revivew of Neuroscience 24 (167–202).

Monosov, Ilya E., and Kirk G. Thompson. 2009. “Frontal Eye Field Activity Enhances Object Identification During Covert Visual Search.” Journal of Neurophysiology 102 (October): 3656–72.

Moore, Tirin, and Marc Zirnsak. 2017. “Neural Mechanisms of Selective Visual Attention.” Annual Review of Psychology 68 (January): 47–72. https://doi.org/10.1146/annurev-psych-122414-033400.

Olshausen, Bruno A., Charles H. Anderson, and David C. Van Essen. 1993. “A Neurobiological Model of Visual Attention and Invariant Pattern Recognition Based on Dynamic Routing of Information.” The Journal of Neuroscience 13 (11): 4700–4719.

Petersen, Steven E., and Michael I. Posner. 2012. “The Attention System of the Human Brain: 20 Years After.” Annual Review of Neuroscience 35: 73–89.

Posner, Michael I., and Steven E. Petersen. 1990. “The Attention System of the Human Brain.” Annual Review of Neuroscience 13: 25–42.

Posner, Michel I. 1980. “Orienting of Attention.” Quarterly Journal of Experimental Psychology 32: 3–25.

Salinas, Emilio, and L. F. Abbott. 1997. “Invariant Visual Responses from Attentional Gain Fields.” Journal of Neurophsiology 77: 3267–72. https://doi.org/10.1152/jn.1997.77.6.3267.

Sperry, Roger W. 1961. “Cerebral Organization and Behavior.” Science 133: 1749–57.

Sperry, Roger W. 1968. “Hemisphere Deconnection and Unity in Conscious Awareness.” American Psychologist 28: 723–33.

Summerfield, Jennifer J., Jöran Lepsien, Darren R. Gitelman, M. Marsel Mesulam, and Anna C. Nobre. 2006. “Orienting Attention Based on Long-Term Memory Experience.” Neuron 49: 905–16. https://doi.org/10.1016/j.neuron.2006.01.021.

Treisman, Ann. 1964. “Selective Attention in Man.” British Medical Bulletin 20: 12–16.

———. 1988. “Feature and Objects: The Fourteenth Bartlett Memorial Lecture.” The Quarterly Journal of Experimental Psychology 40A: 201–37.

Treisman, Anne M. 1969. “Strategies and Models of Selective Attention.” Psychological Review 76 (3): 282–99.

Treisman, Ann, and George Gelade. 1980. “A Feature Integration Theory of Attention.” Cognitive Psychology 12: 97–136.

Treisman, Ann, and J. Souther. 1985. “Search Asymmetry: A Diagnostic for Preattentive Processing of Separable Features.” Journal of Experimental Psychology: General 114 (3): 285–310.

Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. “Show and Tell: A Neural Image Caption Generator.” In Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA. http://arxiv.org/abs/1411.4555v1.

Wang, Wenguan, and Jianbin Shen. 2018. “Deep Visual Attention Prediction.” IEEE Transactions on Image Processing 27 (5): 2368–78.

Wardak, Claire, Etienne Olivier, and Jean-René Duhamel. 2004. “A Deficit in Covert Attention After Parietal Cortex Inactivation in the Monkey.” Neuron 42 (May): 501–8.

Wolfe, Jeremy M. 1994. “Guided Search 2.0 a Revised Model of Visual Search.” Psychonomic Bulletin and Review 1 (2): 202–38.

BERT 入門

心理学に現れた注意のまとめ

Dicotomy

関連脳領域

認知心理学分野

計算モデル (Implementation)

総説論文

深層学習系

温故知新

分離脳 Split brain

半側空間無視

ポズナーとコーヘン

特徴統合理論 (FIT)

探索非対称性 search asymmetry}

スポットライトメタファー

Inhibition of Return (IOR)

ガイド付き探索モデル Guided Search 2.0

Friston’s attetion

上丘 SC

リズム現象

リズム現象 (2)

リズム現象 (3)

DeepGaze II

DeepGaze II (2)

DeepGaze II (3)

DeepGaze III

ヘルムホルツマシン

ヘルムホルツマシン

モデル: ヘルムホルツマシン

定式化

計算例

計算例

計算例 (2) 眼球運動のサンプリング}

文献