Convolutional Block Attention Module


We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architecture seamlessly with negligible overheads. Our module is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements on classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

BAM: Bottleneck Attention Module


Recent advances in deep neural networks have been developed via architecture search in depth, width, and cardinality. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.

Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning


Learning-based color enhancement approaches typically learn to map from input images to retouched images. Most of existing methods require expensive pairs of input- retouched images or produce results in a non-interpretable way. In this paper, we present a deep reinforcement learning (DRL) based method for color enhancement to explicitly model the step-wise nature of human retouching process. We cast a color enhancement process as a Markov Decision Process where actions are defined as global color adjustment operations. Then we train our agent to learn the optimal global enhancement sequence of the actions. In addition, we present a ‘distort-and-recover’ training scheme which only requires high-quality reference images for training instead of input and retouched image pairs. Given high-quality reference images, we distort the images’ color distribution and form distorted-reference image pairs for training. Through extensive experiments, we show that our method produces decent enhancement results and our DRL approach is more suitable for the ‘distort-and-recover’ training scheme than previous supervised approaches. Authors: Jongchan Park (Lunit), Joon-Young Lee (Adobe Research), Donggeun Yoo (Lunit), and In So Kweon (KAIST)

A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer


Predicting TNM stage is the major determinant of breast cancer prognosis and treatment. The essential part of TNM stage classification is whether the cancer has metastasized to the regional lymph nodes (N-stage). Pathologic N-stage (pN-stage) is commonly performed by pathologists detecting metastasis in histological slides. However, this diagnostic procedure is prone to misinterpretation and would normally require extensive time by pathologists because of the sheer volume of data that needs a thorough review. Automated detection of lymph node metastasis and pN-stage prediction has a great potential to reduce their workload and help the pathologist. Recent advances in convolutional neural networks (CNN) have shown significant improvements in histological slide analysis, but accuracy is not optimized because of the difficulty in the handling of gigapixel images. In this paper, we propose a robust method for metastasis detection and pN-stage classification in breast cancer from multiple gigapixel pathology images in an effective way. pN-stage is predicted by combining patch-level CNN based metastasis detector and slide-level lymph node classifier. The proposed framework achieves a state-of-the-art quadratic weighted kappa score of 0.9203 on the Camelyon17 dataset, outperforming the previous winning method of the Camelyon17 challenge.

Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks


Data is one of the most important factors in machine learning. However, even if we have high-quality data, there is a situation in which access to the data is restricted. For example, access to the medical data from outside is strictly limited due to the privacy issues. In this case, we have to learn a model sequentially only with the data accessible in the corresponding stage. In this work, we propose a new method for preserving learned knowledge by modeling the high-level feature space and the output space to be mutually informative, and constraining feature vectors to lie in the modeled space during training. The proposed method is easy to implement as it can be applied by simply adding a reconstruction loss to an objective function. We evaluate the proposed method on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of knowledge preservation compared to previous approaches.

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks


Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the trade-off between preserving and removing style variations.

Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study


We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.

Accurate Lung Segmentation via Network-Wise Training of Convolutional Networks


We introduce an accurate lung segmentation model for chest radiographs based on deep convolutional neural networks. Our model is based on atrous convolutional layers to increase the field-of-view of filters efficiently. To improve segmentation performances further, we also propose a multi-stage training strategy, network-wise training, which the current stage network is fed with both input images and the outputs from pre-stage network. It is shown that this strategy has an ability to reduce falsely predicted labels and produce smooth boundaries of lung fields. We evaluate the proposed model on a common benchmark dataset, JSRT, and achieve the state-of-the-art segmentation performances with much fewer model parameters.

A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology


Predicting tumor proliferation scores is an important biomarker indicative of breast cancer patients' prognosis. In this paper, we present a unified framework to predict tumor proliferation scores from whole slide images in breast histopathology. The proposed system is offers a fully automated solution to predicting both a molecular data based, and a mitosis counting based tumor proliferation score. The framework integrates three modules, each fine-tuned to maximize the overall performance: an image processing component for handling whole slide images, a deep learning based mitosis detection network, and a proliferation scores prediction module. We have achieved 0.567 quadratic weighted Cohen's kappa in mitosis counting based score prediction and 0.652 F1-score in mitosis detection. On Spearman's correlation coefficient, which evaluates prediction on the molecular data based score, the system obtained 0.6171. Our system won first place in all of the three tasks in Tumor Proliferation Assessment Challenge at MICCAI 2016, outperforming all other approaches.

Transferring Knowledge to Smaller Network with Class-Distance Loss


Training a network with small capacity that can perform as well as a larger capacity network is an important problem that needs to be solved in real life applications which require fast inference time and small memory requirement. Previous approaches that transfer knowledge from a bigger network to a smaller network show little benefit when applied to state-of-the-art convolutional neural network architectures such as Residual Network trained with batch normalization. We propose class-distance loss that helps teacher networks to form densely clustered vector space to make it easy for the student network to learn from it. We show that a small network with half the size of the original network trained with the proposed strategy can perform close to the original network on CIFAR-10 dataset.

@inproceedings{title={Transferring Knowledge to Smaller Network with Class-Distance Loss}, author={Seungwook Kim, Hyo-Eun Kim}, booktitle={International Conference on Learning Representation(ICLR) Workshop}, year={2017} }

Semantic Noise Modeling for Better Representation Learning


Latent representation learned from multi-layered neural networks via hierarchical feature abstraction enables recent success of deep learning. Under the deep learning framework, generalization performance highly depends on the learned latent representation which is obtained from an appropriate training scenario with a task-specific objective on a designed network model. In this work, we propose a novel latent space modeling method to learn better latent representation. We designed a neural network model based on the assumption that good base representation can be attained by maximizing the total correlation between the input, latent, and output variables. From the base model, we introduce a semantic noise modeling method which enables class-conditional perturbation on latent space to enhance the representational power of learned latent feature. During training, latent vector representation can be stochastically perturbed by a modeled class-conditional additive noise while maintaining its original semantic feature. It implicitly brings the effect of semantic augmentation on the latent space. The proposed model can be easily learned by back-propagation with common gradient-based optimization algorithms. Experimental results show that the proposed method helps to achieve performance benefits against various previous approaches. We also provide the empirical analyses for the proposed class-conditional perturbation process including t-SNE visualization.

@article{DBLP:journals/corr/KimHC16, author = {Hyo{-}Eun Kim and Sangheum Hwang and Kyunghyun Cho}, title = {Semantic Noise Modeling for Better Representation Learning}, journal = {CoRR}, volume = {abs/1611.01268}, year = {2016}, url = {}, timestamp = {Thu, 01 Dec 2016 19:32:08 +0100}, biburl = {}, bibsource = {dblp computer science bibliography,} }



Recent advances of deep learning have achieved remarkable performances in various challenging computer vision tasks. Especially in object localization, deep convolutional neural networks outperform traditional approaches based on extraction of data/task-driven features instead of hand-crafted features. Although location information of region-of-interests (ROIs) gives good prior for object localization, it requires heavy annotation efforts from human resources. Thus a weakly supervised framework for object localization is introduced. The term "weakly" means that this framework only uses image-level labeled datasets to train a network. With the help of transfer learning which adopts weight parameters of a pre-trained network, the weakly supervised learning framework for object localization performs well because the pre-trained network already has well-trained class-specific features. However, those approaches cannot be used for some applications which do not have pre-trained networks or well-localized large scale images. Medical image analysis is a representative among those applications because it is impossible to obtain such pre-trained networks. In this work, we present a "fully" weakly supervised framework for object localization ("semi"-weakly is the counterpart which uses pre-trained filters for weakly supervised localization) named as self-transfer learning (STL). It jointly optimizes both classification and localization networks simultaneously. By controlling a supervision level of the localization network, STL helps the localization network focus on correct ROIs without any types of priors. We evaluate the proposed STL framework using two medical image datasets, chest X-rays and mammograms, and achieve signiticantly better localization performance compared to previous weakly supervised approaches.

@ARTICLE{2016arXiv160201625H, author = {{Hwang}, S. and {Kim}, H.-E.}, title = "{Self-Transfer Learning for Fully Weakly Supervised Object Localization}", keywords = {Computer Science - Computer Vision and Pattern Recognition}, year = 2016 }

Pixel-level Domain Transfer


We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.

@article{DBLP:journals/corr/YooKPPK16, author = {Donggeun Yoo and Namil Kim and Sunggyun Park and Anthony S. Paek and In{-}So Kweon}, title = {Pixel-Level Domain Transfer}, year = {2016} }



Mitosis counting is time and labor-consuming work and it frequently reveals inter-observer variability. Although deep convolutional neural network, the most accurate image classification algorithm, has been used for detecting mitosis, only public data sets were tested and it had never been utilized for routine histologic slide images. Recently, smartphone cameras with adaptors to the microscope were tried for easier image acquisition and they significantly resolved a barrier for applying computer algorithms to analyze histologic images. Histologic slides of 70 invasive ductal carcinomas of breast were selected and 1761 high-power field histologic images (400x) were acquired by using smartphone application with an adaptor attached to the microscope manufactured by us. Mitoses were annotated by four pathologists blindly. More than three pathologists’ concordance was regarded as true. 2004 mitotic cells and 801600 non-mitotic cells from 60 cases were divided into 10 sets and the algorithm was sequentially trained using fine-tuning method. After the training, ten patients’ images were tested for the concordance of detection with pathologists. During the algorithm training, sensitivity for mitosis detection was calculated between 75-83%. Specificity for mitosis detection was increased to achieve 97% as we trained the algorithm with more images. The trained algorithm identified 189 mitoses in 748 images from 10 cases and showed 79% sensitivity and 96% specificity for detecting mitosis compared to the pathologists. The detected mitoses were displayed in the application within 14 seconds in average. The proposed deep convolutional neural network-based mitosis detection system revealed remarkable sensitivity and specificity, and the performance improved as more images were utilized for training. Along with the smartphone application and the adaptor we manufactured, it assists pathologists to identify mitosis so that reduce time and labor costs, while resulting objective diagnosis.



We propose an automatic TB screening system based on deep CNN. Since CNN extracts the most discriminative features according to target objective from given data by itself, the proposed system does not require manually-designed features for TB screening. Also, we show that transfer learning from lower convolutional layers of pre-trained networks resolves the difficulties in handling high-resolution medical images and training huge parameters with limited number of images. Experiments are conducted using three real field datasets, the KIT, MC and Shenzhen sets, and the results show that the proposed system has high screening performance in terms of AUC and accuracy.

@proceeding{doi:10.1117/12.2216198, author = {Hwang, Sangheum and Kim, Hyo-Eun and Jeong, Jihoon and Kim, Hee-Jin}, title = { A novel approach for tuberculosis screening based on deep convolutional neural networks }, journal = {Proc. SPIE}, volume = {9785}, pages = {97852W-97852W-8}, year = {2016}, URL = {} }



A weakly-supervised semantic segmentation framework with a tied deconvolutional neural network is presented. Each deconvolution layer in the framework consists of unpooling and deconvolution operations. 'Unpooling' upsamples the input feature map based on unpooling switches defined by corresponding convolution layer's pooling operation. 'Deconvolution' convolves the input unpooled features by using convolutional weights tied with the corresponding convolution layer's convolution operation. The unpooling-deconvolution combination helps to eliminate less discriminative features in a feature extraction stage, since output features of the deconvolution layer are reconstructed from the most discriminative unpooled features instead of the raw one. This results in reduction of false positives in a pixel-level inference stage. All the feature maps restored from the entire deconvolution layers can constitute a rich discriminative feature set according to different abstraction levels. Those features are stacked to be selectively used for generating class-specific activation maps. Under the weak supervision (image-level labels), the proposed framework shows promising results on lesion segmentation in medical images (chest X-rays) and achieves state-of-the-art performance on the PASCAL VOC segmentation dataset in the same experimental condition.

@ARTICLE{2016arXiv160204984K, author = {{Kim}, H.-E. and {Hwang}, S.}, title = "{Scale-Invariant Feature Learning using Deconvolutional Neural Networks for Weakly-Supervised Semantic Segmentation}", keywords = {Computer Science - Computer Vision and Pattern Recognition}, year = 2016 }



We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.

@inproceedings{attentionNet, title={AttentionNet: Aggregating Weak Directions for Accurate Object Detection}, author={Donggeun Yoo, Sunggyun Park, Joon-Young Lee*, Anthony S. Paek, In So Kweon}, booktitle={Computer Vision (ICCV)}, year={2015} }
back to top