Introduction
AI conference paper review has been a labor intensive problem for reviews. Reviews need to investigate the correlation of the paper and the conference’s topic as well as the quality of the paper. We want to help them decide the correlation and quality of the paper to be reviewed.
For the reviewers in the top conferences of computer science. They will receive thousands of paper submitted to the conference. The task I want to complete is to find the correlation and quality only from the abstract without seeing the whole paper. The correlation means if the paper’s topic is relative to the conference he/she submit. The quality means if the paper reaches the level of a top conference. I will use a model with word2vec, LSTM and fully connected layer to complete this task.
Data Descripition
I will use the dblp data to complete the correlation task. The dataset is called Citation Network Dataset, which include all the paper counted by dblp database. We have the conference name and the abstract of each paper. From the conference name, we can use CSranking to make sure its correlation class and if it is a conference count by CSranking, it is a top conference.
Conference | Count | avg length | label |
WWW | 5688 | 152.8 | IR |
SIGIR | 4415 | 150.1 | IR |
NIPS | 8241 | 152.8 | ML&DM |
KDD | 5062 | 166.3 | ML&DM |
ICML | 4778 | 135.4 | ML&DM |
ICCV | 5795 | 147.8 | CV |
ECCV | 4057 | 147.9 | CV |
CVPR | 12503 | 151.3 | CV |
EMNLP | 2762 | 117.6 | NLP |
ACL | 6342 | 106.7 | NLP |
I will use Peerread dataset for the “quality problem”. I only use part of it, which is the arXiv papers on NLP topics and their abstract text and accept status. The dataset has a pre-divided train and test set. For the training set, we have 6197 papers. 1654 of them is accepted and 4543 papers are not accepted. For the test set, we have 334 papers. 81 of them are accepted and 253 of them are not accepted. I will only use the abstract text and accept status. The dataset is used for this task so the data is very clean and do not need too much work on it. Here is one of the data
Methods Description
The baseline is easily to be found and I use Naive Bayes to be the baseline. Still Naive Bayes see a passage text as a word bag and does not consider the sequence of the words, which losses some information. So I designed following model.
First, we can use the pretrained word2vec corpus to transform the words to a vector. The technique can provide us a vector $w_k$ for each word based on its context information in the window you choose. For the word embedding in a abstract text as the input, we can use a LSTM network. Then the output neural network representation is put into a fully connected network and do the classification.
For specific parameters: I have a batchsize=64, learning rate = 0.01, word embedding-size = 50 and with glove pre-trained word2vec.
Evaluation and Results Description
I have performed my proposed model and the baseline(naive bayes) on two tasks. The evaluation is performed by accuracy metric. To make sure the model(even the baseline model) is significant enough, I involve the performance of random model and a easily rule based model. For our proposed model, I just tune for a short time, but they perform better than the baseline on two tasks. We use accuracy as the performance metric.
Model | Accuracy |
Random | 0.251 |
Predict “cv” directly | 0.379 |
Naive Bayes | 0.827 |
Our proposed model | 0.877 |
Model | Accuracy |
Random | 0.521 |
Predict “not accept” directly | 0.680 |
Naive Bayes | 0.703 |
Our proposed model | 0.787 |
What’s Next
I run out of time to use the n-gram feature on logistic regression as a baseline. Furthermore, I am fascinated by the work done by huang2018deep, which can give a heat map on the important part of the paper. I want to use it in a attention mechanic.
Reference
Jia-Bin Huang. 2018. Deep paper gestalt. arXiv preprint arXiv:1812.08775 .
Rie Johnson and Tong Zhang. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pages 562–570.
Dongyeop Kang, Waleed Ammar, Bhavana Dalvi, Madeleine Zuylen, Sebastian Kohlmeier, Eduard Hovy, and Roy Schwartz. 2018. A dataset of peer reviews (peerread): Collection, insights and nlp applications. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) https://doi. org/10.18653/v1/n18-1149.
Yoon Kim. 2014. Convolutional neural networks for sentence classifification. arXiv preprint arXiv:1408.5882 .