报告题目:Question-based Text Summarization
报告日期及时间:12月29日15:00 周五
报告地点:B403
报告人: Xiaohua Tony Hu
报告人单位: Drexel University
报告人简介: Xiaohua Tony Hu (Ph.D, 1995) is a full professor at the College of Computing and Informatics, Drexel University and lecture professor at Central China Normal University. He is also serving as the founding Co-Director of the NSF Center (I/U CRC) on Visual and Decision Informatics (NSF CVDI), IEEE Computer Society Bioinformatics and Biomedicine Steering Committee Chair, and IEEE Computer Society Big Data Steering Committee Chair. Tony is a scientist, teacher and entrepreneur. He joined Drexel University in 2002. He founded the International Journal of Data Mining and Bioinformatics (SCI indexed) in 2006. Earlier, he worked as a research scientist in the world-leading R&D centers such as Nortel Research Center, and Verizon Lab (the former GTE labs). In 2001, he founded the DMW Software in Silicon Valley, California. He has a lot of experience and expertise to convert original ideas into research prototypes, and eventually into commercial products, many of his research ideas have been integrated into commercial products and applications in data mining fraud detection, database marketing.
Tony’s current research interests are in big data, data/text/web mining, bioinformatics, information retrieval and information extraction, social network analysis, healthcare informatics. He has published more than 270 peer-reviewed research papers in various journals, conferences and books such as various IEEE/ACM Transactions (IEEE/ACM TCBB, IEEE TFS, IEEE TDKE, IEEE TITB, IEEE SMC, IEEE Computer, IEEE NanoBioScience, IEEE Intelligent Systems), JIS, KAIS, CI, DKE, IJBRA, SIG KDD, IEEE ICDM, IEEE ICDE, SIGIR, ACM CIKM, IEEE BIBE, IEEE CICBC etc, co-edited 20 books/proceedings. He has received a few prestigious awards including the 2005 National Science Foundation (NSF) Career award, the best paper award at the 2007 International Conference on Artificial Intelligence, the best paper award at the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, the 2010 IEEE Granular Computing Outstanding Contribution Awards, the 2007 IEEE Bioinformatics and Bioengineering Outstanding Contribution Award, the 2006 IEEE Granular Computing Outstanding Service Award, and the 2001 IEEE Data Mining Outstanding Service Award. He has also served as a program co-chair/conference co-chair of 14 international conferences/workshops and a program committee member in more than 80 international conferences in the above areas. He is the founding editor-in-chief of the International Journal of Data Mining and Bioinformatics (SCI indexed), International Journal of Granular Computing, Rough Sets and Intelligent Systems, an associate editor/editorial board member of four international journals (KAIS, IJDWM, IJSOI and JCIB). His research projects are funded by the National Science Foundation (NSF), US Dept. of Education, the PA Dept. of Health, the Natural Science Foundation of China (NSFC). He has obtained more than US$9.5 million research grants in the past 12 years as PI or Co-PI (PIs of 9 NSF grants, PI of 1 IMLS grant in the last 10 years). He has graduated 23 Ph.D. students from 2006 to 2017 and is currently supervising 8 Ph.D. students.
报告摘要:In the modern information age, finding the right information at the right time is an art (and a science). However, the abundance of information makes it difficult for people to digest it and make informed choices. In this research project, we aim to help people who want to quickly capture the main idea of a piece of information before they read the details through text summarization. In contrast with existing works, which mainly utilize declarative sentences to summarize a text document, we aim to use a few questions as a summary. In this way, people would know what questions a given text document can address and thus they may further read it if they have similar questions in mind. A question-based summary needs to satisfy three goals, relevancy, answerability, and diversity.
Relevancy measures whether a few questions can cover the main points that discussed in a text document; answerability measures whether answers to the questions are included in the text document; and diversity measures whether there is redundant information carried by the questions.
To achieve the three goals, we design a two-stage approach which consists of question selection and question diversification. The question selection component aims to find a set of candidate questions that are relevant to a text document, which in turn can be treated as answers to the questions. Specifically, we explore two lines of approaches that have been developed for traditional text summarization tasks, extractive approaches and abstractive approaches to achieve the goals of relevancy and answerability, respectively. The question diversification component is designed to re-rank the questions with the goal of rewarding diversity in the final question-based summary. Evaluation on product review summarization tasks for two product categories shows that the proposed approach is effective for discovering meaningful questions that are representative for individual reviews. This research work opens up a new direction in the intersection of information retrieval and natural language processing.
邀请人: 彭智勇 教授