Large distributed health data networks (DHDNs) that leverage electronic health records (EHRs) (e.g., eMerge, pSCANNER, PEDSnet, and PCORnet) have drawn substantial interests in clinical research, particularly for precision medicine, in United States. They eliminate the need to create, maintain, and secure access to central data repositories, minimize the need to disclose protected health information outside the data-owning entity, and mitigate many security, proprietary, legal, and privacy concerns. Missing data are ubiquitous and present analytical challenges in DHDNs. The existing methods for handling missing data require pooling patient-level data into a centralized repository and hence sharing of such data across institutions/sites. This approach, however, may not be appropriate or practical due to institutional policies, cost of moving enormous amounts of data, and most importantly, security and privacy concerns. In particular, a large body of research has demonstrated that given some background information about an individual such as data from EHRs, an adversary can learn sensitive information about the individual from de-identified data. In this talk, I will first describe the issue of missing data in distributed health data networks including 1) horizontally partitioned data - different data custodians such as hospitals and healthcare service providers have the same type of data for different sets of patients; and 2) vertically partitioned data - different data custodians such as hospitals, insurance companies, and sequencing centers have different pieces of patient information (i.e., data from the same patient are distributed across different institutions). I will then present our work on developing privacy-preserving methods and tools for handling missing data in DHDNs that do not require pooling patient-level data into a centralized repository. Lastly, I will present the conceptual architecture for our software toolbox for handling missing data in DHDNs including different modules for communication, storage, and algorithms.
龙琦现为美国University of Pennsylvania医学院,生物统计流行病与信息学系的教授以及Abramson癌症中心的Director of Biostatistics Core。 现在主要的研究方向是与精准医学有关的统计和数据科学方法以及相应的软件工具开发，包括医学大数据分析(-omics data, electronic health records data, and mobile health data)与信息安全,缺损数据,因果分析,和贝叶斯方法等方向.已有学术论文和摘要一百多篇，包括在Journal of the American Statistical Association,Annals of Applied Statistics, Biometrics, Biostatistics, Cancer Research,与American Journal of Pathology等国际权威期刊上发表的第一作者和通讯作者的学术论文.。已主持多项美国National Institutes of Health, Patient-Centered Outcomes Research Institute,与National Science Foundation的科研基金项目。受聘为多个国际权威期刊编委.获得Elected Fellow of the American Statistical Association和Elected Member of the International Statistical Institute等多项荣誉.
龙琦教授于1998年在中国科学技术大学获学士学位；2005年在美国University of Michigan获得博士学位；曾在美国Emory University生物统计与生物信息系担任Rollins Endowed Assistant Professor (2005-2011),副教授(2011-2016),和Director of Research (2015-2016)。