A Doctoral position is open to work in the Probability and Statistics team of IECL, Nancy France.
More details about the subject can be found just below.
*Title :* Modeling and inference of the persistence of information on social networks
*Keywords :* Social networks, topic modeling, multivariate long–range depen- dence
*Context : *Social networks and medias in general create a huge quantity of information which may differ according to the location (countries, areas, cities.....) and the time periods. A natural question is to identify which main topics are persistent in a corpus of documents as tweets, websites or scientific papers. The aim of the project is to take into account the specifities of data as similarities between different regions or countries as well as the time stamp of the document...This question has been already addressed in several papers (see for e.g.[1]) and several models have been proposed to summarize the temporal evolution (see for e.g. [2]).
*Challenges :* We aim at complementing these works studying spatio-temporal persistence in textual data. Using dynamic topic modeling [3], we can modeled in real-time the content evolution of a corpus. Our goal will be to identify which topics are persistent in a corpus, taking into account both spatial and temporal information. The part simulation and inference will be designed using Monte Carlo methods [6,7] whereas persistence will be measured using multivariate long range dependence [4].
*Bibliography*
- [1] S. Asur, B. A. Huberman, G. Szabo, C. Wang. Trends in social media: per- sistence and decay. In ICWSM. (2011).
- [2] Y. Wang, E. Agichtein, M. Benzi. TM-LDA: efficient online modeling of latent topic transitions in social media. Proc. of the 18th ACM SIGKDD. ACM (2012).
- [3] D. Blei, J. D. Lafferty. Dynamic topic models. Proceedings of the 23rd in- ternational conference on Machine learning. ACM, (2006).
- [4] S. Kechagias, V. Pipiras. Definitions and representations of multivariate long- range dependent time series. JTSA 36.1 1-25 (2015).
- [5] M. Li, X. Wang, K. Gao, S. Zhang. A survey on information diffusion in online social networks: Models and methods. Information 8, no. 4: 118 (2017). - [6] G. Winkler, Image analysis, random fields and MCMC methods, Springer (2003) - [7] R. S. Stoica, A. Philippe, P. Gregori, J. Mateu. ABC Shadow algorithm: a tool for statistical analysis of spatial patterns. Stat. Comp., 27(5) : 1225-1238, (2017)
*Duration:* 3 years (full time position). Starting date: October, 2019 *Supervisors: *This thesis will be cosupervised by M. Clausel and R. Stoica both full Professors at IECL (Nancy, France): M. Clausel : https://sites.google.com/site/marianneclausel/ e-mail address : marianne.clausel@univ-lorraine.fr R. Stoica : https://sites.google.com/site/radustefanstoica/ Contacts : marianne.clausel@univ-lorraine.fr, radu-stefan.stoica@univ-lorraine.fr
*Working Environment:* The PhD candidate will work between the Probabil- ity and Statistic team of the IECL lab which is a leading institutions, respec- tively in Mathematics in France. The lab is a located at Nancy, France. This
subject is part of the OLKI project (http://lue.univ-lorraine.fr/fr/open-language-and-knowledge-citizens-olki) of the programm Lorraine Universit ́e d’Excellence. *Location :* Nancy, which is the capital of Lorraine in France, with excellent train connection to Luxembourg (1h30) and Paris (1h30).