Chaos 22, 013111 (2012); http://dx.doi.org/10.1063/1.3675621 (25 pages)
Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations
(Received 15 October 2010; accepted 8 December 2011; published online 24 January 2012)
© 2012 American Institute of Physics
Article Outline
- INTRODUCTION
- A reader’s guide: The outline of this paper
- MOTIVATING EXAMPLES
- INFORMATION THEORY BACKGROUND
- Average TDMI
- Aggregate TDMI
- TDMI-SPECIFIC ESTIMATOR BIASES
- Sample size dependent estimator bias effects
- Fixed point bias estimate for average and aggregate populations
- Non-estimator bias: How the TDMI calculation can act as a population filter
- Methods for assessing δt bin compositions
- POPULATION-BASED DEVIATIONS FROM THE INDIVIDUAL TDMI ESTIMATES
- Heterogeneity-based deviations from the individual: Average TDMI case
- Entropy of the averaged population
- Heterogeneity-based deviations from the individual: Aggregate TDMI case
- Entropy of the aggregated population
- Heterogeneity-based deviations from the individual: Average TDMI case
- HOW TO INTERPRET THE TDMI FOR A POPULATION, OR, TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY
- Support dependent, graph independent, effects on the population TDMI
- Graph dependent, support independent, effects on the population TDMI
- Support dependent, graph-based effects on the population TDMI
- NON-TDMI-BASED METHODS FOR INTERPRETING POPULATION DIVERSITY
- Homogeneity in measurement composition
- Homogeneity in measurement distribution supports
- Homogeneity in the distribution of the graphs of the measurement PDFs
- ASSEMBLING THE PIECES: AN EXPLICIT PRESCRIPTION FOR TDMI ANALYSIS AND INTERPRETATION FOR A POPULATION OF TIME SERIES FOR A FIXED TIME SEPARATION
δt
- Step one: Determining the computability of
(δt)
- Step two (A in Fig.
): Interpreting δ I (δ t ) or
(δt)
- Step three (B in Fig.
): Assessing population representation
- Step one: Determining the computability of
- QUANTITATIVE EXAMPLES FOR TDMI INTERPRETATION AND POPULATION HOMOGENEITY EVALUATION
- Simulated data examples: The quadratic map and the Gauss map
- TDMI-based analysis of the simulated data
- Non-TDMI-based analysis of the simulated data
- Quantifying small sample-size effects
- Real data examples: Glucose values for 100 densely sampled individuals versus 20,000 random individuals
- TDMI-based analysis for data set 7, the well measured population
- Non-TDMI-based analysis for data set 7, the well measured population
- TDMI-based analysis for data set 8, the random (less well measured) population
- Non-TDMI-based analysis for data set 8, the random (less well measured) population
- Analysis of the TDMI under variation of δt
- Simulated data examples: The quadratic map and the Gauss map
- DISCUSSION AND COMMENTS
- Specific results of the interpretative framework relative to real data
- Using categorical billing code data to help verify the TDMI analysis
- How our method addresses nonstationarity
- Comments regarding the connection between the supports and the normalizations of the distributions
- Future directions regarding the use of this technique
- Some remaining statistical problems
- SUMMARY
RELATED DATABASES
KEYWORDS and PACS
Keywords
ARTICLE DATA
- C. Komalapriya, M. Thiel, M. C. Ramano, N. Marwan, U. Schwarz, and J. Kurths, Phys. Rev. E 78, 066217 (2008).
- J. C. Sprott , Chaos and Time-series Analysis (Oxford University Press, New York, 2003).
- H. Kantz and T. Schreiber , Nonlinear Time Series Analysis, 2nd ed. (Cambridge University Press, UK, 2003).
- W. Hogan and M. Wagner, J. Am. Med. Inform Assoc. 5, 342 (1997).
- J. van der Lei, Methods Inf. Med. 30, 79 (1991).
- H. Sagreiya and R. B. Altman, J. Biomed. Inf. 43, 747 (2010).
- J. M. Higgins and L. Mahadevan, Proc. Natl. Acad. Soc. U.S.A. 107, 20587 (2010).
- E. Shudo, R. M. Ribeiro, and A. S. Perelson, J. Viral Hepat. 15, 357 (2008).
- M. S. Turner, Phys. Today 62, 8 (2009).
- J. D. Scargle, Astrophys. J. 263, 835 (1982).
- S. Baisch and G. H. R. Bokelmann, Comput. Geosci. 25, 739 (1999).
- M. Schulta and K. Stattegger, Comput. Geosci. 23, 929 (1997).
- A. W. C. Liew, J. Xian, S. Wu, D. Smith, and H. Yan, BMC Bioinf. 8, 137 (2007).
- L. Wasserman , All of Statistics: A Concise Course in Statistical Inference, (Springer, New York, 2004).
- M. Loéve , Probability Theory I (Springer-Verlag, 1977).
- A. G. Gray and A. W. Moore , “Very fast multivariate kernel density estimation using via computational geometry,” in Joint Stat. Meeting (August 4th, 2003).
- Y.-I. Moon, B. Rajagopalan, and U. Lall, Phys. Rev. E 52, 2318 (1995). [ISI] [MEDLINE]
- R. J. May, G. C. Dandy, H. R. Maier, and T. M. K. G. Fernando , “Critical values of a kernel density-based mutual information estimator,” in International Joint Conference on Neural Networks (IEEE, Vancouver, BC, 2006).
- D. J. Albers and G. Hripcsak , Estimation of time-delayed mutual information from sparsely sampled sources, e-print arXiv:1110.1615, 2011.
- R. L. Wheeden and A. Zygmund , “Measure and integral,” in Monographs and Textbooks in Pure and Applied Mathematics (Marcel Dekker, Inc., New York, 1977), Vol. 43.
- G. P. Basharin, Theor. Probab. Appl. 4, 333 (1959)TPRBAU000004000003000333000001.
- M. S. Roulston, Physica D 125, 285 (1999). [Inspec] [ISI]
- J. Graxzyk and G. Światek, Ann. Math. 146, 1 (1997). [ISI]
- M. Jakobson, Commun. Math. Phys. 81, 39 (1981). [ISI]
- D. J. Albers and G. Hripcsak, Phys. Lett. A 374, 1159 (2010).
- D. J. Albers and G. Hripcsak , Using population scale EHR data to understand and test human physiological dynamics, e-print arXiv:1110.3317, 2011.
- It may seem odd to normalize indices, but this just keeps the domain of
between zero and one.
- To see the variation in the PDF estimates due to small sample sizes, observe the PDF estimates for different sets of uniform random numbers with small cardinality.
- Note, the L1 difference is not technically a distance function or a metric because it does not satisfy the triangle inequality.
Figures (click on thumbnails to view enlargements)
(average PDF) and
(PDF of the aggregate) for a collection of three collections of Gaussian random numbers whose distributions have means 0, 2, and 4 respectively.
FIG.1 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
(δt)) is greater than bias.
FIG.2 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
FIG.3 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
FIG.4 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
FIG.5 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
and
with δt bins of 6 h for a period of a few days for D7 and D8; note that the bias estimates can be found in Tables 8 , 7. With respect to (a), note the following: for δt ≤ 6 h, δI > 0 and for δt > 6 h, δI ≈ 0; the KDE and histogram estimates are extremely similar; the diurnal (daily) periodic variation in correlation of glucose is clearly evident in both
and
. With respect to (b), note the following: for all δt δI is consistent and likely zero within bias; the KDE and histogram estimates differ greatly, implying the presence of small sample size effects in the average TDMI calculation; the diurnal (daily) periodic variation in correlation of glucose is clearly evident in both
and
in all but the KDE estimated TDMI average.
FIG.6 Download High Resolution Image (.zip file) |
Export Figure to PowerPoint
Tables
View Table
View Table
View Table
View Table
View Table
















This Publication
Scitation
SPIN
Google Scholar
PubMed