|
The preliminary schedule for the Joint Workshop and Summer School can
be consulted here
. Please, take this program only as
preliminary during March and the beginning of April.
You can find a brief list of introductory concepts with links to
useful wikipedia articles here
Models: Specification, complexity and choice (David Hogg)
What is a model? What freedoms does a model have and how can we
capture that? Are qualitatively different models comparable? What is
the difference between a likelihood and a probability for a model or
for model parameters? How do we decide among models that are
qualitatively similar but quantitatively different? How do we decide
among models that are qualitatively different? The most important
content will be conveyed through a lab session in which participants
pair-code solutions to some model selection problems.
Table of contents
- Lecture 0: (to be provided in advance as links or bibliography if needed)
- Lecture 1: Model specification and likelihood formulation
- Lecture 2: Model complexity and choice
- Lecture 3: (pair-coding) Model selection workshop
- Lecture 4: (pair-coding) workshop continued
Knowledge Discovery and Data Mining (Giuseppe Longo)
Feature selection: filter approach, wrapper approach, PCA, Diffusion
Maps. Supervised classification: the curse of dimensionality,
bias-variance trade-off, the kernel trick, support vector machines,
cross-validation, evaluation of classifiers. Unsupervised
classification taxonomy, evaluation measures.
Table of contents:
- Lecture 0: (to be provided in advance as links or bibliography if needed)
- Lecture 1: what is data mining
- Lecture 2: feature selection and dimensionality reduction
- Lecture 3: classification tasks and supervised methods
- Lecture 4: clustering methods
Statistical Image Analysis (Robert Lupton)
The source detection problem, source modelling, catalogue cross
correlations, combination of images...
Table of contents
- Lecture 0 (to be provided in advance as links or bibliography if needed)
- Lecture 1 The Sampling Theorem and Image Resampling
- Lecture 2 Object Detection and Measurement as Statistical Estimation
- Lecture 3 Workshop: object detection and measurement
- Lecture 4 (workshop continued, if needed)
Technical aspects of the analysis of petabyte-size databases (Matthew Graham)
It would take over 33 years to watch a 1 PB MP3 movie yet, within the
decade, data sets of this size will be as everyday a feature of
astronomical life as astro-ph or APOD. This section will cover the
practical aspects of handling petascale (and larger) data sets and
streams including new computational approaches needed to work with
them from an astronomer's perspective.
Table of contents
- Lecture 0 (to be provided in advance as links or bibliography if needed)
- How big is a petabyte?
- Big data sets en route: astronomy, other sciences
- Lecture 1: How to store a petabyte
- What do you store?
- Cost and performance of storage
- Databases: relational vs non-relational, indexing
- Lecture 2: How to work with a petabyte
- Distribution
- Divide and conquer: MapReduce, Hadoop (how to sort 1 PB)
- Putting things together: PIG
- Lecture 3: How to analyze a petabyte
- Random access
- Characterizing data
- Streaming statistics
- Ideas for pair-coding examples (to be discussed with SOC / other lecturers).
- Coding up a simple analysis routine using Hadoop
Time series analysis (Suzanne Aigrain)
This section will cover common tool for exploring and characterising
time-series and ensembles thereof. The first two lectures are devoted
to time- and frequency domain techniques respectively, and cover some
frequently used exploratory . Particular attention will be devoted to
the treatment of stochastic processes and mixtures of stochastic and
periodic processes.
Table of contents
- Lecture 0 (to be provided in advance as links or bibliography if needed)
- stationarity, autocorrelation function, (discrete) Fourier transform, window function
- properties of the Gaussian distribution
- Lecture 1: Time-domain analysis
- autocorrelation techniques
- common time-domain filters
- stochastic processes: ARIMA models, Gaussian processes
- Lecture 2: Frequency analysis
- noise properties in the frequency domain
- periodic signal detection
- time-frequency analysis, wavelet transforms
- Lecture 3: Ensembles of time series
- principal component analysis in the time and frequency domains
- classification and clustering
|