I am a Professor in the ECE department of The University of Texas at Austin. I received a PhD in EECS from The Massachusetts Institute of Technology, in the Laboratory for Information and Decision Systems (LIDS), and an AB in Mathematics from Harvard University. I received the NSF CAREER award in 2011.
My current research interests focus on decision-making in large-scale complex systems, with a focus on learning and computation. Specifically, I am interested in robust and adaptable optimization, high dimensional statistics and machine learning, and applications to large-scale networks, including social networks, wireless networks, transportation networks, and energy networks. I have also worked on applications of machine learning and optimization to computer-aided design.
Hoffmann, Jessica, Matt Jordan, and Constantine Caramanis. “Quarantines as a Targeted Immunization Strategy.” Preprint, 2021.
@article{hoffmann2020quarantines,
title = {Quarantines as a Targeted Immunization Strategy},
author = {Hoffmann, Jessica and Jordan, Matt and Caramanis, Constantine},
journal = {preprint},
year = {2021},
group = {misc},
arxiv = {https://arxiv.org/pdf/2008.08262.pdf}
}
In the context of the recent COVID-19 outbreak, quarantine has been used to "flatten the curve" and slow the spread of the disease. In this paper, we show that this is not the only benefit of quarantine for the mitigation of an SIR epidemic spreading on a graph. Indeed, human contact networks exhibit a powerlaw structure, which means immunizing nodes at random is extremely ineffective at slowing the epidemic, while immunizing high-degree nodes can efficiently guarantee herd immunity. We theoretically prove that if quarantines are declared at the right moment, high-degree nodes are disproportionately in the Removed state, which is a form of targeted immunization. Even if quarantines are declared too early, subsequent waves of infection spread slower than the first waves. This leads us to propose an opening and closing strategy aiming at immunizing the graph while infecting the minimum number of individuals, guaranteeing the population is now robust to future infections. To the best of our knowledge, this is the only strategy that guarantees herd immunity without requiring vaccines. We extensively verify our results on simulated and real-life networks.
Katiyar, Ashish, Soumya Basu, Vatsal Shah, and Constantine Caramanis. “Robust Estimation of Tree Structured Markov Random Fields.” Preprint, 2021.
@article{katiyarRobustMRF2021,
title = {Robust Estimation of Tree Structured Markov Random Fields},
author = {Katiyar, Ashish and Basu, Soumya and Shah, Vatsal and Caramanis, Constantine},
journal = {preprint},
year = {2021},
group = {misc},
arxiv = {https://arxiv.org/pdf/2102.08554.pdf}
}
We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by unknown noise. As the presence of noise in the observations obfuscates the original tree structure, the extent of recoverability of the tree-structured MRFs under noisy observations is brought into question.
We show that in a general noise model, the underlying tree structure can be recovered only up to an equivalence class where each of the leaf nodes is indistinguishable from its parent and siblings, forming a leaf cluster. As the indistinguishability arises due to contrived noise models, we study the natural k-ary symmetric channel noise model where the value of each node is changed to a uniform value in the support with an unequal and unknown probability. Here, the answer becomes much more nuanced. We show that with a support size of 2, and the binary symmetric channel noise model, the leaf clusters remain indistinguishable. From support size 3 and up, the recoverability of a leaf cluster is dictated by the joint probability mass function of the nodes within it. We provide a precise characterization of recoverability by deriving a necessary and sufficient condition for the recoverability of a leaf cluster. We provide an algorithm that recovers the tree if this condition is satisfied, and recovers the tree up to the leaf clusters failing this condition.
Zhuo, Jiacheng, Jeongyeol Kwon, Nhat Ho, and Constantine Caramanis. “On the Computational and Statistical Complexity of over-Parameterized Matrix Sensing.” Preprint, 2021.
@article{zhuoSlowMatrix2021,
title = {On the computational and statistical complexity of over-parameterized matrix sensing},
author = {Zhuo, Jiacheng and Kwon, Jeongyeol and Ho, Nhat and Caramanis, Constantine},
journal = {preprint},
year = {2021},
group = {misc},
arxiv = {https://arxiv.org/pdf/2102.02756.pdf}
}
We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing.
If the ground truth signal X^* ∈\mathbbR^d*d is of rank r, but we try to recover it using FF^⊤where F ∈\mathbbR^d*k and k>r, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix \fitMat into separate column spaces to capture the effect of extra ranks, we show that ||F_t F_t - F||_F^2 converges to a statistical error of \tilde\mathcalO \parenthk d σ^2/n after \tilde\mathcalO(\frac\sigma_rσ\sqrt\fracnd) number of iterations where \fitMat_t is the output of FGD after t iterations, σ^2 is the variance of the observation noise, \sigma_r is the r-th largest eigenvalue of \trueMat, and n is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.
Kwon, Jeongyeol, Yonathan Effroni, Constantine Caramanis, and Shie Mannor. “RL for Latent MDPs: Regret Guarantees and a Lower Bound.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{kwon2021rl,
title = {RL for Latent MDPs: Regret Guarantees and a Lower Bound},
author = {Kwon, Jeongyeol and Effroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2102.00321.pdf}
}
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least Ω((SA)M) episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, \it i.e., providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., Boots et al.) and a reachability assumption, we show that the need for initialization can be removed.
———. “Reinforcement Learning in Reward-Mixing MDPs.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{kwon2021rlb,
title = {Reinforcement Learning in Reward-Mixing MDPs},
author = {Kwon, Jeongyeol and Effroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {}
}
ILearning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP). There, a reward function is drawn from one of multiple possible reward models at the beginning of every episode, but the identity of the chosen reward model is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality. Indeed, with no further assumptions, even for two switching reward-models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. We provide the first polynomial-time algorithm that finds an ε-optimal policy after exploring \tildeO(poly(H,ε^-1) ⋅S^2 A^2) episodes, where H is time-horizon and S, A are the number of states and actions respectively. This is the first efficient algorithm that does not require any assumptions in partially observed environments where the observation space is smaller than the latent state space.
Papadigenopoulos, Orestis, and Constantine Caramanis. “Recurrent Submodular Welfare and Matroid Blocking Bandits.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{papadig2021recurrent,
title = {Recurrent Submodular Welfare and Matroid Blocking Bandits},
author = {Papadigenopoulos, Orestis and Caramanis, Constantine},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2102.00321.pdf}
}
A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player’s actions and the reward distributions of the arms (Kleinberg and Immorlica [FOCS18], Basu et al. [NIPS19]). As opposed to the standard MAB setting, where the optimal solution in hindsight can be trivially characterized, these correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns – a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a fixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, yields a (1−1/e)-approximation for partition matroids, yet it only guarantees a 1/2-approximation for general matroids. In this paper we develop new algorithmic ideas that allow us to obtain a polynomial-time (1−1/e)-approximation algorithm (asymptotically and in expectation) for any matroid, and thus allow us to control the (1−1/e)-approximate regret. A key ingredient is the technique of correlated (interleaved) scheduling. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds.
Kwon, Jeongyeol, Nhat Ho, and Constantine Caramanis. “On the Minimax Optimality of the Em Algorithm for Learning Two-Component Mixed Linear Regression.” In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2021.
@inproceedings{kwon2020minimax,
title = {On the minimax optimality of the em algorithm for learning two-component mixed linear regression},
author = {Kwon, Jeongyeol and Ho, Nhat and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2006.02601.pdf}
}
We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long-standing question that many recent results have attempted to tackle: we completely characterize the convergence behavior of EM, and show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes. In particular, when the SNR is sufficiently large, the EM updates converge to the true parameter θ^* at the standard parametric convergence rate \calo((d/n)^1/2) after \calo(\log(n/d)) iterations. In the regime where the SNR is above \calo((d/n)^1/4) and below some constant, the EM iterates converge to a \calo(\rm SNR^-1 (d/n)^1/2) neighborhood of the true parameter, when the number of iterations is of the order \calo(\rm SNR^-2 \log(n/d)). In the low SNR regime where the SNR is below \calo((d/n)^1/4), we show that EM converges to a \calo((d/n)^1/4) neighborhood of the true parameters, after \calo((n/d)^1/2) iterations. Notably, these results are achieved under mild conditions of either random initialization or an efficiently computable local initialization. By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, we fill the remaining gap in the literature, and significantly, reveal that in low SNR, EM changes rate, matching the n^-1/4 rate of the MLE, a behavior that previous work had been unable to show.
Basu, Soumya, Orestis Papadigenopoulos, Constantine Caramanis, and Sanjay Shakkottai. “Contextual Blocking Bandits.” In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2021.
@inproceedings{basu2020contextual,
title = {Contextual Blocking Bandits},
author = {Basu, Soumya and Papadigenopoulos, Orestis and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2003.03426.pdf}
}
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms’ mean rewards. However, playing an arm blocks it (across all contexts) for a fixed number of future time steps. The above contextual setting captures important scenarios such as recommendation systems or ad placement with diverse users.
This problem has been recently studied \citepDSSX18 in the full-information setting (i.e., assuming knowledge of the mean context-dependent arm rewards), where competitive ratio bounds have been derived.
We focus on the bandit setting, where these means are initially unknown; we propose a UCB-based variant of the full-information algorithm that guarantees a \mathcalO(\log T)-regret w.r.t. an α-optimal strategy in T time steps, matching the Ω(\log(T)) regret lower bound in this setting. Due to the time correlations caused by blocking, existing techniques for upper bounding regret fail. For proving our regret bounds, we introduce the novel concepts of delayed exploitation and opportunistic sub-sampling and combine them with ideas from combinatorial bandits and non-stationary Markov chains coupling.
Atsidakou, Alexia, Orestis Papadigenopoulos, Soumya Basu, Constantine Caramanis, and Sanjay Shakkottai. “Combinatorial Blocking Bandits with Stochastic Delays.” In International Conference on Machine Learning (ICML). PMLR, 2021.
@inproceedings{atsidakou2021combinatorial,
title = {Combinatorial Blocking Bandits with Stochastic Delays},
author = {Atsidakou, Alexia and Papadigenopoulos, Orestis and Basu, Soumya and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2105.10625.pdf}
}
Recent work has considered natural variations of the \em multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of \em blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms’ expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays.
Caramanis, Constantine, Paul Duetting, Matthew Faw, Federico Fusco, Philip Lazo, Stefano Leonardi, Orestis Papadigenopoulos, Emmanouil Pountourakis, and Rebecca Reiffenhauser. “Single Sample Prophet Inequalities via Greedy-Ordered Selection.” Symposium on Discrete Algorithms (SODA), 2021.
@article{caramanis2021single,
title = {Single Sample Prophet Inequalities via Greedy-Ordered Selection},
author = {Caramanis, Constantine and Duetting, Paul and Faw, Matthew and Fusco, Federico and Lazo, Philip and Leonardi, Stefano and Papadigenopoulos, Orestis and Pountourakis, Emmanouil and Reiffenhauser, Rebecca},
booktitle = {Symposium on Discrete Algorithms (SODA)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2103.13089.pdf}
}
We study Single-Sample Prophet Inequalities (SSPIs), i.e., prophet inequalities where only a single sample from each prior distribution is available. Besides a direct, and optimal, SSPI for the basic single choice problem [Rubinstein et al., ITCS’20], most existing SSPI results were obtained via an elegant, but inherently lossy reduction to Order Oblivious Secretary (OOS) policies [Azar et al., SODA’14]. Motivated by this discrepancy, we develop an intuitive and versatile greedy-based technique that yields SSPIs directly rather than through the reduction to OOS. Our results can be seen as generalizing and unifying a number of existing results in the area of prophet and secretary problems. Our algorithms significantly improve on the competitive guarantees for a number of interesting scenarios (including general matching with edge arrivals, bipartite matching with vertex arrivals and certain matroids), as well as capture new settings (e.g., budget additive valuations). Complementing our algorithmic results, we also consider mechanism design variants. Finally, we analyze the power and limitations of different SSPI approaches by providing a partial converse to the reduction from SSPI to OOS given by [Azar et al., SODA’14].