I'm Tim Kaler.

I'm a PhD student at MIT in the EECS department and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). I'm currently working within the Supertech research group studying methods for engineering high performance multicore algorithms that have nice properties - like serial semantics.

My email address is: tfk @ mit.edu. Feel free to email me as long as you're human (bots, be honest!).

Stuff

I've worked as a freelancer and contributor within the vBulletin community designing modifications and new applications for internet forums.

If you have played Monopoly and are also interested in learning a bit about the way money influences politics then you may like the infographic Monopoly: Congressional Edition which visualizes the money spent by various industries/groups for lobbying.

Do you like chess, lasers, and computers named Joshua? Then you might enjoy watching some game playing programs in action on the Leiserchess Scrimmage Server. Students in MIT's performance engineering course (6.172) have used the scrimmage server to challenge other team's game playing programs to friendly games of Leiserchess as a part of the course's final project.

Papers

Cilkstat: A Low-Overhead Sampling-Based Scalability Profiler

*Tim Kaler, Rumen Hristov, John Holliman*

*Submitted to IPDPS'16*

Introduces Cilkstat --- a sampling-based scalability profiler for fork-join parallel programs written using the Cilk Plus linguistic extension to C and C++. Existing scalability profilers such as Cilkview and Cilkprof determine the parallelism of Cilk programs through either binary instrumentation or dynamic link-time instrumentation. These methods are highly accurate, but can incur a significant overhead which is unacceptable in production systems or long-running benchmarks. An alternative to instrumenting the binary is to use a sampling-based profiler. Existing sampling profilers are presently used to determine hotspots in serial programs. These profilers work by periodically interrupting a program to collect a sample containing a snapshot of the callstack at the point of interruption. Sampling profilers have a number of practical advantages: 1) their overhead is typically less than 5 percent allowing them to run continuously in live systems to detect rare performance bugs; 2) their low-overheads have minimal impact on the performance characteristics of the profiled program, and; 3) the instrumentation (and its associated overheads) can often be toggled dynamically without recompiling or restarting the program. We compared Cilkstat to the two existing scalability analyzers Cilkview and Cilkprof. Over a set of 21 application benchmarks (geometric mean), Cilkview incurred an instrumentation overhead of 4.64, Cilkprof incurred an overhead of 1.51, and Cilkstat incurred an instrumentation overhead of just 1.03. Cilkstat is able to achieve very low overheads while still producing meaningful parallelism estimates. On a set of 21 benchmarks, Cilkstat produced parallelism estimates that were on average within a factor of 1.73 of Cilkprof's estimates. For comparison, Cilkview and Cilkprof which measure instructions and cycle counts respectively produce parallelism estimates that differ by a factor of 1.52 on average --- and have significantly higher profiling overheads.

Polylogarithmic Fully Retroactive Priority Queues

*Erik D. Demaine, Tim Kaler, Quanquan Liu, Aaron Sidford, Adam Yedidia*

*Proceedings of the 14th Algorithms and Data Structures Symposium (WADS), 2015*

Since the introduction of retroactive data structures at SODA 2004 [1], a major open question has been the difference between partial retroactivity (where updates can be made in the past) and full retroactivity (where queries can also be made in the past). In particular, for priority queues, partial retroactivity is possible in O(log m) time per operation on a m-operation timeline, but the best fully retroactive priority queue costs O(\sqrt{m} log m) time per operation. We address both of these eleven-year-old open problems by giving a general logarithmic-overhead transformation from partial to full retroactivity, called "hierarchical checkpointing," provided that the given data structure is "time-fusible" (multiple structures with disjoint timespans can be fused into a timeline supporting queries of the present). As an application, we construct a fully retroactive priority queue which can insert an element, delete the minimum element, and find the minimum element, at any point in time, in O(log^2 m) amortized time per update and O(log^3 m log log m) time per query, using O(m log m) space. Our data structure also supports the operation of determining the time that an element is deleted in O(log^2 m) time

[1] Demaine, E.D., Iacono, J., Langerman, S.: Retroactive data structures. ACM Transactions on Algorithms 3 (2007) 13 Originally in SODA 2003.

Executing Dynamic Data-Graph Computations Deterministically Using Chromatic Scheduling

*Tim Kaler, William Hasenplaugh, Tao B. Schardl, Charles E. Leiserson*

*26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014*

Introduces Prism, a chromatic-scheduling algorithm
for dynamic data-graph computations. Prism vertex-colors the graph
so that vertices of the same color can be updated without creating
data races. During execution, Prism uses a multibag data
structure to maintain sets of active vertices of the same color. We
prove theoretical bounds on the work and span of Prism and
compare it empirically to existing nondeterministic lock-based schedulers
to demonstrate that it performs well both in theory and practice.
We then describe an extension to PRISM called PRISM-R that executes dynamic
data-graph computations deterministically even when updates modify global
variables with associative operations. PRISM-R satisfies the same theoretical
bounds as PRISM, but its implementation is more involved, incorporating a
multivector data structure to maintain an ordered set of vertices partitioned
by color. (Code,
Input)

Ordering Heuristics For Parallel Graph Coloring

*William Hasenplaugh, Tim Kaler, Tao B. Schardl, Charles E. Leiserson*

*26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014*

Introduces the largest-log-degree-first (LLF) and smallest-log-degree-last
(SLL) ordering heuristics for parallel greedy graph-coloring algorithms, which
are inspired by the largest-degree-first (LF) and smallest-degree-last (SL)
serial heuristics, respectively. We show that the SLL and LLF heuristics are
less vulnerable to adversarial inputs that limit parallelism while still
maintaining coloring qualities that are of similar quality, in practice, to
those produced by SL and LF. The paper also extends prior analysis of the JP
message passing coloring algorithm to handle arbitrary degree graphs and
develops an efficient multicore implementation that, on 12 cores, achieves a
geometric mean speedup of 6.99 with LLF and 5.37 with SLL.

Cache Efficient Bloom Filters

*Tim Kaler*

Describes dynamic blocked bloom filters -- a cache efficient bloom filter
based on a composition of two existing bloom filter variants: scalable bloom
filters and blocked bloom filters. The paper also serves as a survey of these
two bloom filter variants and provides an empirical comparison of their space
usage and false positive rates with the standard bloom filter.

Thesis Proposal - Optimizing Deterministic Parallel Graph Computation

*Tim Kaler, Charles Leiserson*

An MEng thesis proposal describing preliminary work improving Graphlab, a parallel framework
for machine learning, by making it deterministic and 2x faster for sufficiently large
benchmarks.

Spatial Data Structures - Performance Comparison

*Tim Kaler, Oscar Moll*

An informal survey and performance comparison of simple spatial datastructures
supporting 2-D range queries. It also proposes a simple criteria for choosing
the best data structure to perform a given rectangular query which could be
used by a query optimizer.

Code in the Air: Simplifying Sensing on Smartphones

*Tim Kaler, John Patrick Lynch, Timothy Peng, Lenin Ravindranath,
Arvind Thiagarajan, Hari Balakrishnan, and Sam Madden*

Demo Abstract, Sensys 2010, November 2010