My diverse research background shares a common theme of applying computational and data-intensive approaches to accelerate scientific discovery. I have a strong interest in the application of service-oriented architectures for delivering scientific capabilities. This approach enables production capabilities to be delivered to wide scientific audiences without the need to download, install, or operate complex research software. However, this paradigm creates new challenges, for instance, the need to efficiently (in terms of performance and cost) make use of heterogeneous (and potentially large-scale) cyberinfrastructure. To address these needs I am actively investigating autonomic computing methods for automatically provisioning infrastructure and scheduling scientific workloads over dynamically provisioned infrastructure as well as methods to federate resource providers using economic techniques.

Science as a service

Delivering scientific data and software available via internet-accessible services is fast becoming a standard model for disseminating scientific capabilities. As part of Globus project I have contributed to the design and development of a number of scientific services, including Globus Nexus, a scalable platform service that provides identity, profile and group management capabilities; Globus Catalog, a flexible data cataloging service for referencing, managing, describing, and querying large amounts of distributed data; Globus data publication, which supports user-oriented publication of large scientific datasets and self-service management of collections; and the Globus Galaxies platform, which provides scalable cloud-based data management and analytics capabilities as a platform.

  • K. Chard, S. Tuecke, and I. Foster, "Efficient and secure transfer, synchronization, and sharing of big data," IEEE Cloud Computing, vol. 1, no. 3, pp. 46-55, 2014.
  • K. Chard, M. Lidman, B. McCollam, J. Bryan, R. Ananthakrishnan, S. Tuecke, and I. Foster, "Globus Nexus: A platform-as-a-service provider of research identity, profile, and group management," Future Generation Computer Systems, vol. PP, 2015.
  • K. Chard, J. Pruyne, B. Blaiszik, R. Ananthakrishnan, S. Tuecke, and I. Foster, "Globus data publication as a service: Lowering barriers to reproducible science," in Proceedings of the 11th IEEE International Conference on e-Science (e-Science), 2015, pp. 401-410.
  • R. Madduri, K. Chard, R. Chard, L. Lacinski, A. Rodriguez, D. Sulakhe, D. Kelly, U. Dave, and I. Foster, "The Globus Galaxies platform: delivering science gateways as a service," Concurrency and Computation: Practice and Experience, vol. 27, no. 16, pp. 4344-4360, 2015.

Autonomic computing and cost-aware cloud provisioning

There are significant technical challenges associated with scaling analyses efficiently and cost-effectively across on-demand cyberinfrastructure. We are researching autonomic computing approaches for provisioning cloud infrastructure and scheduling workloads. Our approach leverages three core components: a provisioning service, a profiling service, and several cost-aware scheduling algorithms. Our stand-alone autonomic cloud provisioning service is designed to dynamically provision cloud infrastructure for executing high throughput computing workloads (e.g., using HTCondor and Apache Spark). The service makes decisions based on projected execution time and cost, real-time economic information, and current and projected workload. Our profiling service is designed to automate the creation of tool profiles, concise descriptions of the performance and CPU, memory, network, and disk requirements of a supplied tool under different cloud environments and scenarios. Building upon this provisioning model we have developed several cost-aware and deadline constrained scheduling algorithms to efficiently allocate workload over provisioned instances We have deployed these approaches within the context of the Globus Galaxies project (the platform that underlies Globus Genomics) and have observed significant cost reductions and performance improvements.

  • R. Chard, K. Chard, K. Bubendorfer, L. Lacinski, R. Madduri, and I. Foster, "Cost-aware cloud provisioning," in Proceedings of the 11th IEEE International Conference on e-Science (e-Science), 2015, pp. 136-144.
  • R. Madduri, K. Chard, R. Chard, L. Lacinski, A. Rodriguez, D. Sulakhe, D. Kelly, U. Dave, and I. Foster, "The Globus Galaxies platform: delivering science gateways as a service," Concurrency and Computation: Practice and Experience, vol. 27, no. 16, pp. 4344-4360, 2015.
  • V. Arabnejad, K. Bubendorfer, B. Ng, and K. Chard, "A deadline constrained critical path heuristic for cost-effectively scheduling workflows," Accepted to the 8th IEEE/ACM International Conference on Utility and Cloud Computing, 2015.

Federated and economic resource allocation

Scientists are faced with an increasingly diverse range of options with respect to cyberinfrastructure. However, the task of federating (and even comparing) these providers is challenging. My Ph.D. dissertation focused on the application of economic principles and algorithms to federate cloud providers and other large-scale computing systems (e.g., grids and clusters). While many researchers have proposed approaches for federating cloud providers, in all cases two important questions arise: who should own and operate management infrastructure? and how can allocations be performed securely? To address these questions, we proposed a unique co-operative model (analogous to a grocery "co-op") in which service providers collectively manage the federation by hosting various management services. To alleviate the potential for subversion, we applied cryptographic secure auctions to enable economic allocations to be conducted on potentially untrusted hosts. To asses these approaches we developed the DRIVE meta-scheduler: a service-based economic meta-scheduler that creates an open market over a federated pool of disparate providers. We have also developed a series of high performance resource utilization strategies that are designed to overcome the performance limitations of using economic allocation techniques.

  • K. Chard and K. Bubendorfer, High Performance Resource Allocation Strategies for Computational Economies. IEEE Transactions on Parallel and Distributed Computing.24(1), 72-84, 2013.
  • K. Chard, K. Bubendorfer, and P. Komisarczuk, High Occupancy Resource Allocation for Grid and Cloud Systems, a study with DRIVE, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC), Chicago, USA, 2010.

Social cloud computing

The edges of a social network digitally encode a level of "trust" between individuals based on the real-world relationships that exists outside of the network. We have developed models for exploiting these relationships as a foundation for incentivized and trustworthy resource sharing, a model we have termed "social cloud computing." A social cloud leverages a cloud-like model for exposing access to resources, relying heavily on the principles of virtualization (and sandboxing) to share capabilities (securely) between users. Towards this vision, we have developed and evaluated a social compute cloud, social storage cloud, and social content delivery network. Our work has been featured in the IEEE Spectrum Magazine and the Cloud Times.

  • K. Chard, K. Bubendorfer, S. Caton, and O. Rana, Social cloud computing: A vision for socially motivated resource sharing. IEEE Transactions on Services Computing. 5(4), 551-563. 2012.
  • K. Chard, S. Caton, O. Rana, and K. Bubendorfer, Social Cloud: Cloud Computing in Social Networks, in Proceedings of the 3rd IEEE International Conference on Cloud Computing (CLOUD), Miami, USA, 2010.

Information extraction and analytics

There is a wealth of valuable information locked within unstructured scientific data (e.g., headers, figures, network graphs, etc.). If harnessed, this information can offer new scientific insights. Towards this goal, I have investigated methods for extracting knowledge from unstructured medical notes and publications as well as approaches for representing and analyzing institutional knowledge.

Smntx is a distributed service based architecture designed to improve accessibility, scalability and flexibility of medical NLP applications. Rather than directly providing NLP capabilities, Smntx leverages existing NLP engines and coordinates distributed access to these tools. Smntx stores and indexes coded results such that data mining and analysis can be performed in real time through a lightweight REST API.

ChiDB is a structured database of materials properties mined from published literature. To atuomatically populate this database we have developed methods to extract information (e.g., scientific facts, methodologies, and discoveries) from published literature. This is a challenging task as information is represented in unstructured formats (e.g., free-text, equations, figures, and tables). Our approach automatically extracts information from publications, exploits expert crowds to review extractions, and makes curated values available via a web service. Our approach leverages machine learning approaches to classify and rank publications and the items (e.g., figures, tables) within them.

  • K. Chard, M. Russell, Y. Lussier, E. Mendonca, and J. Silverstein, "Scalability and cost of a cloud-based approach to medical NLP," in Proceedings of the 24th International Symposium on Computer-Based Medical Systems (CBMS), 2011, pp. 1-6.
  • K. Chard, M. Russell, Y. Lussier, E. Mendonca, and J. Silverstein, "A cloud-based approach to medical NLP," in Proceedings of the AMIA Annual Symposium, 2011, pp. 207-216.