#

Virtual Machines (VM)

Description  

In a typical cluster computing environment, the OS, configuration, etc. is relatively homogeneous across the infrastructure and root/administrator access on each host is limited to system administrators. Allowing different operating systems or giving users privileged access creates a fragmented infrastructure. Often users need the computational benefits of an entire “server” but do not have the funds for or need an entire physical server. Virtual Instances provide an avenue for users looking to utilize different operating systems and configurations than are typically used in the underlying physical infrastructure. They also enable users to own an entire "server" without having to pay the cost of a full machine.  

 

Key Features and Benefits   

Virtual Instances allow administrators to partition a single physical server into multiple, isolated, “virtual” servers each with their own independent kernel, operating system, and virtual devices. This also allows for a robust security isolation that does not occur when users are sharing the same kernel and memory space within the same operating system. The resources available to a given virtual instance can easily be scaled up and down as needed (e.g. CPU, RAM, etc.). Virtual instances can easily be placed on different or multiple networks as required. They can be “live-migrated”, or moved between hypervisors without downtime, which enables administrators to easily perform maintenance on the underlying hardware and its operating system or configuration as necessary.  

Definitions:  

Virtual Instance (abbreviated ‘VM’): an emulation of a computer system at the device level. This includes virtual CPUs (VCPU), memory, network devices, storage, etc.  

Hypervisor/Host: Physical server where virtual instances live. Each virtual instance consumes some portion of the host’s physical resources (i.e. CPU, memory, network, etc.), whereby many virtual instances can be hosted on the same Hypervisor.  This is advantageous when the VMs resources are used in bursts and not consistently.   

Load balanced: Having a service deployed across multiple VMs that allow more than a single instance to share the load (e.g. CPU, network, memory, disk IO) from the end-users.  This protects against overwhelming a single VM or single hypervisor.   

High-availability:  Having a service deployed across multiple servers (and perhaps multiple availability zones) provides fault-tolerance that mitigate different levels of outages or planned maintenance events.  

Availability Zone: Isolated locations where Virtual Instances can be deployed. Availability zones are used to provide better locality to services (i.e. ensure that the compute and storage are in the same data center).  Using multiple availability zones provides a heightened level of fault-tolerance that can help mitigate issues at the data center level.  

Provider:  

  • External – Virtual Instance hosted on publicly available cloud providers (e.g. Amazon Web Services, Microsoft Azure, Google Compute Platform, etc.).  This is beneficial to leverage services not available locally or to burst out to capacities that are unreasonable to manage locally.   
  • Private/On-Premise – Virtual instance hosted on local infrastructure (e.g. OpenStack, OpenNebula, VMWare, etc.). This is beneficial if you have large amounts of data already hosted locally.  

Image: A virtual collection of a kernel, operating system, and configuration.  

Snapshot: Point-in-time copy of a Virtual Instance’s root and/or data disk.  

Block storage/volumes: Block-level storage device that can be dynamically attached to a Virtual Instance. This allows for the scaling of project or research data separate from the OS, tools, and software.  

Software Dependencies: Inside of a virtual instance, there are many software components that make up the stack for a given service.  Each of these can have dependencies on the software below.  For example, at the top-layer may be a WordPress website, which relies on Apache/HTTP service and SSL security for HTTPS, all which relies on the Operating System.  For security measures, if SSL needs to be updated, this may also cause the Apache service to also be updated.  This might break some functionality on WordPress.  Normally the underlying services like the Operating System and core services of the VM are managed by the RC group, so coordination with the researchers needs to be maintained in order to keep the end-user facing service working. 

Security:  Serious considerations should be made with regards to how a virtual instance is secured, especially if the services that are hosts are on a public network.   

  • Public: Any public facing service that allows access to underlying data (storage or databases) should have a certificate with encryption and verified user accounts.  Only the ports that are needed to provide a service should be open, I.e. all other ports should be closed via a network Firewall.   
  • Private:  Any VM on a private network could have more freedom to have services open to other logical networks on-campus.  VMs also allow for a development environment in which root access, and in this case the VM should be restricted from other logical networks via network firewall. 

Reproducibility: VM’s provide the most reproducible environment as it contains the full stack of the Operating System, Libraries, and Services needed to run.  The VM image can be stored long term and reused as a quality assurance / quality control years later. 

Management:  Management of a VM is normally done in partnership with the researcher and It is important that this is established at the beginning so that roles and responsibilities are clearly defined.  Each tier of management represents a different level of time commitment on part of the RC group.  For example, a bare metal VM, would be where the RC group provisions the virtual infrastructure (cores, memory, disk, and network) and the researcher is given a machine that they maintain the OS image, all Libraries and Services, data and content.  This requires the least amount of time for the RC group and the most for the researcher.  A fully managed VM (e.g. for a website) would be one where the RC group provisions the infrastructure, the OS, Libraries, Services all the way to a web hosting platform (e.g. WordPress) and the researcher is responsible for the administration inside of WordPress, the website design, and content.  This requires the most amount of time for the RC group and the least amount of time for the researcher. 

 

Service Expectations and Limits:  

Research virtual instance environments are not intended to be an enterprise service and are generally operated on a more conservative cost basis.  Typically, virtual instances are set up as single instances without the complexity of high-availability, load-balanced, or multi-availability zones. Many factors affect performance of virtual instances including the network, underlying filesystem, and oversubscription of CPUs and memory.   Additionally, end users may not be best positioned to determine the level of resources needed at the time of their request.  Due to the nature of research being conducted, requests regularly require the full amount of resources available.    

Max Total Allowable Size for any Single VM: 8 VCPU Cores/16GB RAM/200GB Disk  

Max Combined RAM and VCPU Per Lab for multiple VMs: 16 VCPU Cores/64GB RAM/200GB Disk  

At FASRC, availability, uptime, and backup schedule is provided as best effort with staff that do not have rotating 24/7/365 shifts.    

 

Available to:  

Available to all users with an FASRC account.  

 

Service manager and Owner:   

Service Manager:  TBD, Associate Director of Systems Software Group  

Service Owner: Scott Yockel, Director of FAS Research Computing  

All VM requests via email to rchelp@rc.fas.harvard.edu   

 

Offerings (Tiers of Service)  

Tier 1:  

  • Administered By: RC Admins  
  • Root Access: No  
  • Network Storage Mounts: Yes  
  • Features: Fully provisioned and monitored by Admins. Admins are also responsible for service availability and proper functioning of the base VM. No root access allowed.  
  • Cost: (TBD) 

Tier 2:  

  • Administered By: Admins + User  
  • Root Access: Limited  
  • Network Storage Mounts: Yes  
  • Features: Admins provision, “base” VM, end-user has limited privileged access (example: sudo access to restart webserver).   
  • Cost: (TBD)  

  Tier 3:   

  • Administered By: User  
  • Root Access: Yes  
  • Network Storage Mounts: No  
  • Features: Admins provision a “bare” VM and hand it over to requester. End-user is responsible for all software installation, configuration, and administration of services.  
  • Cost: (TBD)