William Wordsworth may have “wandered lonely as a cloud” in his much-loved poem Daffodils, but he would be hard pressed to find a lonely cloud today, even in a more specialized area such as pharmaceutical R&D. A simple Google search for cloud and pharma throws up 104M hits, while a more refined search for cloud computing in pharmaceutical R&D gives a more modest 2.9M hits.
This is clearly an active area, and already the purported benefits (e.g. speed up research, cut costs) seem to be outweighing the perceived weaknesses (e.g. security concerns about protecting intellectual property and compliance) as pharma companies move many of their computing efforts from self-maintained on-premise servers to commercial cloud platforms and publish compelling case studies.
As a computational chemist or a data scientist/cheminformatician used to searching databases and applying computational analysis, visualization, prediction, and SAR techniques, how much do you really need (or want) to care about where and how your tools are housed, maintained, and accessed? Well – if you expect the tools to be available 24x7, always running the latest version in a high performance and secure environment, and with dedicated support, then perhaps a better understanding of the cloud might help.
The aim of this white paper is to help you better understand how the cloud is applied in pharma R&D, including the terminology used, the various server arrangements that are available, and the pros and cons of each.
Why move to the cloud?
The initial justification for moving to the cloud was to reduce IT spending on technology and infrastructure. This rationale is still valid but is now augmented with increased business value through enhanced innovation, improved analytics, scalability, automation, and resilience. McKinsey lists specific high-level corporate benefits:
- Cost- and space-efficiency
- Speed of set up and operation
- Unlimited storage and scalability
- Security, reliability, backup, and recovery
- Reduced administrative burdens
- Optimal hardware and software investments
- Environmental friendliness
- Flexible work practices
- Increased automation
- Data control
Real world examples of this increased business value include two Oracle-cited cloud-based pharma success stories: (1) a rapidly accelerated synthetic vaccine development project shrunk from ninety to 5 days; and (2) cutting the time taken to virtually screen 1 million compounds from 24 hours in-house to 7 minutes 25 seconds in the cloud.
Cloud for a Computational Chemist
The high-level Cloud-derived benefits listed above can apply across the majority of functions in an organization; and they can certainly give you a sense of confidence that your company or institution is taking advantage of the latest and most beneficial technological advances: but as an individual practitioner of computational chemistry or cheminformatics techniques, how does the cloud help you answer your pressing day-to-day R&D questions faster, and with greater certainty and accuracy?
Whether it’s designing, creating, and filtering a large-scale virtual compound library for high throughput virtual screening, or predicting novel pharmacologically important properties across a huge collection of potential lead compounds for SAR analysis, you will be most effective and successful if you can be confident that your applications are:
- Available 24x7, with minimal downtime.
- Up to date, always running the most recent release, with all upgrades, point releases, patches, and customizations applied behind the scenes.
- Scalable and high performance, always running rapidly with predictable speeds, irrespective of the volume, velocity, and variability of the incoming data streams, and the number of simultaneous users.
- Secure, with robust and appropriate access control levels to protect valuable intellectual property and to ensure that users only have appropriate access rights to data to which they are entitled.
- Supported, with available, knowledgeable, dedicated support staff on hand to offer advice and help solve issues.
All of these features and characteristics are readily available when data and applications are hosted and operational in a robust cloud environment: on the other hand, in-house/on-premises corporate and academic IT infrastructures and support staff will be hard pressed to consistently match these levels of operational excellence.
One of the major factors that slowed the uptake of cloud deployments in the biopharma sector was security. Organizations were very concerned about protecting their intellectual property – often their chemical registry files, aka “the corporate crown jewels'' – on servers outside the corporate firewall and possibly in non-approved geographic locations. That concern has largely been mitigated through stringent security controls and hybrid mixes of public and private cloud systems, but other potential disadvantages have been identified and need to be addressed:
- Technical issues
- Need for an always-on internet connection
- Vendor dependence and lock-in
- Lack of Control
- Bandwidth limitations
- Lack of portability of legacy applications
The sections below on server and tenancy arrangements discuss how many of these concerns can and are being addressed.
Cloud issues for a Computational Chemist
If an organisation has successfully addressed most of the high-level concerns listed above, what else might a computational chemist have to worry about? The last item – lack of portability of legacy applications – may be an issue if the applications are not well documented, or the original developers (and their deep understanding) have departed. Depending on the migration path taken, the application may take a long time to be moved to the cloud, and its performance and scalability may be sub-optimal.
There are three main ways to migrate custom legacy apps to the cloud:
- Lift and Shift
This is the quickest and easiest route as it requires no code or architecture changes. The app is simply rehosted “as is” on cloud-based infrastructure. But this speed and simplicity has downsides, as the legacy app may have scalability and manageability issues in the new environment, and it won’t be able to take advantage of newer computing features such as APIs and microservices.
This is an incremental approach beyond Lift and Shift and involves making modest changes to the application architecture such as adjusting the way the app interacts with its database to take advantage of cloud services, while leaving client-side unaltered. This approach provides more benefits than Lift and Shift but takes more time and expense.
This is the most complete approach and requires rearchitecting the legacy app to take advantage of cloud technologies. It necessitates significant code changes so is the most costly and lengthy option, but the result is a modernised app which will be fully scalable and supportable going forward.
Another issue, possibly related to migration, might be that site- or individual-specific customizations may not be readily available in the cloud instance of the app.
This is a very brief primer to help understand the terminology used when discussing the cloud.
- On Premises
On-premises is the typical pre-cloud set up: software is installed, runs, and is maintained on computers on the premises and within the firewall of the person or organization using the software
- The Cloud
Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user.
- Private Cloud
This is typically hosted on an organization’s own data centre and is only available to them. It has the on-premises disadvantages of high start up costs, ongoing maintenance, and uncertain capacity.
- Public Cloud (now outdated)
The cloud provider manages the infrastructure, and multiple customers share the provider’s hardware, paying on a per-use basis. One disadvantage is that this set up often lacks network isolation between the various customers.
- Virtual Private Cloud
This is a service from a public cloud provider that creates a private-cloud-like environment on public cloud infrastructure. This has the same flexibility as a public cloud, but with the added advantage of enhanced security. This approach combines the public cloud’s resource availability, scalability, flexibility, and cost-effectiveness with the security and control of a private cloud and is typically less expensive to build and simpler to manage than an on-premises private cloud.
- Hybrid Cloud
This approach merges a private cloud and a virtual private cloud into a single, flexible infrastructure, and organizations can choose the optimal cloud environment for each application or workload. This is a common deployment model in large organizations.
Server Arrangements and Terminology – Pizza Analogy
This is a brief primer to decode the various *aaS acronyms, and to explain the degrees of client involvement with a cloud provider, using the so-called Pizza Analogy developed by Albert Barron in 2014 to illustrate who is responsible for what and how much control they have.
- Traditional On Premises
This is like making a pizza from scratch and eating it at home: you are responsible for everything and have complete control.
- IaaS – Infrastructure as a Service
This corresponds to “take and bake” where a vendor provides the pizza ingredients, and you do the cooking and eating with your own equipment (oven, table, chairs).
- PaaS - Platform as a Service
This is analogous to pizza delivery. The vendor provides a cooked pizza for you to eat at home.
- SaaS – Software as a Service
This is eating out at a restaurant - everything is provided from oven to chairs to napkins.
Tenancy Arrangements and Terminology – Housing Analogy
In cloud-speak, a tenant is a core part of any SaaS application, and consists of a logical grouping of users, data, and permissions, typically a company or organization.
There are two cloud tenancy arrangements:
- Single tenant: tenants are siloed and isolated via dedicated infrastructure
- Multi-tenant: tenants use pooled infrastructure resources and are isolated via policies and controls
Each approach has its own strengths and weaknesses, and these need to be balanced against business objectives and drivers:
Potential benefits of single tenant include:
- Security: A single customer and a single server are often contained on secure hardware with isolated data and used by a limited number of people.
- Dependability: With an entire environment dedicated to one client, resources are abundant and available anytime.
- Customizability: Control over the entire environment allows for customization and added functionality.
Potential drawbacks of single tenant:
- Maintenance: Single tenant means more tasks and regular maintenance to keep things running smoothly and efficiently.
- Setup/Management: Single tenant environments can be time consuming to set up and manage.
- Cost: Single tenant can provide more resources, but at a premium price to pay for the entire environment.
Potential benefits of multi-tenant:
- Affordability: Multiple customers share the cost for the environment, and those savings (from the SaaS vendor) are typically transferred to the cost of the software.
- Standard Integrations: limited customizability forces standard integration patterns, allowing more in the long run.
- Simpler Updates/Changes: Changes, updates and maintenance are rolled out to all tenants at once, reducing operational costs
- Rapid Provisioning and Deployment: Scale up and onboarding are quicker and simpler.
Potential drawbacks of multi-tenant:
- Limited Customizability: Custom changes to the database aren’t typically an option.
- Security: Third parties are allowed on the same database, and this broader access reduces control of security.
- Slower Updates/Changes: If another integrated SaaS product updates their system, there may be issues with connecting apps.
Conclusion: I’m a Computational Chemist - How Should I Cloud?
In the same way that successful drug design seeks to balance often competing molecular properties (e.g. activity vs. specificity, dosage vs. toxicity ; structure vs. synthesizability), selecting the optimal cloud environment and tenancy arrangement requires reaching an acceptable consensus among the myriad benefits and concerns outlined above. Some of these parameters may be of more concern at a corporate level (e.g. infrastructure and staff costs, security), while others will be more crucial for computational chemists (e.g. performance, scalability, customizability).
If time, infrastructure, and staff expenses are not major concerns, and complete control and customizability are of paramount importance, then a single tenant private cloud would be the choice.
If your organization has the capacity to invest in infrastructure and staff, and if maintaining complete control and customizability is of paramount importance, then a single tenant private cloud would be an optimal choice
If your organization values rapid deployment and efficient budget management, and is open to trading off some degree of control and customization for these benefits, then a multi-tenant virtual private cloud could serve as an excellent solution
These two options might be viewed as two ends of a spectrum of possible arrangements, each with its own various degrees of expense, control, security, flexibility, and customizability. In practice, many larger pharma organizations are choosing the hybrid cloud approach as an intermediate position. This merged combination of a private cloud and a virtual private cloud can provide the optimal cloud setting for particular applications, giving each sufficient compute power to deal with the anticipated data volume, and the flexibility and scalability to handle unexpected increases.
For more information on Chemaxon’s approach to the cloud, please get in touch with us through the link below.