Put your comments on the Comments page!


This represents a requirements document on grid AAA/Security. It is based directly upon Shawn Mullen et al's (2004) document on "Grid Authentication, Authorization and Accounting Requirements" (at https://forge.gridforum.org/projects/saaa-rg/document/Draft_5_of_Requirements_Doc/en/1) and will hopefully form a thread in taking the Mullen et al work forward. Most of the requirement text is from that document. Please see our version with extended notes, showing where we disagree with the Mullen et al document and where we have varied from it at RequirementsDocFull. The following, therefore, represents a 'clean' version (without the explanatory comments).


REAL TITLE: The requirements for an ideal grid

We welcome anyone to make comments on this document. To do so, click this link to the Comments page (but you may wish to see our explanatory notes first). You could open the Comments page in another tab or window and please refer to the text in question. N.B. You will have to click on the Login link above and create yourself a user first (Create Profile). Please do!

This document

The sections of this document are:

  1. Abstract
  2. Grid use models

  3. Site Authentication Requirements

    1. Terminology and definitions
    2. Identity
    3. Assurance
    4. Lifetimes
  4. Site Authorization Requirements

    1. Terminology
    2. Authorization Process
    3. Authorization Attributes
    4. Policies
    5. Transparency
    6. Operations
    7. Authorization for Replicated Data
  5. Site Accounting and Audit Requirements

    1. Introduction
    2. Terminology
    3. Requirements Gathering
    4. Grid Auditable Data
    5. Requirements Gathering for Grid Resource Accounting
    6. Existing Standards and Practices

Abstract

The purpose of this document is to extend the exercise begun by Shawn Mullen and collegues to "collect and codify the requirements of existing grid resource sites with respect to the acceptance of grid credentials for access to their services" (see https://forge.gridforum.org/projects/saaa-rg/document/Draft_5_of_Requirements_Doc/en/1) but to extend this idea to the consideration of devolved authentication. The Mullen et al. document (rightly) assumed the predominance of end entity identification via X.509 user certificates. The aim of this, alternative, document is to take a step back and to imagine the requirements where the use of certificates in this way is not assumed: neither is it precluded.

Eventually, this document could possibly develop into an informational GGF document which grid application and library coders could use as a reference guide. However, more immediately, it may give rise to suggestions for future development work in GGF working groups.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 (Bradner 1997).

Two versions of this document exist. This version is presented 'clean' with no annotations. Please see RequirementsDocFull for an indication of where our text varies from Mullen et al.'s original, and for justifications for those changes.

Grid use models

Introduction

The use models of grids are written about elsewhere by this project. However, for clarity, they are reproduced here in summary form.

  1. Dedicated primary grid resource (e.g. compute cluster, data cluster)
  2. Voluntary secondary resource, actively monitored by resource owner. Resource owner deliberately makes resource un/available and may choose whether or not to run grid jobs on an individual basis.
  3. Voluntary secondary resource operated blindly by resource owner, possibly with dedicated, secure, ring-fenced sandpit within the system that defers to end-user activity.
  4. A no-trust, no-accounting grid (subset of the above). Each node has a secure sandpit and the owner allows anything to go on there. All users are authorised to use it.

Nevertheless, this document only makes recommendations regarding the first two entries in the above list (Dedicated primary grid resource and Voluntary secondary resource, actively monitored by resource owner) as these are the technologies currently in use. Some of the technology required for the latter two entries has yet to be established.

This document also makes mention of the Customer-ServiceProvider (CSP) model of grid use. This is also defined elsewhere but may be summarised as a service provider taking on the responsibility of authenticating the user and other grid nodes trust that service provider (acting as a grid end entity) to make the global authentication and authorization decisions correctly and legally. Therefore the grid user, in a traditional sense, is the service provider - which may run jobs on other grid nodes - and the customer is someone for whom the work is being done.

Therefore the list of users (also documented elsewhere) may be summarised as:

  1. Service End-User (data, SEUD) Typically, no computing expertise. Relies upon the Service Provider (SP) in a Customer-ServiceProvider relationship. The SEUD is agnostic as to whether a grid is being used. The SEUD does not need to be ‘known’ by any grid access management (AM) service (as the grid trusts and accounts the SP not the user). SP may need to authenticate, authorize and account for the user.

  2. Service End-User (executables, SEUX). Some understanding of code creation. Very similar to SEUD, but runs either executable code or scripts via SPs (and as so, is a special subset of the Service End Users that may warrant further security measures).
    • N.B. Where the above two groups can be aggregated, we refer to them as SEUs.
  3. Power user agnostic of grid resource node (PUA). Typically, develops programs and data but does not care where processing takes place.
  4. Power user requiring specific grid resource nodes (PUS). Typically, as PUA but may have more platform etc. dependent expertise and some sysadmin expertise. Some grid node owners may wish to have a relationship with the PUS that involves direct authentication and authorization (and accounting).
  5. Power user developing a service (PUDS). Typically, as PUA/PUS but developing expertise like SP. As for PUS or PUA, but moving into arrangements like SP (see below). May need to begin interacting with and accounting for SEUs in an experimental manner.
  6. ServiceProvider (SP) to SEUs. Typically, as PUA/PUS but has expertise in authorization and possibly identity management. SP may be trusted to provide services only to those supposedly authorised to use the grid. SP may need to identify (authenticate) SEUs but should need to recognise status (for authorization). SP will need strong authentication between it and the primary grid resource. Accounting may be required between the primary grid resources and the SP and between the SP and the SEU (although this latter requirement may not need to be met using grid middleware).

  7. Infrastructure sysadmin (GRID-SYS). Typically, a user carrying out system administration of grid nodes with possibly infrastructure delivery and security expertise. A GRID-SYS may need to authenticate directly to particular grid resource nodes. However, in theory, it is possible that s/he may authenticate elsewhere and the node computer may trust that external authentication point (or identity provider).

(Chapter 1) Site Authentication Requirements

Terminology and definitions

"User secrets" refers to values intended to be known only by the user, known by the user and an authentication infrastructure, or known only to an authentication infrastructure and employed on the user's behalf after the user has authenticated with some other secret(s).

To sidestep such questions as whether "a day" means eight hours or 24 hours and just how long a month is, we will deal in seconds but not quibble over implementation variances at the 10% or 20% level.

Credentials are assumed to have lifetimes which bound their period of validity. "Long-lived" credentials have lifetimes of 1,000,000 seconds (1 megasecond or 1 Ms) or more. "Short-lived" credentials have lifetimes of 100,000 seconds (0.1 Ms) or less. Lifetimes between those limits are "intermediate." The terms long-lived and short-lived may also be applied to the secrets employed by a user to acquire credentials, although the only short-lived user secrets known to be commonly employed are one-time (or "single-use") authenticators.

(Conversions: 0.1 Ms is a bit more than a day; 1 Ms is a bit less than 2 weeks.)

If a credential's lifetime can be extended by the user, using no more proof of identity than the credential itself, this is considered "renewal" of the credential, while if the process of extending the lifetime requires measures equivalent to those employed in its initial acquisition, we consider the result a new credential.

We specifically do not consider "post-dated" credentials -- those with lifetimes that begin at some point later than the time of the authentication act. Neither do we consider the relative strengths of cryptographic protocols, algorithms, and key lengths. We assume they are always designed, selected and implemented appropriately.

Identity

Sites will make authorization decisions on an aggregate basis: on Virtual Organization (VO) membership or group membership. However, at times it may be necessary to set access rights at the granularity of a single user. Sites therefore may need to reserve the right, and preserve the ability, to set authorization at this level. Incident handling requires the ability to identify the legitimate owner of credentials presented during transactions under investigation. However, this may be done in concert with a trusted partner (i.e. the site could accept a pseudonym with good provenance but later need to know who/what is the actual owner of the credentials presented, and could co-operate with that partner to determine the identity of the end entity in question).

Accordingly, every set of authentication credentials should be traceable to the identity of an individual, because this provides stronger security by way of auditability, revocation, and problem determination. However, in special cases there may be occasion to forfeit these benefits in order to provide temporary and generic identities.

For example, an Internet cafe could provide temporary (very limited lifetime) credentials authorizing use of grid resources based solely on the fact that access was purchased. Such an identity may be a pseudonym such as "Customer 24." However, please note that it may be better to achieve a solution where a Customer-ServiceProvider model (see Grid use models above) is used in such a situation.

Where credentials are generated pseudonymously or temporarily by an identity provider (home organisation) or service provider and passed to a grid node or nodes, a service level agreement must be in place between that provider and the grid community (or node(s)) that will ensure the rapid revocation of those credentials when demanded. This may be invoked by a service provider or grid node that detects a security problem associated with those credentials. It may be advantageous that this occurs rapidly and automatically, possibly to be reviewed later by a human administrator.

Other, similar identity indirections are expected:

Secure anonymous communications may still be allowable, and appropriate, for functions that do not require user authentication.

For example, in the case above of cafe access to Grid resources; the user may still require secure conversation because the results of the data derived may have some proprietary value.

Assurance

An authentication system may provide multiple methods for a user to perform their initial authentication, and these methods may differ in their convenience, resistance to attack, and risks of exposure of secrets. Even when an implementation offers its users only one method, it may not be clear to relying parties which method it is. Since some inverse correlation does exist between convenience and strength of authentication, there may be inducements to allow and employ multiple levels of authentication if sites make some class of services available through weaker but less burdensome authentication methods.

A numbered scale of assurance level should exist and a value should be passed from the identity provider (home organisation) to the grid node or nodes for short term credentials, or this value should be kept permanently (implicitly or explicitly) within longer term credentials. This assurance 'grading' needs to be of a value and format recognised by an international standard or (more likely in the shorter term) be based upon a system agreed upon by collaborators within a particular grid.

The system for grading the assurance level of each authentication assertion is beyond the scope of this document. However, this assertion should be made and transmitted. If practicable, the method used to perform authentication should be deducible from credentials, but this requirement is secondary to the requirement of the transmission of assurance level.

Levels of authentication strength

This is be eyond the scope of this document. The concept of "authentication strength" is too close to the concept of "assurance level". The reliability or trustworthiness of an authentication event or authentication token is based equally upon the technology used (and short/long term credentials and encryption etc.) and the policies used to initially authenticate, maintain data, and renew and revoke credentials.

Mode of storage

We recognize the following modes of storage of users' long-term secrets (whether used directly to authenticate to a grid resource or to a proxy), each with its own set of vulnerabilities:

What you know

What you have

Ranking of storage modes

It is not possible to give a strict ranking of storage modes discussed in section 4.3.3 relative to safety without asking and answering a number of questions about the details of the secrets, their storage, and their registration as the users' authentication information. Also, users may perform unsafe actions (knowingly or unknowingly) which place their secrets at much greater risk of disclosure.

Deducible Authn strength

There are a number of cases where processes running on a machine need to authenticate to other processes. Automated processes may have to act as authenticated clients and users may wish to have automatic software ("cron jobs") that require automatic authentication. All of these should be somehow restricted such that theft of credentials from an individual machine does not easily permit their reuse elsewhere. In either case, secrets will be of the "stored" class and must be considered to be stored in cleartext form, regardless of any measures which obfuscate them.

Authenticated identities of automated client processes should include identification of the machine which is intended to have access to the authentication secret.

Authentication methods based on stored secrets should indicate the machine from which they were used. If they do not, then this information must be available in auditable records.

Lifetimes

All forms of digital credential in common use are subject to possible theft and misuse. The probability of such an event is monotonically nondecreasing with time. The countermeasures against eventual credential theft are expiration and revocation. Neither measure alone is sufficient to prevent all misuse, nor is the combination of the two.

Two types of digital credential should be highlighted here to take into account the roles and behaviour of proxy parties. In many use-cases it has been found to be necessary to generate proxy credentials. These may differ in digital nature and in assurance level from the original user digital credential. They are typically shorter lived than the original digital credentials, but there may be exceptions to this generalisation. Thus, we hereby draw out the two concepts of user credential and proxy credential.

The lifetime of authentication secrets is a separate parameter from the lifetime of credentials.

(Chapter 2) Site Authorization Requirements

Terminology

Terminology used in this document strives to be consistent with that used in the Authorization Frameworks working group.

"User" is a synonym for end entity and for subject used in the more general framework document. We preserve the use of "user" since it is more widely used within the site operations community.

"Groups" refer to groups of end entities which are accorded equivalent rights for purposes of obtaining a particular set of privileges.

"Role" refers to the set of attributes an end entity is presenting with a particular request for obtaining or asserting a privilege.

"Provenance" refers to information about the history of a request or of any type of assertion. Examples include: the identity of the original requester; and the identity of the entity that is making the assertion.

Authorization Process

A VO must manage information regarding users that can be used for authorization decisions. This information may be made available to relevant trusted third parties. Thus, a typical authorization process may have several steps (for example, but not necessarily in the following order, user authorization, VO authorization, site authorization, resource authorization) with various implementations. Users and VO managers must be able to rely on consistent interpretation of their policies.

The Virtual Organization must be able to decide user membership policy and allow sites to set user authorization policy. However, it is likely that a degree of co-operation between the VO and the site will be desirable in setting the site's authorization policy in most cases.

The authorization method must be application independent.

Mutual authorization may be required.

An application or end entity may need assurances that the resource is authorized to run a specific job. The distributed program or grid job in and of itself may be of value. The results may be of value and need protection from dubious resources.

A grid job may need to specify that it is only run on systems with security level B operating systems, or systems not directly connected to the Internet, or some other operations requirement. This is more relevant in the OGSA model where service factories may incorporate more resources to handle service request loads.

Maintain Provenance

The authorization mechanism must preserve the Subject Identity of the user who originated the request. Where relevant, this could be via a trusted third party that has already taken part in the authorization mechanism. In this case, the grid ServiceProvider is trusted to have carried out the authorization check on the customer and to be acting on his/her/its behalf.

Provide for method of grouping users

It may be possible to assign a user to a group. The authorization of resource access could be managed by managing permissions of the group.

Authorization Level Dependent on Authentication Strength

The authorization for access to a resource at a particular level may depend on the strength of the authentication. The level of authentication or, more likely, the level of assurance (agreed upon by the grid community, see Section 4.3), must be included with the credential information presented to all resource managers.

Call-outs

Call-outs prior to access to resources may be provided as a form of authorization control for use by the virtual organization, the site(s) and each resource provider.

Revocation

There must be the ability to quickly revoke a particular remote authorized service that may be operated under dubious procedures. The timescale for this revocation should be of order 0.1 Ms.

For example, if a remote processing resource steals computation results, it should be removed from the directory of processing resources. This is difficult in the context of the current Grid technology because of the open resource registration process and aggressive discovery algorithms. Similar such directory services on the Internet have a history of exploitation.

Authorization Attributes

Attribute Authorities In expected grid operations, authorization attributes are managed by authorization authority servers run by VOs, by sites or other authoritative entities. These authorization attributes may contain specific authorization privileges, indicating to sites that they should be authorized to act in a particular role, or may contain statements of membership in a particular group within the VO.

Numbers of Attributes

Users or end entities may have any number of roles within a given Virtual Organization. Whereas VOs may choose to structure themselves and express recommended authorization policy in an arbitrary form, resource providers need appropriate mechanisms to enforce that policy in the local authorization infrastructure. Therefore, the user attributes should be stored in a standard form, and the recommended policy should be expressed from the VO authorization authority server to the site in a manner agreed between the VO and the site.

Users or end entities may be members of any number of Virtual Organizations.

Currency of Membership

Assertions of membership in roles and groups within a VO must be able to be validated by relying parties. Validation of such assertions should not succeed more than 0.1Ms after an authority removes the subject's membership.

Resource Administrators Authorize by Groups and Roles

VO attributes describing the roles and groups must follow a published standard, agreed upon at least within the domain of the VO. This consistency gives the Authorizer or Resource Administrator a manageable and trusted view of the membership pool. The administrator must be able to trust the concurrency of the roles and groups. This removes the need for Authorizer to have an understanding of each member. The Authorizer needs to only understand the groups and roles within this assigned membership pool.

User Selection of roles

A user must be able to select and de-select VOs and roles for a specific access (analogous to the substitute user or 'su' command on UNIX systems, allowing an entity to change the current role briefly for a critical section before returning to a role and access privilege less vulnerable or potentially dangerous.)

In addition, a user should be able to individually define the set of privileges to be used with a specific service request. This allows for least privilege access tailored to the requested service and increases system security.

Policies

Authorization decision criteria

The owner of a resource or data should be able to allow or deny the authorization of an end entity to carry out an action using any of the following criteria:

Precedence rules for applying authorization decision criteria must be clearly stated.

Source of authorization also a decision criterion It may be desirable for a resource manager to be able to disable access based on the source of the authorization attributes presented in case of compromise of a particular remote attribute authority.

Combinations

The authorization method should allow any combination of the above authorization requirements, including any combination of VOs and roles (see requirement 5.3.2). Nevertheless, this is still a business decision to be taken between the resource owner and the VO/Attribute Authority.

Authorization may be based on Operation criteria

It should be possible to base authorization on any of the following, in addition to the authorization requirements of section 5.4.1.

Granularity of Authorization

Depending on the application scenario, the granularity requirement for authorization decisions varies from fine grain (e.g. based on individual subject, requested action, privilege restrictions, and assets involved) to coarser-grained authorization on the basis of groups or even sites. Support for role based access control mechanisms is specifically requested for future collaborative environments but may also be desirable for other grid systems.

Collections

There should be no restrictions on the degree/level of granularity of authorization. In particular, no hard-coded limits to how the granularity is set should exist. This should include, for example, allowing authorization to a hierarchy of directories, individual directories, or individual files. It may become burdensome on the resource to support a high level of granularity, therefore it is left to the resource to set a practical level of granularity collecting objects into manageable sets.

Catalog by user

It must be possible to determine the list of resources to which an end entity has access and what actions that entity is allowed to carry out as a member of the VO(s) and role(s) set for the current session. The burden of creating this list is on the end entity. It is left to the end entity to know or lookup or discover the resource and query for access permissions. This relieves the resource from having to know how to report to the end entities. This also averts a security vulnerability similar to the historical NIS (Network Information Services) hack in which the complete access lists being pushed to slave servers were intercepted and exploited. It is recommended that resources reveal access permissions only to the authenticated entities that hold these permissions and to administrative entities (see also next paragraph).

Catalog by role

It must be possible to determine if a role or group has access to a resource. This access information is necessary to accurately stage and schedule jobs. This access information is sensitive because it could be used to exploit the Grid's security. For example, knowing that Bob has access to the targeted resource, the hackers attention is turned to Bob or his home computer.

Therefore, the following access levels are needed: A resource's access information must be accessible in its complete form to the administrator of that resource and security personnel for security audit and forensic purposes. Authenticated users may have information about all accesses he/she is allowed on that resource using the asserted identity and authorizations. Others must have access to authorization data only in the form

Authorization control points

Authorization Policy Change Control

Policy coherency tools needed

Authorization policies may change over time. Mechanisms to manage policy specification across the administrative domain of the resource, site, VO, application manager, and user should be provided.

Timely updates of policy needed

A time delay between publication of a policy change and implementation or enforcement is to be expected. There should be prompt implementation of policy change. The resource manager will implement the policy change and log compliance. The resource manager will define a prompt and reasonable time delay appropriate for the resource. Policy changes may require verification and validation before deployment.

Suspension of privileges should not delete policy

Sites and virtual organizations should have the ability to suspend resource authorization for a particular grid identity without actually deleting the authorization and therefore possibly losing tracking information.

Transparency

Directory of user's roles

VOs should provide a method providing membership and role/group information for a given user. Examples of this might include extended attributes within the user's proxy certificate and SAML attribute assertions containing agreed user attributes that are related to roles or privileges.

Transparency of Authorization information and policy

Certain groups or roles may require additional authorization before membership information is released (so as to not leak information about which accounts are privileged).

Protection of Authorization-informing attributes

Alterations of the information should only be possible through secure, authenticated access paths using procedures such that the sites are willing to trust the role / membership information returned. This requirement may involve a detailed description of how virtual organizations maintain and protect this data. (Similar, perhaps to a Certificate Policy / Certification Practices Statement for Certificate Authorities.)

(Current proxy certificate specifications ensure that proxy and delegation operations never require private keys to be sent across the network. It is important to state clearly to developers that all future protocols regarding proxy certificates must continue this practice. If it is necessary to send a passphrase or password across the network, they need to be encrypted at a strength equivalent to the strength of the key.)

Dynamic Revocation of authorization

There is a dynamic nature to authorized access in that it may depend on the resource load, quality of service, or time of day. If authorization access changes during access, an error code should be propagated back to the application or the application should query for the authorization deny qualifier.

Standard Error Codes

The consistency and transparency to the application is aided by the use of standardized error codes of authorization denials. The error information should not provide more information than necessary, lest it create a security risk. An error return code may be accompanied with a log entry number to assist the resource administrator in synchronizing the denial instance. For example, a user may call a helpdesk to report access problems, giving the error code and log entry number. The resource administrator can reference this log entry number to provide detailed information.

Role Confirmation

Trust Model

It must be possible for the resource to confirm that a user has the VO membership(s) they claim. This is done through the trust model with the authority vetting the identity of the user. With respect to mechanisms for this via X.509 digital certificates, this is described in the "CA-based Trust Model for Grid Authentication and Identity Delegation" from the GGF Grid Certificate Policy Working Group.

Timeliness

It must be possible for the resource to confirm the user's claimed role(s) or group membership at the time access to a resource is requested. For example, in the Globus environment, resources assign these groups via the grid-mapfile.

Privacy

It must not be possible for unauthorized users to produce a list of members of a VO, or the list of VOs to which a user belongs. Authorized VO administrators may have access to the full list of members.

Operations

Logging

Logs documenting the resource access decisions, policies, policy changes, and resource implementation of policies should be kept. The virtual organization, site(s) and resource managers should log such events and retain these logs for 10Ms (approximately 4 months). The logs should be protected to ensure privacy and integrity.

Logs should be frequently archived on a machine different than the one on which they were generated.

When archived, the logs should be digitally signed by the archive server.

Revocation

It must be possible for an authorized VO administrator to revoke all of a user's authorizations based on VO membership by removing the user from the VO.

It must be possible for an authorized VO administrator to revoke a user's assertion of privileges by removing the user's ability to claim a given role, a number of roles, or other attributes issued by an authority.

Revocation Timeliness

Authorization revocation should be done in a time frame consistent with the authentication revocation of 0.1Ms.

Fault Tolerance

Grids should gracefully survive partitioning so that local services can continue their operation in case a resource is disconnected or to avoid a DoS attack. This may require redundant or distributed Authorization Services.

Providing credentials to service

The authenticated identity (where needed) and authorization-supporting attributes that a user presents should be made available to the execution environment by something like a gatekeeper or job manager. In other words, the gatekeeper may have passed a request based on the presented credentials, but if this results in delegation of the request (e.g. running a job ) the authorization and/or identity credentials should be made available to the final execution environment via some standard mechanism.

Authorization for Replicated Data

Dependency on unreplicated authorization service.

If files are replicated, authorization for access to this replicated data should not depend on the availability of a single source of authorization. Simply put, the source site and the source site authorization server can go down without effecting access to the replicated data at other sites. Otherwise the service is not replicated.

Consistent authorization on all replicas.

The authorization requirements on data access should be consistently applied for all replicas of the same data.

(Chapter 3) Site Accounting and Audit Requirements

Accounting and Audit Requirements Introduction

Accounting has historically had close ties to Authentication and Authorization because of the certainty with which they need to identify the entity to be associated with the accounting data. This is particularly important in the areas of security audits, intrusion detection, and computer and network forensics.

Accounting also has importance beyond accurate billing. IT management use accounting for controlling and managing operational costs. Accounting links to other IT disciplines such as capacity planning, service level management, and performance management.

Terminology

Grid Resource Accounting

Grid resource auditing is the more traditional sense of accounting that accounts for resources usage and billing.

Grid Auditing

Grid Auditing is the focus on accounting as a security component, and the need for a seamless relationship between accounting, and the authentication and authorization components of the Grid. Simply put, with a small addition to existing accounting data, an audit mechanism could greatly enhance Grid security.

Monitoring

The term "monitoring" refers, in the accounting and audit context, to the recording of transaction data. It is synonymous with "logging" in this document and does not imply timely human oversight.

Requirements Gathering

Requirements Gathering for Grid Accounting

Requirements for Grid accounting focus on the relationship of monitoring and metering authentication and authorization for auditing security. This information binds an end entity to the resource for the time and duration of access. The consumer of this information is Grid admin, helpdesk , intrusion detection or computer forensics.

Requirements Gathering for Grid Resource Accounting

It is important to understand how the audit data will be used. This will help define the accounting data gathered and the data flow. It is the goal of this document to describe the requirements of Grid accounting and audit components which satisfy a broad range of instances and usage. This chapter will also identify other current Grid working groups and accounting standards that are addressing these needs.

Non-Goals

This chapter will consider the consumers of the accounting data and their requirements, but will not analyze the consumers or make recommendations on how consumers should process the accounting data. It is not the goal of this chapter to reproduce or reinvent past accounting standards or duplicate current Grid accounting work.

Grid Auditable Data

The Grid auditing examines accounting requirements from a security perspective: audit logs, intrusion detection, and forensics. These requirements are not disjoint for mainstream accounting concerned with billing and metering, but in this section the requirements are described from the security perspective.

Grid Accounting must log - or be able to construct via trusted authorities following security problems - the following data per resource access.

See also section 4.2.1 for further relevant details of these pseudonymity issues.

Resource Identification (RID)

The resource must be identified. The resource identity can be layered or accumulative or onion fashioned. This identification may be any or all of the following and more:

The RID should be descriptive of the state of the resource. For example, if the resource is a file, the exact content of the file at the time of access would be an optimal piece of information for a forensic analogy. This type of metadata is difficult and expensive to maintain, and usually requires replay logs for the most accurate view of the data at and during the time of access. Nonetheless, the more accurate the accounting description of the resource, the more options are open for damage assessment and recovery.

End Entity Identification (EEID)

The EEID accurately describes the end entity to the resource. Commonly this will be a GSI proxy certificate, which can be traced back to a credential from some trusted source of identity. This may also be some sort of pseudonymous identifier with good provenance, as long as it can be traced - according to the security policy in place - back to the real identity of the end entity. There are a number of requirements related to the handling of the EEID.

EEID logging

Information tying the EEID to the processes executed on its behalf should be kept as part of the Grid auditable monitor data.

This data should not be recorded locally but should be reported to a remote central system.

When there are explicit privacy or confidentiality requirements, a specific central system should be used and these data should be encrypted in transit and should not be available for query to any but an explicitly authorized entity.

The provenance of the process or job must extend to the true origin.

If a process inherits credentials beyond the subset of its current credentials, an alarm should be triggered.

An illustrative example specific to the Globus toolkit may help clarify the reasoning for these requirements.

Intrusion detection at a file system level when triggered identifies the PID (process id) of the offender. Via the system process table, the associated UID (user id) and PPID (parent process ID) can easily be identified. When a Grid job is submitted and runs on a Grid resource, the parent process is the UID mapped to the certificate in /etc/grid-security/grid-mapfile during the authorization process. Many certificates may be mapped to the same UID. This masks an audit trail needed to link all of the connections from the offending process to the EEID.

The two crucial pieces of information are the PID of the process running on the Grid resource and the EEID responsible for initiating this process. Both the PID and the EEID are known but not necessarily recorded consistently or together. The globus-gatekeeper will log the EEID at authentication time in the syslogd data.

For example,

  Feb 14 09:31:32 ipsec GRAM gatekeeper[29452]:
  Authenticated globus user:
  /C=US/O=IBM/OU=GridLPP/OU=austin.ibm.com/CN=shawnm

In this example, the EEID can easily be tracked via the CA and RA back to a singular user. The disjoint occurs with the recording of the PID of the actual process that is run on behalf of the EEID on the Grid resource. The PID is returned to the initiator in the form of a JobID.

For example,

 % globus-job-submit ipsec /bin/ls ls /tmp
 https://ipsec.austin.ibm.com:62960/27126/1045236692/

The middle number is the PID of the 'ls' command run on the Grid resource ipsec.austin.ibm.com. The JobID, which contains the PID, and the EEID should be sent as part of the Grid auditable monitor data. This data should not be recorded locally because it allows a hacker a means to cover his tracks. All Grid data should be reported to a remote central system. The provenance of the process or job must extend to the true origin. The GSI model allows for the propagation of jobs and the inheritance of security credentials. Simply put, as a job propagates from Grid resource to Grid resource, EEID must remain consistent or any transition of identity must be logged.

The numbering is wrong in the original Mullen et al document. We have inserted a blank paragraph to keep our numbering schemes in synch.

BLANK

End of dummy inserted paragraph

Authentication and Authorization

Knowing the provenance of a job should allow the audit trail to quickly discern the authentication and authorization used to gain access to the Grid.

Again, in the example of the Globus Toolkit.

The EEID or proxy certificate is logged by gatekeeper on the Grid resource. This is a logging of the authorization processes. The actual authentication took place on the provenance node with grid-proxy-init when the passphrase was entered and the proxy certificate created. The authentication process should be logged. Currently it is not possible to distinguish between a valid authentication via grid-proxy-init and the stealing of the proxy certificate out of /tmp.

This is analogous to the "su" command (substitute user) which is logged by syslog and in sulog. When the grid-proxy-init command is issued the user is taking on the identity of a particular Grid user. This information should be part of the Grid auditable data.

Action, Time and Duration

This section will have some intermingling of the accounting requirements as they relate to security and to resource management. This is done to illustrate that the same accounting data is used for two very different purposes.

The attempted action of the process running on the Grid resource should be part of the Grid accounting data.

The action of the process may be attempted but unsuccessful or denied. As an example, consider failed su attempts or failed logins. Action attempts are critical for behavior-based components of Intrusion Detection Systems (IDS).

Alternately, failed actions may be a consequence of a resource shortage or outage. This is useful to track for diagnostics or dynamic resource management. For example, in the Open Grid Services Architecture (OGSA) model this information could be used to create an additional service factory.

The time and duration of the process running on the Grid resource should be part of the Grid accounting data.

The time and duration are critical to computer forensics, as they allow for the creation of a time line of activity. Action, time and duration are important to intrusion detection, On Demand or dynamic services, and autonomic or self healing services.

Grid Accounting and Audit Data Conclusion

In a Grid environment it is important to monitor a causally connected sequence of events. It is important to be able to traverse this sequence of events from authentication to action taken on the remote resource. The proper accounting data can enable intrusion detection, the detection of malicious behavior and provide security audit trails.

Requirements Gathering for Grid Resource Accounting

Grid accounting is closely affiliated with security, but the more traditional computer accounting belongs more in the GGF Scheduling and Resource Management (SRM) Area. Specifically, the Resource Usage Service Working Group (RUS-WG) is relevant. It is not viewed an abdication of responsibility to leave this section to other GGF working groups. It is viewed as an efficient means of coordination between different GGF groups.

Existing standards and practices

Accounting Institutes

We have not been able to find any standards from computing or IT accounting relating to traditional financial accounting or from other standard bodies such as Oasis or Liberty Alliance. In the IETF, work has been done in this area ( but not necessarily relating to Grid ) in the following set of IETF RFCs.

Of the RFCs, the reviewing author (Mullen et al.) found the the RADIUS Accounting standard to be the most interesting, since the nature of securely logging onto a network via RADIUS is similar to the nature of securely logging onto a Grid. There is considerable work in this standard that may be leveraged in implementing a Grid Accounting standard.

ESPGRIDwiki: RequirementsDoc (last edited 2013-05-17 16:26:46 by localhost)