#acl MarkNorman:admin All:read


~+'''What is a VO? - towards a definition'''+~
-------
This page has now been frozen.  However, if you have comments, please email them to me (MarkNorman) and I shall encorporate them on this or the [[VODefinitionComments]] page.

Thanks,
-- MarkNorman <<DateTime(2006-04-24T17:43:41Z)>>

------

~+Contents+~

[[#prelim|Preliminary note]]

 [[#noresources|Why we shouldn't include 'resources' in a VO]]

 [[#primauthz|The concept of VO is of primary importance to authorisation]]

 [[#newterm|We may have to invent a new term]]

[[#General|General Expectations of a VO]]

 [[#1stPrinc|First principles]]

 [[#Userperspec|What is a VO from a user's perspective?]]

 [[#ServicePerspec|What is a VO from a service developer's (or owner's) perspective?]]

 [[#RPsPerspec|What is a VO from a resource provider's perspective?]]

  [[#ResVirtAbs|Where resource virtualization is not present]]
 
  [[#ResVirtExists|In the future, where resource virtualization exists]]
 
 [[#Respons|Responsibilities diagram]]

[[#Adef|A definition?]]

 [[#GenNotes|General notes and justifications]]
 
 [[#Definition|First attempt at a definition]]

 [[#PrevWork|Previous Work]]

 [[#comments|Comments from readers]]

[[#FinalWord|Final word: why we shouldn't include resources]]

----

<<Anchor(prelim)>>
= Preliminary note =

<<Anchor(noresources)>>
== Why we shouldn't include 'resources' in a VO ==
"VO" means different things to different people (which is part of the reason to begin this 'defining' work).  In the text that I (-- MarkNorman <<DateTime(2006-02-21T12:53:52Z)>>) have begun for this wiki activity, below, I have tried to capture what the non-grid community believes VOs to be (in common parlance).  A lot of 'grid people' also use this term in this way.  However, in some well-established (and good) work the term has become established to mean a combination of people and shared resources (machines).  This latter conflation makes thinking about authorisation very difficult.

Some definitions (see [[#GenNotes|General notes and justifications]], below) assume that the "groups" in the VO will ''usually'' be sharing resources for some activity.
 For example, a set of bioinformaticians at the University of Oxford may be working closely with a group at Harvard and they wish to share their computational resources, services and/or applications.

However, we should not be tied to this definition.  The word "usually" isn't helpful!  A VO can exist where one (group of) person(s) is not contributing any hardware.
 For example, the biologist who has joined the International Ecology Society (in the 'hypothetical' below) does not have resources to contribute. She is a member of the society and therefore benefits in terms of grid computing usage.
 
 Example 2: Consider the two groups of bioinformaticians (as above) but where they have no grid resources of their own.  They may form an ad hoc group (based on some international funding over a short term) and some grid resources around the world may be made available.

<<Anchor(primauthz)>>
== The concept of VO is of primary importance to authorisation ==
In a secure system, authorisation belongs to the resource (or service) owners.  When a job comes to them, they need to know that it comes from a user who is a member of a VO.  They ''don't'' need to worry about:
 * does my machine belong to this same VO?
 * is this user contributing anything to the VO?
Those questions are too complex for the authorisation decision to be made.  The resource (or service) needs just to answer the question: ''if I allow users of status B from VO 1 to use my facilities, does this user have status B from VO 1?''


<<Anchor(newterm)>>
== We may have to invent a new term ==
We also have the term "Request-based VOs" (see PreviousVoWork).  This is a ''bit'' closer to what the rest of this page is attempting to achieve, but it still conflates resources and participants to a certain degree.  The "VO" that I have tried to define is the kind of thing that people are using VOMS servers for at the moment, I believe, and clearly separates the resource providers from the users (which you ''need'' to do if you're thinking about authorisation).

If "VO" is too widely accepted as depicting collaborating groups of researchers who are also sharing grid (computing) resources, then we may need a separate term to use when we are considering the action of authorising users (and possibly other entities).  Collaborating groups with resources to share sound more like "grids" or "grid collaborations" and they may have complex policies that will impact authorisation policies.  "VOs" sound more like ad hoc groups of people who may have been granted the use of some grid resources.  This latter definition works for both collaborations with ''and without'' resource sharing, and is useful for both cases.  So we need a term for that concept.  Many people think that the term should be "VO" (but many people think that entities within a VO must be sharing resources).

So, please read the following, bearing in mind that we may eventually have to pick another term for this type of VO!

Thanks for reading, and hopefully, contributing!

Mark -- MarkNorman <<DateTime(2006-02-21T12:53:52Z)>>.


<<Anchor(General)>>
= General Expectations of a VO =

<<Anchor(1stPrinc)>>
== First principles ==
(N.B. "authN" = Authentication; "authZ" = Authorisation)
  1. AuthZ is performed at the resource or service.  It isn't the direct responsibility of the VO to do this (although this is usually the effective outcome - controversial point, but please keep reading!).
  1. The VO houses and maintains attributes about users.  These are given, on demand or 'up front', to the resource/service where the access decision is made.
  1. AuthN is not normally performed at the VOs (this would arguably make them "O's").
  1. Groups of resources are not VOs.  They might be "grids" or they may have another collective name. (e.g. a campus grid is not a VO: it is a set of resources used by a sub-set of people on and off the campus.  The set of users (and other entities) may be a VO.
  1. In the examples that follow, the term grid or grids denote groups of resources that are typically collaborating in some way.  Typically, they would share the same middleware or key protocols. However, a single resource could theoretically belong to many grids.

<<Anchor(Userperspec)>> 
== What is a VO from a user's perspective? ==
An (end) user (i.e. someone who does not run a service or own a grid node) should know (or find out) that for him to use a particular grid service, or collection of services, he must join a particular VO.

 For example, a biologist wants to run her data through the grid-based extinction-rate algorithm service.  She finds out that this is provided by computer scientists working at the International Ecological Society.  She joins the IES.  When she attempts to use the Extinction Rate Grid Application,  the underlying service finds out that she is a member of the IES and allows her to proceed.

<<Anchor(ServicePerspec)>>
== What is a VO from a service developer's (or service owner's) perspective? ==
A service may be provided and maintained within a grid by someone who decides to (or is mandated to) serve users within a particular community.  The VO represents that 'community'.

 For example, a developer has been funded by the IES to provide a service for all authenticated members of higher education instituations throughout the world as well as all members of the IES.  The service developer is partly responsible for ensuring that the service cannot be used by people outside these communities.

<<Anchor(RPsPerspec)>>
== What is a VO from a resource provider's perspective? ==
A resource provider may own machines upon which grid services are run.  The resource provider may have a personal preference (or one which comes from his organisation) as to which services are run on his resource.

 For example, the resource provider may wish to exclude all biological services in favour of providing resource for text-mining services.  Therefore, he does not have to worry about biological VOs.

Where the resource provider allows services, he may wish to account for the use of his resource.  He could do this in two ways:
 1. Count the cycles used by particular services or applications and bill the owners/maintainers of those services/apps (and leave them to charge their users if they wish to do so).
 1. Identify every user and bill them directly for cycles used.
 1. (As a combination of the above), identify every user, look them up in the VOs to exclude the VOs to which he wishes to provide his resource free of charge.  Bill the appropriate users.
 (Clearly #1 in the above list is the easiest, logically, but mechanisms need to exist to enable this scenario).

Those resource providers who are not deliberately joining a diverse grid may wish to restrict the use of their resources to only some services and only some VOs.  In this case, the resource provider has the same concerns and
 * restricts the services  and/or
 * restricts the VOs
that can use the resource.

<<Anchor(ResVirtAbs)>>
=== Where resource virtualization is not present ===
In grids where the end user is fully aware of using a particular grid node, then the node owner may be considered to have a similar interest to that of the service developer in the above examples.  The node owner is directly concerned with users and what they do on her machine.

<<Anchor(ResVirtExists)>>
=== (In the future, or...) Where resource virtualization exists ===
End users will not have a direct relationship with resource owners.  AuthN, AuthZ (access control) and accounting will ''have'' to be performed at either the application or service levels (whichever is appropriate).  Alternatively, resource brokers may have to exclude users in certain VOs from accessing certain resources.  Resource owners may wish to bill service providers and/or application providers (which may be synonymous to VOs) for the CPU time when those services or applications are active at their resources.

<<Anchor(Respons)>>
== Responsibilities diagram ==
Note that I have used the concept of "billing" below as an example of accounting.  I think that this is a useful concept in bringing VOs into focus.  Even if we cannot envisage actually charging for the use of a service or resource, it provides a useful example of metering and quota-filling (which are far easier to envisage) as well as sophisticated authZ.


{{{
Architecture
Level:          Resource          Service/application         User

Authentication  AuthN of          AuthN of user (may be       AuthN'ed by
                service           devolved, but proof         service or
                                  needed)                     resource or
                                                              3rd party
                                                              trusted by
                                                              serv/resource


Authorization   AuthZ of          AuthZ of user               AuthZ at
                service           <*VO lookup*>               service
                                     -- OR --
                AuthZ of                                      AuthZ at
                all users   --------------------------------> every
              <*VO lookup*> --------------------------------> resource
                                     -- OR --
                AuthZ
                devolved to                                   AuthZ at
                res. broker                                   res. broker
 
Accounting/     Bills       ----->      Bills [1,2]  -------> Is billed [1]
Billing         service/app             User                  by service/app.
                or VO                                         or VO
                                     -- OR --
                Bills user  --------------------------------> Is billed by
                directly    --------------------------------> (possibly many)
                                                              resource
                                                              provider(s)
                                                             
                                                             
[1] Optional: Some services will not bill users, as they may be funded
directly (without accounting) by the VO.

[2] Service/app may have to provide usage info to VO so that VO can bill
users accurately.

VO possible        1. Provide info. to service/app for AuthZ decision.
responsibilities   2. Provide info. to resource    for AuthZ decision.
                   3. Provide info. to res. broker for AuthZ decision.
                   4. Hold a repository of usage statistics for individual
                      users.
                   5. Hold a mapping of identifiers from IdPs (e.g. DNs) to
                      VOs own user identifier.
                      
VO mandatory       1. Provide user info. to service/apps and/or resources
responsibilities      and/or resource brokers for AuthZ decisions.
                   

}}}


<<Anchor(Adef)>>
= A definition? =

<<Anchor(GenNotes)>>
== General notes and justifications ==

Foster et al. (2001) suggested, "A Virtual Organization is a collection of individuals and institutions that is defined according to a set of resource sharing rules".[A]  This definition seems too narrow and possibly alien to the real world.  The reasons for saying this are:
 * "institutions" clouds the issue.  In the last few years, VOs have been thought of as most likely to be subsets of users from ''within institutions'' grouped with other subsets of users at other institutions. Perhaps the definition would be better with the word institution absent altogether or included as "and/or institutions" if necessary.
 * "a set of resource sharing rules" is again too narrow.  Users can belong to a VO (such as the International Ecological Society) and resources (or services) choose to be available to the VO's users.
 * the definition also implies that the VO owns the resources (or at least drives the access policies of the resources) which it may or may not be the case (in the real world examples that have arisen recently).
 
Although citing the above Foster et al. definition, in 2004, the community developing the Virtual Organization Membership Service (VOMS) software discussed VOs in terms of users, rather than institutions or resources.[B] Alfieri et al. noted that "VOs generally share resources", but they clearly will not ''always'' share resources and certainly there will be VOs in existence that do not own any resources at all.  Later in the same document, Alfieri et al. stress the concept of the VO is that "VOs administer users, grant them permissions and establish agreements with resource providers (RPs). RPs, in turn, enforce local authorization."  This seems far closer to a realistic definition of a VO.  Alfieri et al. state that "the owner of a resource (i.e. the RP) should be able to enforce local user authorization based on various user characteristics such as his membership in a VO, roles he can have or his identity.

[A] I. Foster, C. Kesselman and S. Tuecke, The Anatomy of the Grid, Interna-
tional Journal of High performance Computing Applications, 15, 3, 2001

[B] R. Alfieri, R. Cecchini, V. Ciaschini, F. Spataro, L. dell'Agnello, A. Frohner and K. Lörentey, From gridmap-file to VOMS: managing Authorization in a Grid environment, Future Generation Computer Systems, 2005, http://grid-auth.infn.it/docs/voms-FGCS.pdf, April, 2004.

=== Characteristics of a VO ===
The following are therefore characteristics of a VO, expressed in terms that attempt to avoid over-restriction or promoting concepts such as "usually" or "generally" too highly. VOs
 * represent groups of users which may cross administrative boundaries
 * should imply a ''definable membership'' in that users can join and leave such groups
 * (with respect to grids) represent communities for which access to grid resources or services or applications may be granted or denied
 * may contain varying statuses of user and attributes about those users
 * are definable in themselves as lists of members
 * do not normally provide identity, but instead rely on externally trusted parties for identity establishment (authentication).

<<Anchor(Definition)>>
== First attempt at a definition ==
A VO is definable as a list of identified users that represents a real-world group of people [*] that have a clear membership.  The VO is not usually the primary point for the establishment or assertion of identity and may be relied upon by grid resources, services and applications to provide information for authorisation decisions.  At its simplest, a VO contains a list of members and their unique identifiers.  At its most complex, a VO may contain different status levels of members and many attributes about the members as well as accounting information regarding members' use of grid resources, services or applications.

END OF DEFINITION

=== Questions ===
[*] I was very tempted to put "and other entities" here, but maybe we don't need those words?  It is hard to imagine a 'machine', for example, being given a membership of a VO so that any user or service running on that machine can use other machines that have been made available to the VO.  Surely, it's the end users who are the important factors.  But maybe someone needs to come up with a good example!!

<<Anchor(PrevWork)>>
== Previous Work ==
I have created a page of links etc. to PreviousVoWork.  This should help us see how definitions have evolved and how they have been used in the past few years.

See especially Nate Klingenstein's wiki thoughts on VOs at https://authdev.it.ohio-state.edu/twiki/bin/view/Main/VirtualOrganizations

<<Anchor(comments)>>
== Comments from readers ==
Readers of this page (who don't want to put their comments in line) have already left comments at [[VODefinitionComments]].  Please add your thoughts!

<<Anchor(FinalWord)>>
= Final word: why we shouldn't include resources =
Some email correspondents still were not happy about leaving resources out entirely.  See the [[http://www.federation.org.au/pipermail/shibgrid-bof/2006-March/thread.html|"A VO definition" thread]] in the Shib-Grid BOF reading list.

I still think that we ''have'' to separate out the idea of resource sharing from the VO as we are mixing up legal and administrative requirements with technical requirements and developers and implementers will not be able to produce something that meets the main definition.

A VO that means that some resources are being shared still has to technically support the facility of deciding which user can access each resource.  It is this decision that should define the VO, not the political or financial reasons to share resources in the first place.

The following graphic summarises my point of view.  If we consider a VO to be a kind of "Independent Role Provider", then that functionality can support resources needing to make authorisation decisions.  If we consider a VO to be a 'resource sharing partnership' then we still have to come up with the functionality for the authorisation decision-making in some other way.  As the graphic depicts, we end up going around the garden again and ending up with a role-provider server.  So either
 * we have to abandon the term VO to mean this ill-defined political 'thing' and use a new term (such as "Independent Role Provider" (IndRoP?)  or
 * we use VO to mean IndRoP (as a lot of people do already) and the "VO" in VOMS can still mean "Virtual Organisation"


{{http://users.ox.ac.uk/~markn/wikifiles/VOdef1.png}}

 ~-Graphic of the need to have a role provider service (often also known as a VO)-~