Differences between revisions 38 and 39
Revision 38 as of 2007-04-03 09:17:24
Size: 16944
Editor: MarkNorman
Comment:
Revision 39 as of 2013-05-17 16:26:47
Size: 16946
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
 [attachment:AllHandsPapers2006/AllHands06TypesUsersFinal.pdf Types of grid users and the Customer-Service Provider relationship: a future picture of grid use] (PDF).  [[attachment:AllHandsPapers2006/AllHands06TypesUsersFinal.pdf|Types of grid users and the Customer-Service Provider relationship: a future picture of grid use]] (PDF).
Line 65: Line 65:
 c) The Smalltown Medical Center receives an unexpected patient when the President of the USA visits town. The President supplies a sample of blood for which the Smalltown doctors need to scan for a variety of pathogens, toxins and other markers that could be indicative of his symptoms. The analysis of the blood and the cross-checking with other data held in secure databases concerning the patient is very processor-intensive. Therefore the Medical Center (SEU) uses a grid service provider (SP) to process the job. This SP must be able to run jobs which can query the secure database so that only positive results are reported to the Smalltown doctors (i.e. the Smalltown doctors must not be able to know which diseases the patient has suffered with in the past, unless they are clearly relevant to the analysis as has been indicated by the algorithm of the grid job). The SP may, or may not, own the secure database. Whilst the data are being moved around the grid, the privacy/anonymity of the patient must be guaranteed as well as the confidentiality of all data that have proved irrelevant to the final findings. [[FootNote(In this use-case, the data and computer systems involved have to be protected as part of HIPPA (Health Insurance Patient Privacy Act of the USA). This is an example of an influence (in this case legislative) external to the grid and to the users that put constraints on the security and possibly access management of the grid and/or grid nodes involved.)]]  c) The Smalltown Medical Center receives an unexpected patient when the President of the USA visits town. The President supplies a sample of blood for which the Smalltown doctors need to scan for a variety of pathogens, toxins and other markers that could be indicative of his symptoms. The analysis of the blood and the cross-checking with other data held in secure databases concerning the patient is very processor-intensive. Therefore the Medical Center (SEU) uses a grid service provider (SP) to process the job. This SP must be able to run jobs which can query the secure database so that only positive results are reported to the Smalltown doctors (i.e. the Smalltown doctors must not be able to know which diseases the patient has suffered with in the past, unless they are clearly relevant to the analysis as has been indicated by the algorithm of the grid job). The SP may, or may not, own the secure database. Whilst the data are being moved around the grid, the privacy/anonymity of the patient must be guaranteed as well as the confidentiality of all data that have proved irrelevant to the final findings. <<FootNote(In this use-case, the data and computer systems involved have to be protected as part of HIPPA (Health Insurance Patient Privacy Act of the USA). This is an example of an influence (in this case legislative) external to the grid and to the users that put constraints on the security and possibly access management of the grid and/or grid nodes involved.)>>

The text previously held on this page has been collated into a paper for the UK e-Science All Hands Meeting 2006

There may be some useful stuff in the notes that did not quite make it into the paper. See UseCasesPaperNotes just to check!

The rest of this page is (as we consider!) interesting thoughts arising from brainstorming etc. sessions, but which did not find their way into a paper.


Models of grids and grid resources

The following is a (non-exhaustive) list of types of grid resource and models of grid upon which grid computing may be possible. N.B. All may be possible on the same grid, and examples from section 2.3 may be applicable to all.

  1. Dedicated primary grid service (e.g. compute cluster, data cluster)
  2. Voluntary secondary resource, actively monitored by resource owner. Resource owner deliberately makes resource un/available and may choose whether or not to run grid jobs on an individual basis.
  3. Voluntary secondary resource operated blindly by resource owner, possibly with dedicated, secure, ring-fenced sandpit within the system that defers to end-user activity.
  4. A no-trust, no-accounting grid (subset of c., above). Each node has a secure sandpit and the owner allows anything to go on there. All users are authorised to use it.

Notes/examples:

SETI@home and climateprediction.net should be examples of b. above as they could theoretically be managed by the resource owner and be actively selected. However, as most workstation users completely trust the programs, they may be behaving more like c., except that the processing is not ring-fenced and secure.

No further detail is attempted here as this document attempts to be neutral in terms of architectures and technology.

Privacy and confidentiality

Running alongside each of the use-cases above are another two dimensions. The first is the need for privacy/anonymity and the second is the confidentiality of the data and/or algorithms.

Privacy

In any of the use-cases listed in section 2.3, the identity of the end-user may need to be protected. Grid nodes and services may care what the end-user is, but may not care who is the end-user. Clearly this is easier to achieve if a trusted third party (e.g. a SP) is submitting the grid job(s).

Confidentiality of data and/or algorithms

Again in any of the use-cases listed in section 2.3, the data and/or algorithms being processed may either be sensitive (e.g. medical records) or confidential (e.g. of commercial importance). Users may need either contractual guarantees that data or algorithms cannot be stolen or observed by an 'unauthorised' entity, or for this to be technically unfeasible.

A categorisation of example end users

In our papers for the All Hands meeting (see AllHandsPapers2006) we gave examples of the 7 main types of users. Please see those main types (and others) at UserCategoryExampleActivities. This is where we have attempted to take some real world (existing) examples for each user category.

Appendix one: Some example use-cases (end to end stories)

This section contains some story-line cases with which to illustrate the generalised use-cases contained in section 2.3. This is a (near) trivial example section, and is merely for feeding the discussion regarding the broad use-case definitions within section 2. This section of the document does not attempt to encompass the broader issues and the many types of users. However, section 2 attempts to do this. Abbreviations used in this section are introduced within section 2.

  • a) A humanities researcher (SEU) submits a text document containing metadata and a set of video data to a grid SP and asks for a very complex multi-factor analysis involving the text and the video data.
    • The SP needs to know that the user has the correct privileges to use the service and must find out that he or she is a member of the UK academic community and already holds a degree. The SP also needs to know to which organisation (department and institution) the user belongs in order to bill (charge financially) that organisation. The processing requires the use of three grid nodes. The SP submits the job and auditing/tracking metadata so that the grid nodes may bill the SP. Periodically the grid nodes bill the SP and the SP has its own charging mechanism for billing the humanities researcher.
    b) The BBC Weather Unit in London (SEU) registers to receive hourly weather data output from the profit-making UK Meteorological Grid Service. The UKMGS has a data cluster and compute cluster of its own, but regularly has to purchase processing power from nodes on the UK e-Science Grid. It also demands specific output from several grid-enabled satellites and government-run weather stations. This is done in an automated but unpredictable way (e.g. for a particular combination of temperatures and pressures, the UKMGS jobs may ask for radar data for – unpredictable – regions around the British Isles).
    • Each grid node 'called upon' by the UKMGS jobs charges the UKMGS for the processor or instrument time used. The UKMGS puts its output data on secure web sites for the BBC Weather Unit to pick up. The BBC pays a standard fee for the service, but occasionally will pay more for specific requests for 'unusual' data, such as "What will the weather be like for the England game on Tuesday in the World Cup finals in Munich?" b)i The UKMGS is very protective of the algorithms that it has produced for predicting the weather. It needs to be able to run its jobs and to have a guarantee that the owner (or other users of) the external grid node will not steal the data or the algorithms. Ideally, the UKMGS would like this to be technically unfeasible.

    c) The Smalltown Medical Center receives an unexpected patient when the President of the USA visits town. The President supplies a sample of blood for which the Smalltown doctors need to scan for a variety of pathogens, toxins and other markers that could be indicative of his symptoms. The analysis of the blood and the cross-checking with other data held in secure databases concerning the patient is very processor-intensive. Therefore the Medical Center (SEU) uses a grid service provider (SP) to process the job. This SP must be able to run jobs which can query the secure database so that only positive results are reported to the Smalltown doctors (i.e. the Smalltown doctors must not be able to know which diseases the patient has suffered with in the past, unless they are clearly relevant to the analysis as has been indicated by the algorithm of the grid job). The SP may, or may not, own the secure database. Whilst the data are being moved around the grid, the privacy/anonymity of the patient must be guaranteed as well as the confidentiality of all data that have proved irrelevant to the final findings. 1 d) A biologist researcher needs some very processor-demanding work to be performed for a statistical analysis of a very large data set that s/he has collected. His/her IT support specialist is able to write a program to perform the work but must submit this program and data to the grid for the job to be performed.

    • The IT specialist already has the access credentials to be able to run jobs on the grid, but has to guarantee to a grid AM service or auditor that the researcher is also privileged to benefit from such work. The IT specialist submits the job to the grid and does not care upon which grid node the processing takes place (acting as a PUA). The job is completed and the specialist picks up the results and passes them on to the researcher. The grid node (or resource broker) may demand payment for the use of the resource, but the biologist is part of a community that should receive such services without charge. This is expressed to the grid node or resource broker. d)i The biologist thinks that s/he may be physically attacked by people who morally oppose the nature of his/her work and wishes to remain anonymous and untraceable. d)ii The biologist believes that s/he is close to finding a cure for HIV and does not wish for the Nobel Prize to go to anyone else, should they see some of his/her data or see the way s/he is interrogating it. Therefore, s/he needs the data and algorithms to remain a secret from other grid users.

    e) A theologian has a very complex textual analysis of a great number of published versions of the bible. The question is too complex for any of the available text mining services that are currently resident on the grid and so the researcher has to have a developer design a program to carry out the analysis.
    • The developer (acting as a PUS) knows of a data cluster that already contains copies of these versions of the Bible and so writes a program or 'job' which needs to specifically access this data cluster. At some point the developer has to prove that the theologian is privileged to make use of these grid resources. The job is run and the theologian’s university department is billed for the use of the grid resources. e)i The theologian is aware that his/her research is highly controversial and therefore wishes for his/her identity to remain secret. S/he may still need a mechanism of ensuring that the SP is paid for its services e)ii The developer is thinking of making money from his/her algorithms and wishes for them to remain secret.
    f) Later, the PUS developer (from example e) ) realises that there are many researchers that need similar jobs carrying out. Therefore s/he develops a web interface, that includes a billing mechanism, for researchers (SEUs) to use to choose texts and cross-referencing queries.
    • The developer becomes a private company and pays for grid membership so that s/he can use grid processing power and databases when required (but will be billed for this access as and when it occurs). During the development of the service, the (now PUDS) developer invites humanities researchers to use the service for free/gratis in order to test it. Nevertheless, in his/her contract with the grid consortium, the PUDS developer has had to agree that only genuine UK academic researchers are able to use the grid resources for free/gratis, but that private individuals and 'for-profit' organisations should be billed. The PUDS developer decides to avoid the problem by only serving the UK academic community during the testing phase, but has the difficulty of checking the end-users’ statuses whenever a test is made.
    g) A highly technical programmer has permissions, as an academic, to use the grid. S/he writes some code and submits it to the grid to produce some computer-generated imagery (CGI) output. Once s/he receives that output, s/he is able to process it on his/her desktop machine and then re-submit it to the grid for further processing. Sometimes s/he is able to start a job running for which s/he is unconcerned whether it takes the usual two hours or three days. By prioritising his/her jobs (or by deliberately choosing the places to run them) s/he is able to use the different parts of the available grid to the greatest efficiency. (In this way s/he may be behaving as a PUA or PUS: if a mechanism were available to use slower grid resources when the priority is low, then s/he would be happy to use this and to remain a PUA. Otherwise, s/he may deliberately run high and low priority jobs in specific places, depending upon demand). h) A satellite orbiting the earth has a grid-enabled sensor attached for use with grid research. A system administrator (GRID-SYS) has to connect to the hardware controlling the sensor to perform a firmware upgrade. This has to be done remotely and there are five individuals on the planet who are trusted to perform this task.
    • A GRID-SYS authenticates (somewhere), connects to the device with system administrator privileges and carries out the task. This task needs to be carried out periodically and the five individuals change.
    i) The same sensor on the same satellite is regularly switched to detect light of a different wavelength for the collaborative group of researchers across the world that uses the data. This switch must be performed manually and this can be done by about one hundred of these researchers' IT grid support staff (all GRID-SYSs).

Appendix two

What is a grid?

Other people's definitions

"An environment in which individual users can access computers, databases and experimental facilities simply and transparently, without having to consider where those facilities are located." [RealityGrid, Engineering & Physical Sciences Research Council, UK 2001] http://www.realitygrid.org/information.html

"A means of network computing that harnesses the unused processing cycles of numerous computers, to solve intensive problems that are often too large for a single computer to handle, such as in life sciences or climate modeling." http://www.consultingtimes.com/glossary.html

After admitting that there is a short answer and a very long answer, the GridCafé web pages at CERN (http://gridcafe.web.cern.ch/gridcafe/whatisgrid/whatis.html) say that:

"The short answer is that, whereas the Web is a service for sharing information over the Internet, the Grid is a service for sharing computer power and data storage capacity over the Internet. The Grid goes well beyond simple communication between computers, and aims ultimately to turn the global network of computers into one vast computational resource."

Wikipedia (http://en.wikipedia.org/wiki/Grid_computing on 29 March 2005), described grid computing, thus:

"Grid computing offers a model for solving massive computational problems by making use of the unused resources (CPU cycles and / or disk storage) of large numbers of disparate, often desktop, computers treated as a virtual cluster embedded in a distributed telecommunications infrastructure."

The same article later asserted:

"Grid computing involves sharing heterogenous resources (based on different platforms, hardware/software architectures, and computer languages), located in different places belonging to different administrative domains over a network using open standards. In short, it involves virtualizing computing resources."

Ian Foster (with Carl Kesselman) updated his previous definitions of a grid in 2004. It should be noted that Foster has also come up with checklists and other, more lengthy text to explain what is a grid. Foster and Kesselman stated:

"We define a Grid as a system that coordinates distributed resources using standard, open, general-purpose protocols and interfaces to deliver nontrivial qualities of service."

Our definitions

For the purposes of this document, we take much of the spirit encompassed in Foster and Kesselman's definition, but find the phrases "standard, open" and "nontrivial qualities of service" laudable but not necessarily defining terms for a grid. We therefore define a grid as:

A set of networked computers and/or other devices, including remote instrumentation, that have been made available so that their operation can be shared. The sharing of these resources must be via an agreed set of protocols.

Foster and Kesselman's "system" is an object because it is identifiable by the agreed set of protocols. Any grid system which the ESP-GRID project produces will use "standard, open, general-purpose protocols", but it is possible that other grids may use proprietary code and standards, as long as all components of the grid use the same protocols. However, for resources that are geographically remote and non-contiguous in network terms, the feature of the set of resources that conveys the essence of being a grid is the common protocols (or possibly middleware).

N.B. For the purposes of the ESP-GRID project, we must also assume that the 'generic grid' is of a mixed economy – i.e. that commercial, academic and non-profit use may co-exist within the same grid. This means that we must consider grids where detailed accounting must be possible. However, this does not need to affect the definition of "a grid".

  1. In this use-case, the data and computer systems involved have to be protected as part of HIPPA (Health Insurance Patient Privacy Act of the USA). This is an example of an influence (in this case legislative) external to the grid and to the users that put constraints on the security and possibly access management of the grid and/or grid nodes involved. (1)

ESPGRIDwiki: UseCasesPaper (last edited 2013-05-17 16:26:47 by localhost)