Developer Evaluation

In February 2006 Alun Edwards (ESP-GRID) asked the developers from the BRIDGES and DyVOSE projects to answer a few brief questions as part of the evaluation work, see EvaluationPages.

Contact details confirmed at http://www.nesc.ac.uk/nesc/team.html with biographies.

We asked the developers to considering specifically the Shibbolizing of the Bridges web portal and DyVOSE work, and all the myriad of steps which had to be completed to make this work (PERMIS, whatever), and asked them to please identify for us:

  1. what did you find difficult?
  2. what makes Shibboleth a good solution for accessing a service like Bridges or the DyVOSE data?
  3. what issues can you see in a real-world production of this with 100s of users, maybe a commercial data provider, issues for the future etc.?
  4. what scalability issues can you identify?
  5. how could the access control you've implemented be subverted by e.g. a bad person, or by an expert trying to get round the system for their own convenience, or by a careless user?

To spark some real-world flavour we used the 'contrived' scenario:

To the developer:

Scenario: Please imagine you've by chance met a manager of a faculty resource in the corridor, and he/she knows of your experience and naively thinks you're the person who can just Shibb their target - "This afternoon, if you've time?"

Developers' notes

What did the developers find difficult?

The operating system

The Operating System (OS) was a difficult problem. Finding a compatible platform that didn't need everything built from source - which they could never get to work. Fedora Core 4 did the trick though, the developers would strongly recommend a flexible attitude to OS, considering there are quite a few dependent packages that need to be installed.

The portal container

To Shibbolize the portal container was a difficult problem. Servlet containers are used to host the portal, while the Shibb must be installed to the Apache server as a module. It is very difficult to force the requests that come for the portal container through the Apache Shibboleth module first. Shibbolizing the portal container proved impossible. Fortunately, mod_jk (xxxx what's this?) is designed to connect the Apache and Tomcat, and provides such functionality. So Tomcat was finally chosen as the Portal container, and Tomcat is necessary if the developers want to Shibbolize portals that require a Servlet container in future.

Finding the perfect setup that PERMIS required

Finding the perfect setup that PERMIS required to run correctly was difficult. Final result was Fedora Core 1, Java 1.4.2, compatible versions of their own tools working together.

Boundaries between PERMIS and Globus

It was difficult (complicated) to understand where PERMIS stopped and Globus started – various error messages would come back sometimes referring to the PERMIS policy, sometimes the PKI authentication methods and sometimes the Globus service implementation. Working out where they were coming from could be very tricky. The supporting documentation was also pretty sparse and the source code difficult to navigate. The developers addressed this by documenting a step-by-step HOWTO going through what had been done to get it working. They now use this instead of the PERMIS docs, see http://www.nesc.ac.uk/hub/projects/etf/

Find an established federation

The input that an established federation can provide proved to be invaluable in getting Shibbed in a short period of time, (days rather than weeks). In BRIDGES' case the SDSS federation normally responded to queries within 24 hours. Also the Identity Providers (IdPs) and Service Providers (SPs xxxx AE: definition is this correct?) can be configured and tested separately before joining them together. (xxxx AE: final sentence looks like I tacked it on!)

What makes Shibboleth a good solution for accessing a service like Bridges or the DyVOSE data?

What issues can the developers see in a real-world production of this with 100s of users, maybe a commercial data provider, issues for the future etc.?

What scalability issues can the developers identify?

How could the access control the developers have implemented be subverted by e.g. a bad person, or by an expert trying to get round the system for their own convenience, or by a careless user?

Responses Received

We are extremely grateful to the following for responding so promptly to our scenario:

The results can be seen below.

Results

Question

Developer

Answer

1. what did you find difficult?

John Watt

Finding a compatible platform that didn't need everything built from source - which I could never get to work. Fedora Core 4 did the trick though, I would heartily recommend a flexible attitude to OS, considering there are quite a few dependent packages that need to be installed. The input that an established federation can provide, in our case the SDSS federation, proved to be invaluable in getting Shibbed in a short period of time (days rather than weeks - the federation normally responded to queries within 24 hours) plus the IdPs and SPs can be configured and tested separately before joining them together.

Micha Bayer

can't really comment as I was not involved in Shib-specific functionality

Jipu Jiang

As John said, the Operating System is a difficult problem. It needs no further explanation. Also, it is very difficult to force the requests that come for portal container go through the Shibb first. In another word, Shibbolise the portal container is a difficult point. As we know, servlet containers are used to host the portal, while the Shibb must be installed to the Apache server as a module. Shibbolishing the portal container is impossible except we force the requests go through the Apache Shibboleth module first. Fortunately, mod_jk is designed to connect the Apache and Tomcat, and provides such functionality. So Tomcat is finally chosen as the Portal container, and Tomcat is necessary if we want to shibbolise portals that require a Servlet container.

Anthony Stell

As John said, finding the perfect setup that PERMIS required to run correctly (Final result was Fedora Core 1, Java 1.4.2, compatible versions of their own tools working together). In terms of understanding where PERMIS stopped and Globus started was complicated too – various error messages would come back sometimes referring to the PERMIS policy, sometimes the PKI authentication methods and sometimes the Globus service implementation. Working out where they were coming from could be very tricky. The supporting documentation was also pretty sparse and the source code difficult to navigate - I addressed this by documenting a step-by-step HOWTO going through what I had done to get it working. We now use this instead of the PERMIS docs. :o) (Can all be found at http://www.nesc.ac.uk/hub/projects/etf)

2. what makes Shibboleth a good solution for accessing a service like Bridges or the DyVOSE data?

John Watt

It is very easy to tailor attribute release/acceptance once Shibb is running. Adding other projects to our infrastructure was not difficult. We interfaced Shibb with GridSphere successfully, and bypassed the GridSphere login to use the mod_auth_ldap authentication. The SSO capabilities of Shibb are desirable in our Grid context.

Micha Bayer

It looks quite promising to me, especially for a academia-type environment where we would want to, say, give access to an application for anyone in Scotland as part of a Scottish grid. We would then not have to worry about managing our own user base but instead have arrangements with all other Scottish unis etc. This obviously relies on us being able to trace user activity and user origin/details, for example because NGS as a an end resource dictates this to us under the existing agreement. So as long as we can extract a user's DN programmatically from whithin the portal (can we?) it would be a good solution for us. That way offending users could be tracked and hopefully dealt with at their home institution.

Jipu Jiang

Flexibility, expandability, and good usability (here I don't mean it is easy to install, what I mean is the framework is easy to use) are the main benefits in my opinion. Shibboleth greatly reduces the administration pressure for the authentication and authorization. System administrators can easily and freely adjust user information on the IdP side without care about the SP side. On the other hand, SP can adjust the relative policy without care about the IdP. In this way, administration boundary is very clear: the IdP is only responsible for the user information and information release policy, while the SP is only responsible for the authorization policy of the resource. Also, adding or removing certain organization from the federation is relatively easy. There are no complex connection between an organization and the whole framework. This feature makes it have good expandability.

Anthony Stell

From my limited knowledge, the technology paradigm allows distributed access to resources and provides a nice, flexible way of transferring attributes between participants in a VO. This is a good way of limiting and exposing access to resources as and when they're needed in a VO. # (However, I'm suspicious - I think the devil will be in the implementation and it may end up being hamstrung to use only specific attributes, with the result that the flexibility will turn out to be less than it would seem.)

3. what issues can you see in a real-world production of this with 100s of users, maybe a commercial data provider, issues for the future etc.?

John Watt

see scalability below

Micha Bayer

as in my previous point - can't think of anything else just now

Jipu Jiang

see scalability below

Anthony Stell

Um, where do I start? :o) # PERMIS is a nice idea as a policy engine that interprets SAML, but the implementation is really bad. It requires a huge supporting infrastructure of technologies that themselves can be flaky (such as requiring specific LDAP versions). As far as I know, Shibboleth is similar - it requires very specific versions of supporting technology that make it quite inflexible (but like I've said, I'm not an expert in this so don't know for sure). # PERMIS also provides an incredibly limited interface to use. The way it has been implemented means that a service method is run and, if it is PERMIS protected, it will either run automatically or bomb out completely with an Authorization Exception. There is no handle, such as a parameter saying "PERMIT/DENY" that the developer can grab on to and control the flow accordingly. This led to a bit of a monumental hack in our implementation of the BRIDGES code, which I will expand upon in question 5. # PERMIS also claims to be authentication-agnostic and *technically* this is not a lie - but in actuality it requires a local LDAP server back-end that matches the DNs used in the authentication tokens (certificates)... so it isn't really. # Based on my PERMIS experience, if I was a manager of a commercial concern, I would run a country mile from implementing this. As I've said, the supporting infrastructure required is complex and I'm not convinced that the time invested in making it work was worth it. # Plus, I'm sure there are holes that I'm not aware of => it's worse than no security, it's the illusion of security.

4. what scalability issues can you identify?

Micha Bayer

A system where you have no real estimate of your potential user numbers is hard to build. This is really what it boils down to when you open things to big institutions like other universities. You would ideally want to strike a balance and devise a system which handles roughly the load you expect, but with larger numbers of potential users the load becomes harder to estimate. I guess building extensibility into the system from the start would be the answer here.

Jipu Jiang

But it does have a problem in expanding. User Role specified by the IdP and required by the SP may not be the same. So that, to access a specific grid service, IdP have to change user role to fit to the SP specification or the other way around, SP have to change its policy to fit the IdP. To solve this, the best way is to negotiate the relationship between roles, or unify different roles from different organizations.

Anthony Stell

Though I don't know for sure, I can almost guarantee that PERMIS hasn't been tested for scalability in anger (> about 10 users). I predict that when this happens, performance will slow down chronically on any given system. The code just seems to be quite bloated, and this will be a problem when combined with Globus services, which are themselves not incredibly efficient. # The LDAP server back-end that PERMIS uses would also have to be managed automatically. I'm not aware of any tools out there that do this just now, and I think this would have to be a manual process anyway, because in any security process, human verification of credentials is essential at some point. I reckon some kind of automated management tool such as this could be built, but I'm not sure how it would fit in to the whole infrastructure.

John Watt

We are adopting the PERMIS role-based Access control system for authorization, which by adopting a privilege management infrastructure we hope to avoid scalability problems associated with Access Control Lists. DyVOSE is investigating dynamic delegation, which eventually will allow separate institutions to form a VO without rewriting their local security policies (we demonstrate a simplified version of this between Glasgow and Edinburgh). NeSC Glasgow is now starting to look at performance aspects of portal technology which will feed into any investigation of high-load Shibb use now our user base is increasing.

5. how could the access control you've implemented be subverted by e.g. a bad person, or by an expert trying to get round the system for their own convenience, or by a careless user?

John Watt

Unless you tailor your firewall properly it is quite easy for someone who knows the system is there just to reference the U R L : port [XXXX AE: written like this to cheat the wiki formatting!] directly to gain access to the Shibbed portal, and bypass authentication. If a user on a public computer doesn't log out of a session he has requested to be remembered by the IdP, it is possible that someone can re-use his session to gain access to other federation sites, which would be the same problem as if his password got stolen.

Micha Bayer

the others are in a better position to anser this as this again is really Shib-specific.

Jipu Jiang

Currently, we force the user request come through the Apache Shib module first to do authentication, then the request is redirected to the Portal container. Malicious user can go directly to the portal container if they know the address and port. Such weakness must be prevented by using firewall to block certain port. For example, our portal container located at xxx.xxx.xxx.xxx:8080, and we force user requests go to xxx.xxx.xxx.xxx:80, and then any request to the container no longer go through 8080, instead, go through 80. in this case, we have to block the 8080 port on the server, and only leave port 80 open. Another possibility is that, users manage to know the URI to the actual Grid service. It is unlikely possible because the Servlet container and relative dynamic page technology are designed to prevent this from the beginning. Further, even they know the URI of the Grid service, they would still need the JAR file of the Grid service as library to invoke the service.

Anthony Stell

This is similar to John's answer. One of the major requirements to come out of the BRIDGES project was the need for usability. Biologists were simply not going to use our applications if they had to do complicated things like get UK e-Science certificates for themselves. # Unfortunately, PERMIS requires PKI certificates to get the username (see, not really authn-agnostic :o). So we had to use a test program, to use PERMIS as a look-up system rather than integrating it completely with the BLAST service, and have the service call the URI from a specified string (this is the hack that I referred to). I think this means that if anyone knows the PERMIS service URI specifically, it could be hacked - though I've closed the firewall on the PERMIS server, so it should be ok now. But it's something that could be easily missed during implementation. # As you can probably tell, I've got a fair amount of bad things to say about the PERMIS side of things! :o) However, I do feel the idea is sound - it's just the implementation that brings it down.

Oluwafemi Ajayi's response

step-by-step

1. A user connects to the portal; he is redirected to SDSS (WAYF host) for single sign-on (SSO). # Shibboleth protects resources hosted on apache but our portal is hosted on Tomcat. We use mod_jk in a sense to bridge apache and tomcat so that Shibboleth can indirectly protect tomcat services. Because mod_jk allows apache to wrap tomcat services, it will be difficult for a hacker to directly access a tomcat service. In my experience, there seems not to be a performance issue for using mod_jk as a wrapper/bridge.

2. User is redirected to the origin site for sign-on

3. User provides authentication information through their IdP (authentication site) and a SAML authentication assertion (identity handle) is returned to the portal (service provider). # In reality, the SDSS stores a cookie on the user's browser to identify the user's IdP for single-sign-on purposes. The identity handle is also stored as a cookie on the user's browser for subsequent access to other service provider in the federation. In my own view the use of cookies poses security risks such as identity theft. # If a user turned off cookies in their browser then SSO cannot be achieved. Also if a user switches browser say from IE to Firefox, they will have to re-authenticate. # We implemented out identity provider using LDAP to provide authentication services for local users.

4. The service provider request user attributes from the IdP using the identity handle.

5. Ideally user's attribute certificates are returned as SAML responses to the service provider. # Our IdP releases attributes using the LDAP server we set up. The identity handle is used to query the LDAP for user attributes. # Because of the way GridSphere (Portal technology) handles data in headers, attributes are returned as "short" strings to the portal. This currently is not secured, but we are working on a better solution such as SAML assertion compression/hashing.

6. The portal uses attributes to interact with Grid services e.g. blast service. # From our point of view there are no performance issues or further security risks from this point.

7. Security questions really are: What do you do with attributes once exchanged at the service providers' side? What type of PMI does the service provider have? Can the PMI be driven with attributes such as Roles or DN?

8. A none cookie solution will make current Shibboleth implementation more secure. The SSO of shib makes it a good authentication solution for services like blast but it is not enough for authorisation. A service like blast would have to be designed to do authorisation using attributes. At the moment we are pushing attributes to service providers, but a pull model is better in some scenarios (i.e. requesting attributes only when needed).

9. At the moment (based on push mode) scalability issues lies at the Service provider, but if in pull model, services would have serious scalability issues.

10. Security and SSO will be greatly improved if shibboleth works with OS and not browsers.