Constructed vs. Collaborative Catalogs

After our recent meeting in Zuni, T.J. Ferguson outlined a number of different types of information that would be needed to make the database useful to archaeologists. These included better provenience of the objects including the site of origin, its more detailed location within the excavation site, even down to the feature number. T.J. also requested relationships between objects in the sense of them being found in the same feature, grave or layer. He also thought that any information about date (absolute or relative) would be necessary to make the database useful to archaeologists (T.J. Ferguson, pers comm).

Being a trained archaeologist myself, though now a sociologist of technology and information scientist, I understand the importance of these types of information for archaeologists working with collections data. Of course, the fuller context is important to anyone using the information, especially Zunis, to avoid misconceptions or to recognise mistakes. However, I think that T.J.’s request highlights a fundamental misunderstanding of what we are doing with Collaborative Catalogs — that is the difference between a Constructed Catalog and a Collaborative Catalog.

It is not my intention, of course, to single T.J. out on this, as I do not think that he “misunderstands” the system in this sense. His points are perfectly valid in respect to the usual process of constructing a shared catalog. However, his comments brought home to me, very clearly, how easily a confusion may arise between the usual approach of constructing shared catalogs, and our very different approach of creating emergent, collaborative catalogs. So it is not that T.J.’s requests are “wrong”, but that they show how we, as a project, have failed to clearly define what we are doing differently. So, I hope that no one minds, especially T.J., if I use this blog to have a go at making things clearer.

T.J.’s points about including important information about archaeological context is valid generally, of course. The more contextual information available the better for all who wish to use that information. However, the inclusion of common fields, a common ontology, for the shared information points to a model of constructed, shared databases. This model of sharing information has been the norm for many decades, especially when sharing information between associated disciplines and institutions, such as archaeologist and museums. In this model, two, or more, interest or specialist groups get together and thrash out a common data model which will include all the information that the different partners need. This should include categories of information that is needed, but may not yet be available. It may even include information that could be seen to be needed in the future. The mode requires a single common ontology, or information categorisation, that accommodates all the participating partners’ needs, as far as is possible. After such a model is worked out, to everyone’s satisfaction, it is implemented as a single platform for all the access and use.

Today, this mode is often realised as a portal (see Why we are not interested in Portals and Why we are not interested in Portals – 2), the best known for our purposes here being the Reciprocal Research Network (RRN). The RRN has followed this mode to the letter, working with the First Nations in British Columbia to work out a common information model, then creating a shared portal for all to work with and share. Again, it is not my intention to single out the RRN, but, being an innovative and highly commendable project, and one that is setting high standards for such work, it offers an excellent example of this mode of constructing shared catalogs.

The problem with this mode is, and has always been, that it is highly circumscribed, reductive and commensurating. It is circumscribed in the sense that it will necessarily limit the parties who can participate in its construction. Collections are of interest to a vast number, and diversity, of interest and expert groups. Each constructed, shared catalog can only accommodate a small sample of these interests. It is reductive in the sense that even though it is circumscribed to a single or small number of interest groups, it has to create a single model to accommodate both the collecting institution, and the interest group(s). It also has to reduce the diverse needs that always exist within an interest group to a single model, thus being at lease doubly reductive. Finally, it is commensurating in the sense that it assumes that the meaningful identity of the object, the information that is used to describe the object, can be commensurate across different interests, intentions and uses.

Creating Collaborative Catalogs (CCC) is a project that explicitly seeks to overcome these limitations of a constructed, shared catalog for the benefit of all the groups interested in using collections for research and study. Therefore, what we are explicitly not doing is creating a constructed, shared catalog of any sort.

So what is it that we are doing? First of all, to overcome the practical problem of providing information for all of the possible interests around a collection, CCC has devised a whole new model of working with collections information. Drawing on developments in social computing (Boast, Bravo and Srinivasan, 2007), PuSH technologies (PuSH Technology) and emergent systems theory (Turnbull, 2007), Collaborative Catalogs seeks to overcome the problems of centralised databases by distributing both the information and its systemisation to the interest groups. In this mode, rather than sitting down and working out a common, and much reduced, information model, the collections data is given to those individuals or groups who need to use it, and it is they who do the systemisation locally.

So, to go back to T.J. Ferguson’s request, CCC would say, first, that collecting institutions should provide all information that they currently have, but that. second, rather than asking the collecting institutions to be the information hub — which is often beyond their mandate or resources to accommodate, we will PuSH the information to archaeologists and ask them to extract or add this information in a way that systematises and/or extends it for archaeologists. Then, we ask, but do not demand, that the archaeologists offer up this locally systematised  and extended information publicly for others to use, including the original collecting institution (if they so desire). Of course, this is not a task just for archaeologists. The CCC makes it possible for all to acquire the information and to systematise and/or extend as is appropriate for them and their knowledge community.

In creating a distributed Collaborative Catalog, in this way, we not only put the information directly in the hands of the expert and interest communities, but we send the information into that community, into its local systems, into the technical and research contexts that can best systematise and extend it for their own needs and uses. This overcomes the three problems with constructed, shared databases in that CCC does not:

  • circumscribe the communities that can make use of the information, but diversify them;
  • reduce the information set needed by these communities, but accommodating their extension within many local contexts;
  • commensurate the information and its use, but to allow for information and use to emerge from a diversity of interest communities.

Such an approach will not only, we believe, engender much more extensive use of collections and their information, but it will create an environment that is more in keeping with contemporary research environments. Environments where many new ways of contextualising and interpreting collections emerge from a diversity of communities and are shared as local solutions to shared problems for others to extend and develop further. In this way CCC is a both a radical departure from conventional data sharing practices, but is much more consistent with developing modes of transdisciplinary research in the sciences (Cameron and Mengler, 2009; Wicksona, Carewc and Russell, 2006; Biagioli, 2009). Like these emerging transdisciplinary research modes, CCC also seeks to vastly extend the expert communities that participate in problem solutions to include source communities and other public stakeholders.

So to finally come back around to T.J.’s request, the answer is yes, but these categories should emerge through an ongoing dialog and co-development by you within your own local archaeological information systems, those of the collecting institutions and the other expert communities. Only in this way, we argue, can we actually achieve the need for local expert systemisation on the one hand, and broad transdisciplinary information sharing on the other.

ZUNI PuSH – App or API?

I was just answering some questions from Beatrice — our developer at Trento — about the ZUNI PuSH system that they are fast completing. We were talking about the filtering feature and how complex or simple it should be. Naturally, I said it should be simple mostly because I think simple search features are more powerful. Simplicity accommodates a vastly larger diversity of queries, and their intentions, than do complex search systems (contrary to theory). However, I made a point to Beatrice which got me thinking. I said that we “should keep in mind that this process provides the subscriber with a filtered feed for them to process. That means that we do not have to, and even probably should not, provide” a complex filter. The point of the system, I went on briefly to say, is that the subscriber can do what they like with the data as they are PuSHed (sent) the filtered feed data. In this way, what we are doing is “much more like and API than an APP”, I said.

So this is what got me thinking. We have been treating this system as though it were an App (a Web Application), which it partially is. However, it is an App that allows you to work with feed data as though it were an API (an Application Programming Interface). What APIs do is to allow for an App (App_1) to communicate with another App or system, or many other Apps or systems, so that data from these Apps or systems can be used, modified and reapplied in the App_1. APIs also allow for data to be passed from the App_1 back to the other Apps or systems. APIs are everywhere on the web, but  APIs act in the background, behind a front-end of an App that the user engages with, so users don’t usually know they are there.

The ZUNI PuSH system, however, is a bit of a hybrid. Though it is a front end App for people to publish and/or subscribe and filter feeds — in this way nothing unusual, it actually sends the subscriber the feed data from the filtered feed like an API for the subscriber to process. In this way, the ZUNI PuSH system is like an API.

This may seem like a “technicality”, and it is in one sense, but it is actually critical to understanding what it is we are trying to do. The fact that it is both an App and an API shows how this approach is fundamentally different from other access systems such as readers, portals or catalogues. These traditional access systems see information as something that is accessed and referred to. The ZUNI PuSH system sees information more as it is used in mashups — as something that is used, transformed and recontextualised by the user. The ZUNI PuSH system sees information as a resource, not as a product, and this is a critical difference with far reaching implications.

What is wrong with PubSubHubBub.

I have been having a great set of skype meetings with our team in Trento (Fabio, Marcos and Beatrice) about the Publish and Subscribe system. We are making excellent progress and have the system fleshed out already. However, from the beginning we have all been aware of some limitation of the usual publish and subscribe models for both RSS/ATOM and PubSubHubBub. We are working with Trento because of three limitations in particular: Size of feeds, lack of security and need to filter output. Today, in anticipation of our meeting Beatrice posted her list of PubSubHubBub limitations which I think are the best to date from anywhere. With her permission, I am posting them here. By the by, for these reasons, we are not using the Google Code PubSubHubBub implementation, but will be using Apache ActiveMQ (http://activemq.apache.org/getting-started.html).

PubSubHubbub limitations

by Beatrice Valeri (xinecs87@gmail.com), Trento University Computer Science, Italy

PubSubHubbub is a protocol for pushing updates of atom feeds. It is not useful for the UCLA Push project because it is thought for very simple things and it doesn’t cover many of the requirements.

1.   When a new subscriber arrives, he should receive also all messages written before his arrival. This is not completely supported by PubSubHubbub. If the subscriber subscribes before the feed is published, then, after the feed is published, the subscriber receives all messages written before publishing and all the following updates. Once the feed is published, new subscribers receives only the updates.
2.   There is no security on the hub. Anyone can subscribe and publish.
3.   Subscribers have no way to subscribe only on some messages from a feed. Filtering have to be done by the subscriber after messages are received.
4.   PubSubHubbub is not able to manage feeds that are already big at the moment of publishing. When a feed is published, the hub reads it completely and parses it. If the feed is too big, the hub is not able to parse it.
Feeds have to be broken into pieces and each piece has to be published.
5.   The subscriber has to know the feed url in order to subscribe to it. This is not a real pub-sub system since subscriber has to be aware of publishers and has to subscribe again if a new interesting feed is published.
6.   With PubSubHubbub, a feed can be published only if there is already a subscriber waiting for it. This is not what we want.

Why we are not interested in Portals – 2

From my last post, I have worked up a model of a Portal implementation so we can now compare our model with that of a Portal. Of course, there is a lot more functionality in your average Portal than I have modeled, but the point remains as this model fits the fundamental structure of the vast majority of Portals.



If we look at the Portal model we see that all of the information flow is from the User to the Portal. There is no information moving back to the User, except that they are looking at on the Portal. The control of the presentation, organization and interface is with the Portal and stays with the Portal. Even the comments don’t move back to core Institutional database, but remains within the Portal. As the Portal is almost always under the control of the Institution anyway, this means that information not only moves from the Users to the Institution (no change there), but also the control of the way the information is managed, presented, accessed and ordered remains with the Institution as well (again, no change there).

If we look at our model, there are three fundamental differences. First, the information that goes to the hub, is not organized, presented nor ordered there, but simply PuSHed from there to the Subscribers’ servers. It is at the Subscribers’ servers that the information is organized, presented and ordered many times over, and in the local context. There is also the key difference that comments are not simply attached to a Portal instance, but are available to return into the Institution as part of the object’s primary record. Most fundamental of all, though, is that our model is reversible. Any Subscriber in our model can become a Publisher, and any Publisher a Subscriber. Knowledge developed through the local use of Institutional information, can be PuSHed back to the Institution to enhance their documentation of the object. A Portal cannot be reversed as it is not a distribution system, but a broadcast system.

Why we are not interested in Portals.

I thought we would lay down the gauntlet now, even though we are a bit of a way off demonstrating our work. I have uploaded two models for our systems. The first is a User Model, which is a demonstrative diagram for a general audience that shows what our system will intend to do. The second is an Object Model, using UML, which is more for developers. The two more or less depict the same system, though. I should emphasize, for those developers reading this, that the Object Model is not a full Class Model, but more of a “conceptual” Object Model.

What I want to show, however, is not simply what we intend to do, but to also explain why this is different from a Web Portal. In a recent discussion between the IT Officer for Anthropology at the American Museum of Natural History (New York), Jim and I, it became clear that what we are doing could easily be confused with a Portal. Or, worse in my mind, that it could be assumed that there would be little difference between what we are doing and a Portal. In fact, I think that what we are doing is fundamentally different, and even opposed, to what Portals do. Here is why.

If you will pardon me drawing a definition from Wikipedia, a Web Portal is “a web site that function as a point of access to information on the World Wide Web. A portal presents information from diverse sources in a unified way.” (Wikipedia, emphasis added). Wikipedia goes on to say that a Portal “provide[s] a way for enterprises to provide a consistent look and feel with access control and procedures for multiple applications and databases, which otherwise would have been different entities altogether.” This is the key difference to what we are trying to do and what Web Portals are trying to do. Whereas a Web Portal takes a diverse set of resources, centralizes them and gives them a single “enterprise” identity, what we are trying to do is the opposite. We are trying to do is to take a diverse set of resources, distribute them as filtered sets to diverse expert communities, so that these filtered sets of resources can be localized and used in completely different ways.

The difference between a Web Portal and our approach is not simply superficial, but goes right down to our understanding of what Knowledge is. Where the assumption about knowledge in a Web Portal, and most “knowledge systems,” is that knowledge is an accumulated resource, a set of commodities that gain their power as knowledge through their packaging or their organizing, we accept a different, less colonial, view of knowledge. We see knowledge not as a set of proscriptively ordered and presented resources, but as a personal, local and community achievement. Knowledge, for us, is something you do, and do skillfully, not something you acquire, proffer or stockpile.

So the difference may seem subtle, even trivial, but is in fact fundamental. A Web Portal seeks to share information resources between individuals and communities through a unified, proscribed and centralized system — an enterprise system much like a museum or archive. However, what we trying to do is to share information resources between individuals and communities by distributing those resources into the diverse local systems so that they can be directly used to build local knowledge. While a Web Portal is, by its very nature, a system that creates a unified identity for information, and information use, through its “enterprise” identity, our system seeks to fundamentally undermine this universalizing and commodifying approach to knowledge by radically replacing unity and centralization with diversity and localization.

More notes from meeting 11-30-10 – system details & focus group ideas

This is a continuation of my sketchy notes from our mini-meeting at Zuni on November 30, 2010. We spent much of the afternoon in a discussion about sharing protocols and other details of the Zuni local system, and the protocols that will drive the links between the different local systems. Lastly, we had a discussion of the focus groups for evaluating the system.

We brainstormed about user profiles, and the kinds of information that might work to identify different users and drive the protocols of access. After coming up with a lengthy list, it seemed like a pretty invasive & complex amount of information to gather from users — might discourage people from signing up.

- Probably only 1-2% of objects need that level of careful restriction
– just mark it ‘unavailable without assistance’
— can only access it at AAMHC with help
– for right now, just set sensitive things aside

we can run a permission set out of FileMaker
another idea – restricting access to local IP addresses

- Robin also emphasized that we need a word that doesn’t subjugate the new stuff to the existing catalog
- we discussed different kinds of updates that might emerge, and whether we should distinguish between different updates.
Categories we brainstormed: (which could also emerge from the actual work)
- correctives / corrections
- events
- new / additional information
- research
- relations / genealogies
- disputes / discussions

- idea – part of the system evaluation could be from the self-ID of additions – does a pattern emerge?

- impact could be just access, or it could be making flexible systems, or user-generated content

- how do we roll out the system?
– events in the community
– getting adults on board
– permanent kiosks at IHS, etc

Photos / videos / audio – what if young people don’t know the protocols about taking photos or video?

Focus groups – guiding questions
themes of what we’re interested in
- experience of the system
- access to patrimony
- community dialogue

question ideas (for anonymous q’s in system)
How easy is this system to use? (answered on a scale of easy to hard)
Tell us about it…
How much do you feel you’ve learned from the system?

Notes from meeting at Zuni 11-30-10

What follows are my sketchy notes from our mini-meeting at Zuni on Nov. 30th, with Ramesh, Robin, Daf Harries, Jim, Octavious, Curtis and myself. We also had a couple of guests as well – Cynthia Chavez-Lamar from the School of Advanced Research in Santa Fe, and Miranda Velarde-Lewis, a Zuni grad student in museum studies at UW.

- Ramesh noted that as far as how he sees it, it’s really important for us to look at how the system inspires a social motivation for contributing media & mashups — what can we put in place to make sure that happens?
- a lot is happening in terms of collaborative catalogs with unofficial partners (like School of Advanced Research) — not necessarily online
– for example, at SAR, the whole staff is really engaged – coming back to check and proofread what was added to the catalog.
- Jim recently spoke at AAA (Am. Anthro. Association meetings) – he says that he raised a lot of eyebrows when he said “we are the source community for your museums, so you are satellite of our museums – extensions of our community”
- Cynthia told a story about two aprons in different collections (one from the Zuni Day School collection, the other from DAM) – wouldn’t it be nice to point one to another
- Jim said that when we say it’s about power, we want the system to mirror the way knowledge is organized within the community
– we might be taking risks, but if we don’t do it, someone else will (and they might make stuff up, or be wrong)
- not just ‘setting the record straight’ but having a continued, ongoing sovereignty over the collection – governance
– Zuni decide what gets shared back, what goes back
– under your name, under your control
- The question was raised – How do we deal with technocracy? — don’t create one
– setting it up so we don’t have to say ‘you can only work with us if you have this kind of system’
– doesn’t require local systems to do anything a particular way
— museums have different needs & audiences
– our goal is just the in-between links
- Most of what exists elsewhere are portals
– when source community information comes in, it’s placed subordinate to core museum info and how it is classified
- How they incorporate information coming back is up to the museum, its culture, etc.
– we would hope they change it at a deep level
- Reciprocal Research Network – it’s perceived as inclusive, but really they are placing shared info on the side
- no protocols, no sense of what’s appropriate or not
- the museum is always the editorial filter – we say that’s not their place to decide what voice comes through
- don’t create a system that’s so rigid that there’s no room for innovation & creativity
– local uses – re-empowerment

Moving on to the technical side of things — there are two parts
1) each endpoint is being treated as a ‘local system’ – governed locally
– the Zuni prototype exists – the FileMaker Pro DB
2) open-source sharing system – local controls and formats stay local

- the key insight here – any institution sharing information is publishing a database of its own info
– not necessarily the same thing as the whole catalog – some bits are left out
- CouchDB is a mechanism by which you can synchronize two published databases

[ two slides from Daf's presentation that describes how the CouchDB system will be used to link all our local systems]

- using CouchDB is better than what we were thinking previously (pubsubhubbub) — in that, all kinds of formatting and coordination would be required — data going into a central hub
– the other problem – requires a hub somewhere external, outside of the control of local systems
* what we build is going to be the piece in between (say) FileMaker and CouchDB *
- we’re using CouchDB (which exists already) and building the in-between bit
– key thing – your CouchDB database is still inside of your control

- idea to build in indicators to send flags about updated records – fixing mistakes or making changes
ex: user might check a box saying “Notify museum about this change – they should know about this”

* indicator to be added – indicating similar or related objects at other partner museums
- Open question: how the partner museums are going to subscribe to each other, not just to Zuni DB
– we haven’t really discussed this yet
- Cynthia said that it would be very useful if when AAMHC updates (say) a stew bowl, other institutions with similar objects are notified
- some institutions (say, SAR) might want ALL updates, while others might not
- instead of a portal pulling it all together, we’re creating something more like a web of collections

- this may not change the ways that libraries, museums, archives think about sharing everything – “knowledge is free”
– but it might make them more uncomfortable about the idea
- answering the question “How do we know what they say is any good?”
– range of expertise – different kinds

Subtle difference – really powerful thing – not just about sending messages out and receiving them
- it’s about the data actually coming here to Zuni, to be modified / reused / critiqued
- will open eyes of museums – looking at items in an entirely different way

- the advantage of digital here – data – is that data can move around much easier
– creating a collection for comparison that doesn’t exist in the physical world
- as we expand the partnerships further, the good news is that the threshold for entry is pretty low and the relationship isn’t automatically two-way
– we can stipulate that cultural advisors have to visit before museum can subscribe to AAMHC
- building a series of linking interfaces
– for Argus, mySQL, Oracle, FileMaker, etc
- but there’s still the issue of how institutions handle updates within their data structure

* we will need to write up the parameters for participation for new partners *
– “here’s what you need to decide internally for your data structure & updates”
Thinking hard about categories and what is interesting – what we’ll want to do with the info

Later in the day, we talked over many details & decisions that we need to make to implement the system, which I’ll cover in another post.