This post demonstrates how to add further structure to data after the initial items have been (uniquely) identified and committed to your persistent store. The core idea here is that once you have items uniquely identified, you can overlay a structure (or any number of structures) upon them as desired.
These structural overlays can also be made to interact as much (or as little) as necessary to address the question currently under consideration.
For example, given four cell lines (this is taken from a brochure from the Charles River labs web site):
| Cell Line | Species | Organ | ID |
|---|---|---|---|
| SW780 | Human | Bladder | CL-1 |
| Hep3B | Human | Liver | CL-2 |
| B16 | Mouse | Skin | CL-3 |
| Madison109 | Murine | Lung | CL-4 |
(note: not all relationships need to be specifically listed in the parent table)
This gives us the following structure:
Upon which we can overlay a set of relationship showing the source organ

Or source species

Now, lets say we add a new cell line
| Cell Line | Species | Organ | ID |
|---|---|---|---|
| SW780-1 | Human | Bladder | CL-5 |
Giving us

We may later realize that CL-5 was derived from CL-1 and just use a separate parent child relationship table to store the information
| Parent | Child | Relationship |
|---|---|---|
| CL-1 | CL-5 | "derived" |
(note: "Root" cell lines are those that do not appear in the Child column or do not appear in this table at all)

This sort of thing can be generally extended and need not be a strict tree:
| Parent | Parent Table | Child | Child Table | Relationship |
|---|---|---|---|---|
| CL-1 | Cell_Line | CL-5 | Cell_Line | fusion-parent |
| CL-2 | Cell_Line | CL-5 | Cell_Line | fusion-parent |
Obviously the richer the relationship, the more likely you are to move to a table specifically designed to capture that information.
| Mixture Component | Mixture Component Table | Mixture | Mixture Table | Amt |
|---|---|---|---|---|
| C-1 | Compound | C-9 | Compound | 0.1 |
| C-1 | Compound | C-9 | Compound | 0.1 |
| R-2 | Reagent | C-9 | Compound | 0.1 |
As these structures build up it is easy to then interrogate the information about our available cells.
Query: What mammalian cell lines do we have? Procedure: Traverse from the mammalian node and collect all cell line instances

Query: What cell lines are derived from C-1? Procedure: Find cell lines derived from C-1, find cell-lines derived from them (recursively), collect all cell line instances.

The overall pattern is pretty straightforward and is can be processed with standard graph algorithms
See also: Considerations in developing a middle distance ontology

