The fields of Semantics and Ontologies has consumed my life for the past several years. Why?
They are enormous topics with applicability in every field, not just formal logic or pure academics. They are fundamental to the way we interpret and reason with the world around us, and the amount of information about the world we are to absorb is staggering. Our mobile devices, our activity on the internet, and the increased digitization of our records: have all contributed to the exponential growth in data. Sorting out all this data is an ever increasingly difficult task. And it’s not just the growth of data that is increasing but also the way it evolves from it’s initial capture-state to how we use it.
Fundamentally, when analyzing data you must determine 3 things:
1) What is the nature of the data? Is it:
- user records
- patient records
- news articles
- individual tweets
- sensory data from weather stations?
All must be treated differently.
2) Are there any patterns in the data?
3) How does the data relate to itself and to other information sources?
- What does the data tell us and what can we learn from it?
- Can the correlation of patient symptoms tell us something about the causation of their condition?
- How accurate are weather predictions based on past records?
Question (1) deals with questions such as (in the weather domain) "What is snow?", ”What is temperature?”, and maybe even “What is low temperature?”. Humans have no problem answering these questions, but machines run into several issues mainly becuase “Snow” is just a 4 character label. This label is not enough to encapsulate all that is “Snow”. The technology behind Linked Data begins to reveal this information. At the very least, it provides a point of reference as an URI for an intended meaning of “Snow”. This particular DBpedia link provides meta data that is associated with the concept “Snow”. Now anyone in any language can reference Snow and mean the same thing.
Question (2) is addressed by data-mining algorithms, which reveal predictive patterns. For example, the relation between snow and low temperatures is straight forward. The fact that we observe snow becomes a prediction of the current temperature, mainly that it is low. To a computer algorithm, “Snow” is just a label that has 4 characters “S”, “n”, “o” and “w”. A data-mining algorithm might see that whenever the column “condition” is “Snow” the column “temperature” has a value < 0, and derive the rule:
If "condition" == Snow then "temperature < 0"
Question (3) deals with how concepts behind the data relate to each other. Currently this type of knowledge is known in computer systems as “Business Logic”. It is the part of software that interprets input and determines what to produce as output. Often this is described as the process between the data and the user interface. Software frameworks that incorporate this distinction are called Model-View-Controller frameworks. Model is the data, Views are the interfaces, and a Controller is the business logic.
Business logic is a set of concrete steps that must occur whenever certain conditions are true. Anyone who is implementing such systems must have very precise specifications, or understand what these concepts represent and how they relate to each other. A system like DBpedia and the appropriate URI will tell you what a concept is, and provides a good amount of metadata. To understand how this concept is related to the rest of the system, it helps to know how it fits into the greater scheme of things.
A) It may exist as a node in a taxonomy, a hierarchical class structure such as:
… and so on.
B) It may describe a process such as:
C) It may also describe business triggers:
If a website visitor’s Ad “X” criteria is matched, display Ad X.
Once a developer understands what data models will be interacting with a system, the proper functionality may be built around them. It simply is not enough to program a system that “given A, produces B”. In today’s ever evolving applications, APIs and NoSQL data models, it is not sufficient to develop a system from a set of requirements. The application development cycle must be more dynamic. Agile Software Development is a great way to develop applications in a fast pace, quickly adapting environment. One of the methods of specification gathering is not to receive a set of requirements from a middle man, but to sit face-to-face with the person that is determining the requirements, knows the use cases, and knows who the users are. In other words it’s the person who knows the motivation behind the system and what problem the system is suppose to solve. While tying the models to functionality is an in depth topic on its own, having a better understanding of the data beyond columns and relational tables is important in understanding the problem domain.
Interpreting the Data
The two components, data and business logic are currently described by different languages. For example SQL for the data and Ruby for business logic. But is it possible to use a single language for both? Such a language would require enough expressivity to accomplish this. It would require the proper semantics to describe a Client, a Bank Account, and the types of interaction one can have with the other such as Withdrawals, Deposits, Overdrafts, etc.
Ontologies provide this type of functionality. To analyze a data model, we need to recognize individual elements of the model itself.
The field of Mereology deals with elements that are “parts of a whole”:
An engine is part of a car.
The field of Set Theory deals with class membership:
This object is a car…. This car is a Honda.
What I’ve done is express some information about our observation …. “Car”, mainly that it has an engine and that it is a Honda. My ability to express this successfully is dependent on the audience understanding what I’m saying. Understanding in this case means two things:
- The audience understands what the individual words are.
- The audience understands what the individual words mean.
Now that we know some things about a “Car”, we can reason about it to infer more information. For example:
The owner of a Car owns the engine inside that car.
We can also infer that:
The owner of this Car owns a Honda.
See my previous post on the semantics and syntax of these examples.
I’ve just used a particular syntax to express these relations. Notice that this syntax doesn’t say anything about the semantics of these relations, it’s just a way to express these statements. To give terms like “Car" meaning, we assign a reference (such as URI) and related concepts (with an ontology). From this we can begin developing the types of interactions a "Driver" can have with a "Car”.
Driver can Drive a Car.
Driver can Steer a Car.
Car has a Break Pedal.
Driver can Press a Break Pedal.
In a broader sense, we call these interactions relations. These types of “common sense” relations can be stored in an upper ontology. Someone developing a “Car" or "Bank" application can use these types of "Driver and Car" relations to better understand the application’s domain, and reuse existing relations. If need be, the relation "Drives" can be applied to trucks, and other vehicles. The term "Drives" is related to "Rides" and "Flys”, so many of the relations (such as “Steers" can be applied to "Motorcycles" and "Airplanes”.
Semantics of Business Logic
Some critics will say that a software package and a database is far more efficient then reasoning with a set of ontological objects and relations. These criticisms are valid, but improvements in semantic technologies have made it possible to adopt semantics into modelling not just data, but also processes embedded in business logic. A recent post by Fiona McNeil titled “The semantic web is here. Is your organization ready?" addresses this type of adoption of semantic technologies.
A reasoner that infers output based on ontology relations and individuals (representing data) can be optimized for a set of queries, called query rewriting. Such concepts have been thoroughly described in the SPARQL community. This is similar to the optimization that occurs during the compilation of a programs source code. The number of relations handled efficiently by any one system has been increasing in the recent years, with a list of triples stores compiled by the W3C. During the compilation process of any programming code, any good compiler will run an optimizer that makes the code more efficient. Consider that Facebook compiles the entire facebook.com application written in PHP into a single compiled binary for runtime efficiency gains.
The dynamic nature of Web 3.0’s applications and data requires a more agile set of tools. In the same way that IDE's have made development easier and more efficient, semantics will allow engineers to model data and business logic quicker as well.