This article is the second of a series of articles on how to build an end-to-end machine learning pipeline for a chatbot AI that relies on a user-defined knowledge base. You can read about the description of the use case and the other articles in the series here.
In this article you will read about the dashboard backend, the development details and how to deploy the package to Google Cloud Run, a highly scalable service for containerized applications.
You can explore the code at my GitHub repository.
Disclaimer: I’m still a beginner at Golang and the software explained in this article is not complete and contains flaws that need to be solved to produce a real production application. If you are interested in extending the functionalities of this code or if you have any constructive criticism or feedback, please get in touch!
The technological stack and the design pattern
The backend is developed in Go, an open source programming language developed by Google.
The code is structured as detailed by Uncle Bob’s Clean Architecture design pattern.
Uncle Bob’s Clean Architecture is based on the separation of concerns by the division of software into four main layers. They are described below from the inner to the outer circles:
- Entities: encapsulate the enterprise business rules. The entities can be structures, functions or objects with methods. Entities are independent from the application and are software that implement concepts shareable with other enterprise use cases.
- Use cases: encapsulate the application specific business rules. Use cases manage data conversion and data flow to and from the entities.
- Interfaces: the layer that converts the data from the use cases and entities layer to the format required by some external source such as the database or the web.
- Infrastructure: the layer that is composed of frameworks, specific tools and drivers such as the database.
The separation of concerns is enforced by the dependency rule: the dependencies of the modules must only point inwards. This means that the inner circles must know nothing about the outer circles: no module, no variable, no structure, no framework, no database or interface of any of the outer circles can be referenced by the respective inner circles. One question arises: why all this overhead? Because this software architecture allows us to completely decouple the business logic from the tools, the technologies from the core business concepts, ending up with a decoupled system with a clear separation of concerns.
With clean architecture and the dependency rule, if everything is respected, the software will be truly testable both with and without UI, database, web server or other external elements and it will be independent from frameworks, UI, databases and external actors. In addition, this architecture allows easy scaling and updating of the functionalities of the application. In general, the introduction of new functionalities or the modification of those already implemented shouldn’t imply the complete restructuring of the software or modifying the implementation of, for example, object B, if B is not the target of the intervention. Well, clean architecture and the dependency rule allows us to build software that goes in this direction: decoupling software allows us to be quick and effective in testing and updating code functionalities.
The following implementation took inspiration from both from Uncle Bob’s article and from the great Manuel Kissingler’s blog post, from which this project and article take inspiration. Here you will read about my implementation of the backend for the bert-faqclass project and about the deployment on Google Cloud Run.
But first things first: let’s begin with the entities.
The entities capture the domain concepts. The bert-faqclass project is built around two entity (domain) concepts that are at the root of the use case: the FAQs, because they are the elements that the AI has to learn, and the keywords, because they are context specific and, generally, unknown to the user.
The entity structures defined in the domain module are shown in Code snippet 1:
In addition to the structs, there are operations that can be executed with the FAQs and the keywords, like adding or deleting a new FAQ. The interfaces of the operations are defined in the domain layer and will be implemented by the outer layers. The interfaces defined in the domain layer are shown in Code snippet 2:
FaqRepository and KeywordsRepository are interfaces: there is no reference to the technology nor the algorithms of the implementations as those belong to the interface layer that links the application to the database. This is how the dependency rule affects the software: an abstract interface does not refer to anything in the outer layer, its implementation is defined and injected by one of the outer layers that, in this case, is the use cases layer.
Taking inspiration from Manuel Kissingler’s blog post, let’s clear out the three w-s of the interfaces: where it is used, where its interface is and where its implementation is. FaqRepository and KeywordRepository interfaces belong to the entities layer, are used by the use cases layer and the implementations belong to the interfaces layer.
The Use Cases
Use cases implement the application logic and must depend only on the functions, objects and structures of the use cases layer and of the domain layer. In addition, since we want to decouple each of the layers from the others, use cases modules redefine objects defined in inner modules to make them compliant with use cases needs. Code snippet 3 shows use cases structures:
Here the use cases Faq and the Keyword structs are identical to those of the domain layer. One question naturally arises: why should we define another struct, identical to the one of the inner circle layer that, as stated by the dependency rule, could be “legally” imported? Well, because we want decoupled software! Indeed, even though this example does not require us to define a struct different to that of the domain layer, in general it’s good practice to decouple data: our application could evolve and we may be required to create new properties to develop our application logic in ways that are different from business logic (shared by all the business applications): the domain layer is application independent, while use cases are application specific.
The use cases package also defines the Logger interface. Here we declare only the functions that we expect the logger to have, the implementation will be in charge of one of the outer layers. This way the application is agnostic with respect to the technology, the drivers, the frameworks and the implementation of the logger: we could easily change it by creating a new logger with these methods, no code is involved in the “switch” other than the logger itself.
Here two other structs are also defined that point to the domain layer: KnowledgeBaseInteractor and KeywordsInteractor. These are the structures that wrap the loggers (that, in this case, are the same object) and the Repository handlers whose interfaces are defined in the domain layer and implemented in the interfaces layer.
Now let’s analyse the implementation of the FaqRepository, defined in the domain layer as an interface, shown in Code snippet 4:
As stated by Uncle Bob, the role of use cases is just to develop the application logic and to orchestrate data flow from interfaces to the domain, and vice-versa. In Code snippet 4 it can easily be noticed that the use cases layer just acts as a mediator between domain and interfaces layer: it moves things, orchestrates the logic and changes data formats.
The use cases layer also manages data flow for keywords. As an example, the use case function Keywords(), responsible for returning all the keywords, is shown in Code snippet 5:
Interfaces are a set of adapters that modify the data formats of the use cases and domain layers to those requested by some external agent, such as a database or the web. All the logic for data management for the external agent must be restricted to this layer: no code from the circles within this one should know anything at all about anything from the circles surrounding this layer. Indeed, if in the future we want to change the database from NoSQL to SQL, by respecting the dependency rule we will only have to update the interfaces and the drivers; the use cases’ and the domain software will get no updates because they knew nothing about the database and no dependencies from external circles existed.
In this layer are also defined the interfaces for the logger: the implementation doesn’t belong to this layer because the technology/drivers belong to the infrastructure layer. The definition of the interfaces are shown in Code snippet 6.
The backend must be accessible from the web: the interfaces layer handles the software to manage the interactions with the web. The web service module for the keywords and the knowledge base are more or less the same: they orchestrate web requests, make use of use cases APIs and define the answers for web requests. Code Snippet 7 shows the types defined for the web service in the interfaces layer:
Here you can notice how injection is used to handle dependencies: KnowledgeBaseInteractor and KeywordsInteractor are interfaces that are injected with use cases which are defined in the entities layer. One important benefit in applying the dependency rule is, as an example, that KnowledgeBaseInteractor could be easily mocked in the unit tests, which makes the web service handler testable in isolation, thus we could test only the behaviour of the web service handler itself.
Code snippet 8 shows the web service functions to manage keywords APIs:
Each web service method implements some very basic logic: they log the request, check if the query parameters are in the expected format, set the C.O.R.S. header, access use cases functions and finally build the answers. Indeed, they don’t do much and this is the point! Interfaces, if they are implemented as stated by Clean Architecture principles, just transport and translate data between layers, making the REST request unrecognizable to other layers.
Now that we have a working business domain, the orchestration of the use cases and an accessible entry point for web requests, we only need to implement the algorithms to make the software capable of saving data somewhere, which, in this case, is a database. This is done by creating the concrete implementations of the abstract repository interfaces of our domain and use cases layers. The implementation of the repository is indeed a matter for the interfaces layer because it’s an intermediation between the low-level point of view of outer infrastructure layer and the high-level business oriented point of view of the inner layers.
Once again we need to make sure that we don’t violate the dependency rule: the implementation of the repositories will depend on the infrastructure software that connects our application to Google’s Firestore NoSQL database. It’s not that the repository software is not aware that it’s using a NoSQL database. It’s rather that it implements only the high-level logic, translates data from/to the infrastructure and use cases layers, and ignores all the low-level operations required to “speak” with the database, such as instantiating the client, holding timeouts and so on and so forth. Therefore, our repository will take care of all the high-level operations and will be kept in the dark about all those nasty infrastructural details. Once again, the repository code for the knowledge base and the keywords is more or less the same and, to avoid replicating lists of boringly identical codes, in Code snippet 9 is shown only the implementation of knowledge base repositories.
Up to now we have defined only the implementation of the repositories but we don’t yet have a handler at our disposal. Well, no worries, it’s easy: Code snippet 10 comes in to help: here all the handlers required by the software are defined.
Finally, we are ready to discuss the very outer layer, the infrastructure layer. As explained in the chapter “The technological stack and the design pattern”, this layer is about the low-level, often the driver/client dependent logic, software of our application. Our application requires this layer for two reasons: the logger is application dependent, indeed, it uses an external package called logrus, and the database connection requires low-level logic.
Let’s start with Code snippet 11, which implements the logger:
The low-level infrastructural code required to manage Firestore connections is depicted in Code snippet 12. Here, there is no implementation of application and business logic: the data received and returned by this implementation is in a very basic format; a map[string]interface. This is because the responsibility for changing data format belongs to the inner interface layer.
Putting it all together
Now that we have all the pieces, it’s just a matter of putting things together. We have to deal with some construction work because of our use of the dependency rule. The repositories must be injected with the db handlers and, in turn, the interactors must be injected into the web service handler. Finally, the http server must be started. Code snippet 13 shows these steps.
The backend is deployed using Google Cloud Run. In order to use this service, it is necessary to build a container and the technology used is Docker. The Dockerfile is shown in Code snippet 14:
The docker container is created by means of a build stage and a production stage. Given the definition of the Dockerfile, Code snippet 15 is responsible for building the docker container and pushing it to the Google Container Registry, a repository of docker containers associated with the project.
Once it is pushed to Google Container Registry we are ready to run the deployment with Google Cloud Run. The script is shown in Code snippet 16:
We have seen how to build a backend for a very simple application in Golang, where the code architecture is built according to Clean Architecture principles and follows the dependency rule. We have also examined how to build a container for the web application, how to push it to Google Container Registry, and how to deploy it with Google Cloud Run.
If you are interested in the description of the use case and the other articles in this series, click here.