Fetchers⚓︎
Overview⚓︎
The fetch request can be triggered manually (via a console script), on a timer basis (cron job, etc) or semi-automatically when a new entity is created for a category via the frontend.
Fetch phase⚓︎
Data is retrieved from one or more sources and stored in a temporary location.
Ability to fetch a whole database dump, batches of data or specific entities.
While every source will likely require a different fetcher, it's likely several sources share the same kind of interface. For example, many could provide data in JSON or CSV or SQL dumps. Those parsers will be generic so they can be reused across fetchers.
The retrieved data is then transformed into a temporary format which can be directly used by the other steps. This temporary format is very likely going to be a serialized Lua object or, eventually, a REM structure.
Each fetcher uses one or more data transformation templates, in order to provide just the information requested. This is very useful as a first filtering step, because the retrieved data often contains unnecessary information. It also translates property and column names, converts data types, etc.
No semantic processing or filtering is performed during this step.
Processing phase⚓︎
Data processing templates are used to process information we already know the structure of. For example, many sources already provide data in properties/tags and we can directly convert those to hypertags.
The rest of the content is assumed to be unstructured and needs to be processed further.
Relevant keywords are first found using the most suitable algorithm. The immediate context (adjectives and objects) in the same sentence are then associated with said keywords.
If the keywords are found in the data processing templates, then they're considered a tag and the immediate context is used as content.
Any remaining keyword which is not found is inserted in a special tag group to denote tentative parsing and data uncertainty.
Insertion phase⚓︎
The processed content is inserted into the corresponding entity(-ies).
Each target entity is created if not existing and the new content is inserted or updated as necessary.
Content which involves other entities and/or categories is then propagated.
Potentially have lists of specific tags or keywords used to quickly find the categories which need to be involved in the insertion of the new content.