Skip to content

Latest commit

 

History

History
357 lines (269 loc) · 17 KB

File metadata and controls

357 lines (269 loc) · 17 KB

Pyramis Developer Documentation

Pyramis is designed keeping two goals in mind:

Abstractions must completely capture all aspects of multi-tier system specifications

The 3GPP specifications for 5G are used as a source to extract general-purpose networking constructs. These inform the choice of Pyramis Syntax. The working assumption is that a Domain-Specific Language that can specify the procedures listed in the 3GPP specification can specify a wide range of multi-tier system.

Pyramis workflow must be extensibile and reusable for a wide variety of multi-tier systems

At present, we have demonstrated support for variation in multitier-systems only at the level of differences in L-7 protocol. This is enabled by a general purpose header-file parser that generates a set of base types for the system. The working assumption is that a given set of base types (along with encoders and decoders) completely specify the L-7 protocol of any multi-tier system

🛠️ From Specification to Executable NF

Pyramis keywords can represent key aspects of most multi-tier systems. However, to compile to a working implementation, certain constraints have to be imposed on the inputs and outputs.

⚒ Well-Defined L-7 protocol library Pyramis supports multitier systems using the NGAP and HTTP L-7 protocols out of the box. However, custom application-layer protocol must meet certain requirements:
  • Valid messages for custom protocols must be implemented as complete C/C++ structs. These files may be stored in a utils directory in the your root folder.
  • HTTP messages must represent and access their payload strings as attributes of nlohmann::json objects. We provide an HTTP library for this purpose.
  • All char arrays are interpreted as C++ std::vector<char>. Strings, if any, must be null-terminated.
  • Header-file library must be fully contained in a /utils directory.
⚒ Notion of procedure key The NF must generate a unique procedure key for each instance of supported procedure.
  • Procedure may be simple (login request-response) or complex (SMF session establishment).
  • Complexity arises due to the requirement of demultiplexing messages received at a single interface to the correct message handler.

The notion of key and its supporting fd_to_key_map and key_to_fd_map are implementation-specific constructs that enable this message demultiplexing.

  • procedure key is used by the NF application to maintain a synchronous message processing flow despite asynchronous message ingress at an NF.
  • Your UDF File must always contain a keygen function, defined via //@@keygen
⚒ Platform-file + Processing-file architecture Where a platform file triggers kernel networking actions, and the processing file performs user-level message-processing actions
  • In the current implementation, a C++ user-level processing file is generated from the Pyramis specification.
  • In the current implementation, a multithreaded, asynchronous epoll-based platform.cpp file is generated that declares an entry point into the user-level processing code.

🛠️ NF architecture under Pyramis

I describe the NF architecture using the 3GPP 5G AMF as an example node, without loss of generality.

On successful translation of a Pyramis node specification, two key files are generated: AMF_linking.cpp and AMF_platform.cpp. These two files implement the processing-platform split.

⚒ Design

AMF_platform.cpp performs core networking functions to implement a NF that can act in a multi- threaded and asynchronous manner, as both a Server and a Client.

In this multithreaded view, on initialisation, multiple nfvInstance threads monitor their local epoll file descriptor, whose watch list contains a single listen socket bound to a globally known port. Each NF instance thread is running its own epoll wait loop. On event detection by epoll wait() at the shared listen socket, multiple threads may be woken up and there is a race to accept() the incoming connection. On accept() by a single thread, the newly created data socket is added to a thread-local map called the active_socket_map.

Another key criterion is supporting systems that implement multi-node or chained procedures. Such procedures require imposing a sequential order on asynchronous message receipts and sends. In systems with short connections, it becomes necessary to record active sockets and sockets that need to be closed.

⚒ Implementation

To achieve these goals, the platform file maintains the thread-local active_socket_map of custom Socket structs. A Socket contains attributes that describe the socket such as its file descriptor, port number, socket type, peer IP address, and whether the connection is short or long. Furthermore, each NF instance thread has a single epoll file descriptor that detects events at active sockets. On detection of an event at any socket, a callback is triggered based on the type of Socket that encountered the event.

For example, on event detection at a data Socket, the platform file passes a buffer representing the event read at the kernel socket to the processing file via the callbacks defined in the platform file for decoding, IE interpretation, UE context generation, request/response message generation, and finally triggering a send_data() to a peer NF, in whatever manner was described by the Pyramis specification.

A note on the platform file callbacks

On server initialisation, callbacks that are specified in the interface file are registered with the sockets bound to the globally known port associated with that interface. During the running of the server, callback functions bound to the initial port are registered with newly created sockets as well. These callback functions are specified as EVENTs in the Pyramis specification and translated to C++ by the compiler.

Therefore, in the two file NF architecture, the callbacks are triggered by the platform file only on receipt of the incoming message data, but are defined in the processing file.

🛠️ Pyramis-to-C++ Compiler

The Pyramis Grammar is functionally a subset of the Python Grammar. This allows a major convenience during compilation to C++, i.e. The compiler does not require a custom lexer or a parser. Instead, after pre-processing a Pyramis file to generate an equivalent Python file, we can generate an AST intermediate representation using Python’s ast.parse().

Once the AST is created, the compiler recursively visits each node to further parse identifier information, create and delete scopes, updating symbol-tables and eventually generating an intermediate representation suitable for conversion to C++ code.

⚒ Pyramis Compiler Driver

The compiler driver orchestrates the entire compilation process, right from parsing command-line options to generating C++ code. Its major functions are listed below.

Initialisation: __init__.py
---------------
1. Parse command-line, set global compiler configurations.
2. Parse C++ protocol headers, UDF File and Interface File.
3. Pre-process Pyramis Specification to Python.
4. Create AST, begin AST walk.
AST Walk: graph.py
--------
Recursively visit each node
1. Maintain scopes and update symbol tables
2. Infer and assign types to identifier.
3. Incrementally generate an IR of parsed Pyramis EVENTs, python.Actions and
python.Maps.
4. Report semantic errors
Code Generation: python.py
---------------
Generate C++ files from IR
1. Remove redundant Map accesses.
2. Generate timer_expiry_context_t
3. Emit translated EVENT definitions to processing file i.e. linking.cpp.
4. Emit Map definitions to contexts.h.
5. Emit event declarations to linking.h
6. Emit networking code to platform.cpp and platform.h and generate Makefile.
⚒ Types in Pyramis

Any reasonable networked system implementation defines and is dependent on its L-7 i.e. application layer protocol. For example, the internet communications occur over the HTTP L-7 protocol, and certain 3GPP NF-NF communications are required to use either the NGAP or PFCP protocols.

At its core, a L-7 application protocol is specified by its state machine, message types, and encoder-decoder pairs, all defined and distributed via protocol libraries. C++ protocol libraries provide C++ structs and classes in header files to define message types, and define encoders and decoders for each valid message-type in the protocol. The state-machine of a protocol is maintained by the application itself, and is a function of the underlying protocol library.

Likewise, an NF implementation, and its Pyramis specification must necessarily depend on external protocol libraries. Pyramis enables specification of these type constraints via the CREATE_MESSAGE, ENCODE, DECODE and UDF keywords.

From the above discussion, it is clear that a valid Pyramis specification of each node must be associated with a set of base types that arise from its assumed L-7 protocol.

⚒ The python.Type API

The Pyramis compiler must work extensively with message-types defined in the protocol library of the NF. Therefore, it implements a recursive python.Type data structure with an associated internal API to simplify certain operations. python.Type is designed to completely capture the recursive nature of nested struct definitions.

// class python . Type represents a recursive C ++ struct .
class Type {
    public :
        ident ,       // top - level name of the type
        thing ,       // array or simple type ?
        indirection , // count of pointer indirection
        subs          // map of attributes of this type to their python.Types

        // Defines rules for equivalent types and returns true
        // if two equivalent types are compared .
        equals ()
        
        // If a sub attribute is of type with thing thing , return the
        // list of attributes encountered in the path to that sub attribute .
        //
        // This is useful if we want to confirm a path to a nested array .
        path_to ()
        
        // If a type contains attr , return
        // its type .
        get_typeof ()

    private :
        // Returns True if a given nested asn type
        // has a particular string as an attribute ,
        // at any nesting level , else False .
        _contains ()
}

Note on Creating python.Types

Recall that python.Types are built to represent recursive C++ message-type structs, defined in the protocol header files. To give the compiler access to these structs, they are parsed to dicts during compiler initialisation via a custom C++ header file parser in pyramis/pyramis/utils.py

The C++ header-file parser performs the crucial function of creating a set of base types for the NF being implemented. For every header file in the protocol header library, the parser isolates struct definitions, serializes them into a .json file, and finally deserializes the .json file to a nested dictionary. In essence, the C++ header-file parser takes a set of header files and extract each struct/union/enum definition encountered in the system.

The work of resolving inter-file struct dependencies, i.e. nested struct definitions takes place on demand via the CREATE_MESSAGE keyword during the AST Walk. This step uses the parsed structs to generate the appropriate recursive python.Type and assigns it to identifier specified.

⚒ Type Assignment and Inference

In the Pyramis Compiler, identifiers are represented as python.Variable objects. Depending on the progress of the AST Walk, an identifier may be typed or untyped. A identifier is considered typed if its python.Variable has been assigned a concrete python.Type.

CREATE_MESSAGE, ENCODE, DECODE, UDF are the only Pyramis keywords that are allowed to directly assign concrete types to an identifier. All other actions must obtain their types indirectly by an inference procedure. Oftentimes, the compiler is fortunate and encounters typed identifiers at each action - implying that a concrete type was assigned at some point before the current action. However, on several occasions identifiers are assigned concrete type after the first usage.

As a uniform solution to this problem, the Pyramis compiler creates and maintains a hierarchy of Scopes

Note on Pyramis IR and ModuleVisitor

The AST Walk is implemented by a custom ModuleVisitor subclass of the ast.NodeVisitor class. The ModuleVisitor performs a depth-first traversal of the ast.Nodes in the AST of the pre-processed Pyramis specification, dynamically dispatch handler functions linked to each ast.Node type. Each handler function performs core functions related to IR creation and type inference.

The Pyramis IR is designed to enable easy generation of C++ code from the allowed Pyramis keywords. Keeping these in mind, the fundamental constructs of a Pyramis specification are defined as python.Event for EVENTs, python.Actions for Pyramis Keywords, and python.Maps for maps accesses. Each of these constructs also depends on their own variables being typed, hence the IR defines python.Variable and a recursive python.Type.

With this in mind, a Pyramis processing file can be parsed into a series of python.Events containing a series of python.Actions. Both of these contain sets of python.Variables representing the arguments passed to the keyword actions The primary objective of the AST walk is to generate this complete Pyramis IR.

A complete Pyramis IR is one in which every variable is typed. To achieve this target, more constructs are required such as Scopes and a mechanism for type inference. Once the generated IR is validated, it is used to directly emit C++ code based on certain code-generation rules.

📋 Scopes and Type Inference

Pyramis scopes are of three kinds: MODULE, EVENT and BLOCK, corresponding to module-level, EVENT-level and IF/LOOP-level. The ModuleVisitor drives the creation of new scopes, addition of new python.Variables to the corresponding symbol-tables.

In a simplistic interpreter design for a purely statically-typed language, a temporary stack of scopes starting at every EVENT would be sufficient to assign types to identifiers, as each would have to be declared before usage. For example, in C++ projects, calling a function without first declaring its typed signature is simply disallowed and leads to a compilation error. Pyramis EVENTs on the other hand, are not provided concrete types in the specification. A subsequent CALL (either from the same EVENT or another one) to that EVENT would similarly fail unless the typed signature is generated before. This behaviour cannot be expected in Pyramis, as assigning explicit types destroys the purpose of a simple DSL syntax.

Since EVENT definitions and CALLs are coupled together, there is a requirement for a mechanism that allows sharing of appropriate variable and their types across EVENTs. The mechanism used by Pyramis is to maintain a persistent parent-pointer tree of scopes. In this setup, we develop a mechanism for inferring types for identifiers irrespective of the order in which they are assigned concrete types:

// The modulevisitor can store references to newly created ( untyped ) EVENT
// variables in its own local scope , and store the python . EVENT
// in a global collection of events with references to its python . Variables .
//
// Similarly , each python . Action i.e. CALL is stored in a global collection of
// calls , with references to its own python . Variable .
//
// See source graph.py for full details .
When a CALL is encountered :
    if event was defined previously
    // type inference across events
    its variables would be referenced by an old scope
    and by the old python . Event stored in the global
    events map .
        if the variable is typed :
            copy the reference to the python . Type to the corresponding
            variable of the CALL that is being processed .
            ... etc
        ..etc
    ..etc

When an EVENT is encountered :
    if event was CALLed earlier :
        assign CALL variable types to the EVENT .
            If the event was typed earlier ,
                // we have succeeded ,
            if not ,
                //untyped variable will be added to scope to be
                // resolved later .
            ..etc
        ..etc
    ..etc