Concepts

FeatureFrame (prev: Table)

Ralf’s abstractions are centered around tables. Tables are defined by some schema, parent tables, and an operator to transform the parent tables. Sources are a special type of table which do not need parent tables or an operator, as they represent raw data sources.

Source

Ralf ingests data through sources, which can be either streaming sources (e.g. kafka topics) or static sources (e.g. DB table, CSV file). Sources are a special type of table which don’t have parent tables, but can otherwise be treated in the same was as tables by setting them as queryable or synchronizing table values with an external connector.

KafkaSource

You can ingest streaming data sources using a KafkaSource:

table = ralf.from_kafka(topic="topic")

The table schema will be auto-generated from the Kafka records.

HTTPSource (todo: add listener?)

TODO

CSVSource

You can ingest static data sources using a CSVSource:

table = ralf.from_csv(filename="file.csv")

The table schema will be auto-generated from the CSV columns.

PostgresSource

TODO

Custom Sources

A custom source can be defined by extending the Source object.

class MyCustomSource(Source): 

    def __init__(self, schema: Schema): 
        super.__init__(self, schema) 
        # TODO: initialiation code

    def next(self) -> List[Record]: 
        # TODO: define how to return batch of new records

ralf will call the next() method to pull new data.

Transform (prev: Operator)

Operators define how to transform parent tables into new tables. Operators are implemented as Ray Actors (TODO: link), and define transformations for updating table values with parent table updates.

class MyOperator(Operator): 
    
    def __init__(self, schema: Schema): 
        super.__init__(self, schema)
        # TODO: initialization code

    def on_record(self, record: Record): 
        # TODO: transformation code
        return Record(...)

ralf supports a number of built-in operators, such as:

  • .window(slide_size: int, window_size: int)

  • .groupby(key, value)

Connectors

Feature rows are stored in ralf internal tables, but can also be synced to external state connectors. You can query the state connectors for feature stores directly, rather than using ralf’s client API.

Redis

TODO

SQLLite

TODO

Custom Connectors

TODO