Distributed Queue – DQ provides durable sequenced delivery of millions of records per second to distributed servers in near real-time

DQ is an essential component in large scale web or eCommerce sites that need high scalability,  ultra high availability and super fresh data.

DQ is available for use by corporate clients who purchase consulting services to adapt it for use in their environment.  Call phone-number-for-bayes-analytic to discuss using DQ in your company.    DQ is  the fastest, safe, low cost and low maintenance way to incrementally move data between transactional master databases and  Analysis, metadata, product information and search servers where you need unlimited horizontal scalability.

We are willing to help you retrofit your existing architecture to use DQ and solve your scalability and availability problems.  Contact phone-number-for-bayes-analytic for more information.  

DQ also supports large scale reliable data delivery across flaky WAN environments.    In a LAN environment DQ delivers over 150K record inserts per second and sustained read loads over 1 million records per second from moderate hardware.    The DQ interface is standard HTTP and it supports dynamic creation of new queues and unlimited record types.

DQ Introduction

You can think of DQ as a cross between Active MQ and a distributed RSS server but DQ is designed specifically for durable in order message delivery of millions of records per minute to hundreds of distributed servers where we want each of the remote services to consume the entire queue in near real-time.

DQ is specifically designed to support large scale replication across flaky WAN environments.  Messages are never targeted to a specific client and new client services can be added without changing the server configuration.

Initial tests have demonstrated that DQ delivers over 200K record inserts per second and sustained read loads over 1 million records per second from moderate hardware.  These tests were through the HTTP interface.   DQ reached this performance on a medium speed laptop and would do twice as well decent server hardware.

DQ Features

  • Fast Incremental delivery of data to distributed servers.
  • Guaranteed sequenced delivery.
  • Fast durable queuing,
  • Ability to propagate data between reference master servers and distributed service nodes at ultra-high speed
  • Uses standard HTTP protocol.
  • DQ provides guaranteed message ordering
  • Support for Many readers
  • Replay from any point of time.
  • Very small footprint,
  • Built in distributed replication to support nearly infinite read scalability
  • Ad hoc creation of new queues
  • Dynamic support for new record types.
  • DQ is designed for record sizes between 80 bytes and 200K could handle larger records including binary.
  • Ability to add distributed service consumer nodes without changing DQ server configuration.
  • Reliable large scale replication across flaky WAN environments.

Transparent scalability to billions of reads per second

By using multiple layers of DQ servers via the standard DQ replicator it is inexpensive to scale the DQ for data delivery to support billions of records per second but that would also require careful network configuration.

Scaling for higher write loads is a little more challenging but the easiest way is to run multiple DQ servers which are dedicated to supporting different queues.

 Underlying file basics

DQ manages its own storage using a proprietary file protocol which easily handles terabytes of data spanning multiple file systems.   A feature of the underlying storage system allows the entire server to run in less than 50 Meg when fully loaded with live readers.

The underlying storage files are designed to facilitate low cost incremental backup using rsync or similar backup systems but most of the time backup is via DQ replicators.

 DQ user in a ML (Machine Learning system).

I use DQ to deliver bar data between edge servers which obtain trade data to Machine learning (ML) analysis nodes.  Some of these nodes consume the data directly while others use the inbound data to create local reference databases.

I wanted DQ to be fast enough to allow the Bayes Analytic back-test system to use DQ as their sole source of data which provided a guarantee that the local back test could not cheat and look into the future.   I wanted DQ to be fast enough to play through a full 4 year back test on 1 minute bars which in our environment covered 25 million bars and about 2.61 gigabytes of data in less than 10 minutes.  DQ and the associated the DQ client exceed this goal.

DQ used in a distributed WAN environment

I am also using DQ to manage trade signal delivery to remote customer data centers and to propagate bar data between servers.

For the remote signal data centers we use our standard DQ replicator to push a subset of the data from our master DQ server to the remote customer data centers using https.  The customers provide their own DQ client which reads the local DQ and makes their own local trading decisions.

 Optimization opportunities

DQ is fast enough that it fully saturated a 1 Gig connection before we saturated the server CPU.    There are still some optimization opportunities which will likely allow me to increase insert performance by another 200% and the read performance by up to another 300% after that we would need to port to hand crafted C to obtain another 300% improvement.    At the current time DQ is fast enough for our needs and already exceeds our 1 Gig network links so there may not be a viable ROI for further speed improvements.

Contact us

Leave a Reply