hckrnws
https://github.com/coralblocks/CoralRing/blob/1168b047e0183c...
Am I missing something here or does the BlockingRingConsumer not actually block? And worse doesn't it just return garbage if poll is called without first checking availableToPoll?
The example sure looks like it... https://github.com/coralblocks/CoralRing/blob/main/src/main/...
Which if so isn't this like 1/4th a library for doing IPC? It doesn't seem to do much itself
availableToPoll() is the method that blocks, because it can return 0 on an empty ring. When that happens you have to block, by for example busy spinning around availableToPoll() until it returns something to poll.
You are never supposed to call poll() without first calling availableToPoll(). How can you poll something if you don't know if there is something available to poll? This is very different than a ConcurrentLinkedQueue where you can call poll() on an empty queue and get a null to indicate that the queue is empty. Also because the ring is a circular queue, you have to know what you can safely poll before polling. That's all done through availableToPoll().
NOTE: By garbage here I don't think you are talking about GC garbage.
> How can you poll something if you don't know if there is something available to poll
counterexample: you can call ::poll(2) if there isn't anything to available. It will either block or return as desired. That's literally what I would expect a poll function to do: check if something is available; c.f. busy-polling.
Maybe it is a language difference?
Not a language difference, but a design difference. By calling availableToPoll before polling you can poll a batch of messages without blocking or locking on each message. This design decision allows the ring to perform batching naturally. And batching is very important for perfomance. java.util.concurrent.ConcurrentLinkedQueue does not support batching.
I don't think anyone is questioning the strategy, just the naming.
Polling, as a term, already generally means asking if there is anything ready. The library might instead have had a poll() to indicate readiness and an unsafeTakeReadyBatch()/releaseBatch() or similar to handle the low level receipt primitives. Or you could even have had a checkForReady() and left poll() to implement a generic version of your busy waiting example, polling with a given sleep and timeout until items were ready, as a convenience for the user.
It's fine as is, of course, as you document expected usage quite well.
This change is now released. Thanks knome and gpderetta for your suggestions and clarifications.
availableToPoll() is now availableToFetch()
poll() is now fetch()
peek() is now fetch(false)
Note that fetch() == fetch(true)
Oh, I see now. I thought he was talking about a computer language but he was actually talking about the English language. My bad.
So are you saying that POLL was a bad choice of name because polling means "check if there is something available and if there is get it"?
What would be a better name? How about availableToFetch and fetch? Any other better idea?
I don't like remove because you are not actually removing the object from the ring.
poll doesn't necessarily imply removing.
It is clear to me now that poll only removes if there is something to remove
A much better name would have been FETCH and not POLL
We'll be changing everything from poll to fetch in the next version => availableToFetch() and fetch()
While we are bikeshedding, what about peek() (returns the element, but doesn't dequeue), pop() (dequeue and return the element), try_peek(), try_pop() for the polling variants.
> availableToPoll() is the method that blocks, because it can return 0 on an empty ring.
Which means it doesn't block...
Blocking means the call itself doesn't return until there's data available. This is a completely non-blocking API. Which is fine, it's just very wrongly named in that case.
But that also means I don't really know why there's both "non-blocking" and "blocking" variants at that point if blocking isn't an option at all in the first place.
You are going to block/wait yourself when availableToPoll() (now changed to availableToFetch() for clarity) returns 0. When that happens you can block/wait by busy spinning or by using a wait strategy. The blocking term means that the producer has to block/wait when the ring is full and the consumer has to block/wait when the ring is empty.
For the non-blocking ring, the producer never blocks on a full ring. It overwrites the circular ring. The consumer can still block on an empty ring.
So to make it clear:
Blocking ring => producer and consumer can block/wait
Non-Blocking ring => producer never blocks and consumer can still block on an empty ring. Consumer can also fall behind too much and disconnect.
It looks like these are more lower level libraries so clients are supposed to block, but they are responsible for doing the blocking themselves.
And given these are designed for "ultra-low-latency" systems, I don't think that's a big problem because the best blocking strategy is probably just to spin.
It would be nice if the docs were nicer though, considering this is a paid product...
> It looks like these are more lower level libraries so clients are supposed to block, but they are responsible for doing the blocking themselves.
You busy spin when blocking (fastest) or you can use a WaitStrategy from https://www.github.com/coralblocks/CoralQueue. You can see an example here: https://github.com/coralblocks/CoralQueue/blob/main/src/main...
> And given these are designed for "ultra-low-latency" systems, I don't think that's a big problem because the best blocking strategy is probably just to spin.
If you have an isolated and dedicated CPU core for your thread, that's correct. Busy-spinning is the fastest/best strategy.
> It would be nice if the docs were nicer though, considering this is a paid product...
CoralRing and CoralQueue are open-source and free at GitHub. There are also a lot of documentation and explanations on the GitHub README.me page (front page). The code has also a lot of comments.
Oof, I saw "CoralQueue" on the product list on your website and I just immediately assumed it was a paid product. But I didn't even think of looking at the LICENSE files, my bad!
Regarding the docs, I was only looking at CoralRing, and it looks like CoralQueue has some additional documentation that applies to CoralRing as well. After reading through them everything makes a lot more sense.
Weird to see this here! I've used CoralBlocks in the low-latency trading domain previously. Highly recommend. The API is kind, they're very responsive, and the latency is exceptional (and comes with all the basics like thread pinning built-in for convenience)
How does it compare to LMAX Disruptor if you have any experience with both?
They're both similar in design, the main difference is Coral Queue can be used for IPC between JVMs, using a mmap'd file.
See https://github.com/real-logic/aeron (also from the creator of the disruptor)
> CoralBlocks in the low-latency trading domain previously.
Yeah, modern JVM is a true miracle and you can be x5 productive (and safe!) compared to C/C++
Do you have any recommendations for a low latency work queue (with in a jvm)?
I want to spawn millions of micro-second-tasks per second, to worker cores..
I am on a massive cache CPU so memory latency hasnt raised its ugly head yet
EDIT: not LMAX please...
I think you can take a look at the DiamondQueue, which is a Demultiplexer and a Multiplexer combined so that thread A dispatches a bunch of tasks to a fixed set of worker threads (W1, W2, W3, etc.) and then these worker threads use a Multiplexer to deliver the results to another thread B. Thread A and Thread B can also be the same thread. There is an example here => https://www.coralblocks.com/index.php/the-diamond-queue-demu...
The DiamondQueue should be soon available for free at the CoralQueue project on GitHub.
You may find https://jctools.github.io/JCTools/ interesting
>any recommendations for a low latency work queue (with in a jvm)?
I toyed around the ring buffer pattern a decade ago, creating a unicast one (using CAS on entries, and eventually a logarithmic scan for next readable entry, not to brute-force-scan them all), but I'm not sure that its latency is much better than that of a regular ThreadPoolExecutor (the throughput could be better though).
Latency also depends on whether it spins or blocks when waiting for a slot to read or write.
If you want to give it a try: https://github.com/jeffhain/jodk/blob/master/src/net/jodk/th...
You were interested in the diamond queue, so just wanted to say that it is now released in our GitHub => https://github.com/coralblocks/CoralQueue?tab=readme-ov-file...
Given the documentation says that this is supposedly to be between JVMs, how do they handle the serialize/deserialize?
They punt on the actual serialization format: https://www.coralblocks.com/index.php/inter-process-communic...
In most applications like this you'll see direct byte manipulation to byte buffers because you want to pull as much performance as possible.
There are fast serialization formats like SBE that people leverage for this as well.
> Given the documentation says that this is supposedly to be between JVMs, how do they handle the serialize/deserialize?
Your transfer object needs to implement MemorySerializable. Below two examples from CoralRing's GitHub:
https://github.com/coralblocks/CoralRing/blob/main/src/main/...
https://github.com/coralblocks/CoralRing/blob/main/src/main/...
The second one effectively allows you to send anything you want (as bytes) through the ring, making CoralRing message agnostic.
Related: https://github.com/pcdv/jocket
Drop-in replacement for java.net.Socket using shared memory (and optionally, futex for notification)
it's sad how comfy the world for spawning request is versus when you are on the receiving side.
I don’t get it. How is this advantageous as it’s limited to one machine? Why wouldn’t you just have one jvm running multiple threads? What is the point of having multiple jvm processes interacting through this ring? Can someone enlighten me?
A few potential reasons for this design coming to mind:
- Resource allocation; you might want to give just specific amount of memory, CPU, network I/O to specific modules of a system, which is not really feasible within a single JVM
- Resource isolation; e.g. a memory leak in one module of the system will affect just that specific JVM instance but not others (similar to why browsers run tabs in multiple processes);
- Upgrades; you can put a new version of one module of the system into place without impacting the others; while the JVM does support this via dynamic classloading (as e.g. used in OSGi or Layrry, https://github.com/moditect/layrry), this becomes complex quickly, you can create classloader leaks, etc.
- Security; You might have (3rd-party) modules you want to keep isolated from the memory, data, config, etc. of other modules; in particular with the removal of the security manager, OS-enforced process isolation is the way to
Also software design. You can split jvm into those that have to follow strict parameters (eg no allocations) and those that follow more traditional Java patterns.
Yeah, I think the HFT guys use CPU pinning a lot: 1 process - 1 CPU, so you'd need multiple processes to take advantage of multicores server.
Usually it is 1 thread - 1 CPU. There might be other reasons (address space separation has its own advantages - and disadvantages) to have distinct processes of course.
JVM does garbage collection, this can stop all threads at safepoints while GC occurs.
Those stops can be enough to ruin your low latency requirements in the high percentiles. A common strategy is to divide workloads between jvms so that you meet the requirement.
CoralRing and CoralQueue (available on GitHub) are completely garbage-free. You can send billions of messages without ever creating garbage for the GC, so no GC overhead. This is paramount for real-time ultra-low-latency systems developed in Java. You can read more about it here => https://www.coralblocks.com/index.php/java-development-witho...
Interesting. But how do you ensure a worker that picks up a task does not pause on gc as well?
CoralRing does not produce garbage, but it cannot control what other parts of your application choose to do. It will hand to your application a message without producing any garbage, now if you go ahead and produce garbage yourself then there is nothing CoralRing can do about that. Ultra-low-latency applications in Java are designed so that nothing in the critical path produces garbage.
Maybe they run a small heap with a zero-pause JVM like Zing, as pause-less GC generally has lower throughput than normal GC.
Java doesn't have real pause-less GC.
Well, "pause too short to matter" just doesn't have the same ring to it.
One millisecond is not a short pause.
Modern low-latency GCs never reach 1ms on all the workloads I've put them through. Mind you I don't GC terabytes of RAM so who knows what happens there.
I can throw some guesses: 1) apps deployed in separate Docker containers due to organization's tech team separation, 2) apps that require security/performance isolation among tenants, 3) isolation layer around memory-leaky and bug-prone third-party library code.
Does anyone have suggestions for something like this, but in Golang?
This is fascinating. I have no idea what something like this would be used for though... what are the use cases?
Basically if you want to schedule workers on a separate JVM, but don't want to pay the latency cost of something like a DB-backed queue or a library with some FFI component.
Super cool, nice work!
Our kafka isn’t reliable enough. I need to write data on disk before flushing it to kafka. Can I use this lis to write data to disk and then consume inside same jvm. I need data to live through restarts
Can you explain what your issue is with Kafka? What makes it not reliable enough?
Kafka is good. The problem is we don’t have a dedicated person to manage it so sometimes we have kafka outages
So basically, you want to build a write-ahead-log before writing data to Kafka, and I think you're underestimating the effort to implement a WAL.
If you don't have a person that can manage Kafka, you almost definitely don't have the person to maintain a WAL.
Just a suggestion, you can learn about automq (a message queue that has rewritten the storage layer based on Kafka), which has done a lot in automatic recovery and self-balancing, maybe it will help you. If you don't need it, you can ignore this comment.
There’s a whole conference in Vegas next week that want to sell you something.
You could but its not purpose built for that. You'd probably be happier using some other memory mapped file format for that.
Could you please suggest java library for this if you know one?
crazy idea, but SQLite
Write your data to a MognoDB database. Then use the Kafka Mongo connector to pull data from MongoDB to Kafka.
ultra-low-latency and java in the same sentence... masochists?
Not at all. The constraints for ultra low latency are the same whether you use C++ or Java: No blocking IO, no dynamic memory allocation. Using the right libraries such as this one, Java can do a surprisingly decent job while keeping the code more accessible to a layman developer than if it was C++. You still need C++ for applications where maximum mechanical sympathy is required (cache alignment guarantees, etc.) but if you have low latency / medium volume Java often does the trick.
Well said and I couldn't agree more. There are top market makers and banks using Java for a fact. And other C++ firms as well. Some of them are considering or have considered the move to Java. Some have already done this move. Some will never do. I'm certain that it is trivial for a C++ programmer to code in Java. The opposite of course is not true. The whole point of Java as a language is to be higher-level than C++. I don't know if people have realized, but with GraalVM it is now possible to compile Java code entirely to native code ahead-of-time, like it is done with C++. Even before GraalVM there was already the -Xcomp option to force JIT compilation in the very first pass. However that does not necessarily mean that AOT is always preferable over JIT. It is not. Runtime profiling information so you can determine the critical path (hot spots) is amazing for some optimizations, such as aggressive inlining.
> I'm certain that it is trivial for a C++ programmer to code in Java.
as someone that knows both, I've made this assumption before too
it has turned out to be not true on almost all occasions
That's fair! I would assume that was due to your comfort zone and muscle memory. Not because you found Java to be harder than C++. Usually a higher level language is easier to grasp, handle and manage than a lower level one.
I think their complexity is very different
C++ the language is exceptionally complicated, but the ecosystem is relatively simple
Java is the opposite, the language itself is simple enough but the ecosystem is humongous (spring/J2EE, maven/gradle, n^2 logging frameworks/adapters, application servers, anything involving classloaders/annotation processing/dynamic bytecode manipulation, ...)
syntactically they look similar, but other than that there's not much in the way of transferable skills between the two
Totally agree! Because of the zero-garbage and no-gc requirement we don't even use the JDK. We use Java as syntax language and write everything from scratch, even the data structures (java.util.HashMap produces garbage). So the bloated Java ecosystem does not affect us too much.
yeah we're the same, our java looks very much like C
unfortunately all the data from outside our strange world still needs to be brought in and cleaned up :)
I do it better, I free myself from as much as I can of toxic SDK (c++ and java SDKs are filthy toxic).
Assembly, worst case scenario plain and simple C99+.
dynamic allocation is fine as long as it is under control, as it's just a pointer bump
what you never want is to trigger the GC at a bad time
(you can do cache alignment too relatively easily)
Doesn't dynamic allocation expose you to memory fragmentation or other non-deterministic phenomenons?
With semantic and fine-grained control of your allocations, and with a memory paging system on top of that, getting fragmentation is tough...
LOL you have a point. But this is subjective. Someone could argue that a highly complex C++ system is masochism. The reality is: a lot of successful market makers, prop trading firms, banks, etc. use Java for ultra-low-latency financial systems. And of course some successful companies use C++ as well.
Numbers aren't subjective. High-performance is a measurable claim and shouldn't be entangled with language-complexity matters.
Totally agree! My understanding because of the use of the word "masochist" was that it was possible but difficult. Not that one language was better/worse than the other, in terms of performance.
Comment was deleted :(
Crafted by Rajat
Source Code