Shinobu started as a small hobby project for me, then turned into a university project, and finally it turned back into a hobby project? Thats more or less correct. Anyways, at first I had a need to send large files to friends. There weren't really any options to do so. This was back in 2016 and practically the fastest option was to use netcat, which works just fine of course, but I wanted something a bit better, something actually made for this purpose.
Now hold on, you might say, you could just upload these to Drive or something! Yes I could, but that actually costs money for big files and its also a lot slower than just sending directly (200 mbit+ downlaod and upload speeds are fairly common around here). With that the project started and a very simple application was made. It utilized a simple protocol that sent over a description of the files in the transfer for the other side, then the files and folders themselves. There was nothing all too interesting about it, but soon after I had to choose a topic for my thesis. At this time I also picked up an interest in cryptography, so I thought I will make my own, performance oriented file transfer protocol along with a secure channel.
Now hold on again, you might say, why not use TLS?
Very good question indeed, the only answer I can give you is that it was a really good opportunity to learn about practical cryptography.
At least I'm not rolling my own algorithms and the whole thing is fairly simple (actually using a lot of ideas from TLS and SSH), so its most
probably reasonably secure. At least on paper, the implementation will might have bugs.
Anyways, all these parts are detailed a bit more in the subpages, so I'm going to leave the introduction here. Most of it is still very much under construction, as-is the library so don't expect actual documentation here. Most of the code is documented so the library should be somewhat useable if you really need something from it.
The main point was to make the secure channel simple. This originally started as an experiment into creating one such channel.
For the design concept I took some ideas from both TLS (1.3) and SSH. The channel has completely separate keys and state for incoming and outgoing communication.
All packets have a unique implicit sequence number. Due to the implicitness the secure channel requires the transport channel to guarantee order and be reliable.
The stages of the protocol are as follows:
- Sides agree on a protocol version.
- Sides select a certificate of the other side they can support. The past messages along with any further ones are authenticated using the certificate's algorithm.
- Sides agree on a symmetric cipher and key exchange, create symmetric key with the exchange.
The certificate used is not really a real certificate per-se, but a simple key container. The model here is a lot like SSH where there is no PKI and the user has to manually trust the key. However I found that merely the key does not provide the functionality so I created a simple (self-signed) certificate format. I could be using X509, and its possible to write extensions to do so (even with PKI!), however I find X509 to be way too complicated generally.
Supported authentication algoriothms are: EC-DSA (P-521), RSA-PSS (4096), and Ed25519. The first two are implemented using Bouncy Castle
while the last (now default) one uses NSec / libsodium.
Key exchange can be regular Diffie-hellmann, EC-DH over P-521, or X25519. As before, the first two are using Bouncy Castle while the last one is NSec.
All the key exchanges are ephemeral, providing forward secrecy.
Symmetric algorithms are: AES-CBC + HMAC-SHA256 (encrypt-then-MAC, provided by the .NET library), AES-GCM (available from both .NET and NSec), ChaCha20Poly1305 (NSec).
If encryption is not needed standalone HMAC-SHA256 is also available. The keys from the key exchange stage are processed using HKDF.
Additionally the user may supply a password for some exchanges which is processed using PKDF2 and then folded into HKDF as additional salt.
The certificates are stored using a modualr storage system that allows encrypted and plain data to be stored in XML format. Additionally the entire file is authenticated. The keys are created from a user password that is processed with either PBKDF2 or Argon2, both of which are fully customizeable.
One of the major points of Shinobu was to offer fast transfers. Compared to WebRTC based online solutions that seem to cap out at 60 mbit/s (at least for me)
there is nothing preventing Shinobu from running at multi-gigabit speeds. Processing is done in a parallel pipeline for both oncoming and outgoing traffic:
Read data -> Sequence data -> Apply the cipher, serialize to binary -> Send over network ->
-> Receive from network -> Sequence and deserialize -> Apply the cipher -> Write data
All the ciphers currently implemented can be run in parallel, but single thread only ciphers are also supported (although that would cap performance).
The pipeline itself is implemented as an abstraction over the System.Buffers.Dataflow library, which takes care of the brunt of parallelization and threading.
A major performance bottleneck was the allocation of large buffers for the data read from disc and the encrpted data. To alleviate this problem System.Buffer.ArrayPool is used,
along with a higher level abstraction that allows the dynamic useage of both pooled and non-pooled arrays. The switch to pooling not only eliminated GC pressure,
but the time it took to allocate which resulted in a massive jump in performance. I should mention at this point that even the original version had no problem with gigabit speeds,
as such these optimizations are more of a fun project (like much of this entire thing) rather than really useful things.
Another big jump was achieved just by switching to .NET Core from Framework. Not only is the crypography a lot faster under Core, it seems that everything is.
After some more minor optimizations, refactors I ended up with something rather fast. Comparisons are provided in the following table. All tests were done on a Ryzen 2600X,
speeds represents both an outgoing and incoming operation, a loopback in other words. Real performance of a single transfer will be higher. All speeds are in MiB/s.
| Framework | Framework + Array Pool | Core | Core + Array Pool | Current | |
|---|---|---|---|---|---|
| AES-CBC + HMAC-SHA256 | 150 (50%) | 248 (70%) | 488 (60%) | 1789 (100%) | 2638 (100%) |
| AES-GCM | - | - | 668 (60%) | 2955 (100%) | 5632 (100%) |
Note: The Ryzen series CPUs can process SHA-256 a lot faster than Intel Skylake can, on Core with ArrayPool the Ryzen system is capable of 2832 MiB/s while a similarly six core 8600k can only do 657 MiB/s.
The command line is the basic way to use the application. It is more or less feature complete and can be used relatively easily. For more information use the built-in help function by running the application without arguments. The CLI itself can be used on any platforms, but you might need to compile the Argon2 library (only the windows version is shipped with the application).
Let me preface this: The GUI is far from complete. Nevertheless it can be used if you don't want to bother with the command line. I myself switched over as its just easier.
The GUI is implemented in WPF using the High-Level API of the base class library and a separate View Model project.
The window uses an entirely custom style taht supports white and dark themes on all platforms (where WPF can be used).
I also carried out an experiment with Avalonia, which turned out to be very successful. I'm not completely sure if I want to continue developing the WPF implementation,
or focus entirely on Avalonia. For those unfamiliar, Avalonia is very much like WPF except it actually works on Linux as well.
I plan on continuing the work on this project whenever I have the motivation to do so. There are some things that I would still like to do, such as improving the WPF / Avalonia interfaces. Using SCTP to provide STUN support is also in the works. Currently I'm in the middle of reworking how packets are interacted with in the pipeline. I expect substantial performance improvements from the reduced copy operations, however this does reduce the flexibility of the code a bit. This change also bumps the protocol to V6 and will break compatibility.