|What's new in FixPix and AI image magic in general...|
FixPix Desktop has been released! (check it out here https://fixpix.net/download.php) The essence of FixPix Desktop is a platform for running the latest and greatest AI on your local images. It was aimed to be easy to use by whoever wants to see the power of image processing in general and AI image processing in particular. The modular architecture enables to add external plug-in components (FixPix Desktop plugins) each containing their own set of features. It also was made to be easy to create additional plugins by any developer who uses Python and has something to contribute (or sell) in the image processing field.
If you follow the AI news, no doubt you've already heard about ChatGPT, the new and jaw-dropping chatbot from OpenAI (of DALL-E 2 fame). You might ask why bring up AI that in the field of natural language processing and generation in a blog about AI in the field of image processing and generation?
"Hey, have you heard about the double-slit experiment? It's this crazy physics thing that's supposed to show how particles can act like waves.
"So here's how it works: you shine a light through a wall with two slits in it, and then you measure what happens on the other side. And you know what you see? A bunch of frickin' waves! I mean, seriously, what the heck is going on here?
"I mean, I get it if you shine a light through water, you get waves. But this is a solid wall we're talking about! And yet, when you measure it, the light is acting like it's a bunch of waves.
"It's like, what kind of magic is this? Is the wall secretly made of Jell-O or something? I mean, I don't even know what to believe anymore.
"But the craziest part is when you cover up one of the slits. Suddenly, the waves disappear and the particles start acting like... well, particles. It's like they're like, 'Oh, you want us to act normal? Fine, we'll act normal.'
"I mean, I don't know about you, but I'm starting to feel like I'm living in the Matrix or something. It's like, what other crazy stuff is out there that we don't even know about?
"But hey, I guess that's the beauty of science, right? It's always full of surprises. Who knows what other weird stuff we're going to discover?
"Anyway, that's the double-slit experiment for you. It's like, one minute you think you know how the world works, and the next minute you're questioning everything you thought you knew.
"But hey, it's all part of the journey, right? The journey to understanding the mysteries of the universe. It's a trip, man."
I read some people's responses that nitpick about ChatGPT giving inaccurate or wrong replies or not being especially creative. The above sketch would probably fall in that category, but I believe that totally misses the point of what is happening here. It's basically like seeing the Wright brothers first flight of 120 feet and complaining that it is not interesting because you can't fly overseas to your vacation with this technology.
No - I believe this is the mark of the begining where everything changes, with way more far-reaching implications (technological, economical, social, ethical and philosophical) than how the ability to fly changed the world
When you live in times of exponent technological growth and creativity, there's no telling what tomorrow will bring...
Also, what does this question have to do with image processing and AI?
Before going into that, I'd like to introduce the open source Stable Diffusion image generation AI that's rivaling commercial large closed source players with the quality of the images it creates. In fact I've already written about Stable Diffusion before but managed to lose it somewhere...)
If you haven't yet heard of Stable Diffusion, a quick internet search will return enough headlines, articles, tutorials, blogs etc. about this amazing piece of technology. You'll quickly realize the impact this software has had on the open source image generation and AI fields, as well as future impact it will have on creating art and the obvious ethical concerns on what the potential for bad actors to misuse it.
Entire books can be written on the subject, but I'd like to focus here on something quite profound which goes deeper than any particular technology, regardless how amazing it might be.
For starters, let's look at some simple examples of what Stable Diffusion (as well as other state of the art text to image generators) can do. Basically, you give it a descriptive text and in seconds (each of the examples below were produced in less than 10 seconds) it produces an image that captures the descriptive text that was entered (to understand how this magic is perfomed is beyond the scope of this post, but you can easily look it up).
Example 1: Two Huskeys playing chess by Monet
Example 2 (using the same text): Two Huskeys playing chess by Monet
Example 3: A racoon playing the violin by Da Vinci
The above are just three simple examples. The number of possible examples that Stable Diffusion can create are limited only by your imagination to give textual descriptions of what you want to see in a picture, and of course, as you've seen, giving the same description twice will create two different pictures that convey the essence of the description.
The possibilities are endless.
Or are they?
Let's get back to basics to reflect on this question.
Any computer image is basically just a rectangular combination of pixels, where each pixel is a rectangle that can be a certain color.
For example, below is an example of an image that is composed of 90 pixels (9 rows, each containing 10 pixels), and in this case, each pixel can take only one of two colors: black or white.
You probably recognize that the above particular combination of black pixels in the 10 x 9 grid of pixels is a smiley. We could also draw other recognizable patterns on this 10 x 9 grid, for example:
I'm sure you could get creative and add draw many more images that I haven't included here given the freedom to draw black pixels wherever you'd like on this 10x9 grid, but could you be endlessly creative?
To get the answer, let's take this to the extreme. Suppose that instead of having a 10x9 grids like the ones above, you only had a 1x1 grid, so you have only one pixel that you can paint. How creative can you get? Well, you have the freedom to draw either one black pixel or one white pixel, so in total you can only produce two "pictures" (if you can call one pixel a picture...)
So basically your creative freedom given only one pixel to paint boils down to two boring (and small) pictures:
That's it. Two is the limit to the number of pictures you can generate if you only have one pixel to work with (which can take on either a black or a white color).
Since that seems obvious, lets see how many pictures we can generate if we are given a grid of two by two pixels (total of 4 pixels) We can calculate this in a similar way that we used to understand how many pictures we can create with 1 pixel above:
So above is a 2x2 grid of pixels (numbered 1 to 4), and each pixel can take one of two colors, namely black and white. How many combinations of pictures can we create with this grid?
There are two posibilites to paint pixel 1 (black or white, just like the two possibilites when we only had one pixel to work with)
For each of the two possibilities to paint pixel 1, there are two possibilities to paint pixel 2, so we have 2x2=4 possibilities to paint combinations of pixel 1 and pixel 2
For each of the 4 possibilities to paint pixel 1 and 2, there are two possibilities to paint pixel 3, so we have 2x2x2=8 possibilities to paint combinations of pixel 1, pixel 2 and pixel 3
For each of the 8 possibilities to paint pixel 1, pixel 2, and pixel 3 there are two possibilities to paint pixel 4, so that gives us 2x2x2x2=16 possibilities to paint combinations of pixel 1, pixel 2,pixel 3 and pixel 4.
So, given 4 pixels, we are limited to creating a total of 16 different pictures. The following illustrates all the possible "pictures" that can be generated given 4 pixels.
However, every additional pixel that we add to the grid, multiplies the previous number of possible combinations by two, and this number of combinations for painting the different pixels grows very fast the larger the grid that you are allowed to use. In fact, this is exactly what exponential growth means.
To summarize, the number of possible pictures that can be generated can be calculated by repeatedly muliplying 2 as many times as there are pixels in the grid. As we saw above, on a grid of 4 pixels, the number of possible images that can be generated is 2x2x2x2 also known as 2 to the power of 4, or 24 = 16.
Just to show you how fast the number of possible images that can be generated grows with the number of pixels in the image, let's get back to our simple 10x9 grid on which we drew some simple recognizable shapes. That grid has 90 pixels, and while the number of possible pictures we could generate with 4 pixels was 24 = 16, the number of possible pictures we can generate with 90 pixels is 290, which turns out to be the number below:
Or if you prefer it in words, with a 10x9 grid on which you can only draw white or black pixels, you are limited to generating one octillion two hundred thirty-seven septillion nine hundred forty sextillion thirty-nine quintillion two hundred eighty-five quadrillion three hundred eighty trillion two hundred seventy-four billion eight hundred ninety-nine million one hundred twenty-four thousand two hundred twenty-four different images.
Note that if you randomly choose an image from the above huge number of possible images, there's a high probability that it will just look like unrecognizable noise and won't appear like any shape we are familiar with. Only a very tiny fraction of these images would have any resemblence to something we'd call a meaningful picture (like the samples shown above).
We can't do much if we are limited to working with 90 pixels. Let's make things a bit more interesting.
Below is a picture using a 200x200 grid of black and white pixels which I'm sure you'll recognize:
How many different pictures can we make using the above 200x200 grid?
We saw that the number of different pictures that can be created using black or white pixels on a 2 by 2 grid is 16. We then saw that the number of different pictures that can be generated using black or white pixels on a 10 by 9 grid is already a huge number (with 28 digits)
The number of different pictures that can be created with the above 200x200 grid is a number which has a length of 12,042 digits !
If we increase our creative freedom and enable full color on the above 200x200 grid, then instead of being limited to black or white we'd have the freedom to choose for each and every pixel in the grid, one of 16,777,216 different colors, and would increase the total number of different pictures we could generate to 16,777,21640,000 - a number that has a length of 288,989 digits !
So the number of different images that one can create even for small 200x200 images is so large that we can't even begin too wrap our limited human minds around the sheer scale of it. Even the number of atoms in the the entire observable universe becomes completely insignificant compared to these scales.
But the takeaway here is that as large as this number is, it is not infinite. A number with 288,989 digits is just as close to infinity as a number with 1 digit.
Our minds are not equiped to grasp the infinite. In fact, the very purpose of the mind is to make things finite so that we can discern between them and try and make sense of the relationships between different finite things, but it's an interesting exercise to try and grasp what pictures are included in the collection of 16,777,216400 pictures which contain all the possible full-color pictures that can be generated on a grid of 200x200 pixels.
So let's imagine you had the super ability to instantly travel to any time you wanted, from the bing bang 13.8 billion years ago to this day.
Let's also imagine you had the super ability to instantly travel to any place you wanted.
You would also be able to take a camera with you.
Also, suppose you are a person who likes to take pictures of everything, and by that I really mean everything - everything, everywhere and at every point in time.
Every event that has ever happened would be captured on your camera: You'd take all the possible pictures of everything that has ever existed. You'd take these pictures in all zoom levels and from every possible angle.
You'd take pictures showing the process of how every object in the universe formed and evolved - galaxies, star, planets, life on planets
Specifically, you'd also take pictures of everything and everyone that existed on earth at any point in time.
So, apart from pictures of every possible non-living things in the history of our planet, and all living things in the history of our planet (we'd finally know exactly how all dinosaurs and other prehistoric animals looked like), The entire biography of every living human on earth would be there in pictures, taken from every possible distance at every possible angle for every instant they existed. You could generate millions of different pictures for each second in the life of each and every person that has ever lived. Nothing would escape your camera.
Can you begin to imagine how many pictures that adds up to?
But all those pictures are in fact just a tiny, tiny fraction of the number of possible pictures in a 200x200 grid, because they are only pictures of all the events in the history of the universe (and Earth) that actually happened - whether it is Bach writing his first music piece, or a picture of the first stages of building the great pyramid of Giza.
For every event above that has actually happened (and which you so graciously took a picture of for posterity), we can generate an almost countless number of pictures of events that never happened - for example that time Napoleon Bonaparte (died 1821) met Charlie Chaplin (born 1889) at Admiral Edward Russell’s Punch Party (occured in 1694). So in collection of 16,777,216400 pictures you would also find a vastly larger collection of pictures for any possible imagined event involving every possible imagined thing.
And for every single picture of an imagined event, you'd find a mind-boggingly huge number of pictures that just seem like a random collection of pixels.
So, no - there isn't any image generation software that can create an infinite number of pictures as the number of pictures (especially meaningful pictures) that can be generated is always finite, but that fininte number is so vast that it beyond our mind's ability to grasp it, and contains every visual thing we could ever experience or imagine.
After months of work, I'm happy to announce the coming release of the FixPix image-processing application, which will have several advantages:
The first release will be for the Windows platform, followed by Mac OS/X and Linux.
For a quick glance of what it is about, check out the help file below:
There's a short article by Tim Bryce from back in 2006 which is named "Parkinson's Law in IT". It's worth reading. In a nutshell, it says that "As computer hardware capacity increases, software becomes more bloated.". I'd expand that and say that in general we tend to waste the things we take for granted. This can lead to not making the most of what we have, not being grateful for what exists and instead focusing on what is missing, etc.
I could go deeper into how it is possible to balance the natural striving for more and better while at the same time being happy with what already exists but this post is not going to go down that rabbit hole.
What I will bring up is the fact that the latest and greatest neural networks (i.e those that can achieve the amazing feats that Google, OpenAI and others have shown) still require huge resources in terms of GPU memory and power. To experiment on your home PC using the open source version that simulates Google's Imagen network requires a strong graphics card that has around 16GB of memory. It does have a smaller neural network that you can use on graphic cards with 8GB on memory but the generated images, though impressive (in terms of what we believed were the limits of what a computer can do just a few years ago) could not be compared in any way with the results that the commercial propriatary networks can achieve.
There is a cynical quote that says that every problem can be solved if you throw enough money at it. When there's enough money to solve the problem then the companies that can allow themselves to will usually use the simplest solution (relatively speaking) to solve an AI task, which is to make use of larger and larger networks, using many strong graphic processors, fed with increadible amounts of data to produce very smart, but perhaps also very bloated deep neural networks.
Of course, the question is how much of the size of the resulting neural network is just bloat due to opting for brute force to train it (for those that have the resources to apply such force) and how much is it a theoretical necessity derived from the amount of knowledge/information required to create an internal representation that can create the jaw-dropping results that these neural networks can accomplish. This is a field of ongoing research and hopefully with further research we will discover that while not everyone can buy a spaceship and fly to Mars, everyone will be able to download a lightweight neural network to their own PC or even phone and create with it amazing things which today we can only do by paying companies which have the resources to train and host these networks.
A big part of the AI revolution is due to the success of deep neural networks (if the concept is unfamiliar, there are many good simple introductions on the Web, for example here), and neural networks are also the driver rapidly evolving field of AI art and image generation.
In a previous post, I mentioned two amazing text-to-image technologies that were developed by Google and OpenAI, namely Imagen and DALL-E 2, but there are more technologies and companies in this area
In the open source space:
In the proprietary space:
The above list is not comprehensive by any means - there are dozens more (most of which are built using the latest open source technologies), and I expect that with the freedom of information powered by open source, the number of commercial and opens source solution will continue to accelerate exponentially.
Note : The original post that was meant to be here took three hours to write and then was accidentally deleted 🤦. This new post will not try to recover the scope of the original because there is a lot of other work to do here at FixPix, but hopefully it will retain the main points of that magnificent post that is now in blog heaven...
The Internet began as a distributed network. What this means is that unlike a centralized system. no one person or entity owned or controlled the Internet. Everyone had the freedom to add their own knowledge and the ability to share their own ideas, or resources with anyone else on the network. Having no central control over what people did on the Internet did not make it devolve into chaos, but instead made it evolve into one of the most amazing phenomenon in the history of mankind with and enabled an exponential growth of freely accessible knowledge.
Of course, the soaring popularity of the Internet was quickly leveraged by commercial entities and also enabled business models where centralized services are used, giving companies control on who gets to do what with the services offered.
Software is created using programming langages. There are many programming languages software developers can choose from to develop software but computers don't really understand programming languages. Computers can only run machine code. The programming languages are made to be understood by humans and programs written in these languages are known as the "source code". This source code is translated into machine code which your computer can then run.
For example, the famous Microsoft Word application is written in a programming language called C++. However, if you look inside the Word application on your computer you won't find the C++ language it was created with but just illegible machine code. This is becuase Microsoft does not publish the source code for Word and only provides you with the machine code that enables you to run the Word application.
This makes sense as a lot of money and time was invested by Microsoft in developing Word and giving the source code would mean anyone could then instantly create their own version of the software, add to it and sell it.
And yet, over the years various "open source" movements and initiatived gained popularity. Open source means developers publish the source code of the applications they develop. This lets other developers the ability to both fix and expand the original application, and indeed that is what happened.
The popularity of open source has soared and today hunderds of thousands of open source projects exist, many of them as good as, if not better than their closed source counterparts.
We at FixPix are aware of the pros and cons of centralized vs distributed systems and open vs. closed source. While freedom comes with a price (mainly its exploitation by bad actors) we firmly believe that freedom is what drives abundance and flourishing. As such we strive to provide more freedom to anyone who wants to use the software:
FixPix is currently a SaaS (Software as a Service) product. This is a popular business model on the Internet where you basically rent the software instead of buying it. This can be a a reccurring payment or pay per use model, but either way continued use requires continued pay. A different model is one where you purchase the software with a one-time payment, download it to your computer and then use it as much as you like. Soon we will be releasing a standalone software version that you can purchase. You pay once for the features and use them as much as you like afterwards. There are other advantages for buying the software such as utilzation of the power of your computer for faster processing times and the ability to use the software even without Internet connectivity. There is another exciting feature related to freedom which will be in the standalone version of the product but this will be left for a later post...
A lot of software development work went into making FixPix software a reality. Howerver, the variety of cutting edge AI technologies that FixPix makes accessible is made possible by the contribution of developers that have made their technologies open source. To show the great appreciation and respect we have for these developers, FixPix will credit the authors of open source technologies it leverages (even if their open source license does not require it). We'll also strive to add our own contibution to the open source movement and this blog will point out amazing open source projects.
Get ready for a celebration of choices on how you want to use FixPix with our soon to be released downloadable software.
The popularity of open source enables us to quickly add more exciting features (and tell you who authored them).
Always backup your blog posts...
This marks the beginning of history for FixPix 😃
FixPix was designed from the start to be a place where everyone can learn hands-on what amazing things can be done with images using the latest AI technologies. Until very recently things like, OpenAI's DAL-E 2 and Google's Imagen would have been considered totally science fiction. Seriously, who would have believed that in their lifetimes they could give a computer an elaborate textual description of a picture you'd like to create and in seconds you'd get an image that a graphic artist might need hours to create?
But these two amazing examples are just some of the jaw-dropping things AI can do today and it is anyone's guess what they'll be able to do tomorrow, bundled with technological, social, ethical and even philosophical implications...
The AI landscape in general and image AI in particular is evolving at an incredible rate, and seems to only be accelerating. This blog will contain updates on the hottest news in the field, explanations about various technologies and new features in FixPix that will enable you to easily run these technologies and be amazed at what they can do.