The Dynamics of Artificial Minds: Movement, Creativity, and Open-Ended Innovation

Colours School- Institut Pascal, June 2025


Denise Lanzieri












slides at denise-lanzieri-csl.github.io/colours_school/

Sony Computer Science Laboratories

  • Paris

    1996

  • Rome

    2021

CSL Map
  • Tokyo

    1988

  • Kyoto

    2020

Sony CSL is a framework to make the wildest ideas come true, for the future of humanity and the planet

Hiroaki Kitano, Sony CTO, President and CEO, Sony CSL

Sony CSL Logo

Research Lines



Infosphere Icon

Infosphere

Tackling the challenge of redesigning Information Technologies to make information more accessible and social dialogue more transparent, understandable, and healthy.

Sustainable Cities Icon

Sustainable Cities

Aiming at providing new tools for understanding and monitoring urban environments in order to make them more sustainable.

Augmented Creativity Icon

Augmented Creativity

Studying the ability of AI to understand the complexity of open-ended systems, to support creativity and help people finding original brilliant, innovative solutions.

Illustration by Fernando Cobelo

Augmented creativity Team

    Our mission is twofold:

    • Theoretical side: Use AI to discover original solutions to complex problems—fostering true creativity at the foundations.

    • Applicative side: Apply AI to augment human creativity in real-world contexts—empowering creators and innovators.

    In this presentation, we will explore three exciting activities:

    1. S+T+ARTS Projects: Our role in a European initiative at the intersection of science, technology, and the arts.

    2. Movement Models: Developing AI models inspired by Large Language Models to learn and generate human movement.

    3. CodeFest: A multi-week coding challenge series leveraging patent data to solve real-world problems.

S+T+ARTS Projects

S+T+ARTS


    Science, Technology and Arts for the main sustainability challenges

  • S+T+ARTS AIR

    • Michail Rybakov $\to$ Measuring Personal Space in Cities: Developed the KI/s (Kinesphere Infringement per Second) metric to quantify crowd density effects on personal space using GPS data, real-world observations, and simulations.

    • Filippo Gregoretti $\to$ Moral Values & Toxicity in Art: Exploring the transformation of moral values and toxicity into emotional audiovisual experiences generated by an emotional AI

  • S+T+ARTS Buen-TEK: Explores how indigenous knowledge and advanced technologies can work together to solve problems like climate change and pollution, and help create a stronger, more sustainable future.

Movement model

(Large) Movement Model




  • Analogy with LLMs: words $\Longrightarrow$ postures, phrases $\Longrightarrow$ movements

  • Posture quantization

  • Anticipation, prediction and comprehension of human movements

  • Smart Mirror: supporting the performer creativity by understanding her own way to be creative and suggesting new solutions on the fly

  • Roadmap:




















    Part I: Data Collection Pipeline





    $\Longrightarrow$ Customized workshop to bootstrap our LMM

    $\Longrightarrow$ Approaches based on deep learning to estimate human pose $\to$ sVision: Designed to identify and track human body poses in real-time.

    Roadmap:




















    Part II: Construction of the AI-based model





    $\Longrightarrow$ Selecting an appropriate deep neural network architecture to generate dance movements
    
    								import tensorflow as tf
    								import keras
    								# Define the loss function for the model 
    								loss = 'mean_squared_error'
    								# Set the batch size 
    								batch_size = 8								
    								# Set the number of epochs
    								epochs = 2000								
    								# Set the learning rate to 0.01
    								lr = 0.01								
    								# Define the number of units (neurons) in each LSTM layer
    								n_units = 1000															
    								# Create a new Sequential model
    								model = Sequential()								
    								# Add the first LSTM layer to the model:
    								model.add(LSTM(units=n_units, input_shape=(predictors_train.shape[1],
    								moves_vocab_size), return_sequences=True, activation=activations.tanh))								
    								# Add the second LSTM layer with similar parameters
    								model.add(LSTM(units=n_units, return_sequences=True, 
    								activation=activations.tanh))							
    								# Add the third LSTM layer which outputs the last sequence
    								model.add(LSTM(n_units, activation=activations.tanh))								
    								# Add a Dense output layer that uses a sigmoid activation function
    								model.add(Dense(moves_vocab_size, activation='sigmoid'))								
    							

    Deterministic Baseline

    In my initial exploration and model-building phases, I started to adopt LSTM :
    • The LSTM is designed to process:

      • Each frame’s human position
      • Frame’s acoustic features

      $\Longrightarrow$ Outputs: Pose sequences

    • I am also testing other network, including Transformers, to evaluate performance and accuracy

    Posture distribution at time T+1

    Vector Quantization (VQ) with Self-Organizing Maps (SOM):

    Probabilistic model
    1. The models take as input movements from videos, captured as a sequence of body positions over time

    2. The model map each data point to a node whose vector is closest to the input vector

    3. A finite sequence of winning neurons will represent the body movement

    4. $\Longrightarrow$ The SOM can be visualized as a "vocabulary" of human positions, where each neuron corresponds to a specific posture

    Vector Quantization with SOM:

    Next step? Integrating Novelties in Deep Learning Systems

    The idea: Training algorithm inspired by Stuart Kauffman’s concept to explore new data spaces
    • The Dreaming Learning Algorithm:

      • Initial training of a probabilistic network with Vector Quantization

      • Dreaming Learning step: the network generates a new synthetic sequence

      • The network is trained again using the synthetically generated sequences
    $\to$ Enhance human creativity through dynamic human-machine interaction, fostering an evolving artistic partnership during performances.

    Artistic experiment

    Artistic experiment

    CodeFest Spring 2025 on classifying patent data

    Classifying Critical Raw Materials in Patents with LLMs


    • Scalable and fully reproducible methodology to map the role of Critical Raw Materials (CRMs) in patent-driven innovation and their alignment with the UN Sustainable Development Goals.

    • CRM-related patents are classified by the functional role of each material—use, refinement, recycling, or removal—linking these roles to broader trends in innovation, technology, and sustainability transitions.

    • My contribution:
      • Train a masked language model (MLM) based on a pre-trained architecture (e.g., BERT for chemicals) using a large corpus of unlabeled patent abstracts
      • Load the domain-specific model from Stage 1 and fine-tune it on a manually annotated dataset of CRM-related patent abstracts.

    General Conclusion



    As a former cosmologist, I never imagined I'd be working with dancers, artists, or urbanists.
    • But the mindset we develop during a PhD — critical thinking, abstraction, modeling, coding, resilience — is highly transferable.

    • You’re not locked into one domain. Your skills can help shape new fields and solve unexpected challenges.

    • Embrace uncertainty. Follow curiosity. That’s where real innovation begins.
    • Your training isn’t just about becoming an expert — it’s about learning how to explore the unknown.

    APPENDIX

    How to overcome motion capture limitations?

    • Approaches based on deep learning to estimate human pose

    • sVision: Designed to identify and track human body poses in real-time.

      • It works by detecting key points or landmarks on the human body, such as joints and other anatomical features.

      • These landmarks are then used to estimate the overall body pose, including the positions and orientations of body parts.

      • Simple setup, flexible, non-Intrusive, portable

    Main Challenges of Pose Detection

    (Illustration of the depth ambiguity (Li and Lee, 2019))
    (Erroneous predictions due to self-occlusion (Shin and Halilaj, 2020))
    • 3D poses $\Longrightarrow$ Using 2D joints to recover a 3D pose becomes an ill-defined problem as one 2D skeleton may correspond to many varied 3D poses.


    • Self-occlusion and dependence on the camera’s viewing: Might fail when visualizing a person in a pose where parts of their body obscure other parts

  • We need to evaluate the efficacy of video-based position extraction and identify systematic errors improving the reliability of the capturing system.
  • Towards a robust pipeline:




















    Future prospects: Extensions to general applications




    • Sport coaching and training

    • Physical rehabilitation

    • Speech-language diseases related to facial expressions

    • Micro-expressions detection

    • Improving human-like motions in robotic systems (movement fine-tuning)

    • Human-computer interface based on gestures and movements

    Methodology


    1. Semantic matching between CPC classes and SDGs
      We compute cosine similarity between CPC subclass titles and SDG targets using pre-trained sentence transformer models, establishing a conceptual bridge between patent classification and sustainability objectives.

    2. Keyword search of CRMs in patent abstracts
      We identify CRM mentions using a curated set of keywords and element symbols, filtered to minimise false positives and aligned with the latest EU CRM list (2023).

    3. Functional classification of CRM roles
      CRM–patent pairs are categorised into five functional roles: use, refine, recycle, remove, or wrong (non-functional mention). This classification reflects distinct innovation strategies across the CRM lifecycle.

    4. Fine-tuned LLM classification
      A BERT-based language model, adapted to the patent domain and fine-tuned on over 11,000 labelled examples, classifies CRM functions with 94% accuracy, outperforming rule-based approaches and enabling functional interpretation at scale.