FAQ
I received a mail informing me that the password of my computing account will expire in few days. How can I change my password?
You can go to the Identity Management Portal to change your password.
I forgot my password, can I change it myself?
Yes, you can go to Identity Management Portal to change it.
I launch my job few minutes/hours ago and it's still pending. Can you have a look?
It's possible that the computing farm is full, in which case you will have to wait. Note that the resources are shared with other groups than Euclid, so the farm can be full even if Euclid activity is low. A few tips in this situation:
- Optimization is the key. Always try to optimise your code, and request only the necessary resources. In particular the more memory you request, the longer it will take for your job to start running.
-
You can check why your jobs don't start with
squeue -j jobId
(or withsqueue -u youruser
). There are many reasons why your job is pending (all reasons here):- Classic
- Resource : resources requested by the job are not available
- Prority : one of more higher priority jobs exist for the partition associated with the job or for the advanced reservation
- BeginTime : the job's earliest start time has not yet been reached
- Dependancy : this job has a dependency on another job that has not been satisfied
- Limits
- AssocGrp*Limit : the job's association has reached an aggregate limit
- AssocMax*Limit : portion of the job request exceeds a maximum limit (e.g., PerJob, PerNode) for the requested association
- Blocking reason (contact us)
- BadConstraints : the job's constraints can not be satisfied
- DependencyNeverSatisfied : this job has a dependency on another job that will never be satisfied
- QOSMaxMemoryPerJob : the Memory request exceeds the maximum each job is allowed to use for the requested QOS
- Classic
-
If you want to launch a little test, you can use the
flash
partition using this option in your sbatch command :--partition=flash
. This partition is limited to 1 hour and 10 simultaneous jobs per user. - Verifiy if your pilots are suitable for your jobs, more information on the pilots configuration here.
Would it be possible to increase my Jupyter memory?
Yes of course, you can ask us by mail or Slack. As a reminder, Jupyter memory for all Euclidians is 16 GB by default.
I can't connect anymore through ssh at CC. Is the CC down?
A few things to check:
- Try to connect on a specific node to check if it's a general issue (cca020 or cca012, for instance).
- Make sure your network is open with
telnet cca.in2p3.fr 22
. - Obviously, make sure you are using the right password. Note that your IP can be banned after several errors. Don't forget that you can use Kerberos to connect without typing your password.
- If you are connecting from a shared IP, keep in mind that authentication errors from other users can impact your connection.
Can I use VSCode at CC-IN2P3?
Yes, you can. We have some tips for the use:
- To solve some performance problems with VScode from an interactive server, you can add this part in your config file
~/.vscode-server/data/Machine/settings.json
. Of course, you can add or modify the directories to the list. And don't forget the comma at the end of each line (exept the last one)!
{
"files.watcherExclude": {
"**/.git/objects/**": true,
"**/.git/subtree-cache/**": true,
"**/node_modules/*/**": true,
"**/.cache/**": true,
"**/.conda/**": true,
"**/.local/**": true,
"**/.nextflow/**": true
}
}
-
Be careful to not use Vscode with a directory in
/sps/euclid
like working directory. When vscode start, it launch agit status -z -uall
from currently directory. Your directory contain many files or a lot of Gio/Tio and this action isn't a good idea with many files. We recommend you to modify this option in your vscode config or change the working directory to launch it (with just few files). -
We recommend to disable
ripgrep
(or to configure it to be less aggressive) to avoid too much load on the interactive server, more informations here.
My PPO is stuck in ALLOCATED on DBView when I launch it on SDC-FR-DEV. Can you help me ?
The PPO isn't probably stuck, it's just not picked up at all by the IAL. IAL of SDC-FR-DEV is just connected to SDC-NL DPS environment, so if your PPO is ingested with OPS environment, the PPO isn't picked up. You can relaunch it with the appropriate SDC project :
SDC | Environment | Project |
---|---|---|
SDC-FR-PROD | OPS | TEST or EUCLID |
SDC-FR-DEV | TEST | TEST or EUCLID |
You'll probably need to abort the PPO before relaunch it. You can abort it by ingesting a XML file with ERun ST_Operations ST_CloneAndRunPPOs
.
<?xml version="1.0" ?>
<ns1:PpoControl xmlns:ns1="http://euclid.esa.org/schema/interfaces/sys/orc">
<Id>NIRH_ABORT-TEST1</Id>
<PpoId>TC-LE3-005001-CM-2PCF-WL-CS_Jackknife_PPO-020924-164109</PpoId>
<Status>NEW</Status>
<Action>ABORT</Action>
</ns1:PpoControl>
Don't forget to change the Id and the PpoId.
source /cvmfs/euclid.in2p3.fr/EDEN-3.1/bin/activate
ERun ST_Operations ST_CloneAndRunPPOs --ppos PPO.xml --sdcs SDC-FR-DEV --output-dir outputDir --username userId --password .. --project TEST --env test --suffix yoursuf